Wasm Builders

Divya Mohan
Divya Mohan

Posted on

Part #1 : How to read WASM and maybe write it too.

This post was originally published on my Medium blog and was later picked up for publication by DevGenius

In the very first post of this series, we walked through the motivations behind the creation of WebAssembly and snuck in a quick peek into how a simple Hello, world program would look like in WASM.

Note: If you wish to try this out yourself, head over to webassembly.studio. Create and build a Hello World project in C. When you build the project, a .wat file will be generated - a snippet of which was shared in the last post.

To be honest, the generated .wat file (an abbreviation of WebAssembly Text) wasn't as intimidating to me on account of familiarity with assembly language. However having dabbled in high-level languages, I am very cognizant that a language stripped of all the comfortable abstractions could be daunting to learn. Therefore, the next couple of posts in this series aim to familiarize the reader with the basic representation of WASM - the binary & textual format.

Image description

Why are there two formats?

Let's take a step back here and understand the motivation behind having two different formats. As aforementioned, the generated WASM code has a binary and a textual format. They aren't exactly the most human-friendly language to exist (unless you're familiar with Assembly language, of course). Therefore,
to enable WebAssembly to be read & edited by humans there is a textual representation of the wasm binary format. This is an intermediate form designed to be exposed in text editors, browser developer tools, etc.

This post aims to cover the nitty gritties of the binary format generated in the .wasm file briefly for those interested.

The Module

Heard of the Abstract Syntax tree (abbreviated as AST)? S-expressions can be very loosely likened to the AST in the sense that they know nothing about the programming language syntax but are a way to represent the same tree-structured data.
But how is this relevant? The fundamental unit of code in both the binary & textual formats is the module and it is one big s-expression. Let's take an example of a very simple code. Simpler than Hello World, since WebAssembly does not have a type to return strings…yet. How we managed to do so in the very first post is an advanced topic that I hope to cover in the next couple of posts. But this means that currently, we only have numeric types to play with for writing a simple program. Therefore, we'll write a simple program to return a number instead of the standard Hello World.

Full disclosure before I proceed. The example I have used here is from the website: https://blog.ttulka.com/learning-webassembly-2-wasm-binary-format.

Copy and paste the snippet below on this web utility & click Download.

(module
  (func (export "main") 
        (result i32)
    i32.const 42
    return))
Enter fullscreen mode Exit fullscreen mode

You'll get a .wasm file on your machine that isn't human-readable and is full of Unicode characters. You will need to view it with a Hex viewer. Therefore, I converted the .wasm file using the format-hex utility on Windows powershell.

The format-hex command

The output

Now I know the above output looks a bit weird, but stick with me.
If you notice in the very first line, the first four bytes are 0061 736D. This corresponds to the WASM binary magic number. It translates to \0asm and identifies the binary as a WASM binary. The next four bytes represent the WASM version i.e. 0100 0000. Yes, we're still at version 1!

But are there any other things except the module in the binary format and what on earth are those trailing numbers?

The section

Every module is organized into various sections in the binary format. Each section consists of the following:
A one-byte section ID
The section size
The actual content

Dependent on the kind of section, the following section IDs are used,

Credit: https://webassembly.github.io/spec/core/binary/modules.html#sections

To understand the above better, let us go back to our example.
On the very first line, we see a lot of bytes other than the magic number and version. That, dear reader, is the beginning of a section. The very first byte gives the ID, 01, and identifies it as a Type section. The next byte gives us the size as aforementioned. In our example, the size is 05 corresponding to a further five bytes, after which the section ends. Now the rest of the five bytes define how many types (of variables) are defined within the function, how many functions our code has, what it takes as a parameter, and what it returns. Standard programming stuff, but in binary format. The sequence is detailed below.

Number of types of variables
Byte 0A: 01
Function Type
Byte 0B: 60
Number of input parameters
Byte 0C: 00
Number of output parameters
Byte 0D: 01
Result type(i32)
Byte 0E: 7F
Enter fullscreen mode Exit fullscreen mode

Our type section decodes into a vector of function types that essentially maps the vector of parameters to the vector of results. The image is what it boils down to.

Image description

If we look at the next couple of bytes, it signals the start of a new section with ID 3. Per the table above, this is the "function" section with a section size of two bytes.

**Beginning of "function" section (section ID: 3)**
0F: 03

Section size
10: 02

Number of functions
11: 01 

Index of the function
12: 00
Enter fullscreen mode Exit fullscreen mode

Details of further sections have been described succinctly in this blog, to which I don't think I can add on. However, I'd like to leave you with an image of the pseudocode published on the blog for what our simple code translates to.

Credit: https://blog.ttulka.com/learning-webassembly-2-wasm-binary-format

So that's it for this post. This, hopefully, gave you a brief idea of WebAssembly's binary format without scaring you off into oblivion. In the next edition of this series, we will look at the text format for WebAssembly.

To stay updated with my latest tech shenanigans, do follow me on Twitter and LinkedIn. I also write a weekly newsletter, friday four, where I cover all the interesting goings-on in the world of tech as a highlight reel. Do consider subscribing if you feel like this is something up your alley :)

Discussion (2)

Collapse
zstauber profile image
Zachary Stauber

Thank you for your articles on WebAssembly. I did want to note that webassembly.studio is no longer around. Someone said they found a fork of the GitHub code running at webassembly-studio.kamenokosoft.com/ but since the GitHub code has not been updated in over a year, it is probably the same thing.

Collapse
divyamohan profile image
Divya Mohan Author

Yeah, it is no longer around :) While I was experimenting with WebAssembly in December 21 - January 22, it was very much in place. But thank you for flagging that and thank you for following :)