This post was originally published on my Medium blog and was later picked up for publication by DevGenius
In the very first post of this series, we walked through the motivations behind the creation of WebAssembly and snuck in a quick peek into how a simple Hello, world program would look like in WASM.
Note: If you wish to try this out yourself, head over to webassembly.studio. Create and build a Hello World project in C. When you build the project, a .wat file will be generated - a snippet of which was shared in the last post.
To be honest, the generated .wat file (an abbreviation of WebAssembly Text) wasn't as intimidating to me on account of familiarity with assembly language. However having dabbled in high-level languages, I am very cognizant that a language stripped of all the comfortable abstractions could be daunting to learn. Therefore, the next couple of posts in this series aim to familiarize the reader with the basic representation of WASM - the binary & textual format.
Let's take a step back here and understand the motivation behind having two different formats. As aforementioned, the generated WASM code has a binary and a textual format. They aren't exactly the most human-friendly language to exist (unless you're familiar with Assembly language, of course). Therefore,
to enable WebAssembly to be read & edited by humans there is a textual representation of the wasm binary format. This is an intermediate form designed to be exposed in text editors, browser developer tools, etc.
This post aims to cover the nitty gritties of the binary format generated in the .wasm file briefly for those interested.
Heard of the Abstract Syntax tree (abbreviated as AST)? S-expressions can be very loosely likened to the AST in the sense that they know nothing about the programming language syntax but are a way to represent the same tree-structured data.
But how is this relevant? The fundamental unit of code in both the binary & textual formats is the module and it is one big s-expression. Let's take an example of a very simple code. Simpler than Hello World, since WebAssembly does not have a type to return strings…yet. How we managed to do so in the very first post is an advanced topic that I hope to cover in the next couple of posts. But this means that currently, we only have numeric types to play with for writing a simple program. Therefore, we'll write a simple program to return a number instead of the standard Hello World.
Full disclosure before I proceed. The example I have used here is from the website: https://blog.ttulka.com/learning-webassembly-2-wasm-binary-format.
Copy and paste the snippet below on this web utility & click Download.
(module (func (export "main") (result i32) i32.const 42 return))
You'll get a .wasm file on your machine that isn't human-readable and is full of Unicode characters. You will need to view it with a Hex viewer. Therefore, I converted the .wasm file using the format-hex utility on Windows powershell.
Now I know the above output looks a bit weird, but stick with me.
If you notice in the very first line, the first four bytes are 0061 736D. This corresponds to the WASM binary magic number. It translates to \0asm and identifies the binary as a WASM binary. The next four bytes represent the WASM version i.e. 0100 0000. Yes, we're still at version 1!
But are there any other things except the module in the binary format and what on earth are those trailing numbers?
Every module is organized into various sections in the binary format. Each section consists of the following:
A one-byte section ID
The section size
The actual content
Dependent on the kind of section, the following section IDs are used,
To understand the above better, let us go back to our example.
On the very first line, we see a lot of bytes other than the magic number and version. That, dear reader, is the beginning of a section. The very first byte gives the ID, 01, and identifies it as a Type section. The next byte gives us the size as aforementioned. In our example, the size is 05 corresponding to a further five bytes, after which the section ends. Now the rest of the five bytes define how many types (of variables) are defined within the function, how many functions our code has, what it takes as a parameter, and what it returns. Standard programming stuff, but in binary format. The sequence is detailed below.
Number of types of variables Byte 0A: 01 Function Type Byte 0B: 60 Number of input parameters Byte 0C: 00 Number of output parameters Byte 0D: 01 Result type(i32) Byte 0E: 7F
Our type section decodes into a vector of function types that essentially maps the vector of parameters to the vector of results. The image is what it boils down to.
If we look at the next couple of bytes, it signals the start of a new section with ID 3. Per the table above, this is the "function" section with a section size of two bytes.
**Beginning of "function" section (section ID: 3)** 0F: 03 Section size 10: 02 Number of functions 11: 01 Index of the function 12: 00
Details of further sections have been described succinctly in this blog, to which I don't think I can add on. However, I'd like to leave you with an image of the pseudocode published on the blog for what our simple code translates to.
So that's it for this post. This, hopefully, gave you a brief idea of WebAssembly's binary format without scaring you off into oblivion. In the next edition of this series, we will look at the text format for WebAssembly.
To stay updated with my latest tech shenanigans, do follow me on Twitter and LinkedIn. I also write a weekly newsletter, friday four, where I cover all the interesting goings-on in the world of tech as a highlight reel. Do consider subscribing if you feel like this is something up your alley :)