Wasm Builders 🧱

Akmal Chaudhri
Akmal Chaudhri

Posted on • Updated on

Quick tip: Using WebAssembly to implement Pearson Product-Moment Correlation in SingleStoreDB

Abstract

In this short article, we'll use WebAssembly to extend SingleStoreDB with Pearson Product-Moment Correlation.

Introduction

Continuing our short series of articles on statistical computations, we'll focus on another example and see how we can utilise WebAssembly with SingleStoreDB.

Create a SingleStoreDB Cloud account

A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use Stats Demo Group as our Workspace Group Name and stats-demo as our Workspace Name.

Once we've created our database in the following steps, we'll make a note of our password and host name.

Create a Database

In our SingleStoreDB Cloud account, we'll use the SQL Editor to create a new database, as follows:

CREATE DATABASE IF NOT EXISTS test;
Enter fullscreen mode Exit fullscreen mode

Setup local Wasm development environment

We'll follow the steps described in the previous article to quickly create a local Wasm development environment. We'll also install and use the pushwasm tool.

Next, let's clone the following GitHub repo:

git clone https://github.com/singlestore-labs/singlestoredb-statistics
Enter fullscreen mode Exit fullscreen mode

Compile

We'll now change to the singlestoredb-statistics/correlation directory and build the code, as follows:

cd singlestoredb-statistics/correlation

cargo build --target wasm32-wasi --release
Enter fullscreen mode Exit fullscreen mode

Deploy

Once the code is built, we'll create an environment variable:

export SINGLESTOREDB_CONNSTRING="mysql://admin:<password>@<host>:3306/test"
Enter fullscreen mode Exit fullscreen mode

We'll replace the <password> and <host> with the values from our SingleStoreDB Cloud account.

Next, we'll use pushwasm to load the Wasm modules into SingleStoreDB, one-by-one:

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit ./target/wasm32-wasi/release/correlation.wasm corr2_init

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit ./target/wasm32-wasi/release/correlation.wasm corr2_iter

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit ./target/wasm32-wasi/release/correlation.wasm corr2_merge

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit ./target/wasm32-wasi/release/correlation.wasm corr2_term

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit ./target/wasm32-wasi/release/correlation.wasm corr2_termd

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit ./target/wasm32-wasi/release/correlation.wasm corrmat_init

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit ./target/wasm32-wasi/release/correlation.wasm corrmat_iter

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit ./target/wasm32-wasi/release/correlation.wasm corrmat_merge

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit ./target/wasm32-wasi/release/correlation.wasm corrmat_term

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit ./target/wasm32-wasi/release/correlation.wasm vec_pack_f64

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit ./target/wasm32-wasi/release/correlation.wasm vec_unpack_f64
Enter fullscreen mode Exit fullscreen mode

The output should be as follows:

Wasm UDF 'corr2_init' was created successfully.

Wasm UDF 'corr2_iter' was created successfully.

Wasm UDF 'corr2_merge' was created successfully.

Wasm UDF 'corr2_term' was created successfully.

Wasm UDF 'corr2_termd' was created successfully.

Wasm UDF 'corrmat_init' was created successfully.

Wasm UDF 'corrmat_iter' was created successfully.

Wasm UDF 'corrmat_merge' was created successfully.

Wasm UDF 'corrmat_term' was created successfully.

Wasm UDF 'vec_pack_f64' was created successfully.

Wasm UDF 'vec_unpack_f64' was created successfully.
Enter fullscreen mode Exit fullscreen mode

Load and run SQL

We'll use a MySQL CLI client to connect to SingleStoreDB:

mysql --local-infile -u admin -h <host> -P 3306 --default-auth=mysql_native_password -p
Enter fullscreen mode Exit fullscreen mode

We'll replace the <host> with the value from our SingleStoreDB Cloud account.

Once connected, we'll switch to the test database:

USE test;
Enter fullscreen mode Exit fullscreen mode

We'll then execute the SQL statements from the correlation.sql file, as follows:

SOURCE correlation.sql
Enter fullscreen mode Exit fullscreen mode

This will create and load data into the iris table, and create some additional functions.

We can quickly check the table:

SELECT * FROM iris LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+--------------+-------------+--------------+-------------+---------+
| sepal_length | sepal_width | petal_length | petal_width | species |
+--------------+-------------+--------------+-------------+---------+
|          4.7 |         3.2 |          1.3 |         0.2 | setosa  |
|            5 |         3.4 |          1.5 |         0.2 | setosa  |
|          5.4 |         3.9 |          1.3 |         0.4 | setosa  |
|          5.4 |         3.4 |          1.7 |         0.2 | setosa  |
|            5 |           3 |          1.6 |         0.2 | setosa  |
+--------------+-------------+--------------+-------------+---------+
Enter fullscreen mode Exit fullscreen mode

There are two aggregate functions:

  1. corr2d(): Returns the correlation coefficient between two variables
  2. corr2(): Returns more details of the relationship between two variables, including linear regression

Run Wasm in the database

We can test the Wasm functions with some examples from the GitHub repo.

  • Compute the correlation between sepal_width and sepal_length.
SELECT species, corr2d(sepal_width, sepal_length)
FROM iris
GROUP BY species;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+------------+-----------------------------------+
| species    | corr2d(sepal_width, sepal_length) |
+------------+-----------------------------------+
| virginica  |                0.4572278163941236 |
| versicolor |                 0.525910717282782 |
| setosa     |                0.7467803732639016 |
+------------+-----------------------------------+
Enter fullscreen mode Exit fullscreen mode
  • Compute the correlation between sepal_width and sepal_length and return the results as JSON.
SELECT species, corr2(sepal_width, sepal_length)
FROM iris
GROUP BY species;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| virginica  | {"b0":3.9068364663867405,"b1":0.9015344766688834,"corr":0.4572278163941236,"n":50,"nmiss":0,"r2":0.2090572760845384,"sse":15.670790000392019,"x_avg":2.9739999999999993,"y_avg":6.587999999999999} |
| versicolor | {"b0":3.53973471502592,"b1":0.8650777202072483,"corr":0.525910717282782,"n":50,"nmiss":0,"r2":0.2765820825528903,"sse":9.444365595856668,"x_avg":2.7700000000000005,"y_avg":5.935999999999998}     |
| setosa     | {"b0":2.6446596755601486,"b1":0.6908543956816411,"corr":0.7467803732639016,"n":50,"nmiss":0,"r2":0.5576809258921721,"sse":2.6929269869833408,"x_avg":3.418000000000002,"y_avg":5.005999999999999}  |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode
  • Compute the matrix of correlation coefficients between sepal_length, sepal_width, and petal_width.
SELECT corrmat(vec_pack_f64([sepal_length, sepal_width, petal_width]))
FROM iris;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+-----------------------------------------------------------------------+
| corrmat(vec_pack_f64([sepal_length, sepal_width, petal_width]))       |
+-----------------------------------------------------------------------+
| [1,-0.10936924995068921,1,0.81795363336916715,-0.35654408961381734,1] |
+-----------------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode

Summary

In this short article, we have seen how we can extend SingleStoreDB with Wasm to provide support for statistical correlation.

Acknowledgements

I thank Oliver Schabenberger for his work on the Wasm modules and the code examples and documentation in the GitHub repo.

Latest comments (0)