Wasm Builders 🧱

Akmal Chaudhri
Akmal Chaudhri

Posted on • Updated on

Quick tip: Using WebAssembly to implement Linear Regression and Analysis of Variance in SingleStoreDB

Abstract

In this final part of the statistics series, we'll use WebAssembly to extend SingleStoreDB with Linear Regression and Analysis of Variance (ANOVA).

Introduction

This is the final part of the short statistics series of articles, and we'll wrap up with details of how to implement two well-known statistical analysis techniques in SingleStoreDB using WebAssembly.

Create a SingleStoreDB Cloud account

A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use Stats Demo Group as our Workspace Group Name and stats-demo as our Workspace Name.

Once we've created our database in the following steps, we'll make a note of our password and host name.

Create a Database

In our SingleStoreDB Cloud account, we'll use the SQL Editor to create a new database, as follows:

CREATE DATABASE IF NOT EXISTS test;
Enter fullscreen mode Exit fullscreen mode

Setup local Wasm development environment

We'll follow the steps described in the previous article to quickly create a local Wasm development environment. We'll also install and use the pushwasm tool.

Next, let's clone the following GitHub repo:

git clone https://github.com/singlestore-labs/singlestoredb-statistics
Enter fullscreen mode Exit fullscreen mode

Compile

We'll now change to the singlestoredb-statistics/linreg directory and build the code, as follows:

cd singlestoredb-statistics/linreg

cargo build --target wasm32-wasi --release
Enter fullscreen mode Exit fullscreen mode

Deploy

Once the code is built, we'll create an environment variable:

export SINGLESTOREDB_CONNSTRING="mysql://admin:<password>@<host>:3306/test"
Enter fullscreen mode Exit fullscreen mode

We'll replace the <password> and <host> with the values from our SingleStoreDB Cloud account.

Next, we'll use pushwasm to load the Wasm modules into SingleStoreDB, one-by-one, in stages:

  • Analysis of Variance
pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm aov_init

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm aov_iter

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm aov_merge

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm aov_term
Enter fullscreen mode Exit fullscreen mode
  • Simple Linear Regression
pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm slr_init

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm slr_iter

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm slr_merge

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm slr_term
Enter fullscreen mode Exit fullscreen mode
  • Multiple Linear Regression
pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm mlr_init

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm mlr_iter

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm mlr_merge

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm mlr_term

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm mlr_terml
Enter fullscreen mode Exit fullscreen mode
  • Utility Functions
pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm vec_pack_f64

pushwasm --force $SINGLESTOREDB_CONNSTRING --wit ./linreg.wit ./target/wasm32-wasi/release/linreg.wasm vec_unpack_f64
Enter fullscreen mode Exit fullscreen mode

All the Wasm UDFs should be successfully created.

Load and run SQL

We'll use a MySQL CLI client to connect to SingleStoreDB:

mysql --local-infile -u admin -h <host> -P 3306 --default-auth=mysql_native_password -p
Enter fullscreen mode Exit fullscreen mode

We'll replace the <host> with the value from our SingleStoreDB Cloud account.

Once connected, we'll switch to the test database:

USE test;
Enter fullscreen mode Exit fullscreen mode

We'll then execute the SQL statements from the linreg.sql file, as follows:

SOURCE linreg.sql
Enter fullscreen mode Exit fullscreen mode

This will create some additional functions.

Run Wasm in the database

For the next part, we'll need to download the bottle.csv file from the CalCOFI project at Kaggle. The downloaded zip file is approximately 50 MB and 257 MB when unpacked. We'll copy the unpacked CSV file to the singlestoredb-statistics/linreg/data directory.

Next, we'll execute the SQL statements from the schema.sql file, as follows:

SOURCE schema.sql
Enter fullscreen mode Exit fullscreen mode

This will create and load data into the fitness and bottle tables, and run some SQL statements. Further details of the tables and the following queries can be found in the schema.sql file in the GitHub repo.

1. Example Fitness Queries

  • Long output, full model
SELECT mlr(oxygen,vec_pack_f64([Age, Weight, RunTime, RestPulse, RunPulse, MaxPulse]))
FROM fitness;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+--------------------------------------------------------------------------------------------------------------------------------------------------+
| mlr(oxygen,vec_pack_f64([Age, Weight, RunTime, RestPulse, RunPulse, MaxPulse]))                                                                  |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| [102.93447948433982,-0.22697379625619457,-0.07417741369123694,-2.6286528180843582,-0.021533639753211362,-0.36962775849034357,0.3032171294554114] |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode
  • Long output, small model
SELECT mlrl(oxygen,vec_pack_f64([Age, Weight]))
FROM fitness;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| mlrl(oxygen,vec_pack_f64([Age, Weight]))                                                                                                                                                                                                                                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [31,3,128.24814438231112,723.13340045637744,851.38154483868857,2,28,30,2.4829084373909605,0.10169946158950849,77.290494127549152,-0.37041610365825456,-0.15823203989833429,13.854628842781686,0.18310073707407407,0.11457181170725468,5.5786766289172585,-2.0230180914476681,-1.3810730365566442,5.7196351554988922e-06,0.052713683614924456,0.17818046127909337] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode
  • Regression coefficients
SELECT mlr(oxygen,vec_pack_f64([Age, Weight]))
FROM fitness;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+----------------------------------------------------------------+
| mlr(oxygen,vec_pack_f64([Age, Weight]))                        |
+----------------------------------------------------------------+
| [77.290494127549152,-0.37041610365825456,-0.15823203989833429] |
+----------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode
  • Simple Linear Regression
SELECT slr(oxygen,Age)
FROM fitness;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+-----------------------------------------------------------------------------------------------------------------------------------------+
| slr(oxygen,Age)                                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------------------------------+
| {"b0":62.22063853036843,"b1":-0.31135980679392494,"n":31,"pvalue":0.09570086024288527,"r2":0.09277653134140218,"sse":772.3933182604706} |
+-----------------------------------------------------------------------------------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode
  • Regression coefficients
SELECT mlr(oxygen,vec_pack_f64([Age]))
FROM fitness;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+-------------------------------------------+
| mlr(oxygen,vec_pack_f64([Age]))           |
+-------------------------------------------+
| [62.220638530368433,-0.31135980679392494] |
+-------------------------------------------+
Enter fullscreen mode Exit fullscreen mode
  • Detailed output
SELECT mlrl(oxygen,vec_pack_f64([Age]))
FROM fitness;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| mlrl(oxygen,vec_pack_f64([Age]))                                                                                                                                                                                                                                                  |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [31,2,78.988226578218018,772.39331826047055,851.38154483868857,1,29,30,2.9656633694439267,0.095700860242885266,62.220638530368433,-0.31135980679392494,8.6698326369370626,0.18080133071430227,7.1766827730079648,-1.7221101502064047,6.7106189272081451e-08,0.095700860242885266] |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode
  • Simple Linear Regression with GROUP BY
SELECT slr(oxygen,Age)
FROM fitness
GROUP BY agegroup;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+-----------------------------------------------------------------------------------------------------------------------------------------+
| slr(oxygen,Age)                                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------------------------------+
| {"b0":67.07355405406196,"b1":-0.3972297297298791,"n":12,"pvalue":0.5382534183098988,"r2":0.039028511587353654,"sse":239.5865736621429}  |
| {"b0":17.676813043480454,"b1":0.6239260869564743,"n":13,"pvalue":0.33971793419603324,"r2":0.08303100866375002,"sse":228.18461678696318} |
| {"b0":61.31558399999808,"b1":-0.24194399999995245,"n":6,"pvalue":0.8882375351410527,"r2":0.005572168395395094,"sse":217.63939676800314} |
+-----------------------------------------------------------------------------------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode
  • Multiple Linear Regression with GROUP BY
SELECT AgeGroup, mlr(oxygen,vec_pack_f64([Weight, RunPulse]))
FROM fitness
GROUP BY AgeGroup;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+----------+-----------------------------------------------------------------+
| AgeGroup | mlr(oxygen,vec_pack_f64([Weight, RunPulse]))                    |
+----------+-----------------------------------------------------------------+
| > 50     | [89.457181891040094,-0.061111943943463934,-0.23491808981133788] |
| 44-50    | [105.46359183230126,-0.12189672611963881,-0.28551453735354709]  |
| < 44     | [133.27903520507476,-0.10553676551069738,-0.42464015851549392]  |
+----------+-----------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode

2. Example Bottle Queries

  • Simple Linear Regression (Salinity as a function of T_degc)
SELECT slr(Salnty,T_degc)
FROM bottle
WHERE T_degc IS NOT NULL AND Salnty IS NOT NULL;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+--------------------------------------------------------------------------------------------------------------------------+
| slr(Salnty,T_degc)                                                                                                       |
+--------------------------------------------------------------------------------------------------------------------------+
| {"b0":34.44090915598341,"b1":-0.055207755972974094,"n":814247,"pvalue":0,"r2":0.255293714414324,"sse":129222.9813118148} |
+--------------------------------------------------------------------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode
  • Multiple Linear Regression (Salinity as a function of T_degc)
SELECT mlr(Salnty,vec_pack_f64([T_degc]))
FROM bottle
WHERE T_degc IS NOT NULL AND Salnty IS NOT NULL;
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+--------------------------------------------+
| mlr(Salnty,vec_pack_f64([T_degc]))         |
+--------------------------------------------+
| [34.440909155983412,-0.055207755972974094] |
+--------------------------------------------+
Enter fullscreen mode Exit fullscreen mode

Summary

In this article, we have seen how WebAssembly can provide significant new functionality to SingleStoreDB, allowing detailed data analysis without using external systems or exporting the data for use with third-party tools. We have also seen how powerful the Wasm functions can be through the example queries.

Acknowledgements

I thank Oliver Schabenberger for his work on the Wasm modules and the code examples and documentation in the GitHub repo.

Top comments (0)