Abstract
Using WebAssembly, we can extend the capabilities of SingleStoreDB in many useful ways. In this article, we'll see how to implement the Chi-Square Test of Independence.
Introduction
In a series of short articles, we'll see how to extend SingleStoreDB with several statistical computations implemented in WebAssembly.
Create a SingleStoreDB Cloud account
A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use Stats Demo Group as our Workspace Group Name and stats-demo as our Workspace Name.
Once we've created our database in the following steps, we'll make a note of our password and host name.
Create a Database
In our SingleStoreDB Cloud account, we'll use the SQL Editor to create a new database, as follows:
CREATE DATABASE IF NOT EXISTS test;
Setup local Wasm development environment
We'll follow the steps described in the previous article to quickly create a local Wasm development environment. We'll also install and use the pushwasm
tool.
Next, let's clone the following GitHub repo:
git clone https://github.com/singlestore-labs/singlestoredb-statistics
Compile
We'll now change to the singlestoredb-statistics/categorical
directory and build the code, as follows:
cargo build --target wasm32-wasi --release
Deploy
Once the code is built, we'll create an environment variable:
export SINGLESTOREDB_CONNSTRING="mysql://admin:<password>@<host>:3306/test"
We'll replace the <password>
and <host>
with the values from our SingleStoreDB Cloud account.
Next, we'll use pushwasm
to load the Wasm modules into SingleStoreDB, one-by-one:
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./categorical.wit --wasm ./target/wasm32-wasi/release/categorical.wasm --name chisq_init
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./categorical.wit --wasm ./target/wasm32-wasi/release/categorical.wasm --name chisq_iter
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./categorical.wit --wasm ./target/wasm32-wasi/release/categorical.wasm --name chisq_merge
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./categorical.wit --wasm ./target/wasm32-wasi/release/categorical.wasm --name chisq_term
All the Wasm UDFs should be successfully created.
Load and run SQL
Next, from the SQL Editor in SingleStoreDB Cloud, we'll select the three vertical dots and choose the Load SQL File option, as shown in Figure 1.
We'll locate, choose and import the categorical.sql
file from the GitHub repo. Once imported, and before running the SQL code in the editor, we'll ensure that we are using the correct database:
USE test;
Then we can select all the code and run it.
Along with some helper functions and calls to the Wasm modules, the code contains two main procedures:
- chisq_(): Chi-square test of independence for two classification variables
- chisq_grouped(): Chi-square test of independence for two classification variables when the data are already grouped
The code loads the following data into the employee_sat
table, which contains three columns:
- EmpClass: Employee classification
- Opinion: Employee opinion
- Nij: The number of employees with a particular opinion in a particular classification
+---------------+--------------+------+
| EmpClass | Opinion | Nij |
+---------------+--------------+------+
| Faculty | Undecided | 10 |
| Staff | Favor | 30 |
| Faculty | Do not Favor | 50 |
| Administrator | Do not Favor | 25 |
| Administrator | Favor | 10 |
| Staff | Undecided | 15 |
| Faculty | Favor | 40 |
| Staff | Do not Favor | 15 |
| Administrator | Undecided | 5 |
+---------------+--------------+------+
Run Wasm in the database
Since this is grouped data, we'll run chisq_grouped()
, as follows:
echo chisq_grouped('employee_sat','EmpClass','Opinion','Nij')
The result should be similar to the following:
+--------------------------------------------------------------------+
| RESULT |
+--------------------------------------------------------------------+
| {"chisq":18.194444444444446,"df":4,"pvalue":0.0011306508216328837} |
+--------------------------------------------------------------------+
Summary
In this example, we have seen the ability to extend SingleStoreDB with Wasm and to use the new functionality to add power to the database engine.
Acknowledgements
I thank Oliver Schabenberger for his work on the Wasm modules and the code examples and documentation in the GitHub repo.
Latest comments (0)