Matt Butcher
Serverless AI Inferencing with Python and Wasm

In a recent article entitled Serverless AI Inferencing with Python and Wasm, Tim McCallum walks through the process of writing a new AI application with Spin, Python, and LLaMa2-Chat.

The basic process is as follows:

  • Create a new app with spin new http-py
  • Add ai_models = ["llama2-chat"] to the spin.toml file
  • Write some code (which I'll show below)
  • Build it with spin build
  • Deploy it with spin deploy

Tim's write-up is super simple and easy to follow. You can be writing your first AI app in no time.

Here's the code Tim added to the scaffolded out app.py file:

from spin_http import Response
from spin_llm import llm_infer
import json
import re

PROMPT = """<<SYS>>
You are a bot that generates sentiment analysis responses. Respond with a single positive, negative, or neutral.
Follow the pattern of the following examples:

User: Hi, my name is Bob
Bot: neutral

User: I am so happy today
Bot: positive

User: I am so sad today
Bot: negative

User: """

def handle_request(request):
    request_body = json.loads(request.body)
    sentence = request_body["sentence"].strip()
    result = llm_infer("llama2-chat", PROMPT + sentence)
    response_body = json.dumps({"sentence": re.sub("\\nBot\: ", "", result.text)})
    return Response(
        200, {"content-type": "application/json"}, bytes(response_body, "utf-8")
)

That's it! The spin deploy command will give you a URL to test. You can use curl to send JSON requests to it:

$ curl -X POST --data '{"sentence":"Everything is awesome!"}' https://sentiment-analysis-abc-xyz.fermyon.app/

    "sentence": "positive"
"sentence": "positive"

If Spin is new for you, you might prefer to start with the Spin quickstart guide. Spin is open source, and you can check out the code for it on GitHub.

