datasette-llm-embed
Datasette plugin adding a llm_embed(model_id, text)
SQL function.
Installation
datasette install datasette-llm-embed
Usage
Adds a SQL function that can be called like this:
select llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')
This embeds the provided text using the specified embedding model and returns a binary blob, suitable for use with plugins such as datasette-faiss.
The models need to be installed using LLM plugins such as llm-sentence-transformers.
Use llm_embed_cosine(a, b)
to calculate cosine similarity between two vector blobs:
select llm_embed_cosine(
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text'),
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some other text')
)
The llm_embed_decode()
function can be used to decode a binary BLOB into a JSON array of floats:
select llm_embed_decode(
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')
)
Models that require API keys
If your embedding model needs an API key - for example the ada-002
model from OpenAI - you can configure that key in metadata.yml
(or JSON) like this:
plugins:
datasette-llm-embed:
keys:
ada-002:
$env: OPENAI_API_KEY
The key here should be the full model ID of the model - not an alias.
You can then set the OPENAI_API_KEY
environment variable to the key you want to use before starting Datasette:
export OPENAI_API_KEY=sk-1234567890
Once configured, calls like this will use the API key that has been provided:
select llm_embed('ada-002', 'This is some text')
Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd datasette-llm-embed
python3 -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
```bash
pytest