Find stories in data

PyPI Changelog Python 3.x License discord mastodon: @datasette

Annotated version of this introductory video

Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.

Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with the world. It is part of a wider ecosystem of 46 tools and 157 plugins dedicated to making working with structured data as productive as possible.

Try a demo and explore 33,000 power plants around the world, then follow the tutorial or take a look at some other examples of Datasette in action.

Then read how to get started with Datasette, subscribe to the monthly-ish newsletter and consider signing up for office hours for an in-person conversation about the project.

New: Datasette Desktop - a macOS desktop application for easily running Datasette on your own computer!

Exploratory data analysis

Import data from CSVs, JSON, database connections and more. Datasette will automatically show you patterns in your data and help you share your findings with your colleagues.

Instant data publishing

datasette publish lets you instantly publish your data to hosting providers like Google Cloud Run, Heroku or Vercel.

Rapid prototyping

Spin up a JSON API for any data in minutes. Use it to prototype and prove your ideas without building a custom backend.

Latest news

18th February 2024 #

Datasette 1.0a10 is a focused alpha that changes some internal details about how Datasette handles transactions. The datasette.execute_write_fn() internal method now wraps the function in a database transaction unless you pass transaction=False.

16th February 2024 #

Datasette 1.0a9 adds basic alter table support to the JSON API, tweaks how permissions works and introduces some new plugin debugging utilities.

7th February 2024 #

Datasette 1.0a8 introduces several new plugin hooks, a JavaScript plugin system and moves plugin configuration from metadata.yaml to datasette.yaml. Read more about the release in the annotated release notes for 1.0a8.

1st December 2023 #

Datasette Enrichments is a new feature for Datasette that supports enriching data by running custom code against every selected row in a table. Read Datasette Enrichments: a new plugin framework for augmenting your data for more details, plus a video demo of enrichments for geocoding addresses and processing text and images using GPT-4.

30th November 2023 #

datasette-comments is a new plugin by Alex Garcia which adds collaborative commenting to Datasette. Alex built the plugin for Datasette Cloud, but it's also available as an open source package for people who are hosting their own Datasette instances. See Annotate and explore your data with datasette-comments on the Datasette Cloud blog for more details.

22nd August 2023 #

Datasette 1.0a4 has a fix for a security vulnerability in the Datasette 1.0 alpha series: the API explorer interface exposed the names of private databases and tables in public instances that were protected by a plugin such as datasette-auth-passwords, though not the actual content of those tables. See the security advisory for more details and workarounds for if you can't upgrade immediately. The latest edition of the Datasette Newsletter also talks about this issue.

15th August 2023 #

datasette-write-ui: a Datasette plugin for editing, inserting, and deleting rows introduces a new plugin adding add/edit/delete functionality to Datasette, developed by Alex Garcia. Alex built this for Datasette Cloud, and this post is the first announcement made on the new Datasette Cloud blog - see also Welcome to Datasette Cloud.

9th August 2023 #

Datasette 1.0a3 is an alpha release of Datasette that previews the new default JSON API design that’s coming in version 1.0 - the single most significant change planned for that 1.0 release.

1st July 2023 #

New tutorial: Data analysis with SQLite and Python. This tutorial, originally presented at PyCon 2023, includes a 2h45m video and an extensive handout that should be useful with or without the video. Topics covered include Python's sqlite3 module, sqlite-utils, Datasette, Datasette Lite, advanced SQL patterns and more.

24th March 2023 #

I built a ChatGPT plugin to answer questions about data hosted in Datasette describes a new experimental Datasette plugin to enable people to query data hosted in a Datasette interface via ChatGPT, asking human language questions that are automatically converted to SQL and used to generate a readable response.

23rd February 2023 #

Using Datasette in GitHub Codespaces is a new tutorial showing how Datasette can be run in GitHub's free Codespaces browser-based development environments, using the new datasette-codespaces plugin.

28th January 2023 #

Examples of sites built using Datasette now includes screenshots of Datasette deployments that illustrate a variety of problems that can be addressed using Datasette and its plugins.

13th January 2023 #

Semantic search answers: Q&A against documentation with GPT3 + OpenAI embeddings shows how Datasette can be used to implement semantic search and build a system for answering questions against an existing corpus of text, using two new plugins: datasette-openai and datasette-faiss, and a new tool: openai-to-sqlite.

9th January 2023 #

Datasette 0.64 is out, and includes a strong warning against running SpatiaLite in production without disabling arbitrary SQL queries, plus a new --setting default_allow_sql off setting to make it easier to do that. See Datasette 0.64, with a warning about SpatiaLite for more about this release. A new tutorial, Building a location to time zone API with SpatiaLite, describes how to safely use SpatiaLite and Datasette to build and deploy an API for looking up time zones for a latitude/longitude location.

15th December 2022 #

Datasette 1.0a2: Upserts and finely grained permissions describes the new upsert API and much improved permissions capabilities introduced in the latest Datasette 1.0a2 alpha release.

All news

Latest releases

26th July 2024

datasette-extract 0.1a8 - Import unstructured data (text and images) into structured tables

  • Now uses GPT-4o mini, which is around 30 times cheaper than GPT-4o for text tasks, though the same price for image tasks. #30
  • New feature: on the "Extract data into this table" page there is now a link to "Duplicate these columns to a new table" which pre-fills the create table form with the same columns and hints. #29

18th July 2024

llm 0.15 - A CLI utility and Python library for interacting with Large Language Models, including OpenAI, PaLM and local models installed on your own machine.

  • Support for OpenAI's new GPT-4o mini model: llm -m gpt-4o-mini 'rave about pelicans in French' #536
  • gpt-4o-mini is now the default model if you do not specify your own default, replacing GPT-3.5 Turbo. GPT-4o mini is both cheaper and better than GPT-3.5 Turbo.
  • Fixed a bug where llm logs -q 'flourish' -m haiku could not combine both the -q search query and the -m model specifier. #515

sqlite-utils 3.37 - CLI tool and Python library for manipulating SQLite databases

  • The create-table and insert-files commands all now accept multiple --pk options for compound primary keys. (#620)
  • Now tested against Python 3.13 pre-release. (#619)
  • Fixed a crash that can occur in environments with a broken numpy installation, producing a module 'numpy' has no attribute 'int8'. (#632)

12th July 2024

datasette-python 0.1 - Run a Python interpreter in the Datasette virtual environment

21st June 2024

datasette 0.64.8 - An open source multi-tool for exploring and publishing data

  • Security improvement: 404 pages used to reflect content from the URL path, which could be used to display misleading information to Datasette users. 404 errors no longer display additional information from the URL. (#2359)
  • Backported a better fix for correctly extracting named parameters from canned query SQL against SQLite 3.46.0. (#2353)

17th June 2024

datasette-faiss 0.2.1 - Maintain a FAISS index for specified Datasette tables

  • Pin to NumPy 1.x - the faiss-cpu library this depends on is not yet compatible with NumPy 2. #4

13th June 2024

datasette-cluster-map 0.18.2 - Datasette plugin that shows a map for any data with latitude/longitude columns

  • Fixed bug where default tiles were displayed at retina resolution in a way that caused the map labels to be illegibly small. #48

12th June 2024

datasette 0.64.7 - An open source multi-tool for exploring and publishing data

  • Fixed a bug where canned queries with named parameters threw an error when run against SQLite 3.46.0. (#2353)

15th May 2024

datasette-enrichments-gpt 0.5 - Datasette enrichment for analyzing row data using OpenAI's GPT models

  • Now uses datasette-secrets for configuration. If you were previously using the environment variable OPENAI_API_KEY you should change that to DATASETTE_SECRETS_OPENAI_API_KEY. #9
  • Switched from using GPT-4 Turbo to the new GPT-4o, which is half the price and should provide better results. #14

datasette-extract 0.1a7 - Import unstructured data (text and images) into structured tables

  • Now uses GPT-4o instead of GPT-4 Turbo - the new model is cheaper, faster and likely a little bit better too. #28

13th May 2024

llm 0.14 - A CLI utility and Python library for interacting with Large Language Models, including OpenAI, PaLM and local models installed on your own machine.

  • Support for OpenAI's new GPT-4o model: llm -m gpt-4o 'say hi in Spanish' #490
  • The gpt-4-turbo alias is now a model ID, which indicates the latest version of OpenAI's GPT-4 Turbo text and image model. Your existing logs.db database may contain records under the previous model ID of gpt-4-turbo-preview. #493
  • New llm logs -r/--response option for outputting just the last captured response, without wrapping it in Markdown and accompanying it with the prompt. #431
  • Nine new {ref}plugins <plugin-directory> since version 0.13:
  • llm-claude-3 supporting Anthropic's Claude 3 family of models.
  • llm-command-r supporting Cohere's Command R and Command R Plus API models.
  • llm-reka supports the Reka family of models via their API.
  • llm-perplexity by Alexandru Geana supporting the Perplexity Labs API models, including llama-3-sonar-large-32k-online which can search for things online and llama-3-70b-instruct.
  • llm-groq by Moritz Angermann providing access to fast models hosted by Groq.
  • llm-fireworks supporting models hosted by Fireworks AI.
  • llm-together adds support for the Together AI extensive family of hosted openly licensed models.
  • llm-embed-onnx provides seven embedding models that can be executed using the ONNX model framework.
  • llm-cmd accepts a prompt for a shell command, runs that prompt and populates the result in your shell so you can review it, edit it and then hit <enter> to execute or ctrl+c to cancel, see this post for details.

3rd May 2024

datasette-upload-dbs 0.3.2 - Upload SQLite database files to Datasette

  • Tweak to the margins on the progress bar.

2nd May 2024

ttok 0.3 - Count and truncate text based on tokens

  • New --allow-special option for allowing special tokens: ttok '<|endoftext|>' --encode --allow-special #13

27th April 2024

datasette-enrichments 0.4.2 - Tools for running enrichments against data stored in Datasette

  • The get_config_form() method is now optional when implementing an enrichment, as previously incorrectly described in the documentation. #44

datasette-enrichments 0.4.1

  • Removed breakpoint() calls in an error path that should not have been released. #49

All releases