Find stories in data

PyPI Changelog Python 3.x License discord mastodon: @datasette

Annotated version of this introductory video

Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.

Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with the world. It is part of a wider ecosystem of 46 tools and 158 plugins dedicated to making working with structured data as productive as possible.

Try a demo and explore 33,000 power plants around the world, then follow the tutorial or take a look at some other examples of Datasette in action.

Then read how to get started with Datasette, subscribe to the monthly-ish newsletter and consider signing up for office hours for an in-person conversation about the project.

New: Datasette Desktop - a macOS desktop application for easily running Datasette on your own computer!

Exploratory data analysis

Import data from CSVs, JSON, database connections and more. Datasette will automatically show you patterns in your data and help you share your findings with your colleagues.

Instant data publishing

datasette publish lets you instantly publish your data to hosting providers like Google Cloud Run, Heroku or Vercel.

Rapid prototyping

Spin up a JSON API for any data in minutes. Use it to prototype and prove your ideas without building a custom backend.

Latest news

7th October 2024 #

Python 3.13 was released today. Datasette 1.0a16 is compatible with Python 3.13, but Datasette 0.64.8 was not. The new Datasette 0.65 release fixes compatibility with the new version of Python.

5th August 2024 #

Datasette 1.0a14 includes some breaking changes to how metadata works for plugins, described in detail in the new upgrade guide. See also the annotated release notes that accompany this release.

18th February 2024 #

Datasette 1.0a10 is a focused alpha that changes some internal details about how Datasette handles transactions. The datasette.execute_write_fn() internal method now wraps the function in a database transaction unless you pass transaction=False.

16th February 2024 #

Datasette 1.0a9 adds basic alter table support to the JSON API, tweaks how permissions works and introduces some new plugin debugging utilities.

7th February 2024 #

Datasette 1.0a8 introduces several new plugin hooks, a JavaScript plugin system and moves plugin configuration from metadata.yaml to datasette.yaml. Read more about the release in the annotated release notes for 1.0a8.

1st December 2023 #

Datasette Enrichments is a new feature for Datasette that supports enriching data by running custom code against every selected row in a table. Read Datasette Enrichments: a new plugin framework for augmenting your data for more details, plus a video demo of enrichments for geocoding addresses and processing text and images using GPT-4.

30th November 2023 #

datasette-comments is a new plugin by Alex Garcia which adds collaborative commenting to Datasette. Alex built the plugin for Datasette Cloud, but it's also available as an open source package for people who are hosting their own Datasette instances. See Annotate and explore your data with datasette-comments on the Datasette Cloud blog for more details.

22nd August 2023 #

Datasette 1.0a4 has a fix for a security vulnerability in the Datasette 1.0 alpha series: the API explorer interface exposed the names of private databases and tables in public instances that were protected by a plugin such as datasette-auth-passwords, though not the actual content of those tables. See the security advisory for more details and workarounds for if you can't upgrade immediately. The latest edition of the Datasette Newsletter also talks about this issue.

15th August 2023 #

datasette-write-ui: a Datasette plugin for editing, inserting, and deleting rows introduces a new plugin adding add/edit/delete functionality to Datasette, developed by Alex Garcia. Alex built this for Datasette Cloud, and this post is the first announcement made on the new Datasette Cloud blog - see also Welcome to Datasette Cloud.

9th August 2023 #

Datasette 1.0a3 is an alpha release of Datasette that previews the new default JSON API design that’s coming in version 1.0 - the single most significant change planned for that 1.0 release.

1st July 2023 #

New tutorial: Data analysis with SQLite and Python. This tutorial, originally presented at PyCon 2023, includes a 2h45m video and an extensive handout that should be useful with or without the video. Topics covered include Python's sqlite3 module, sqlite-utils, Datasette, Datasette Lite, advanced SQL patterns and more.

24th March 2023 #

I built a ChatGPT plugin to answer questions about data hosted in Datasette describes a new experimental Datasette plugin to enable people to query data hosted in a Datasette interface via ChatGPT, asking human language questions that are automatically converted to SQL and used to generate a readable response.

23rd February 2023 #

Using Datasette in GitHub Codespaces is a new tutorial showing how Datasette can be run in GitHub's free Codespaces browser-based development environments, using the new datasette-codespaces plugin.

28th January 2023 #

Examples of sites built using Datasette now includes screenshots of Datasette deployments that illustrate a variety of problems that can be addressed using Datasette and its plugins.

13th January 2023 #

Semantic search answers: Q&A against documentation with GPT3 + OpenAI embeddings shows how Datasette can be used to implement semantic search and build a system for answering questions against an existing corpus of text, using two new plugins: datasette-openai and datasette-faiss, and a new tool: openai-to-sqlite.

All news

Latest releases

20th November 2024

llm 0.19a0 - A CLI utility and Python library for interacting with Large Language Models, including OpenAI, PaLM and local models installed on your own machine.

  • Tokens used by a response are now logged to new input_tokens and output_tokens integer columns and a token_details JSON string column, for the default OpenAI models and models from other plugins that implement this feature. #610
  • llm prompt now takes a -u/--usage flag to display token usage at the end of the response.
  • llm logs -u/--usage shows token usage information for logged responses.
  • llm prompt ... --async responses are now logged to the database. #641

17th November 2024

llm 0.18

  • Initial support for async models. Plugins can now provide an AsyncModel subclass that can be accessed in the Python API using the new llm.get_async_model(model_id) method. See async models in the Python API docs and implementing async models in plugins. #507
  • OpenAI models all now include async models, so function calls such as llm.get_async_model("gpt-4o-mini") will return an async model.
  • gpt-4o-audio-preview model can be used to send audio attachments to the GPT-4o audio model. #608
  • Attachments can now be sent without requiring a prompt. #611
  • llm models --options now includes information on whether a model supports attachments. #612
  • llm models --async shows available async models.
  • Custom OpenAI-compatible models can now be marked as can_stream: false in the YAML if they do not support streaming. Thanks, Chris Mungall. #600
  • Fixed bug where OpenAI usage data was incorrectly serialized to JSON. #614
  • Standardized on audio/wav MIME type for audio attachments rather than audio/wave. [#603](https://github.com/simonw/llm/issues/603

14th November 2024

llm 0.18a1

  • Fixed bug where conversations did not work for async OpenAI models. #632
  • __repr__ methods for Response and AsyncResponse.

datasette-write-ui 0.0.1a11

datasette-dashboards 0.7.0 - Datasette plugin providing data dashboards from metadata

llm 0.18a0 - A CLI utility and Python library for interacting with Large Language Models, including OpenAI, PaLM and local models installed on your own machine.

Alpha support for async models. #507

Multiple smaller changes.

8th November 2024

sqlite-utils 3.38a0 - CLI tool and Python library for manipulating SQLite databases

  • Plugins can now reuse the sqlite-utils memory command with the new return_db=True parameter. #643

1st November 2024

llm 0.17.1 - A CLI utility and Python library for interacting with Large Language Models, including OpenAI, PaLM and local models installed on your own machine.

  • Fixed a bug where llm chat crashes if a follow-up prompt is provided. #601

29th October 2024

llm 0.17

Support for attachments, allowing multi-modal models to accept images, audio, video and other formats. #587

The default OpenAI gpt-4o and gpt-4o-mini models can both now be prompted with JPEG, GIF, PNG and WEBP images.

Attachments in the CLI can be URLs:

llm -m gpt-4o "describe this image" \
  -a https://static.simonwillison.net/static/2024/pelicans.jpg

Or file paths:

llm -m gpt-4o-mini "extract text" -a image1.jpg -a image2.jpg

Or binary data, which may need to use --attachment-type to specify the MIME type:

cat image | llm -m gpt-4o-mini "extract text" --attachment-type - image/jpeg

Attachments are also available in the Python API:

model = llm.get_model("gpt-4o-mini")
response = model.prompt(
    "Describe these images",
    attachments=[
        llm.Attachment(path="pelican.jpg"),
        llm.Attachment(url="https://static.simonwillison.net/static/2024/pelicans.jpg"),
    ]
)

Plugins that provide alternative models can support attachments, see Attachments for multi-modal models for details.

The latest llm-claude-3 plugin now supports attachments for Anthropic's Claude 3 and 3.5 models. The llm-gemini plugin supports attachments for Google's Gemini 1.5 models.

Also in this release: OpenAI models now record their "usage" data in the database even when the response was streamed. These records can be viewed using llm logs --json. #591

28th October 2024

llm 0.17a0

Alpha support for attachments, allowing multi-modal models to accept images, audio, video and other formats. #578

Attachments in the CLI can be URLs:

llm "describe this image" \
  -a https://static.simonwillison.net/static/2024/pelicans.jpg

Or file paths:

llm "extract text" -a image1.jpg -a image2.jpg

Or binary data, which may need to use --attachment-type to specify the MIME type:

cat image | llm "extract text" --attachment-type - image/jpeg

Attachments are also available in the Python API:

model = llm.get_model("gpt-4o-mini")
response = model.prompt(
    "Describe these images",
    attachments=[
        llm.Attachment(path="pelican.jpg"),
        llm.Attachment(url="https://static.simonwillison.net/static/2024/pelicans.jpg"),
    ]
)

Plugins that provide alternative models can support attachments, see Attachments for multi-modal models for details.

7th October 2024

datasette 0.65 - An open source multi-tool for exploring and publishing data

  • Upgrade for compatibility with Python 3.13 (by vendoring Pint dependency). (#2434)
  • Dropped support for Python 3.8.

27th September 2024

shot-scraper 1.5 - A command-line utility for taking automated screenshots of websites

  • Several new features for the YAML configuration used by shot-scraper multi:
  • You can now add a - server: python -m http.server 8003 block to start a server running before screenshots are taken. The PID for this server will be recorded and the server automatically terminated when the command completes, unless you specify the --leave-server option in which case it will be left running, useful for debugging. #156
  • The sh: shell command or python: python code blocks can specify Python or shell commands to run before a screenshot is taken. This means a YAML script can make modifications to the environment in between screenshots, useful for things like progressive tutorials. #155
  • Fixed a bug that occurred if a max-width was accidentally applied to the <div> used for region screenshots. Thanks, Johann Klähn. #143
  • Documented that shot-scraper will quit with an error if a --wait-for expression has not resolved in 30s.

12th September 2024

llm 0.16 - A CLI utility and Python library for interacting with Large Language Models, including OpenAI, PaLM and local models installed on your own machine.

  • OpenAI models now use the internal self.get_key() mechanism, which means they can be used from Python code in a way that will pick up keys that have been configured using llm keys set or the OPENAI_API_KEY environment variable. #552. This code now works correctly: python import llm print(llm.get_model("gpt-4o-mini").prompt("hi"))
  • New documented API methods: llm.get_default_model(), llm.set_default_model(alias), llm.get_default_embedding_model(alias), llm.set_default_embedding_model(). #553
  • Support for OpenAI's new o1 family of preview models, llm -m o1-preview "prompt" and llm -m o1-mini "prompt". These models are currently only available to tier 5 OpenAI API users, though this may change in the future. #570

6th September 2024

csv-diff 1.2 - Python CLI tool and library for diffing CSV and JSON files

  • New feature: --extra key "python format string", for adding additional output keys to the human-readable version of the diff. #38
  • Don't crash in JSON format mode if some JSON keys are missing. #13
  • Now includes a Dockerfile and instructions for building and running it that way. Thanks, @gourk. #11

datasette 1.0a16 - An open source multi-tool for exploring and publishing data

This release focuses on performance, in particular against large tables, and introduces some minor breaking changes for CSS styling in Datasette plugins. - Removed the unit conversions feature and its dependency, Pint. This means Datasette is now compatible with the upcoming Python 3.13. (#2400, #2320) - The datasette --pdb option now uses the ipdb debugger if it is installed. You can install it using datasette install ipdb. Thanks, Tiago Ilieve. (#2342) - Fixed a confusing error that occurred if metadata.json contained nested objects. (#2403) - Fixed a bug with ?_trace=1 where it returned a blank page if the response was larger than 256KB. (#2404) - Tracing mechanism now also displays SQL queries that returned errors or ran out of time. datasette-pretty-traces 0.5 includes support for displaying this new type of trace. (#2405) - Fixed a text spacing with table descriptions on the homepage. (#2399) - Performance improvements for large tables: - Suggested facets now only consider the first 1000 rows. (#2406) - Improved performance of date facet suggestion against large tables. (#2407) - Row counts stop at 10,000 rows when listing tables. (#2398) - On table page the count stops at 10,000 rows too, with a "count all" button to execute the full count. (#2408) - New .dicts() internal method on Results that returns a list of dictionaries representing the results from a SQL query: (#2414) python rows = (await db.execute("select * from t")).dicts() - Default Datasette core CSS that styles inputs and buttons now requires a class of "core" on the element or a containing element, for example <form class="core">. (#2415) - Similarly, default table styles now only apply to <table class="rows-and-columns">. (#2420)

All releases