Annotated version of this introductory video
Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.
Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with the world. It is part of a wider ecosystem of 42 tools and 110 plugins dedicated to making working with structured data as productive as possible.
New: Datasette Desktop - a macOS desktop application for easily running Datasette on your own computer!
Import data from CSVs, JSON, database connections and more. Datasette will automatically show you patterns in your data and help you share your findings with your colleagues.
datasette publish lets you instantly publish your data to hosting providers like Google Cloud Run, Heroku or Vercel.
Spin up a JSON API for any data in minutes. Use it to prototype and prove your ideas without building a custom backend.
13th January 2023 #
Semantic search answers: Q&A against documentation with GPT3 + OpenAI embeddings shows how Datasette can be used to implement semantic search and build a system for answering questions against an existing corpus of text, using two new plugins: datasette-openai and datasette-faiss, and a new tool: openai-to-sqlite.
9th January 2023 #
Datasette 0.64 is out, and includes a strong warning against running SpatiaLite in production without disabling arbitrary SQL queries, plus a new --setting default_allow_sql off setting to make it easier to do that. See Datasette 0.64, with a warning about SpatiaLite for more about this release. A new tutorial, Building a location to time zone API with SpatiaLite, describes how to safely use SpatiaLite and Datasette to build and deploy an API for looking up time zones for a latitude/longitude location.
15th December 2022 #
Datasette 1.0a2: Upserts and finely grained permissions describes the new upsert API and much improved permissions capabilities introduced in the latest Datasette 1.0a2 alpha release.
2nd December 2022 #
Datasette’s new JSON write API: The first alpha of Datasette 1.0 introduces the new write API shipped in the first of the Datasette 1.0 alpha series of releases, including detailed descriptions of two demos that show how the API can be used.
27th October 2022 #
8th September 2022 #
Exploring the training data behind Stable Diffusion describes the process of building and deploying a 4GB searchable SQLite database using Datasette, starting with Parquet data that was used to train the Stable Diffusion image generation model. See also Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator.
21st August 2022 #
Analyzing ScotRail audio announcements with Datasette—from prototype to production provides a detailed walk-through of the process of constructing an initial rapid prototype using Datasette Lite, extending it with a custom plugin and then deploying it as a full Datasette instance using GitHub Actions and Vercel.
14th August 2022 #
31st July 2022 #
New tutorial and accompanying ten minute video: Cleaning data with sqlite-utils and Datasette.
30th June 2022 #
s3-ocr is a new tool which can run OCR (via Amazon Textract) against every PDF file in an S3 bucket and write the results to a searchable SQLite database, ready to use with Datasette. Read more about it in s3-ocr: Extract text from PDF files stored in an S3 bucket.
5th May 2022 #
Datasette Lite is a new way to run Datasette: entirely in your browser, thanks to the Pyodide project which provides a full Python environment compiled to WebAssembly. You can use it to explore any SQLite database file hosted on a CORS-enabled static hosting provider, which includes GitHub and GitHub Pages. Read more about this project in Datasette Lite: a server-side Python web application running in a browser.
12th April 2022 #
Datasette for geospatial analysis describes how Datasette can be used in conjunction with SpatiaLite to work with geospatial data, including details of several geospatial plugins and tools from the Datasette ecosystem.
23rd March 2022 #
Datasette 0.61 introduces two potentially backwards-incompatible changes in preparation for the forthcoming 1.0 release: hashed URL mode has been moved to a new plugin, and the way URLs are generated to tables or databases containing special characters such as
/ has changed. Datasette 0.61.1 fixes a small bug in that release. See also the annotated release notes for these two versions.
27th February 2022 #
The first two of an ongoing series of official Datasette tutorials are now available: Exploring a database with Datasette introduces the Datasette web interface and shows how it can be used to explore a new database, and Learn SQL with Datasette provides an introduction to SQL using Datasette as a learning environment.
13th January 2022 #
Datasette 0.60 adds a new
filters_from_request plugin hook, new internal methods for writing to the database, better performance and various faceting improvements. See also the annotated release notes.
27th January 2023
datasette-render-markdown 2.1.1 - Datasette plugin for rendering Markdown
- Fixed a bug where
[Links containing & characters](...)were rendered with the ampersand double-escaped as
datasette-youtube-embed 0.1 - Turn YouTube URLs into embedded players in Datasette
- Initial release. Turns YouTube video URLs into embedded video players. #1
22nd January 2023
datasette-scraper 0.5 - Adds website scraping abilities to Datasette.
- feature: generic support for extracting json+ld data
- feature: specific support for extracting json+ld
- feature: add
discover-allowto specify an allowlist of patterns to crawl
seed-sitemapsonly activates for seeds that are at the top-level of the domain
extract_from_responsecan delete existing entries
extract_from_responsecan add indexed entries with
extract_from_responseskips doing writes that wouldn't change the database
- enhancement: prune pages that exceed max depth/max page limit earlier
19th January 2023
datasette-faiss 0.2 - Maintain a FAISS index for specified Datasette tables
faiss_agg_with_scores()aggregate functions. #3
14th January 2023
datasette-openai 0.2 - SQL functions for calling OpenAI APIs
- First stable release.
- Semantic search answers: Q&A against documentation with GPT3 + OpenAI embeddings provides a live demo and explanation of this project.
openai_build_prompt()function is now documented. #4
13th January 2023
openai-to-sqlite 0.2 - Save OpenAI API results to a SQLite database
openai-to-sqlite embeddingscommand can read JSON, CSV or TSV from a file or from standard input and fetch and store embeddings for that data. #1
openai-to-sqlite embeddings --sqlcommand can read the data to be embedded from a SQL query. #2
- Data is now sent to the OpenAI API in batches, defaulting to 100 and with a size that can be specified using
--batch-sizeup to 2048. #5
12th January 2023
datasette-cookies-for-magic-parameters 0.1.2 - UI for setting cookies to populate magic parameters
- Fix for a cookie parsing bug. #3
datasette-openai 0.1a2 - SQL functions for calling OpenAI APIs
11th January 2023
datasette-cookies-for-magic-parameters 0.1.1 - UI for setting cookies to populate magic parameters
- Fixed bug where duplicate mentions of the same parameter name resulted in duplicate form fields. #2
- Initial release. Adds a form to any canned query that uses
:_cookie_xparameters allowing the user to set that cookie. #1
git-history 0.7a0 - Tools for analyzing Git history using SQLite
datasette 0.64.1 - An open source multi-tool for exploring and publishing data
datasette-faiss 0.1a0 - Maintain a FAISS index for specified Datasette tables
- Initial release. #1
10th January 2023
json-to-files 0.1 - Create separate files on disk based on a JSON object
- Initial release.
datasette-openai 0.1a1 - SQL functions for calling OpenAI APIs
- Calls to GPT-3 now have a 15s timeout, increased from 5s.