An open source multi-tool for exploring and publishing data

PyPI Changelog Python 3.x License discord

Annotated version of this introductory video

Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.

Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with the world. It is part of a wider ecosystem of 40 tools and 93 plugins dedicated to making working with structured data as productive as possible.

Try a demo and explore 33,000 power plants around the world, then follow the tutorial or take a look at some other examples of Datasette in action.

Then read how to get started with Datasette, subscribe to the monthly-ish newsletter and consider signing up for office hours for an in-person conversation about the project.

New: Datasette Desktop - a macOS desktop application for easily running Datasette on your own computer!

Exploratory data analysis

Import data from CSVs, JSON, database connections and more. Datasette will automatically show you patterns in your data and help you share your findings with your colleagues.

Instant data publishing

datasette publish lets you instantly publish your data to hosting providers like Google Cloud Run, Heroku or Vercel.

Rapid prototyping

Spin up a JSON API for any data in minutes. Use it to prototype and prove your ideas without building a custom backend.

Latest news

14th August 2022 #

Datasette 0.62 introduces compatibility with Pyodide for Datasette Lite, and incorporates a number of bug fixes, plugin hook upgrades and other improvements.

31st July 2022 #

New tutorial and accompanying ten minute video: Cleaning data with sqlite-utils and Datasette.

30th June 2022 #

s3-ocr is a new tool which can run OCR (via Amazon Textract) against every PDF file in an S3 bucket and write the results to a searchable SQLite database, ready to use with Datasette. Read more about it in s3-ocr: Extract text from PDF files stored in an S3 bucket.

5th May 2022 #

Datasette Lite is a new way to run Datasette: entirely in your browser, thanks to the Pyodide project which provides a full Python environment compiled to WebAssembly. You can use it to explore any SQLite database file hosted on a CORS-enabled static hosting provider, which includes GitHub and GitHub Pages. Read more about this project in Datasette Lite: a server-side Python web application running in a browser.

12th April 2022 #

Datasette for geospatial analysis describes how Datasette can be used in conjunction with SpatiaLite to work with geospatial data, including details of several geospatial plugins and tools from the Datasette ecosystem.

23rd March 2022 #

Datasette 0.61 introduces two potentially backwards-incompatible changes in preparation for the forthcoming 1.0 release: hashed URL mode has been moved to a new plugin, and the way URLs are generated to tables or databases containing special characters such as . or / has changed. Datasette 0.61.1 fixes a small bug in that release. See also the annotated release notes for these two versions.

27th February 2022 #

The first two of an ongoing series of official Datasette tutorials are now available: Exploring a database with Datasette introduces the Datasette web interface and shows how it can be used to explore a new database, and Learn SQL with Datasette provides an introduction to SQL using Datasette as a learning environment.

13th January 2022 #

Datasette 0.60 adds a new filters_from_request plugin hook, new internal methods for writing to the database, better performance and various faceting improvements. See also the annotated release notes.

5th December 2021 #

Observable notebooks recently added a SQL cell type, allowing SQL queries to be executed as part of an interactive notebook workflow. Alex Garcia built a Datasette Client for these which allows you to excute queries against any Datasette instance and explore and visualize the results using JavaScript code running in a notebook.

14th October 2021 #

Datasette 0.59 adds column descriptions in metadata, a new register_command plugin hook, enhanced --cors support and a bunch of other fixes and documentation improvements. See also the annotated release notes.

8th September 2021 #

Datasette Desktop is a new macOS desktop application version of Datasette, which supports opening SQLite files on your computer, importing CSV files and installing plugins. I wrote more about how it works in Datasette Desktop—a macOS desktop application for Datasette.

28th July 2021 #

The Baked Data architectural pattern describes a pattern commonly used with Datasette where the content for a site is bundled inside a SQLite database file and included alongside templates and application code in a deployment to a serverless hosting provider.

15th July 2021 #

Datasette 0.58 has new plugin hooks, a huge performance improvement for faceting, support for Unix domain sockets and several other improvements. Read the annotated release notes for extra background and context on the release.

5th June 2021 #

Datasette 0.57 is out with an important security patch plus a number of new features and bug fixes. Datasette 0.56.1, also out today, provides the security patch for users who are not yet ready to upgrade to the latest version.

10th May 2021 #

Django SQL Dashboard is a new tool that brings a useful authenticated subset of Datasette to Django projects that are built on top of PostgreSQL.

All news

Latest releases

18th August 2022

sqlite-diffable 0.5 - Tools for dumping/loading a SQLite database to diffable directory structure

  • sqlite-diffable objects path-to/table.ndjson command for converting a newline-delimited file of JSON arrays into a sequence of JSON objects. #7

14th August 2022

datasette-sentry 0.2 - Datasette plugin for configuring Sentry

datasette-sentry 0.2a1

  • Preview of 0.2 for final testing. #3

datasette 0.62 - An open source multi-tool for exploring and publishing data

Datasette can now run entirely in your browser using WebAssembly. Try out Datasette Lite, take a look at the code or read more about it in Datasette Lite: a server-side Python web application running in a browser.

Datasette now has a Discord community for questions and discussions about Datasette and its ecosystem of projects.

Features
  • Datasette is now compatible with Pyodide. This is the enabling technology behind Datasette Lite. (#1733)
  • Database file downloads now implement conditional GET using ETags. (#1739)
  • HTML for facet results and suggested results has been extracted out into new templates _facet_results.html and _suggested_facets.html. Thanks, M. Nasimul Haque. (#1759)
  • Datasette now runs some SQL queries in parallel. This has limited impact on performance, see this research issue for details.
  • New --nolock option for ignoring file locks when opening read-only databases. (#1744)
  • Spaces in the database names in URLs are now encoded as + rather than ~20. (#1701)
  • <Binary: 2427344 bytes> is now displayed as <Binary: 2,427,344 bytes> and is accompanied by tooltip showing "2.3MB". (#1712)
  • The base Docker image used by datasette publish cloudrun, datasette package and the official Datasette image has been upgraded to 3.10.6-slim-bullseye. (#1768)
  • Canned writable queries against immutable databases now show a warning message. (#1728)
  • datasette publish cloudrun has a new --timeout option which can be used to increase the time limit applied by the Google Cloud build environment. Thanks, Tim Sherratt. (#1717)
  • datasette publish cloudrun has new --min-instances and --max-instances options. (#1779)
Plugin hooks
  • New plugin hook: handle_exception(), for custom handling of exceptions caught by Datasette. (#1770)
  • The render_cell() plugin hook is now also passed a row argument, representing the sqlite3.Row object that is being rendered. (#1300)
  • The configuration directory is now stored in datasette.config_dir, making it available to plugins. Thanks, Chris Amico. (#1766)
Bug fixes
  • Don't show the facet option in the cog menu if faceting is not allowed. (#1683)
  • ?_sort and ?_sort_desc now work if the column that is being sorted has been excluded from the query using ?_col= or ?_nocol=. (#1773)
  • Fixed bug where ?_sort_desc was duplicated in the URL every time the Apply button was clicked. (#1738)
Documentation

12th August 2022

s3-credentials 0.13 - A tool for creating credentials for accessing S3 buckets

  • Documentation now lives on a dedicated documentation website: https://s3-credentials.readthedocs.io/ #71
  • s3-credentials create ... --website --create-bucket now creates an S3 bucket that is configured to act as a website, with index.html an the index page and error.html as the page used for any errors. #21
  • s3-credentials list-buckets --details now returns the bucket region and the URL to the website, if it is configured to act as a website. #77
  • Fixed a bug where list-bucket would return an error if the bucket (or specified --prefix) was empty. #76

10th August 2022

s3-ocr 0.6.3 - Tools for running OCR against files stored in S3

  • Pages with no OCR text on them are now recorded as rows with empty strings, instead of being skipped entirely. #23

9th August 2022

s3-ocr 0.6.2

  • Fixed bug where commands were sometimes not properly registered. #26

s3-ocr 0.6.1

  • Now pins to click>=8.0, which should avoid a bug where installing this on a machine with an older version of Click present would lead to the commands failing to register. #25
  • s3-ocr --help now includes links to the documentation and changelog.

8th August 2022

datasette-nteract-data-explorer 0.4.1 - automatic visual data explorer for datasette

What's Changed

Full Changelog: https://github.com/hydrosquall/datasette-nteract-data-explorer/compare/0.3.1...0.4.1

datasette-nteract-data-explorer 0.4.0

ignore - see 0.4.1.

7th August 2022

s3-ocr 0.6 - Tools for running OCR against files stored in S3

  • s3-ocr start now automatically pauses and then retries if Textract complains that there are too many jobs running. This can be turned into an early exit with an error message using the new --no-retry option. #21
  • New s3-ocr start --dry-run option for displaying what would happen without starting the OCR process. #22
  • Textract now runs in the same region as the S3 bucket it is writing to, avoiding an error. #24

5th August 2022

datasette-scale-to-zero 0.2 - Quit Datasette if it has not received traffic for a specified time period

  • New "max-age": "10h" configuration setting, which causes the server to exit after the specified amount of time whether or not it has received any traffic. #3

2nd August 2022

shot-scraper 0.14.3 - A command-line utility for taking automated screenshots of websites

1st August 2022

s3-credentials 0.12.1 - A tool for creating credentials for accessing S3 buckets

  • Using the --policy or --statement options now implies --user-permissions-boundary=none. Previously it was easy to use these options to accidentally create credentials that did not work as expected since they would have a default permissions boundary that locked them down to only being able to access S3. #74
  • The s3-credentials.AmazonS3FullAccess role created by this tool in order to issue temporary credentials previously used the default MaxSessionDuration value of 3600, preventing it from creating credentials that could last more than an hour. This has been increased to 12 hours. See this issue comment for instructions on fixing your existing role if this bug is affecting your account. #75

31st July 2022

datasette-sqlite-fts4 0.3.2 - Datasette plugin exposing SQL functions from sqlite-fts4

All releases