1,315 rows

View and edit SQL

Suggested facets: type, key (date), timestamp (date)

Link type key title timestamp category is_public search_1 search_2 search_3
blog.db/entries http://simonwillison.net/2017/Dec/10/datasette/#atom-tag Datasette 0.14: customization edition 2017-12-10T01:55:26+00:00   1 <p><a href="https://github.com/simonw/datasette/releases/tag/0.14">Datasette 0.14: customization edition</a></p> <p>I just released the latest version of Datasette with a strong theme of customization: Datasette now supports custom templates and CSS styling hooks, and the metadata format has been expanded to allow per-database and per-table source/license/description information. You can also now define named canned queries which will be packaged up with your data.</p>    
blog.db/entries http://simonwillison.net/2017/Dec/11/vice/#atom-tag VICE News Police Shootings in Datasette 2017-12-11T22:47:11+00:00   1 <p><a href="https://vice-police-shootings.now.sh/">VICE News Police Shootings in Datasette</a></p> <p>VICE News collected data on both fatal and nonfatal police shootings from the 50 largest local police departments in the United States. They released the data under a CC BY-NC-SA 4.0 license so I&#x27;ve converted it to SQLite and loaded it into Datasette.</p> <p>Via <a href="https://twitter.com/simonw/status/940342936650125314">simonw on Twitter</a></p>    
blog.db/entries http://simonwillison.net/2017/Dec/12/location-time-zone-api/#atom-tag Building a location to time zone API with SpatiaLite, OpenStreetMap and Datasette 2017-12-12T15:52:20+00:00   1 <p>Given a latitude and longitude, how can we tell what time zone that point lies within? Here’s how I built a simple JSON API to answer that question, using a combination of data from OpenStreetMap, the SpatiaLite extension for SQLite and my <a href="https://github.com/simonw/datasette">Datasette</a> API tool.</p> <h3><a id="The_API_4"></a>The API</h3> <p>You can try the API out here: feed it a latitude and longitude and it will return the corresponding time zone ID: <a href="https://timezones-api.now.sh/timezones/by_point">https://timezones-api.now.sh/timezones/by_point</a></p> <p>Some examples:</p> <ul> <li><a href="https://timezones-api.now.sh/timezones/by_point?longitude=-0.1406632&amp;latitude=50.8246776">Brighton, England</a> is in Europe/London (<a href="https://timezones-api.now.sh/timezones/by_point.json?longitude=-0.1406632&amp;latitude=50.8246776">in JSON</a>)</li> <li><a href="https://timezones-api.now.sh/timezones/by_point?longitude=-122.4494224&amp;latitude=37.8022071">San Francisco, USA</a> is in America/Los_Angeles (<a href="https://timezones-api.now.sh/timezones/by_point.json?longitude=-122.4494224&amp;latitude=37.8022071">in JSON</a>)</li> <li><a href="https://timezones-api.now.sh/timezones/by_point?longitude=139.7819661&amp;latitude=35.6631424">Tokyo, Japan</a> is Asia/Tokyo (<a href="https://timezones-api.now.sh/timezones/by_point.json?longitude=139.7819661&amp;latitude=35.6631424">in JSON</a>)</li> </ul> <h3><a id="The_data_14"></a>The data</h3> <p>I was first introduced to Eric Muller’s <a href="http://efele.net/maps/tz/world/">tz_world shapefile</a> by <a href="https://twitter.com/Java_Nick">Nick Williams</a> at Eventbrite, who used it to build us an internal time zone lookup API on top of <a href="https://dev.mysql.com/doc/refman/5.7/en/spatial-types.html">MySQL’s geospatial data types</a>. Eric’s project is no longer updated and he recommends Evan Siroky’s <a href="https://github.com/evansiroky/timezone-boundary-builder">timezone-boundary-builder</a> project as an alternative, which der…    
blog.db/entries http://simonwillison.net/2017/Nov/13/csvs-to-sqlite/#atom-tag simonw/csvs-to-sqlite 2017-11-13T06:49:45+00:00   1 <p><a href="https://github.com/simonw/csvs-to-sqlite">simonw/csvs-to-sqlite</a></p> <p>I built a simple tool for bulk converting multiple CSV files into a SQLite database.</p>    
blog.db/entries http://simonwillison.net/2017/Nov/13/datasette/#atom-tag Datasette: instantly create and publish an API for your SQLite databases 2017-11-13T23:49:28+00:00   1 <p>I just shipped the first public version of <a href="https://github.com/simonw/datasette">datasette</a>, a new tool for creating and publishing JSON APIs for SQLite databases.</p> <p>You can try out out right now at <a href="https://fivethirtyeight.datasettes.com/">fivethirtyeight.datasettes.com</a>, where you can explore SQLite databases I built from Creative Commons licensed CSV files <a href="https://github.com/fivethirtyeight/data">published by FiveThirtyEight</a>. Or you can check out <a href="https://parlgov.datasettes.com/">parlgov.datasettes.com</a>, derived from the <a href="http://www.parlgov.org/">parlgov.org</a> database of world political parties which illustrates some advanced features such as <a href="https://parlgov.datasettes.com/parlgov-25f9855/view_party">SQLite views</a>.</p> <p><a href="https://fivethirtyeight.datasettes.com/fivethirtyeight/most-common-name%2Fsurnames"><img alt="Common surnames from fivethirtyeight" src="https://static.simonwillison.net/static/2017/fivethirtyeight-surnames.png" style="width: 100%;" /></a></p> <p>Or you can try it out on your own machine. If you run OS X and use Google Chrome, try running the following:</p> <pre><code>pip3 install datasette datasette ~/Library/Application\ Support/Google/Chrome/Default/History </code></pre> <p>This will start a web server on <a href="http://127.0.0.1:8001/">http://127.0.0.1:8001/</a> displaying an interface that will let you browse your Chrome browser history, which is conveniently stored in a SQLite database.</p> <p>Got a SQLite database you want to share with the world? Provided you have <a href="https://zeit.co/now">Zeit Now</a> set up on your machine, you can publish one or more databases with a single command:</p> <pre><code>datasette publish now my-database.db </code></pre> <p>The above command will whir away for about a minute and then spit out a URL to a hosted version of datasette with your database (or databases) ready to go. This is how I’m hosting the fivethirtyeight and parlgov example datasets, albeit on a cu…    
blog.db/entries http://simonwillison.net/2017/Nov/14/datasette-for-polar-bears/#atom-tag Datasette for Polar Bears 2017-11-14T05:41:30+00:00   1 <p><a href="https://gist.github.com/simonw/9f8bf23b37a42d7628c4dcc4bba10253">Datasette for Polar Bears</a></p> <p>I found a fun dataset of Polar Bear ear tag tracking data put out by the USGS Alaska Science Center and deployed it using datasette in just a couple of minutes - here’s how I did it.</p>    
blog.db/entries http://simonwillison.net/2017/Nov/14/datasettes/#atom-tag Datasettes · simonw/datasette 2017-11-14T07:39:41+00:00   1 <p><a href="https://github.com/simonw/datasette/wiki/Datasettes">Datasettes · simonw/datasette</a></p> <p>I&#x27;m collecting examples of datasette-powered APIs on the project wiki.</p>    
blog.db/entries http://simonwillison.net/2017/Nov/16/datasette-release/#atom-tag Datasette 0.12 2017-11-16T15:55:35+00:00   1 <p><a href="https://github.com/simonw/datasette/releases/tag/0.12">Datasette 0.12</a></p> <p>I just released v0.12 of Datasette. The most exciting new feature is the ability to display a UI for editing named parameters - so you can construct an arbitrarily complex SQL query, include some named parameters and then link directly to it in Datasette to provide a simple interface for changing those parameters. An example involving Australian dogs is included in the release notes.</p>    
blog.db/entries http://simonwillison.net/2017/Nov/16/recent-releases/#atom-tag Plotly Falcon SQL Client and the Remarkable Datasette SQLite2API Generator 2017-11-16T03:04:20+00:00   1 <p><a href="https://blog.ouseful.info/2017/11/15/recent-releases-plotly-falcon-sql-client-and-the-remarkable-datasette-sqlite2api-generator/">Plotly Falcon SQL Client and the Remarkable Datasette SQLite2API Generator</a></p> <p>Lovely write-up of Datasette by Tony Hirst. I’m particularly excited about its applications in the world of data journalism so it’s great to see it starting to get attention from people in that sphere.</p>    
blog.db/entries http://simonwillison.net/2017/Nov/17/paul-ford/#atom-tag Quoting Paul Ford 2017-11-17T01:10:49+00:00   1 <blockquote cite="https://trackchanges.postlight.com/big-data-small-effort-b62607a43a8c"><p>If you’re a public data provider—and many large NGOs, government organizations, cultural organizations, historical archives, media organizations, medical orgs, and academic institutions are exactly that—you can publish gigabytes of data, and make it available as an API, and make it easy to browse on the web, too, with extremely low effort. Put it into SQLite, point this little guy at it, and you’ve just radically increased the accessibility and utility of your data. Because messing around in SQL from a web browser is orders of magnitude more immediately useful than downloading a CSV, processing it, and figuring out what comes next.</p></blockquote><p class="cite">&mdash; <a href="https://trackchanges.postlight.com/big-data-small-effort-b62607a43a8c">Paul Ford</a>    
blog.db/entries http://simonwillison.net/2017/Nov/20/publishing-the-lahman-baseball-database-with-datasette/#atom-tag Publishing the Lahman Baseball Database with Datasette 2017-11-20T21:15:15+00:00   1 <p><a href="http://lawlesst.github.io/notebook/baseball-datasette.html">Publishing the Lahman Baseball Database with Datasette</a></p> <p>Ted Lawless used Datasette to publish an API for the Lahman Baseball Database - a dataset of players and teams spanning 1871 to 2016, available under a Creative Commons License.</p> <p>Via <a href="https://twitter.com/tedlawless/status/932709795064410112">Ted Lawless</a></p>    
blog.db/entries http://simonwillison.net/2017/Nov/25/new-in-datasette/#atom-tag New in Datasette: filters, foreign keys and search 2017-11-25T21:17:47+00:00   1 <p>I’ve released <a href="https://github.com/simonw/datasette">Datasette 0.13</a> with a number of exciting new features (<a href="https://simonwillison.net/2017/Nov/13/datasette/">Datasette previously</a>).</p> <h3><a id="Filters_4"></a>Filters</h3> <p>Datasette’s table view supports query-string based filtering. 0.13 introduces a new user interface for constructing those filters. Let’s use it to <a href="https://fivethirtyeight.datasettes.com/fivethirtyeight-2628db9/bob-ross%2Felements-by-episode?CLOUDS__exact=1&amp;EPISODE__startswith=S03&amp;MOUNTAIN__exact=1">find every episode where Bob Ross painted clouds and mountains</a> in season 3 of The Joy of Painting:</p> <p><a href="https://fivethirtyeight.datasettes.com/fivethirtyeight-2628db9/bob-ross%2Felements-by-episode?CLOUDS__exact=1&amp;EPISODE__startswith=S03&amp;MOUNTAIN__exact=1"><img alt="Animation demonstrating the new filter UI" src="https://static.simonwillison.net/static/2017/bob-ross.gif" style="width: 100%;" /></a></p> <p>The resulting querystring looks like this:</p> <pre><code>?CLOUDS__exact=1&amp;EPISODE__startswith=S03&amp;MOUNTAIN__exact=1 </code></pre> <p>Using the <code>.json</code> or <code>.jsono</code> extension on the same URL returns JSON (in list-of-lists or list-of-objects format), so the new filter UI also acts as a simple API explorer. If you click “View and edit SQL” you will get the generated SQL in an editor, ready for you to further modify it.</p> <h3><a id="Foreign_key_relationships_16"></a>Foreign key relationships</h3> <p>Datasette now provides special treatment for SQLite foreign key relationships: if it detects a foreign key when displaying a table it will show values in that column as links to the related records - and if the foreign key table has an obvious label column, that label will be displayed in the column as the link label.</p> <p>Here’s an example, using San Francisco’s <a href="https://data.sfgov.org/Economy-and-Community/Mobile-Food-Facility-Permit/rqzj-sfat">Mobile Food Facility Permit</a> dataset… aka food t…    
blog.db/entries http://simonwillison.net/2017/Nov/30/customize-presentation-of-specific-columns/#atom-tag Datasette: Ability to customize presentation of specific columns in HTML view 2017-11-30T07:27:58+00:00   1 <p><a href="https://github.com/simonw/datasette/issues/153">Datasette: Ability to customize presentation of specific columns in HTML view</a></p> <p>Still a work in progress, but Datasette master now allows you to inject links to one or more additional CSS and JavaScript resources (optionally with SRI hashes) which will be included on every page. Each template also now provides CSS classes on the body element derived from the current database and table names to provide hooks for custom styling. Next up: custom template support.</p>    
blog.db/entries http://simonwillison.net/2018/Apr/12/wireless-telegraphy-register/#atom-tag Wireless Telegraphy Register 2018-04-12T16:08:31+00:00   1 <p><a href="https://wtr-api.herokuapp.com/">Wireless Telegraphy Register</a></p> <p>Russ Garrett used Datasette to build a browsable interface to the UK&#x27;s register of business radio licenses, using data from Ofcom.</p> <p>Via <a href="https://twitter.com/russss/status/984460272961236997">@russss</a></p>    
blog.db/entries http://simonwillison.net/2018/Apr/14/units/#atom-tag Datasette 0.18: units 2018-04-14T15:56:53+00:00   1 <p><a href="https://github.com/simonw/datasette/releases/tag/0.18">Datasette 0.18: units</a></p> <p>This release features the first Datasette feature that was entirely designed and implemented by someone else (yay open source) - Russ Garrett wanted unit support (Hz, ft etc) for his Wireless Telegraphy Register project. It&#x27;s a really neat implementation: you can tell Datasette what units are in use for a particular database column and it will display the correct SI symbols on the page. Specifying units also enables unit-aware filtering: if Datasette knows that a column is measured in meters you can now query it for all rows that are less than 50 feet for example.</p> <p>Via <a href="https://twitter.com/simonw/status/985182954786734080">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2018/Apr/17/datasette/#atom-tag Datasette 0.19: Plugins Documentation 2018-04-17T03:59:13+00:00   1 <p><a href="https://datasette.readthedocs.io/en/latest/plugins.html">Datasette 0.19: Plugins Documentation</a></p> <p>I&#x27;ve released the first preview of Datasette&#x27;s new plugin support, which uses the pluggy package originally developed for py.test. So far the only two plugin hooks are for SQLite connection creation (allowing custom SQL functions to be registered) and Jinja2 template environment initialization (for custom template tags), but this release is mainly about exercising the plugin registration mechanism and starting to gather feedback. Lots more to come.</p> <p>Via <a href="https://github.com/simonw/datasette/releases/tag/0.19">Datasette 0.19</a></p>    
blog.db/entries http://simonwillison.net/2018/Apr/20/datasette-plugins/#atom-tag Datasette plugins, and building a clustered map visualization 2018-04-20T15:41:11+00:00   1 <p><a href="https://github.com/simonw/datasette">Datasette</a> now supports plugins!</p> <p>Last Saturday <a href="https://twitter.com/simonw/status/985377670388105216">I asked Twitter</a> for examples of Python projects with successful plugin ecosystems. <a href="https://docs.pytest.org/">pytest</a> was the clear winner: the <a href="https://plugincompat.herokuapp.com/">pytest plugin compatibility table</a> (an ingenious innovation that I would love to eventually copy for Datasette) lists 457 plugins, and even the core pytest system itself is built as a collection of default plugins that can be replaced or over-ridden.</p> <p>Best of all: pytest’s plugin mechanism is available as a separate package: <a href="https://pluggy.readthedocs.io/">pluggy</a>. And pluggy was exactly what I needed for Datasette.</p> <p>You can follow the ongoing development of the feature in issue <a href="https://github.com/simonw/datasette/issues/14">#14</a>. This morning I released <a href="https://github.com/simonw/datasette/releases/tag/0.20">Datasette 0.20</a> with support for a number of different plugin hooks: plugins can add custom template tags and SQL functions, and can also bundle their own static assets, JavaScript, CSS and templates. The hooks are described in some detail in the <a href="https://datasette.readthedocs.io/en/latest/plugins.html">Datasette Plugins</a> documentation.</p> <h2><a id="datasetteclustermap_10"></a>datasette-cluster-map</h2> <p>I also released my first plugin: <a href="https://pypi.org/project/datasette-cluster-map/">datasette-cluster-map</a>. Once installed, it looks out for database tables that have a <code>latitude</code> and <code>longitude</code> column. When it finds them, it draws all of the points on an interactive map using <a href="http://leafletjs.com/">Leaflet</a> and <a href="https://github.com/Leaflet/Leaflet.markercluster">Leaflet.markercluster</a>.</p> <p>Let’s <a href="https://datasette-cluster-map-demo.now.sh/polar-bears-455fe3a/USGS_WC_eartags_output_files_2009-2011-Status">try it o…    
blog.db/entries http://simonwillison.net/2018/Apr/20/datasette-uk-fsa/#atom-tag Datasette ClusterMap Plugin – Querying UK Food Standards Agency (FSA) Food Hygiene Ratings Open Data 2018-04-20T20:50:10+00:00   1 <p><a href="https://blog.ouseful.info/2018/04/20/datasette-clustermap-plugin-querying-uk-food-standards-agency-fsa-food-hygiene-ratings-open-data/">Datasette ClusterMap Plugin – Querying UK Food Standards Agency (FSA) Food Hygiene Ratings Open Data</a></p> <p>Tony Hirst wrote a tutorial on using datasette-cluster-map to analyze food hygiene ratings data from the FSA</p> <p>Via <a href="https://twitter.com/psychemedia/status/987430857077030912">@psychemedia</a></p>    
blog.db/entries http://simonwillison.net/2018/Apr/25/json-escape-text/#atom-tag JSON Escape Text 2018-04-25T04:13:09+00:00   1 <p><a href="https://json-escape-text.now.sh/">JSON Escape Text</a></p> <p>I built a tiny tool for turning text into an escaped JSON string - I needed it to help create descriptions and canned SQL queries for adding to Datasette&#x27;s metadata.json files.</p>    
blog.db/entries http://simonwillison.net/2018/Apr/25/register-members-interests/#atom-tag Exploring the UK Register of Members Interests with SQL and Datasette 2018-04-25T15:49:39+00:00   1 <p>Ever wondered which UK Members of Parliament <a href="https://register-of-members-interests.datasettes.com/regmem?sql=select+people_name%2C+person_id%2C+count%28*%29+as+n%2C+group_concat%28item%2C+%22+%3A%3A+%22%29+from+%28select+distinct+item%2C+people.name+as+people_name%2C+person_id%0D%0Afrom+items+join+people+on+items.person_id+%3D+people.id%0D%0Awhere+items.rowid+in+%28select+rowid+from+%5Bitems_fts%5D+where+%5Bitems_fts%5D+match+%3Asearch%29+order+by+hash%29%0D%0Agroup+by+people_name%2C+person_id%0D%0Aorder+by+n+desc%3B&amp;search=helicopter">get gifted the most helicopter rides</a>? How about <a href="https://register-of-members-interests.datasettes.com/regmem?sql=select+distinct+item%2C+people.name%2C+person_id%0D%0Afrom+items+join+people+on+items.person_id+%3D+people.id%0D%0Awhere+items.rowid+in+%28select+rowid+from+%5Bitems_fts%5D+where+%5Bitems_fts%5D+match+%3Asearch%29+order+by+hash&amp;search=gift+sultan+brunei">which MPs have been given Christmas hampers</a> by the Sultan of Brunei? (David Cameron, William Hague and Michael Howard apparently). Here’s how to dig through the Register of Members Interests using SQL and Datasette.</p> <p><a class="a-img" href="https://register-of-members-interests.datasettes.com/regmem?sql=select+distinct+item%2C+people.name%2C+person_id%0D%0Afrom+items+join+people+on+items.person_id+%3D+people.id%0D%0Awhere+items.rowid+in+%28select+rowid+from+%5Bitems_fts%5D+where+%5Bitems_fts%5D+match+%3Asearch%29+order+by+hash&amp;search=gift+sultan+brunei"><img alt="Gifts from the Sultan" src="https://static.simonwillison.net/static/2018/mp-gifts-sultan.png" /></a></p> <p><a href="https://www.mysociety.org/">mySociety</a> have been building incredible civic participation applications like <a href="https://www.theyworkforyou.com/">TheyWorkForYou</a> and <a href="https://www.fixmystreet.com/">FixMyStreet</a> for nearly 15 years now, and have accumulated all kinds of interesting data along the way.</p> <p>They recently launched their own data portal at <a href="https://data.mysociet…    
blog.db/entries http://simonwillison.net/2018/Apr/28/datasette/#atom-tag Datasette - a talk at Zeit Day SF 2018 2018-04-28T21:31:40+00:00   1 <p><a href="https://speakerdeck.com/simon/datasette">Datasette - a talk at Zeit Day SF 2018</a></p> <p>Slides from the talk I gave today about Datasette and Datasette Publish at the Zeit Day SF conference.</p> <p>Via <a href="https://twitter.com/simonw/status/990341615196688386">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2018/Apr/9/datasette/#atom-tag Datasette 0.15: sort by column 2018-04-09T17:25:36+00:00   1 <p><a href="https://github.com/simonw/datasette/releases/tag/0.15">Datasette 0.15: sort by column</a></p> <p>I&#x27;ve released the latest version of Datasette to PyPI. The key new feature is the ability to sort tables by column, using clickable column headers or directly via the new _sort= and _sort_desc= querystring parameters.</p> <p>Via <a href="https://twitter.com/simonw/status/983395459640147968">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2018/Aug/19/datasette-pybay/#atom-tag Slides, notes and links from my Datasette talk at PyBay 2018-08-19T23:23:12+00:00   1 <p><a href="https://simonwillison.net/2018/Aug/19/instantly-publish-datasette/">Slides, notes and links from my Datasette talk at PyBay</a></p> <p>I presented a session about Datasette at the PyBay conference in San Francisco this morning. I talked about the project itself and demonstrated ways of creating and publishing databases using csvs-to-sqlite, Datasette Publish and my new sqlite-utils library.</p> <p>Via <a href="https://twitter.com/simonw/status/1031301958030614528">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2018/Aug/19/instantly-publish-datasette/#atom-tag How to Instantly Publish Data to the Internet with Datasette 2018-08-19T23:23:00+00:00   1 <p>I presented a session about Datasette at the PyBay 2018 conference in San Francisco. I talked about the project itself and demonstrated ways of creating and publishing databases using csvs-to-sqlite, Datasette Publish and my new sqlite-utils library.</p> <div class="resp-container"> </div> <!-- cutoff --> <div class="slide"> <img alt="Datasette - Simon Willison @simonw - PyBay 2018" height="337" src="https://static.simonwillison.net/static/2018/pybay-datasette/datasette-pybay.001.jpeg" width="450" /> <div> <p>Slides and notes from my presentation on <a href="https://datasette.io/">Datasette</a> given at <a href="https://pybay.com/">PyBay 2018</a> on 19th August 2018.</p> </div> </div> <div class="slide"> <img alt="Data Journalism" height="337" src="https://static.simonwillison.net/static/2018/pybay-datasette/datasette-pybay.002.jpeg" width="450" /> <div> <p>I designed Datasette based on my experence working as a data journalist at <a href="https://www.theguardian.com/">the Guardian newspaper</a> in London.</p> </div> </div> <div class="slide"> <img alt="Screenshot of the Guardian Data Blog" height="337" src="https://static.simonwillison.net/static/2018/pybay-datasette/datasette-pybay.003.jpeg" width="450" /> <div> <p>One of the projects I was involved with at the Guardian was <a href="https://www.theguardian.com/news/datablog/2011/jan/27/data-store-office-for-national-statistics">the Guardian Data Blog</a>.</p> <p>Simon Rogers was the journalist most responsible for gathering the data used for infographics in the newspaper. We decided to start publishing the raw data on a blog.</p> </div> </div> <div class="slide"> <img alt="Screenshot of a Google Spreadsheet" height="337" src="https://static.simonwillison.net/static/2018/pybay-datasette/datasette-pybay.004.jpeg" width="450" /> <div> <p>After some consideration of our options, we ended up chosing Google Spreadsheets as a publishing tool.</p> </div> </div> <div class="…    
blog.db/entries http://simonwillison.net/2018/Aug/2/datasette-monkeytype/#atom-tag Datasette unit tests: monkeytype_call_traces 2018-08-02T21:03:27+00:00   1 <p><a href="https://datasette-monkeytype.now.sh/monkeytype-436d0dc/monkeytype_call_traces?_facet=module&amp;_facet=qualname">Datasette unit tests: monkeytype_call_traces</a></p> <p>Faceted browse against every function call that occurs during the execution of Datasette&#x27;s test suite. I used Instagram&#x27;s MonkeyType tool to generate this, which can run Python code and generates a SQLite database of all of the traced calls. It&#x27;s intended to be used to automatically add mypy annotations to your code, but since it produces a SQLite database as a by-product I&#x27;ve started exploring the intermediary format using Datasette. Generating this was as easy as running &quot;monkeytype run `which pytest`&quot; in the Datasette root directory.</p> <p>Via <a href="https://twitter.com/simonw/status/1025081051360583680">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2018/Aug/6/russian-facebook-ads/#atom-tag Analyzing US Election Russian Facebook Ads 2018-08-06T16:01:18+00:00   1 <p>Two interesting data sources have emerged in the past few weeks concerning the Russian impact on the 2016 US elections.</p> <p>FiveThirtyEight <a href="https://fivethirtyeight.com/features/why-were-sharing-3-million-russian-troll-tweets/">published nearly 3 million tweets</a> from accounts associated with the Russian “Internet Research Agency” - see <a href="https://simonwillison.net/2018/Aug/6/troll-tweets/">my article and searchable tweet archive here</a>.</p> <p>Separately, the House Intelligence Committee Minority <a href="https://democrats-intelligence.house.gov/social-media-content/">released 3,517 Facebook ads</a> that were reported to have been bought by the Russian Internet Research Agency as a set of redacted PDF files.</p> <h3><a id="Exploring_the_Russian_Facebook_Ad_spend_18"></a>Exploring the Russian Facebook Ad spend</h3> <p>The initial data was released as <a href="https://democrats-intelligence.house.gov/social-media-content/social-media-advertisements.htm">zip files full of PDFs</a>, one of the least friendly formats you can use to publish data.</p> <p><a href="https://twitter.com/edsu">Ed Summers</a> took on the intimidating task of cleaning that up. <a href="https://github.com/edsu/irads">His results are incredible</a>: he used the <a href="https://pypi.org/project/pytesseract/">pytesseract OCR library</a> and <a href="https://pypi.org/project/PyPDF2/">PyPDF2</a> to extract both the images and the associated metadata and convert the whole lot into a single 3.9MB JSON file.</p> <p>I <a href="https://github.com/simonw/russian-ira-facebook-ads-datasette">wrote some code</a> to convert his JSON file to SQLite (more on the details later) and the result can be found here:</p> <p><a href="https://russian-ira-facebook-ads.datasettes.com/">https://russian-ira-facebook-ads.datasettes.com/</a></p> <p>Here’s an <a href="https://russian-ira-facebook-ads.datasettes.com/russian-ads-919cbfd/display_ads?_search=cops&amp;_sort_desc=spend_usd">example search for “cops” ordered by the USD equivalent spent on th…    
blog.db/entries http://simonwillison.net/2018/Aug/6/troll-tweets/#atom-tag Analyzing US Election troll tweets with Datasette 2018-08-06T15:15:30+00:00   1 <p>FiveThirtyEight <a href="https://fivethirtyeight.com/features/why-were-sharing-3-million-russian-troll-tweets/">published nearly 3 million tweets</a> from accounts associated with the Russian “Internet Research Agency”, based on research by Darren Linvill and Patrick Warren at at Clemson University.</p> <p>FiveThirtyEight’s tweets were <a href="https://github.com/fivethirtyeight/russian-troll-tweets">shared as CSV</a>, so I’ve used my <a href="https://github.com/simonw/csvs-to-sqlite">csvs-to-sqlite</a> tool to convert them and used <a href="https://github.com/simonw/datasette">Datasette</a> to publish them in a searchable, browsable interface: <a href="https://russian-troll-tweets.datasettes.com/">https://russian-troll-tweets.datasettes.com/</a></p> <p>The data is most interesting if you apply faceting. Here’s <a href="https://russian-troll-tweets.datasettes.com/russian-troll-tweets-dc6e91f/tweets?_facet=author&amp;_facet=region&amp;_facet=language&amp;_facet=post_type&amp;_facet=account_category">the full set of tweets faceted by author, language, region, post type and account category</a>:</p> <p><img alt="Faceted search interface showing Russian Troll Tweets" src="https://static.simonwillison.net/static/2018/troll-tweets-faceted.png" /></p> <p>The minimal <a href="https://github.com/simonw/russian-troll-tweets-datasette">source code for this Datasette instance</a> is on GitHub.</p>    
blog.db/entries http://simonwillison.net/2018/Dec/19/fast-autocomplete-search-your-website/#atom-tag Fast Autocomplete Search for Your Website 2018-12-19T00:26:32+00:00   1 <p><a href="https://24ways.org/2018/fast-autocomplete-search-for-your-website/">Fast Autocomplete Search for Your Website</a></p> <p>I wrote a tutorial for the 24 ways advent calendar on building fast autocomplete search for a website on top of Datasette and SQLite. I built the demo against 24 ways itself - I used wget to recursively fetch all 330 articles as HTML, then wrote code in a Jupyter notebook to extract the raw data from them (with BeautifulSoup) and load them into SQLite using my sqlite-utils Python library. I deployed the resulting database using Datasette, then wrote some vanilla JavaScript to implement autocomplete using fast SQL queries against the Datasette JSON API.</p> <p>Via <a href="https://twitter.com/simonw/status/1075181158327934976">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2018/Dec/19/fast-autocomplete-search/#atom-tag Fast Autocomplete Search for Your Website 2018-12-19T04:11:09+00:00   1 <p>Every website deserves a great search engine - but building a search engine can be a lot of work, and hosting it can quickly get expensive.</p> <p>I’m going to build a search engine for 24 ways that’s fast enough to support autocomplete (a.k.a. typeahead) search queries and can be hosted for free. I’ll be using wget, Python, SQLite, Jupyter, sqlite-utils and my open source <a href="https://datasette.readthedocs.io/">Datasette</a> tool to build the API backend, and a few dozen lines of modern vanilla JavaScript to build the interface.</p> <figure><img alt="Animated demo of autocomplete search against 24 ways" src="https://media.24ways.org/2018/willison/24ways-autocomplete.gif" /></figure> <p><a href="https://media.24ways.org/2018/willison/">Try it out here</a>, then read on to see how I built it.</p> <h3>First step: crawling the data</h3> <p>The first step in building a search engine is to grab a copy of the data that you plan to make searchable.</p> <p>There are plenty of potential ways to do this: you might be able to pull it directly from a database, or extract it using an API. If you don’t have access to the raw data, you can imitate Google and write a crawler to extract the data that you need.</p> <p>I’m going to do exactly that against 24 ways: I’ll build a simple crawler using <a href="https://en.wikipedia.org/wiki/Wget">wget</a>, a command-line tool that features a powerful “recursive” mode that’s ideal for scraping websites.</p> <p>We’ll start at the <code>https://24ways.org/archives/</code> page, which links to an archived index for every year that 24 ways has been running.</p> <p>Then we’ll tell <code>wget</code> to recursively crawl the website, using the <code>--recursive</code> flag.</p> <p>We don’t want to fetch every single page on the site - we’re only interested in the actual articles. Luckily, 24 ways has nicely designed URLs, so we can tell <code>wget</code> that we only care about pages that start with one of the years it has been running, using the <code>-I</code> argument like this: <code…    
blog.db/entries http://simonwillison.net/2018/Feb/1/observable-trees/#atom-tag Observable notebook: San Francisco trees from Datasette 2018-02-01T00:37:18+00:00   1 <p><a href="https://beta.observablehq.com/@simonw/san-francisco-trees-from-datasette">Observable notebook: San Francisco trees from Datasette</a></p> <p>I used an Observable notebook to rebuild my San Francisco tree search demo against a Datasette API of a CSV of trees published by the SF Department of Public Works. The map updates live as you type a query, and every cell can be toggled to view the underlying source code.</p>    
blog.db/entries http://simonwillison.net/2018/Jan/10/sqlite-osx/#atom-tag How to compile and run the SQLite JSON1 extension on OS X 2018-01-10T21:01:14+00:00   1 <p><a href="https://stackoverflow.com/a/40552648">How to compile and run the SQLite JSON1 extension on OS X</a></p> <p>Thanks, Stack Overflow! I&#x27;ve been battling this one for a while - it turns out you can download the SQLite source bundle, compile just the json1.c file using gcc and load that extension in Python&#x27;s sqlite3 module (or with Datasette&#x27;s --load-extension= option) to gain access to the full suite of SQLite JSON functions - json(), json_extract() etc.</p>    
blog.db/entries http://simonwillison.net/2018/Jan/17/datasette-publish/#atom-tag Datasette Publish: a web app for publishing CSV files as an online database 2018-01-17T14:11:05+00:00   1 <p>I’ve just released <a href="https://publish.datasettes.com/">Datasette Publish</a>, a web tool for turning one or more CSV files into an online database with a JSON API.</p> <p>Here’s <a href="https://datasette-onrlszntsq.now.sh/">a demo application I built</a> using Datasette Publish, showing Californian campaign finance data using CSV files released by the <a href="https://www.californiacivicdata.org/">California Civic Data Coalition</a>.</p> <p>And here’s an animated screencast showing exactly how I built it:</p> <p><img alt="Animated demo of Datasette Publish" src="https://static.simonwillison.net/static/2018/datasette-publish-demo.gif" /></p> <p>Datasette Publish combines my <a href="https://github.com/simonw/datasette">Datasette</a> tool for publishing SQLite databases as an API with my <a href="https://github.com/simonw/csvs-to-sqlite">csvs-to-sqlite</a> tool for generating them.</p> <p>It’s built on top of the <a href="https://zeit.co/now">Zeit Now</a> hosting service, which means anything you deploy with it lives on your own account with Zeit and stays entirely under your control. I used the brand new <a href="https://zeit.co/blog/api-2">Zeit API 2.0</a>.</p> <p>Zeit’s generous free plan means you can try the tool out as many times as you like - and if you want to use it for an API powering a production website you can easily upgrade to a <a href="https://zeit.co/pricing">paid hosting plan</a>.</p> <h2><a id="Who_should_use_it_16"></a>Who should use it</h2> <p>Anyone who has data they want to share with the world!</p> <p>The fundamental idea behind Datasette is that publishing structured data as both a web interface and a JSON API should be as quick and easy as possible.</p> <p>The world is full of interesting data that often ends up trapped in PDF blobs or other hard-to-use formats, if it gets published at all. Datasette encourages using SQLite instead: a powerful, flexible format that enables analysis via SQL queries and can easily be shared and hosted online.</p> <p>Since so much of the data that I…    
blog.db/entries http://simonwillison.net/2018/Jan/20/datasette-json/#atom-tag How to turn a list of JSON objects into a Datasette 2018-01-20T01:07:09+00:00   1 <p><a href="https://gist.github.com/simonw/eb5ad8e55d75bbc3003dd9e5d6eb438b">How to turn a list of JSON objects into a Datasette</a></p> <p>ramadis on GitHub cleaned up data on 184,879 crimes reported in Buenos Aires since 2016 and shared them on GitHub as a JSON file. Here are my notes on how to use Pandas to convert JSON into SQLite and publish it using Datasette.</p>    
blog.db/entries http://simonwillison.net/2018/Jan/28/analyzing-my-twitter-followers/#atom-tag Analyzing my Twitter followers with Datasette 2018-01-28T06:41:38+00:00   1 <p>I decided to do some ad-hoc analsis of my social network on Twitter this afternoon… and since everything is more fun if you bundle it up into a SQLite database and publish it to the internet I performed the analysis using <a href="https://github.com/simonw/datasette">Datasette</a>.</p> <h3><a id="The_end_result_4"></a>The end result</h3> <p>Here’s the Datasette database containing all of my Twitter followers: <a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a">https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a</a></p> <p>Much more interesting though are the queries I can now run against it. A few examples:</p> <ul> <li><a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+followers.rowid%2C+name%2C+screen_name%2C+description%2C+followers_count%2C+friends_count%2C+location.value+as+location+from+followers+join+location+on+followers.location+%3D+location.id+where+followers.rowid+in+%28select+rowid+from+%5Bfollowers_fts%5D+where+%5Bfollowers_fts%5D+match+%3Asearch%29+order+by+followers_count+desc">Search my followers (their name, bio and location), return results ordered by follower count</a>. This is a parameterized query - here are the resuts for <a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+followers.rowid%2C+name%2C+screen_name%2C+description%2C+followers_count%2C+friends_count%2C+location.value+as+location+from+followers+join+location+on+followers.location+%3D+location.id+where+followers.rowid+in+%28select+rowid+from+%5Bfollowers_fts%5D+where+%5Bfollowers_fts%5D+match+%3Asearch%29+order+by+followers_count+desc&amp;search=django">django</a>.</li> <li><a href="https://simonw-twitter-followers.now.sh/simonw-twitter-followers-b9bff3a?sql=select+rowid%2C+name%2C+screen_name%2C+description%2C+followers_count%2C+friends_count%2C+verified+from+followers+order+by+followers_count+desc">Whe are my most influential followers based on their own follower count?</a></li> <li><a href="ht…    
blog.db/entries http://simonwillison.net/2018/Jan/29/datasette-demo/#atom-tag Datasette Demo (video) from the SF Python Meetup 2018-01-29T05:52:21+00:00   1 <p>I gave a short talk about <a href="https://github.com/simonw/datasette">Datasette</a> last month at the <a href="https://www.meetup.com/sfpython/events/245088012/">SF Python Meetup Holiday Party</a>. They've just posted the video, so here it is:</p> <p>I showed how I built <a href="https://sf-tree-search.now.sh/">San Francisco Tree Search</a> using Datasette, <a href="https://github.com/simonw/csvs-to-sqlite">csvs-to-sqlite</a> and data from <a href="https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq">the San Francisco Department of Public Works</a>.</p>    
blog.db/entries http://simonwillison.net/2018/Jan/8/csvkit/#atom-tag csvkit 2018-01-08T21:03:38+00:00   1 <p><a href="http://csvkit.readthedocs.io/">csvkit</a></p> <p>&quot;A suite of command-line tools for converting to and working with CSV&quot; - includes a huge range of utilities for things like converting Excel and JSON to CSV, grepping, sorting and extracting a subset of columns, combining multiple CSV files together and exporting CSV to a relational database. Worth reading through the tutorial which shows how the different commands can be piped together.</p>    
blog.db/entries http://simonwillison.net/2018/Jan/8/himalayan-database-visual-foxpro-gui-json-api-datasette/#atom-tag Himalayan Database: From Visual FoxPro GUI to JSON API with Datasette 2018-01-08T19:26:49+00:00   1 <p><a href="https://gist.github.com/atomotic/61542deb5d1e77a5ff842658b75982ef">Himalayan Database: From Visual FoxPro GUI to JSON API with Datasette</a></p> <p>The Himalayan Database is a compilation of records for all expeditions that have climbed in the Nepalese Himalaya, originally compiled by journalist Elizabeth Hawley over several decades. The database is published as a Visual FoxPro database - here Raffaele Messuti‏ provides step-by-step instructions for extracting the data from the published archive, converting them to CSV using dbfcsv and then converting the CSVs to SQLite using csvs-to-sqlite so you can browse them using Datasette.</p> <p>Via <a href="https://twitter.com/atomotic/status/949975717734883328">Raffaele Messuti‏</a></p>    
blog.db/entries http://simonwillison.net/2018/Jul/12/now-cdn/#atom-tag The Now CDN 2018-07-12T03:34:06+00:00   1 <p><a href="https://zeit.co/blog/now-cdn">The Now CDN</a></p> <p>Huge announcement from Zeit Now today: all .now.sh deployments are now served through the Cloudflare CDN, which means they benefit from 150 worldwide CDN locations that obey HTTP caching headers. This is particularly relevant for Datasette, since it serves far-future cache headers by default and uses Cloudflare-compatible HTTP/2 push hints to accelerate 302 redirects. This means that both the &quot;datasette publish now&quot; CLI command and the Datasette Publish web app will now result in Cloudflare-accelerated deployments.</p> <p>Via <a href="https://twitter.com/zeithq/status/1017058342945087489">@zeithq</a></p>    
blog.db/entries http://simonwillison.net/2018/Jul/13/xars/#atom-tag XARs: An efficient system for self-contained executables 2018-07-13T19:00:12+00:00   1 <p><a href="https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-system-for-self-contained-executables/">XARs: An efficient system for self-contained executables</a></p> <p>Really interesting new open source project from Facebook: a XAR is a new way of packaging up a Python executable complete with its dependencies and resources such that it can be distributed and executed elsewhere as a single file. It&#x27;s kind of like a Docker container without Docker - it uses the SquashFS compressed read-only filesystem. I can&#x27;t wait to try this out with Datasette.</p> <p>Via <a href="https://twitter.com/llanga/status/1017830352243617794">@llanga</a></p>    
blog.db/entries http://simonwillison.net/2018/Jul/26/datasette-publish-subcommand/#atom-tag Datasette: publish_subcommand hook + default plugins mechanism, used for publish heroku/now 2018-07-26T05:27:49+00:00   1 <p><a href="https://github.com/simonw/datasette/pull/349">Datasette: publish_subcommand hook + default plugins mechanism, used for publish heroku/now</a></p> <p>I just landed a new plugin hook to Datasette master: publish_subcommand, which lets you define new publisher subcommands for the &quot;datasette publish&quot; CLI tool in addition to Heroku and Zeit Now. As part of this I&#x27;ve refactored the heroku/now publisher implementations into two default plugins that ship as part of Datasette - I hope to use this pattern for other core functionality in the future.</p> <p>Via <a href="https://twitter.com/simonw/status/1022352329884938240">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2018/Jul/28/documentation-unit-tests/#atom-tag Documentation unit tests 2018-07-28T15:59:55+00:00   1 <p>Keeping documentation synchronized with an evolving codebase is difficult. Without extreme discipline, it’s easy for documentation to get out-of-date as new features are added.</p> <p>One thing that can help is keeping the documentation for a project in the same repository as the code itself. This allows you to construct the ideal commit: one that includes the code change, the updated unit tests AND the accompanying documentation all in the same unit of work.</p> <p>When combined with a code review system (like <a href="https://www.phacility.com/phabricator/">Phabricator</a> or <a href="https://help.github.com/articles/about-pull-requests/">GitHub pull requests</a>) this pattern lets you enforce documentation updates as part of the review process: if a change doesn’t update the relevant documentation, point that out in your review!</p> <p>Good code review systems also execute unit tests automatically and attach the results to the review. This provides an opportunity to have the tests enforce other aspects of the codebase: for example, running a linter so that no-one has to waste their time arguing over standardize coding style.</p> <p>I’ve been experimenting with using unit tests to ensure that aspects of a project are covered by the documentation. I think it’s a very promising technique.</p> <h3><a id="Introspect_the_code_introspect_the_docs_12"></a>Introspect the code, introspect the docs</h3> <p>The key to this trick is introspection: interogating the code to figure out what needs to be documented, then parsing the documentation to see if each item has been covered.</p> <p>I’ll use my <a href="https://github.com/simonw/datasette">Datasette</a> project as an example. Datasette’s <a href="https://github.com/simonw/datasette/blob/295d005ca48747faf046ed30c3c61e7563c61ed2/tests/test_docs.py">test_docs.py</a> module contains three relevant tests:</p> <ul> <li><code>test_config_options_are_documented</code> checks that every one of Datasette’s <a href="http://datasette.readthedocs.io/en/latest/config.html">configu…    
blog.db/entries http://simonwillison.net/2018/Jun/18/datasette-csv-export/#atom-tag Datasette 0.23: CSV, SpatiaLite and more 2018-06-18T15:34:04+00:00   1 <p><a href="http://datasette.readthedocs.io/en/latest/changelog.html#v0-23">Datasette 0.23: CSV, SpatiaLite and more</a></p> <p>The big new feature in 0.23 is CSV export: any Datasette table or query can now be exported as CSV, including the option to get all matching rows in one giant CSV file taking advantage of Python 3 async and Datasette&#x27;s efficient keyset pagination. Also in this release: improved support for SpatiaLite and various JSON API improvements including the ability to expand foreign key labels in JSON and CSV responses.</p> <p>Via <a href="https://twitter.com/simonw/status/1008733990172282880">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2018/Jun/24/query-parquet-files-sqlite/#atom-tag Query Parquet files in SQLite 2018-06-24T19:44:06+00:00   1 <p><a href="https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html">Query Parquet files in SQLite</a></p> <p>Colin Dellow built a SQLite virtual table extension that lets you query Parquet files directly using SQL. Parquet is interesting because it&#x27;s a columnar format that dramatically reduces the space needed to store tables with lots of duplicate column data - most CSV files, for example. Colin reports being able to shrink a 1291 MB CSV file from the Canadian census to an equivalent Parquet file weighing just 42MB (3% of the original) - then running a complex query against the data in just 60ms. I&#x27;d love to see someone get this extension working with Datasette.</p>    
blog.db/entries http://simonwillison.net/2018/Jun/29/datasette-vega/#atom-tag datasette-vega 2018-06-29T15:00:48+00:00   1 <p><a href="https://github.com/simonw/datasette-vega">datasette-vega</a></p> <p>I wrote a visualization plugin for Datasette that uses the excellent Vega &quot;visualization grammar&quot; library to provide bar, line and scatter charts configurable against any Datasette table or SQL query.</p> <p>Via <a href="https://twitter.com/simonw/status/1012693000022036482">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2018/Mar/28/datasette-in-the-wild/#atom-tag Baltimore Sun Public Salary Records 2018-03-28T17:12:52+00:00   1 <p><a href="https://salaries.news.baltimoresun.com/">Baltimore Sun Public Salary Records</a></p> <p>The Baltimore Sun have published an interactive search engine for public salaries of Maryland state employees, and it&#x27;s powered by Datasette! Since data journalism is one of my key use-cases for Datasette I&#x27;m incredibly excited to see this in the wild. They&#x27;ve also published the underlying source code (see the via link) which is a really nice example of how to use Datasette&#x27;s custom templates and canned query functionality.</p> <p>Via <a href="https://github.com/baltimore-sun-data/salaries-datasette">baltimore-sun-data/salaries-datasette</a></p>    
blog.db/entries http://simonwillison.net/2018/Mar/28/webapp-deployment-using-containers/#atom-tag Cloud-first: Rapid webapp deployment using containers 2018-03-28T15:50:36+00:00   1 <p><a href="https://wwwf.imperial.ac.uk/blog/research-software-engineering/2018/03/27/cloud-first-rapid-webapp-deployment-using-containers/">Cloud-first: Rapid webapp deployment using containers</a></p> <p>The Research Software Engineering group at ICL have written a tutorial on deploying web apps as Docker containers using Azure and they use Datasette as the example application.</p> <p>Via <a href="https://twitter.com/imperialrse/status/978900269667901441">@ImperialRSE</a></p>    
blog.db/entries http://simonwillison.net/2018/May/12/datasette-full-text-search/#atom-tag Datasette: Full-text search 2018-05-12T12:09:24+00:00   1 <p><a href="http://datasette.readthedocs.io/en/latest/full_text_search.html">Datasette: Full-text search</a></p> <p>I wrote some documentation for Datasette&#x27;s full-text search feature, which detects tables which have been configured to use the SQLite FTS module and adds a search input box and support for a _search= querystring parameter.</p>    
blog.db/entries http://simonwillison.net/2018/May/17/sqlitebiter/#atom-tag sqlitebiter 2018-05-17T22:40:28+00:00   1 <p><a href="https://github.com/thombashi/sqlitebiter">sqlitebiter</a></p> <p>Similar to my csvs-to-sqlite tool, but sqlitebiter handles &quot;CSV/Excel/HTML/JSON/LTSV/Markdown/SQLite/SSV/TSV/Google-Sheets&quot;. Most interestingly, it works against HTML pages - run &quot;sqlitebiter -v url &#x27;https://en.wikipedia.org/wiki/Comparison_of_firewalls&#x27;&quot; and it will scrape that Wikipedia page and create a SQLite table for each of the HTML tables it finds there.</p>    
blog.db/entries http://simonwillison.net/2018/May/20/datasette-facets/#atom-tag Datasette Facets 2018-05-20T23:42:18+00:00   1 <p><a href="https://github.com/simonw/datasette/releases/tag/0.22">Datasette 0.22</a> is out with the most significant new feature I’ve added since the initial release: <strong>faceted browse</strong>.</p> <p><a href="https://github.com/simonw/datasette">Datasette</a> lets you deploy an instant web UI and JSON API for any SQLite database. <a href="https://github.com/simonw/csvs-to-sqlite">csvs-to-sqlite</a> makes it easy to create a SQLite database out of any collection of CSV files. <a href="https://publish.datasettes.com/">Datasette Publish</a> is a web app that can run these combined tools against CSV files you upload from your browser. And now the new <a href="https://datasette.readthedocs.io/en/latest/facets.html">Datasette Facets</a> feature lets you explore any CSV file using faceted navigation with a couple of clicks.</p> <h3><a id="Exploring_characters_from_Marvel_comics_6"></a>Exploring characters from Marvel comics</h3> <p>Let’s use facets to explore every character in the Marvel Universe.</p> <p>FiveThirtyEight have published <a href="https://github.com/fivethirtyeight/data/tree/master/comic-characters">a CSV file</a> of 16,376 characters from Marvel comics, scraped from Wikia as part of the research for their 2014 story <a href="https://fivethirtyeight.com/features/women-in-comic-books/">Comic Books Are Still Made By Men, For Men And About Men</a>.</p> <p>Here’s that CSV file <a href="https://fivethirtyeight.datasettes.com/fivethirtyeight-5de27e3/comic-characters%2Fmarvel-wikia-data">loaded into the latest version of Datasette</a>:</p> <p><img alt="Marvel characters explored using Datasette Facets" src="https://static.simonwillison.net/static/2018/marvel-facets.gif" /></p> <p>We start by applying the <a href="https://fivethirtyeight.datasettes.com/fivethirtyeight-5de27e3/comic-characters%2Fmarvel-wikia-data?_facet=ID&amp;_facet=ALIGN&amp;_facet=SEX">identity status, alignment and gender facets</a>. Then we filter down to just the <a href="https://fivethirtyeight.datasettes.com/fivethirtyeight-5de27e3…    
blog.db/entries http://simonwillison.net/2018/May/30/spatialite-datasette-docs/#atom-tag SpatiaLite — Datasette documentation 2018-05-30T04:34:06+00:00   1 <p><a href="http://datasette.readthedocs.io/en/latest/spatialite.html">SpatiaLite — Datasette documentation</a></p> <p>Datasette&#x27;s documentation now includes extensive coverage of the SpatiaLite extension for SQLite: how to install it, how to import latitude/longitude points, shapefiles and GeoJSON data into SpatiaLite tables, and how to run SQL queries against it that take advantage of spatial indexes. I&#x27;m learning SpatiaLite at the moment and filling out the documentation with each new trick I learn as I go - as Mark Pilgrim once taught me, the best way to learn a new technology is to write about it.</p>    
blog.db/entries http://simonwillison.net/2018/May/5/datasette-021/#atom-tag Datasette 0.21: New _shape=, new _size=, search within columns 2018-05-05T23:25:40+00:00   1 <p><a href="https://github.com/simonw/datasette/releases/tag/0.21">Datasette 0.21: New _shape=, new _size=, search within columns</a></p> <p>Nothing earth-shattering here but it&#x27;s accumulated enough small improvements that it warranted a new release. You can now send ?_shape=array to get back a plain JSON array of results, ?_size=XXX|max to get back a specific number of rows from a table view and ?_search_COLUMN=text to run full-text search against a specific column.</p>    
blog.db/entries http://simonwillison.net/2018/May/9/changelog/#atom-tag Notes from my appearance on the Changelog podcast 2018-05-09T18:03:05+00:00   1 <p>After I spoke at <a href="https://zeit.co/day">Zeit Day SF</a> last weekend I sat down with <a href="https://twitter.com/adamstac">Adam Stacoviak</a> to record a 25 minute segment for <a href="https://changelog.com/podcast/296">episode 296</a> of the Changelog podcast, talking about Datasette. We covered a lot of ground!</p> <p>A transcript should be available soon (the Changelog have <a href="https://github.com/thechangelog/transcripts">excellent transcripts</a>) but in the meantime here are my own notes on the topics that we covered.</p> <p>My section starts <a href="https://changelog.com/podcast/296#t=00:54:45">54:45 minutes into the episode</a>.</p> <h3><a id="From_the_intro_8"></a>From the intro</h3> <p>Datasette was inspired by my work at the Guardian with <a href="https://www.theguardian.com/profile/simonrogers">Simon Rogers</a>, where we started the <a href="https://www.theguardian.com/data">The Guardian Data Blog</a>.</p> <p>Here’s <a href="https://github.com/simonw/datasette">Datasette</a> on GitHub, and <a href="https://simonwillison.net/2017/Nov/13/datasette/">Datasette: instantly create and publish an API for your SQLite databases</a> where I first announced the project.</p> <p>My <a href="https://www.sf-trees.com/">sf-trees.com</a> website, which is powered by <a href="https://san-francisco.datasettes.com/sf-trees/Street_Tree_List">this Datasette API</a> using <a href="https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq">data from the San Francisco Department of Public Works</a>.</p> <p>The hosting platform I use for immutable deployments of Docker containers is <a href="https://zeit.co/now">Zeit Now</a>.</p> <h3><a id="The_Datasette_tool_suite_18"></a>The Datasette tool suite</h3> <p>Here’s the full Datasette suite of tools:</p> <ul> <li><a href="https://github.com/simonw/csvs-to-sqlite">csvs-to-sqlite</a> is a command-line tool for converting CSV files into a SQLite database.</li> <li><a href="https://github.com/simonw/datasette">Datasette</a> is a web application for explori…    
blog.db/entries http://simonwillison.net/2018/May/9/datasette-metropolitan-museum-art/#atom-tag Datasette: The Metropolitan Museum of Art 2018-05-09T18:38:50+00:00   1 <p><a href="https://metmuseum.datasettes.com/MetObjects-00f9d76/MetObjects">Datasette: The Metropolitan Museum of Art</a></p> <p>The Metropolitan Museum of Art publish a CSV file on GitHub with details of 464,360 items from their collection. I turned it into a searchable Datasette instance.</p> <p>Via <a href="https://twitter.com/simonw/status/994285501271158784">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2018/Nov/19/smaller-python-docker-images/#atom-tag Zeit 2.0, and building smaller Python Docker images 2018-11-19T03:13:40+00:00   1 <p>Changes are afoot at <a href="https://zeit.co/now">Zeit Now</a>, my preferred hosting provider for the past year (see <a href="https://simonwillison.net/tags/zeitnow/">previous posts</a>). They have <a href="https://zeit.co/blog/now-2">announced Now 2.0</a>, an intriguing new approach to providing auto-scaling immutable deployments. It’s built on top of lambdas, and comes with a whole host of new constraints: code needs to fit into a 5MB bundle for example (though it looks like this restriction will soon be <a href="https://spectrum.chat/?t=0ab38384-5aa3-4b04-899a-5b056f9b83b9">relaxed a little</a> - <strong>update November 19th</strong> you can now <a href="https://zeit.co/blog/customizable-lambda-sizes">bump this up to 50MB</a>).</p> <p>Unfortunately, they have also announced their intent to deprecate the existing Now v1 Docker-based solution.</p> <blockquote> <p>“We will only start thinking about deprecation plans once we are able to accommodate the most common and critical use cases of v1 on v2” - <a href="https://spectrum.chat/thread/96985341-e17f-4af4-a330-c726774ed436?m=MTU0MTcwOTU1ODIwNA==">Matheus Fernandes</a></p> </blockquote> <blockquote> <p>“When we reach feature parity, we still intend to give customers plenty of time to upgrade (we are thinking at the very least 6 months from the time we announce it)” - <a href="https://spectrum.chat/thread/46d54a53-f58d-4e8f-bce2-047a6ac93305?m=MTU0MjUyMDMwMzc5NQ==">Guillermo Rauch</a></p> </blockquote> <p>This is pretty disastrous news for many of my projects, most crucially <a href="https://github.com/simonw/datasette">Datasette</a> and <a href="https://simonwillison.net/2018/Jan/17/datasette-publish/">Datasette Publish</a>.</p> <p>Datasette should be fine - it supports Heroku as an alternative to Zeit Now <a href="https://datasette.readthedocs.io/en/stable/publish.html">out of the box</a>, and the <a href="https://datasette.readthedocs.io/en/stable/plugins.html#publish-subcommand-publish">publish_subcommand plugin hook</a> makes it easy to add further provid…    
blog.db/entries http://simonwillison.net/2018/Oct/25/how-instantly-publish-data-internet-datasette/#atom-tag How to Instantly Publish Data to the Internet with Datasette 2018-10-25T17:18:43+00:00   1 <p>I spoke about my <a href="https://github.com/simonw/datasette">Datasette</a> project at <a href="https://pybay.com/">PyBay</a> in August and they've just posted <a href="https://www.youtube.com/watch?v=lmP75mp3-Rg">the video</a> of my talk.</p> <div class="videoWrapper" style="margin-bottom: 1em;"> &#160; </div> <p>I've also published the <a href="https://static.simonwillison.net/static/2018/pybay-datasette/">annotated slides</a> from the talk, using a similar format to <a href="https://simonwillison.net/2010/Apr/25/redis/">my Redis tutorial</a> from back in 2010.</p>    
blog.db/entries http://simonwillison.net/2018/Oct/4/datasette-ideas/#atom-tag The interesting ideas in Datasette 2018-10-04T02:28:45+00:00   1 <p><a href="https://github.com/simonw/datasette">Datasette</a> (<a href="https://simonwillison.net/tags/datasette/">previously</a>) is my open source tool for exploring and publishing structured data. There are a lot of ideas embedded in Datasette. I realized that I haven’t put many of them into writing.</p> <p> <a href="#Publishing_readonly_data">Publishing read-only data</a><br /> <a href="#Bundling_the_data_with_the_code">Bundling the data with the code</a><br /> <a href="#SQLite_as_the_underlying_data_engine">SQLite as the underlying data engine</a><br /> <a href="#Farfuture_cache_expiration">Far-future cache expiration</a><br /> <a href="#Publishing_as_a_core_feature">Publishing as a core feature</a><br /> <a href="#License_and_source_metadata">License and source metadata</a><br /> <a href="#Facet_everything">Facet everything</a><br /> <a href="#Respect_for_CSV">Respect for CSV</a><br /> <a href="#SQL_as_an_API_language">SQL as an API language</a><br /> <a href="#Optimistic_query_execution_with_time_limits">Optimistic query execution with time limits</a><br /> <a href="#Keyset_pagination">Keyset pagination</a><br /> <a href="#Interactive_demos_based_on_the_unit_tests">Interactive demos based on the unit tests</a><br /> <a href="#Documentation_unit_tests">Documentation unit tests</a></p> <h3><a id="Publishing_readonly_data"></a>Publishing read-only data</h3> <p>Datasette provides a read-only API to your data. It makes no attempt to deal with writes. Avoiding writes entirely is fundamental to a plethora of interesting properties, many of which are expanded on further below. In brief:</p> <ul> <li>Hosting web applications with no read/write persistence requirements is incredibly cheap in 2018 - often free (both <a href="https://zeit.co/now">ZEIT Now</a> and a <a href="https://www.heroku.com/">Heroku</a> have generous free tiers). This is a big deal: even having to pay a few dollars a month is enough to dicentivise sharing data, since now you have to figure out who will pay and ensure the payments don’t expire i…    
blog.db/entries http://simonwillison.net/2019/Apr/13/where-sql/#atom-tag Datasette: ?_where=sql-fragment parameter for table views 2019-04-13T02:00:32+00:00   1 <p><a href="https://github.com/simonw/datasette/issues/429">Datasette: ?_where=sql-fragment parameter for table views</a></p> <p>I just shipped a tiny but really useful new feature to Datasette master: you can now add ?_where=sql-fragment on to the URL of any table view to inject additional SQL directly into the underlying WHERE clause. This tiny feature actually has some really interesting applications: I created this because I wanted to be able to run more complex custom SQL queries without losing access to the conveniences of Datasette&#x27;s table view, in particular the built-in faceting support. The feature actually fits in well with Datasette&#x27;s philosophy of allowing arbitrary SQL to be executed against a read-only database: you can turn this ability off using the allow_sql config flag.</p>    
blog.db/entries http://simonwillison.net/2019/Apr/23/datasette-glitch/#atom-tag Running Datasette on Glitch 2019-04-23T04:08:53+00:00   1 <p>The worst part of any software project is setting up a development environment. It’s by far the biggest barrier for anyone trying to get started learning to code. I’ve been a developer for more than twenty years and I still feel the pain any time I want to do something new.</p> <p><a href="https://glitch.com/">Glitch</a> is the most promising attempt I’ve ever seen at tackling this problem. It provides an entirely browser-based development environment that allows you to edit code, see the results instantly and view and remix the source code of other people’s projects.</p> <p>It’s developed into a really fun, super-creative community and a fantastic resource for people looking to get started in the ever-evolving world of software development.</p> <p>This evening I decided to get <a href="https://datasette.readthedocs.io/">Datasette</a> running on it. I’m really impressed with how well it works, and I think Glitch provides an excellent environment for experimenting with Datasette and <a href="https://datasette.readthedocs.io/en/stable/ecosystem.html">related tools</a>.</p> <p>TLDR version: visit <a href="https://glitch.com/edit/#!/remix/datasette-csvs">https://glitch.com/edit/#!/remix/datasette-csvs</a> right now, drag-and-drop in a CSV file and watch it get served by Datasette on Glitch just a few seconds later.</p> <h3><a id="Running_Python_on_Glitch_12"></a>Running Python on Glitch</h3> <p>The Glitch documentation is all about Node.js and JavaScript, but they actually have very solid Python support as well.</p> <p>Every Glitch project runs in a container that includes Python 2.7.12 and Python 3.5.2, and you can use <code>pip install --user</code> or <code>pip3 install --user</code> to install Python dependencies.</p> <p>The key to running non-JavaScript projects on Glitch is the <code>glitch.json</code> file format. You can use this to specify an <code>install</code> script, which sets up your container, and a <code>start</code> script, which starts your application running. Glitch will route HTTP traffic to …    
blog.db/entries http://simonwillison.net/2019/Apr/26/dockerfile-datasette-dentists/#atom-tag Dockerfile for creating a Datasette of NHS dentist information 2019-04-26T14:09:34+00:00   1 <p><a href="https://github.com/hubgit/datasette-dentists/blob/master/Dockerfile">Dockerfile for creating a Datasette of NHS dentist information</a></p> <p>Really neat Dockerfile example by Alf Eaton that uses multi-stage builds to pull dentist information from the NHS, compile to SQLite using csvs-to-sqlite and serve the results with Datasette. TIL the NHS like to use ¬ as their CSV separator!</p> <p>Via <a href="https://twitter.com/invisiblecomma/status/1121768361648635904">@invisiblecomma</a></p>    
blog.db/entries http://simonwillison.net/2019/Apr/9/vintage-python-on-cloud-run/#atom-tag Ministry of Silly Runtimes: Vintage Python on Cloud Run 2019-04-09T17:33:47+00:00   1 <p><a href="https://dev.to/di/ministry-of-silly-runtimes-vintage-python-on-cloud-run-3b9d">Ministry of Silly Runtimes: Vintage Python on Cloud Run</a></p> <p>Cloud Run is an exciting new hosting service from Google that lets you define a container using a Dockerfile and then run that container in a &quot;scale to zero&quot; environment, so you only pay for time spent serving traffic. It&#x27;s similar to the now-deprecated Zeit Now 1.0 which inspired me to create Datasette. Here Dustin Ingram demonstrates how powerful Docker can be as the underlying abstraction by deploying a web app using a 25 year old version of Python 1.x.</p> <p>Via <a href="https://twitter.com/jacobian/status/1115665262215352320">@jacobian</a></p>    
blog.db/entries http://simonwillison.net/2019/Aug/4/many-to-many/#atom-tag Working with many-to-many relationships in sqlite-utils 2019-08-04T03:57:20+00:00   1 <p><a href="https://sqlite-utils.readthedocs.io/en/stable/python-api.html#working-with-many-to-many-relationships">Working with many-to-many relationships in sqlite-utils</a></p> <p>I just released sqlite-utils 1.9 with syntactic sugar support for creating many-to-many relationships for records stored in SQLite databases.</p> <p>Via <a href="https://twitter.com/simonw/status/1157861422723940352">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2019/Dec/10/better-presentations/#atom-tag Better presentations through storytelling and STAR moments 2019-12-10T00:00:42+00:00   1 <p>Last week I completed <a href="https://explorecourses.stanford.edu/search?view=catalog&amp;filter-coursestatus-Active=on&amp;q=GSBGEN%20315:%20Strategic%20Communication&amp;academicYear=20192020">GSBGEN 315: Strategic Communication</a> at the Stanford Graduate School of Business.</p> <p>The course has a stellar, well deserved reputation. It's principally about public speaking, and I gained a huge amount from it despite having over fifteen years of experience speaking at conferences.</p> <p>Some of the things that really stood out for me (partially in the form of catchy acronyms):</p> <ul><li>Every talk should start with an AIM: Audience, Intent, Message. Who are the audience for the talk? What do you intend to achieve by giving the presentation? With those two things in mind, you can construct the message - the actual content of the talk.</li><li>Try to include at least one STAR moment - Something They'll Always Remember. This can be a gimmick, a repeated theme, a well-selected video or audio clip. Something to help the talk stand out.</li><li>The human brain is incredibly attuned to stories. If you can find an excuse to tell a story, no matter how thin that excuse is, take it.</li><li>Presentations are most interesting if they are structured with contrasts. These can be emotional high and low points, or content that illustrates what <em>is</em> compared to what <em>could be</em>. <a href="https://www.duarte.com/presentation-skills-resources/creating-moments-of-impact-using-sparklines-for-strategic-conversations/">Sparklines</a> are a tool that can be used to think about this structure.</li></ul> <p>That last point about stories is where things get really interesting. We reviewed the classic <a href="https://en.wikipedia.org/wiki/Hero%27s_journey">hero's journey</a> story structure... but with a twist.</p> <p>When giving a talk, position your <em>audience</em> as the hero. They start in position of comfort and safety. Your job is to call them to adventure - guide them towards a dangerous and unknown realm…    
blog.db/entries http://simonwillison.net/2019/Dec/16/logging-sqlite-asgi-middleware/#atom-tag Logging to SQLite using ASGI middleware 2019-12-16T22:30:46+00:00   1 <p>I had some fun playing around with <a href="https://asgi.readthedocs.io/en/latest/specs/main.html#middleware">ASGI middleware</a> and logging during our flight back to England for the holidays.</p> <h3 id="asgi-log-to-sqlite">asgi-log-to-sqlite</h3> <p>I decided to experiment with SQLite as a logging mechanism. I wouldn’t use this on a high traffic site, but most of my Datasette related projects are small enough that logging HTTP traffic directly to a SQLite database feels like it should work reasonable well.</p> <p>Once your logs are in a SQLite database, you can use <a href="https://datasette.readthedocs.io/">Datasette</a> to analyze them. I think this could be a lot of fun.</p> <p><a href="https://github.com/simonw/asgi-log-to-sqlite">asgi-log-to-sqlite</a> is my first exploration of this idea. It’s a piece of ASGI middleware which wraps an ASGI application and then logs relevant information from the request and response to an attached SQLite database.</p> <p>You use it like this:</p> <pre class=" language-python"><code class="prism language-python"><span class="token keyword">from</span> asgi_log_to_sqlite <span class="token keyword">import</span> AsgiLogToSqlite <span class="token keyword">from</span> my_asgi_app <span class="token keyword">import</span> app app <span class="token operator">=</span> AsgiLogToSqlite<span class="token punctuation">(</span>app<span class="token punctuation">,</span> <span class="token string">"/tmp/log.db"</span><span class="token punctuation">)</span> </code></pre> <p>Here’s a demo Datasette instance showing logs from my testing: <a href="https://asgi-log-demo-j7hipcg4aq-uc.a.run.app">asgi-log-demo-j7hipcg4aq-uc.a.run.app</a></p> <p>As always with Datasette, the data is at its most interesting once you <a href="https://asgi-log-demo-j7hipcg4aq-uc.a.run.app/asgi-log-demo/requests?_sort_desc=rowid&amp;_facet=path&amp;_facet=user_agent&amp;_facet=content_type#facet-content_type">apply some facets</a>.</p> <h3 id="intercepting-requests-to-and-from-the-wrapped-asgi-app">Interc…    
blog.db/entries http://simonwillison.net/2019/Dec/20/data-driven-reporting-more-newsrooms/#atom-tag Building tools to bring data-driven reporting to more newsrooms 2019-12-20T11:17:41+00:00   1 <p><a href="https://medium.com/jsk-class-of-2020/building-tools-to-bring-data-driven-reporting-to-more-newsrooms-4520a0c9b3f2">Building tools to bring data-driven reporting to more newsrooms</a></p> <p>I wrote about my fellowship project so far and my goals for the next few months for the JSK Medium publication. My next priority: an invite-only hosted version for newsrooms so that figuring out how to install and manage the software isn&#x27;t the biggest barrier to entry.</p>    
blog.db/entries http://simonwillison.net/2019/Dec/23/weeknotes-datasette-033/#atom-tag Weeknotes: Datasette 0.33 2019-12-23T21:38:06+00:00   1 <p>I released <a href="https://datasette.readthedocs.io/en/latest/changelog.html#v0-33">Datasette 0.33</a> yesterday. The release represents an accumulation of small changes and features since Datasette 0.32 back in November. Duplicating the release notes:</p> <ul><li><code>rowid</code> is now included in dropdown menus for filtering tables (<a href="https://github.com/simonw/datasette/issues/636">#636</a>)</li><li>Columns are now only suggested for faceting if they have at least one value with more than one record (<a href="https://github.com/simonw/datasette/issues/638">#638</a>)</li><li>Queries with no results now display “0 results” (<a href="https://github.com/simonw/datasette/issues/637">#637</a>)</li><li>Improved documentation for the <code>--static</code> option (<a href="https://github.com/simonw/datasette/issues/641">#641</a>)</li><li>asyncio task information is now included on the <code>/-/threads</code> debug page</li><li>Bumped Uvicorn dependency 0.11</li><li>You can now use <code>--port 0</code> to listen on an available port</li><li>New <code>template_debug</code> setting for debugging templates, e.g. https://latest.datasette.io/fixtures/roadside<em>attractions?</em>context=1 (<a href="https://github.com/simonw/datasette/issues/654">#654</a>)</li></ul> <p>The last three items deserve extra explanation.</p> <h3>Port 0</h3> <p>I run a <em>lot</em> of Datasette instances on my laptop - I often have 5 or 6 running representing different projects or experiments.</p> <p>Eventually this means I start to hit port conflicts. I'll run <code>datasette mydb.db -p 8005</code> and get an error because port 8005 is already in use by something else.</p> <p>I <a href="https://twitter.com/simonw/status/1207585946440478720">asked on Twitter</a> if there were any downsides to having Datasette automatically pick an unused port... and @davids and @dominicrodger pointed out that port 0 is <a href="https://www.lifewire.com/port-0-in-tcp-and-udp-818145">a Unix convention</a> for "pick a port for me".</p> <p>It turne…    
blog.db/entries http://simonwillison.net/2019/Dec/3/datasette-atom/#atom-tag datasette-atom: Define an Atom feed using a custom SQL query 2019-12-03T00:20:44+00:00   1 <p>I've been having a ton of fun iterating on <a href="https://www.niche-museums.com/">www.niche-museums.com</a>. I put together <a href="https://simonwillison.net/2019/Nov/25/niche-museums/">some notes on how the site works</a> last week, and I've been taking advantage of the Thanksgiving break to continue exploring ways in which <a href="https://github.com/simonw/datasette">Datasette</a> can be used to quickly build database-backed static websites.</p> <p>I post a new museum to the site every day, so it was inevitable that someone would <a href="https://projects.metafilter.com/5697/Niche-Museums">ask for a feed</a>. And here it is: an <a href="https://www.niche-museums.com/browse/feed.atom?_feed_title=Niche+Museums">Atom feed for Niche Museums</a>.</p> <p>This means Niche Museums is effectively a blog now, which is fitting: it's a universal truth that any sufficiently advanced backend technology will evolve to the point where it can power a blog with an Atom feed.</p> <h3 id="datasette-atom">datasette-atom</h3> <p>I built the feed by wrapping up work on the first version of a new Datasette plugin: <a href="https://github.com/simonw/datasette-atom">datasette-atom</a>. It takes advantage of the <a href="https://datasette.readthedocs.io/en/stable/plugins.html#register-output-renderer-datasette">register_output_renderer</a> plugin hook, which was contributed by Russ Garrett <a href="https://github.com/simonw/datasette/pull/441">back in May</a>.</p> <p>The idea with the plugin is to make it possible to construct an Atom feed from an arbitrary SQL query.</p> <p>This is a really powerful ability. It means that a user with sufficent knowledge of SQL can subscribe to an arbitrary feed of data from any Datasette instance that is running the plugin. </p> <h3 id="defining-atom-with-sql">Defining an Atom feed with a SQL query</h3> <p>The plugin works by requiring you to provide a SQL query that produces the following columns in its output:</p> <p><code>atom_id</code>, <code>atom_title</code> and <code>atom_updated<…    
blog.db/entries http://simonwillison.net/2019/Dec/30/sqlite-utils-2/#atom-tag sqlite-utils 2.0: real upserts 2019-12-30T06:48:40+00:00   1 <p>I just <a href="https://github.com/simonw/sqlite-utils/releases/tag/2.0">released version 2.0</a> of my <a href="https://github.com/simonw/sqlite-utils">sqlite-utils</a> library/CLI tool to <a href="https://pypi.org/project/sqlite-utils/">PyPI</a>.</p> <p>It's 2.0 because this is my first backwards compatibility breaking version of the library. I was hoping to avoid this for as long as possible: given semantic versioning, <a href="https://twitter.com/simonw/status/1192130232498569216">my opinion</a> is that the highest quality libraries are the 1.x versions with the highest sub-version. Version 1.253 means a library managed to ship 253 new features without once breaking compatibility!</p> <p>For sqlite-utils I needed the clean break, because I made a big mistake in my definition of "upsert".</p> <p>An upsert is a database operation which says "Update this record with these new values. If the record does not exist, create it."</p> <p>This is actually a <a href="https://www.sqlite.org/lang_UPSERT.html">documented feature of SQLite</a> version 3.24.0 and higher. I've seen the term used for other databases <a href="https://wiki.postgresql.org/wiki/UPSERT">such as PostgreSQL</a> as well.</p> <p>The problem is: when I first implemented the <code>.upsert()</code> method for sqlite-utils I incorrectly assumed that upsert was an alias for <code>INSERT OR REPLACE</code>.</p> <p>So in sqlite-utils 1.x, the following code:</p> <pre><code>db["dogs"].upsert({ "id": 1, "name": "Cleo", "age": 4 }, pk="id")</code></pre> <p>Executes then following SQL:</p> <pre><code>INSERT OR REPLACE INTO dogs ("id", "name", "age") VALUES (1, "Cleo", 4);</code></pre> <p>It turns out this isn't how upsert should be expected to work. This query will entirely replace any existing record. But... an upsert should update the record in place. Crucially, it should leave any columns not referenced in the upsert alone. My implementation over-wrote the entire row, deleting any missing values.</p> <p>So if I run this Python …    
blog.db/entries http://simonwillison.net/2019/Feb/1/datasette/#atom-tag Datasette 0.27 2019-02-01T04:39:30+00:00   1 <p><a href="https://datasette.readthedocs.io/en/stable/changelog.html#v0-27">Datasette 0.27</a></p> <p>The latest release of Datasette introduces an option to output tables and SQL query results as newline-delimited JSON - plus a new &quot;datasette plugins&quot; command for listing available plugins.</p> <p>Via <a href="https://twitter.com/simonw/status/1091189046745849857">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2019/Feb/1/ecosystem/#atom-tag The Datasette Ecosystem 2019-02-01T04:41:21+00:00   1 <p><a href="https://datasette.readthedocs.io/en/stable/ecosystem.html">The Datasette Ecosystem</a></p> <p>I&#x27;ve written a page of documentation that introduces the wider Datasette Ecosystem: csvs-to-sqlite, sqlite-utils, db-to-sqlite, dbf-to-sqlite, markdown-to-sqlite and a full collection of Datasette plugins.</p>    
blog.db/entries http://simonwillison.net/2019/Feb/25/sqlite-utils/#atom-tag sqlite-utils: a Python library and CLI tool for building SQLite databases 2019-02-25T03:29:20+00:00   1 <p><a href="https://github.com/simonw/sqlite-utils">sqlite-utils</a> is a combination Python library and command-line tool I’ve been building over the past six months which aims to make creating new SQLite databases as quick and easy as possible.</p> <p>It’s part of <a href="https://datasette.readthedocs.io/en/stable/ecosystem.html">the ecosystem of tools</a> I’m building around my <a href="https://datasette.readthedocs.io/">Datasette</a> project.</p> <p>I spent the weekend adding all kinds of exciting command-line options to it, so I’m ready to describe it to the world.</p> <h3><a id="A_Python_library_for_quickly_creating_databases_8"></a>A Python library for quickly creating databases</h3> <p>A core idea behind Datasette is that <a href="https://www.sqlite.org/">SQLite</a> is the ideal format for publishing all kinds of interesting structured data. Datasette takes any SQLite database and adds a browsable web interface, <a href="https://datasette.readthedocs.io/en/stable/json_api.html">a JSON API</a> and the ability to <a href="https://datasette.readthedocs.io/en/stable/csv_export.html">export tables and queries as CSV</a>.</p> <p>The other half of the equation then is tools to create SQLite databases. <a href="https://github.com/simonw/csvs-to-sqlite">csvs-to-sqlite</a> was my first CLI attempt at this. <code>sqlite-utils</code> takes a much more flexible and comprehensive approach.</p> <p>I started working on <code>sqlite-utils</code> last year as part of my project to <a href="https://simonwillison.net/2018/Aug/6/russian-facebook-ads/">Analyze US Election Russian Facebook Ads</a>. The initial aim was to build a library that made constructing new SQLite databases inside of a <a href="https://jupyter.org/">Jupyter notebook</a> as productive as possible.</p> <p>The core idea behind the library is that you can give it a list of Python dictionaries (equivalent to JSON objects) and it will automatically create a SQLite table with the correct schema, then insert those items into the new table.</p> <p>To illustrate, …    
blog.db/entries http://simonwillison.net/2019/Feb/8/socrata2sql/#atom-tag socrata2sql 2019-02-08T15:27:14+00:00   1 <p><a href="https://github.com/DallasMorningNews/socrata2sql">socrata2sql</a></p> <p>Phenomenal new open source tool released by Andrew Chavez at the Dallas Morning News. Socrata is the open data portal software used by huge numbers of local governments worldwide. socrata2sql is a tool that interacts with the standard Socrata API and can use it to suck down a dataset and save it as a SQLite, PostgreSQL, MySQL or other SQLAlchemy-supported database. I just tried this and it took a single command to create a SQLite database of every police arrest in Dallas in the past five years.</p> <p>Via <a href="https://twitter.com/newsnerdrepos/status/1093632312238063617">@newsnerdrepos</a></p>    
blog.db/entries http://simonwillison.net/2019/Jan/11/usable-data/#atom-tag Usable Data 2019-01-11T18:33:18+00:00   1 <p><a href="https://medium.com/@ftrain/usable-data-eb7234d64309">Usable Data</a></p> <p>A Paul Ford essay from February 2016 in which he advocates for SQLite as the ideal format for sharing interesting data. I don&#x27;t know how I missed this one - it predates Datasette, but it perfectly captures the benefits that I&#x27;m trying to expose with the project. &quot;In my dream universe, there would be a massive searchable torrent site filled with open, explorable data sets, in SQLite format, some with full text search indexes already in place.&quot;</p> <p>Via <a href="https://twitter.com/simonw/status/1083793391631032320">Twitter</a></p>    
blog.db/entries http://simonwillison.net/2019/Jan/15/sqlite-in-2018-a-state-of-the-art-sql-dialect/#atom-tag SQLite in 2018: A state of the art SQL dialect 2019-01-15T16:21:12+00:00   1 <p><a href="https://modern-sql.com/blog/2019-01/sqlite-in-2018">SQLite in 2018: A state of the art SQL dialect</a></p> <p>In 2018 SQLite gained boolean literals, window functions, filter clauses, upserts and the ability to rename a column. If you want to try it out the latest official datasetteproject/datasette Docker image now bundles SQLite 3.26.</p> <p>Via <a href="https://twitter.com/markuswinand/status/1085111833403580416">@markuswinand</a></p>    
blog.db/entries http://simonwillison.net/2019/Jan/7/exploring-search-relevance-algorithms-sqlite/#atom-tag Exploring search relevance algorithms with SQLite 2019-01-07T03:29:29+00:00   1 <p><a href="https://www.sqlite.org/index.html">SQLite</a> isn’t just a fast, high quality embedded database: it also incorporates a powerful full-text search engine in the form of the <a href="https://www.sqlite.org/fts3.html">FTS4</a> and <a href="https://sqlite.org/fts5.html">FTS5</a> extensions. You’ve probably used these a bunch of times already: many iOS, Android and desktop applications use SQLite under-the-hood and use it to implement their built-in search.</p> <p>I’ve been using these capabilities for <a href="https://datasette.readthedocs.io/en/stable/full_text_search.html">basic search in Datasette</a> for over a year now, but I’ve recently started digging into some of their more advanced features. It turns out hacking around with SQLite is a great way to learn more about how fundamental information retrieval algorithms work under the hood.</p> <p>Today I’m releasing <a href="https://github.com/simonw/sqlite-fts4">sqlite-fts4</a> - a Python package that provides a collection of custom SQL functions for working with SQLite’s FTS4 module. It includes some neat tools for introspecting how relevancy ranking algorithms actually work.</p> <h3><a id="Why_not_just_use_FTS5_8"></a>Why not just use FTS5?</h3> <p>If it’s available to you FTS5 is usually the best option: it has a good ranking algorithm built in. I described how to use it to build <a href="https://24ways.org/2018/fast-autocomplete-search-for-your-website/">fast autocomplete search for your website</a> for the 2018 <a href="https://24ways.org/">24 ways advent calendar</a>. You can join directly against a virtual table and order by a pre-calculated relevance score accessible through that table.</p> <p>What makes FTS4 interesting is that it doesn’t include a scoring mechanism: it instead exposes raw statistical data to you in a way that lets you build your own ranking functions.</p> <p>You probably don’t <em>need</em> to do this - unless you are stuck on an older SQLite version that doesn’t support the latest features. But… if you’re interested in unde…    
blog.db/entries http://simonwillison.net/2019/Jul/1/db-to-sqlite/#atom-tag db-to-sqlite 1.0 release 2019-07-01T01:35:40+00:00   1 <p><a href="https://github.com/simonw/db-to-sqlite/releases/tag/1.0">db-to-sqlite 1.0 release</a></p> <p>I&#x27;ve released version 1.0 of my db-to-sqlite tool, which lets you create a SQLite database copy of any database supported by SQLAlchemy (I&#x27;ve tested it against MySQL and PostgreSQL). The tool has a bunch of new features: you can use --redact to redact specific columns, specify --table multiple times to copy a subset of tables, and the --all option now efficiently adds all foreign keys at the end of the import. The project now has unit tests which run against MySQL and PostgreSQL in Travis CI. Also included in the README: a shell one-liner for creating a local SQLite copy of a remote Heroku Postgres database based on extracting the connection string from a Heroku config environment variable.</p>    
blog.db/entries http://simonwillison.net/2019/Jul/14/sso-asgi/#atom-tag Single sign-on against GitHub using ASGI middleware 2019-07-14T01:18:56+00:00   1 <p>I released <a href="https://datasette.readthedocs.io/en/stable/changelog.html#v0-29">Datasette 0.29</a> last weekend, the first version of Datasette to be built on top of ASGI (discussed previously in <a href="https://simonwillison.net/2019/Jun/23/datasette-asgi/">Porting Datasette to ASGI, and Turtles all the way down</a>).</p> <p>This also marked the introduction of the new <a href="https://datasette.readthedocs.io/en/stable/plugins.html#plugin-asgi-wrapper">asgi_wrapper</a> plugin hook, which allows plugins to wrap the entire Datasette application in their own piece of ASGI middleware.</p> <p>To celebrate this new capability, I also released two new plugins: <a href="https://github.com/simonw/datasette-cors">datasette-cors</a>, which provides fine-grained control over CORS headers (using my <a href="https://github.com/simonw/asgi-cors">asgi-cors</a> library from a few months ago) and <a href="https://github.com/simonw/datasette-auth-github">datasette-auth-github</a>, the first of hopefully many authentication plugins for Datasette.</p> <h3><a id="datasetteauthgithub_8"></a>datasette-auth-github</h3> <p>The new plugin is best illustrated with a demo.</p> <p>Visit <a href="https://datasette-auth-demo.now.sh/">https://datasette-auth-demo.now.sh/</a> and you will be redirected to GitHub and asked to approve access to your account (just your e-mail address, not repository access).</p> <p>Agree, and you’ll be redirected back to the demo with a new element in the Datasette header: your GitHub username, plus a “log out” link in the navigation bar at the top of the screen.</p> <h3><a id="Controlling_who_can_access_16"></a>Controlling who can access</h3> <p>The default behaviour of the plugin is to allow in anyone with a GitHub account. Since the primary use-case for the plugin (at least for the moment) is restricting access to view data to a trusted subset of people, the plugin lets you configure who is allowed to view your data in three different ways:</p> <ul> <li>You can restrict access to a specific list of Git…    
blog.db/entries http://simonwillison.net/2019/Jul/20/unlocking-the-department-of-states-foreign-military-training-dat/#atom-tag Unlocking the Department of State’s foreign military training data for good this time 2019-07-20T04:42:58+00:00   1 <p><a href="https://securityforcemonitor.org/2019/07/18/unlocking-the-department-of-states-foreign-military-training-data-for-good-this-time/">Unlocking the Department of State’s foreign military training data for good this time</a></p> <p>I&#x27;m so excited about this: Security Force Monitor used Datasette to publish a 200,000 row database of training engagements between the US military and foreign military units, based on their own massive efforts to clean up the official data (from thousands of PDF files). This is pretty much my dream use-case for Datasette, and their future goals are inspiring: &quot;Our hope is that when the next report arrives in a short few months, we will be able to turn it into machine readable data and pass it around the sector in minutes, rather than months.&quot;</p> <p>Via <a href="https://twitter.com/simonw/status/1152437436011565058">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2019/Jul/22/healthkit-sqlite/#atom-tag healthkit-to-sqlite 2019-07-22T03:34:38+00:00   1 <p><a href="https://github.com/dogsheep/healthkit-to-sqlite">healthkit-to-sqlite</a></p> <p>Ever since I got an Apple Watch I&#x27;ve been itching to get my hands on the step tracking and health data that it&#x27;s been collecting for me. I know it&#x27;s there in a SQLite database on my wrist, but I couldn&#x27;t figure out how to get it! A few days ago I stumbled across the &quot;Export Health Data&quot; button in the iOS Health app, and it turns out it creates a zip file containing XML with a full dump of the data collected by Apple Health. healthkit-to-sqlite is the tool I&#x27;ve built that can read that export and use it to create a SQLite database ready to be queried and explored with Datasette. It&#x27;s a pretty basic implementation but it&#x27;s already giving me access to over 3 million rows of data. Lots of potential here for interesting work with personal analytics.</p>    
blog.db/entries http://simonwillison.net/2019/Jul/8/datasette-029/#atom-tag Datasette 0.29 2019-07-08T04:26:56+00:00   1 <p><a href="https://datasette.readthedocs.io/en/stable/changelog.html#v0-29">Datasette 0.29</a></p> <p>I shipped Datasette 0.29! • ASGI all the way down! Plus a new asgi_wrapper plugin hook letting plugins do all kinds of powerful new things • New mechanism for secret plugin configuration options • Facet by date • ?_through= for joins through m2m tables. Much more.</p> <p>Via <a href="https://twitter.com/simonw/status/1148083732982591488">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2019/Jul/8/datasette-auth-github/#atom-tag datasette-auth-github 2019-07-08T04:28:17+00:00   1 <p><a href="https://github.com/simonw/datasette-auth-github">datasette-auth-github</a></p> <p>My first big ASGI plugin for Datasette: datasette-auth-github adds the ability to require users to authenticate against the GitHub OAuth API. You can whitelist specific users, or you can restrict access to members of specific GitHub organizations or teams. While it&#x27;s structured as a Datasette plugin it also includes ASGI middleware which can be applied to any ASGI application.</p> <p>Via <a href="https://twitter.com/simonw/status/1148085030448578561">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2019/Jul/8/datasette-cors/#atom-tag datasette-cors 2019-07-08T04:30:53+00:00   1 <p><a href="https://github.com/simonw/datasette-cors">datasette-cors</a></p> <p>My other Datasette ASGI plugin: this one wraps my asgi-cors project and lets you configure CORS access from a list of domains (or a set of domain wildcards) so you can make JavaScript calls to a Datasette instance from a specific set of other hosts.</p> <p>Via <a href="https://twitter.com/simonw/status/1148084447687786498">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2019/Jun/12/paginate-json/#atom-tag paginate-json 2019-06-12T15:22:20+00:00   1 <p><a href="https://pypi.org/project/paginate-json/">paginate-json</a></p> <p>I released a fun tiny utility: paginate-json, which knows how to paginate through JSON APIs that use the HTTP Link header for pagination. I built it so I could pull data from the GitHub API and pipe it directly into SQLite via sqlite-utils.</p> <p>Via <a href="https://twitter.com/simonw/status/1138828809849847808">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2019/Jun/14/locations-kml/#atom-tag Convert Locations.kml (pulled from an iPhone backup) to SQLite 2019-06-14T00:45:01+00:00   1 <p><a href="https://gist.github.com/simonw/2f09eb4b9fa41a3721f3e87f18813221">Convert Locations.kml (pulled from an iPhone backup) to SQLite</a></p> <p>I&#x27;ve been playing around with data from my iPhone using the iPhone Backup Extractor app and one of the things it exports for you is a Locations.kml file full of location history data. I wrote a tiny script using Python&#x27;s ElementTree XMLPullParser to efficiently iterate through the Placemarks and yield them as dictionaries, which I then batch-inserted into sqlite-utils to create a SQLite database.</p>    
blog.db/entries http://simonwillison.net/2019/Jun/23/datasette-asgi/#atom-tag Porting Datasette to ASGI, and Turtles all the way down 2019-06-23T21:39:00+00:00   1 <p>This evening I finally closed a <a href="https://simonwillison.net/tags/datasette/">Datasette</a> issue that I opened more than 13 months ago: <a href="https://github.com/simonw/datasette/issues/272">#272: Port Datasette to ASGI</a>. A few notes on why this is such an important step for the project.</p> <p><a href="https://asgi.readthedocs.io/">ASGI</a> is the Asynchronous Server Gateway Interface standard. It’s been evolving steadily over the past few years under the guidance of Andrew Godwin. It’s intended as an asynchronous replacement for the venerable <a href="https://wsgi.readthedocs.io/">WSGI</a>.</p> <h3><a id="Turtles_all_the_way_down_6"></a>Turtles all the way down</h3> <p>Ten years ago at EuroDjangoCon 2009 in Prague I gave a talk entitled <a href="https://www.slideshare.net/simon/django-heresies">Django Heresies</a>. After discussing some of the design decisions in Django that I didn’t think had aged well, I spent the last part of the talk talking about <em>Turtles all the way down</em>. I <a href="https://simonwillison.net/2009/May/19/djng/?#turtles-all-the-way-down">wrote that idea up here</a> on my blog (see also <a href="https://www.slideshare.net/simon/django-heresies/65-The_Django_Contract_A_view">these slides</a>).</p> <p>The key idea was that Django would be more interesting if the core Django contract - a function that takes a request and returns a response - was extended to more places in the framework. The top level site, the reusable applications, middleware and URL routing could all share that same contract. Everything could be composed from the same raw building blocks.</p> <p>I’m excited about ASGI because it absolutely fits the <em>turtles all the way down</em> model.</p> <p>The ASGI contract is an asynchronous function that takes three arguments:</p> <pre><code>async def application(scope, receive, send): ... </code></pre> <p><code>scope</code> is a serializable dictionary providing the context for the current connection. <code>receive</code> is an awaitable which can be used t…    
blog.db/entries http://simonwillison.net/2019/Jun/9/datasette-bplist/#atom-tag datasette-bplist 2019-06-09T01:26:14+00:00   1 <p><a href="https://github.com/simonw/datasette-bplist">datasette-bplist</a></p> <p>It turns out an OS X laptop is positively crammed with SQLite databases, and many of them contain values that are data structures encoded using Apple&#x27;s binary plist format. datasette-bplist is my new plugin to help explore those files: it provides a display hook for rendering their contents, and a custom bplist_to_json() SQL function which can be used to extract and query information that is embedded in those values. The README includes tips on how to pull interesting EXIF data out of the SQLite database that sits behind Apple Photos.</p> <p>Via <a href="https://twitter.com/simonw/status/1137532700691128320">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2019/Jun/9/datasette-render-binary/#atom-tag datasette-render-binary 2019-06-09T16:22:13+00:00   1 <p><a href="https://github.com/simonw/datasette-render-binary">datasette-render-binary</a></p> <p>Yet another tiny Datasette plugin. This one attempts to render binary data in a slightly more readable fashion - it shows ASCII characters as they are, and shows all other data as monospace octets. Useful as a tool for exploring new unfamiliar databases as it makes it easier to spot if a binary column may contain a decipherable binary format.</p> <p>Via <a href="https://github.com/simonw/datasette/issues/506">Option to display binary data #506</a></p>    
blog.db/entries http://simonwillison.net/2019/Mar/9/datasette-jellyfish/#atom-tag datasette-jellyfish 2019-03-09T18:29:13+00:00   1 <p><a href="https://github.com/simonw/datasette-jellyfish">datasette-jellyfish</a></p> <p>I learned about a handy Python library called Jellyfish which implements approximate and phonetic matching of strings - soundex, metaphone, porter stemming, levenshtein distance and more. I&#x27;ve built a simple Datasette plugin which wraps the library and makes each of those algorithms available as a SQL function.</p>    
blog.db/entries http://simonwillison.net/2019/Mar/9/publish-data-behind-your-stories-sqlite-and-datasette/#atom-tag Publish the data behind your stories with SQLite and Datasette 2019-03-09T18:27:15+00:00   1 <p><a href="https://github.com/simonw/datasette-car-2019">Publish the data behind your stories with SQLite and Datasette</a></p> <p>I presented a workshop on Datasette at the IRE and NICAR CAR 2019 data journalism conference yesterday. Here&#x27;s the worksheet I prepared for the tutorial.</p>    
blog.db/entries http://simonwillison.net/2019/May/1/jsk-fellowship/#atom-tag JSK Journalism Fellowships names Class of 2019-2020 (and I'm in it!) 2019-05-01T16:43:54+00:00   1 <p><a href="https://jsk.stanford.edu/news-notes/2019/jsk-journalism-fellowships-names-class-of-2019-2020/">JSK Journalism Fellowships names Class of 2019-2020 (and I&#x27;m in it!)</a></p> <p>In personal news... I&#x27;ve been accepted for a ten month journalism fellowship at Stanford (starting September)! My work there will involve &quot;Improving the impact of investigative stories by expanding the open-source ecosystem of tools that allows journalists to share the underlying data&quot;.</p> <p>Via <a href="https://twitter.com/simonw/status/1123624552867565569">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2019/May/19/datasette-0-28/#atom-tag Datasette 0.28 - and why master should always be releasable 2019-05-19T22:15:56+00:00   1 <p>It's been quite a while since the last substantial release of Datasette. <a href="https://datasette.readthedocs.io/en/stable/changelog.html#v0-27">Datasette 0.27</a> came out all the way back in January.</p> <p>This isn't because development has slowed down. In fact, the project has had <a href="https://github.com/simonw/datasette/compare/0.27...0.28">131 commits</a> since then, covering a bewildering array of new functionality and with some significant contributions from developers who aren't me - Russ Garrett and Romain Primet deserve special recognition here.</p> <p>The problem has been one of discipline. I'm a big fan of the idea of keeping master shippable at all times in my professional work, but I hadn't quite adopted this policy for my open-source side projects. A couple of months ago I found myself in a situation where I had two major refactorings (of faceting and of Datasette's treatment of immutable files) going on in master at the same time, and untangling them turned out to take way longer than I had expected.</p> <p>So I've updated Datasette's <a href="https://datasette.readthedocs.io/en/stable/contributing.html#general-guidelines">contribution guidelines</a> to specify that <strong>master should always be releasable</strong>, almost entirely as a reminder to myself.</p> <p>All of that said, I'm finally back out of the weeds and I'm excited to announce today's release of <a href="https://pypi.org/project/datasette/0.28/">Datasette 0.28</a>. It features a <a href="https://adamj.eu/tech/2019/01/18/a-salmagundi-of-django-alpha-announcements/">salmagundi</a> of new features! I'm replicating <a href="https://datasette.readthedocs.io/en/stable/changelog.html#v0-28">the release notes</a> below.</p> <h3>Supporting databases that change</h3> <p>From the beginning of the project, Datasette has been designed with read-only databases in mind. If a database is guaranteed not to change it opens up all kinds of interesting opportunities - from taking advantage of SQLite immutable mode and HTTP caching to …    
blog.db/entries http://simonwillison.net/2019/May/21/discover-insights-developer-survey-results/#atom-tag Discover Insights in Developer Survey Results 2019-05-21T18:50:22+00:00   1 <p><a href="https://glitch.com/culture/discover-insights-explore-developer-survey-results-2019/">Discover Insights in Developer Survey Results</a></p> <p>Stack Overflow partnered with Glitch and used Datasette to host the full data set from Stack Overflow&#x27;s 2019 Developer Survey!</p>    
blog.db/entries http://simonwillison.net/2019/May/21/public-data-release-of-stack-overflows-2019-developer-survey/#atom-tag Public Data Release of Stack Overflow’s 2019 Developer Survey 2019-05-21T18:51:43+00:00   1 <p><a href="https://stackoverflow.blog/2019/05/21/public-data-release-of-stack-overflows-2019-developer-survey/">Public Data Release of Stack Overflow’s 2019 Developer Survey</a></p> <p>Here&#x27;s the Stack Overflow announcement of their developer survey public data release, which discusses the Glitch partnership and mentions Datasette.</p>    
blog.db/entries http://simonwillison.net/2019/May/25/sqlite-utils-1/#atom-tag sqlite-utils 1.0 2019-05-25T01:20:37+00:00   1 <p><a href="https://sqlite-utils.readthedocs.io/en/latest/changelog.html#v1-0">sqlite-utils 1.0</a></p> <p>I just released sqlite-utils 1.0, with a couple of handy new features over 0.14: it can now automatically add columns to a database table if you attempt to insert data which doesn&#x27;t quite fit (using alter=True in the Python API or the --alter option to the &quot;sqlite-utils insert&quot; command). It also has the ability to output nested JSON column values on the command-line using the new --json-cols option. This is the first project I&#x27;ve marked as a 1.0 release in a very long time - I&#x27;ll be sticking to semver for this project from now on, bumping the major version only in the case of a backwards incompatible change.</p>    
blog.db/entries http://simonwillison.net/2019/May/30/datasette-jq/#atom-tag datasette-jq 2019-05-30T01:52:57+00:00   1 <p><a href="https://github.com/simonw/datasette-jq">datasette-jq</a></p> <p>I released another tiny Datasette plugin: datasette-jq registers a single custom SQL function, jq(), which lets you execute the jq expression language against a JSON column (or literal value) to filter and transform the JSON data. The README includes a link to a live demo - it&#x27;s a neat way to play with the jq micro-language.</p> <p>Via <a href="https://twitter.com/simonw/status/1133912206859313152">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2019/Nov/11/weeknotes-8/#atom-tag Weeknotes: Python 3.7 on Glitch, datasette-render-markdown 2019-11-11T23:26:34+00:00   1 <p><a href="https://simonwillison.net/2019/Oct/28/niche-museums-kepler/#Streaks_56">Streaks</a> is really working well for me. I’m at 12 days of commits to <a href="https://github.com/simonw/datasette">Datasette</a>, 16 posting a daily <a href="https://www.niche-museums.com/">Niche Museum</a>, 19 of actually reviewing my email inbox and 14 of guitar practice. I rewarded myself for that last one by purchasing an actual classical (as opposed to acoustic) guitar.</p> <h3><a id="Datasette_4"></a>Datasette</h3> <p>One downside: since my aim is to land a commit to Datasette master every day, I’m incentivised to land small changes. I have a bunch of much larger Datasette projects in the works - I think my goal for the next week should be to land one of those. Contenders include:</p> <ul> <li><a href="https://github.com/simonw/datasette/issues/617">TableView.data()</a> refactor - a blocker on a bunch of other projects</li> <li><a href="https://github.com/simonw/datasette/issues/567">Datasette Edit</a> - finish <a href="https://github.com/simonw/datasette/issues/569">the new connection work</a> so I can have plugins that write changes to databases</li> <li><a href="https://github.com/simonw/datasette/issues/417">Datasette Library</a> - watch a directory and automatically serve new database files that show up in that directory</li> <li>Finish and ship my work on <a href="https://github.com/simonw/datasette/issues/551">facet-by-many-to-many</a></li> <li>Implement <a href="https://github.com/simonw/datasette/issues/613">basic join support</a> for table views (so you can join without writing a custom SQL query)</li> <li>Probably the most impactful: Datasette needs a website! Up until now I’ve directed people to <a href="https://github.com/simonw/datasette">GitHub</a> or to <a href="https://datasette.readthedocs.io/">the documentation</a> but the project has grow to the point that it warrants its own home.</li> </ul> <p>I’m going to redefine my daily goal to include pushing in-progress work to Datasette branches in an attempt …    
blog.db/entries http://simonwillison.net/2019/Nov/12/datasette/#atom-tag Datasette 0.31 2019-11-12T06:11:57+00:00   1 <p><a href="https://datasette.readthedocs.io/en/stable/changelog.html#v0-31">Datasette 0.31</a></p> <p>Released today: this version adds compatibility with Python 3.8 and breaks compatibility with Python 3.5. Since Glitch support Python 3.7.3 now I decided I could finally give up on 3.5. This means Datasette can use f-strings now, but more importantly it opens up the opportunity to start taking advantage of Starlette, which makes all kinds of interesting new ASGI-based plugins much easier to build.</p>    
blog.db/entries http://simonwillison.net/2019/Nov/15/datasette-template-sql/#atom-tag datasette-template-sql 2019-11-15T00:59:50+00:00   1 <p><a href="https://github.com/simonw/datasette-template-sql">datasette-template-sql</a></p> <p>New Datasette plugin, celebrating the new ability in Datasette 0.32 to have asynchronous custom template functions in Jinja (which was previously blocked by the need to support Python 3.5). The plugin adds a sql() function which can be used to execute SQL queries that are embedded directly in custom templates.</p> <p>Via <a href="https://twitter.com/simonw/status/1195126529304301573">@simonw</a></p>    
blog.db/entries http://simonwillison.net/2019/Nov/18/datasette-template-sql/#atom-tag Weeknotes: datasette-template-sql 2019-11-18T22:29:29+00:00   1 <p><a href="https://simonwillison.net/2019/Nov/11/weeknotes-8/#Datasette_4">Last week</a> I talked about wanting to take ona a larger Datasette project, and listed some candidates. I ended up pushing a big project that I hadn't listed there: <a href="https://github.com/simonw/datasette/issues/622">the upgrade of Datasette</a> to Python 3.8, which meant dropping support for Python 3.5 (thanks to <a href="https://github.com/simonw/datasette/pull/595">incompatible dependencies</a>).</p> <p>Since Glitch <a href="https://simonwillison.net/2019/Nov/11/weeknotes-8/#New_datasettecsvs_using_Python_37_on_Glitch_17">now runs Python 3.7.5</a> my biggest reason for supporting 3.5 was gone, so I decided to make the upgrade.</p> <p><a href="https://datasette.readthedocs.io/en/stable/changelog.html#v0-31">Datasette 0.31</a> was the first version to drop support. <a href="https://datasette.readthedocs.io/en/stable/changelog.html#v0-32">Datasette 0.32</a> is the first to take advantage of it: I <a href="https://github.com/simonw/datasette/issues/628">switched Datasette's template rendering</a> over to use Jinja's <a href="https://jinja.palletsprojects.com/en/2.10.x/api/#async-support">async template support</a>, which requires Python 3.6+.</p> <p>This has exciting implications for the <a href="https://datasette.readthedocs.io/en/stable/plugins.html#extra-template-vars-template-database-table-view-name-request-datasette">extra_template_vars plugin hook</a>, which allows plugins to add extra variables (and functions) to the template scope.</p> <p>Plugin authors can now add custom template functions that are defined with <code>async def ...</code> and make <code>await</code> calls within the body of the function. When the template is rendered, Jinja will automatically await those function calls.</p> <p>I released a new plugin that takes advantage of this capability: <a href="https://github.com/simonw/datasette-template-sql">datasette-template-sql</a>. It lets you embed additional SQL queries directly in a custom Datasette template. F…    
blog.db/entries http://simonwillison.net/2019/Nov/25/niche-museums/#atom-tag niche-museums.com, powered by Datasette 2019-11-25T22:27:46+00:00   1 <p>I just released a major upgrade to my <a href="https://www.niche-museums.com/">www.niche-museums.com</a> website (launched <a href="https://simonwillison.net/2019/Oct/28/niche-museums-kepler/">last month</a>).</p> <ul> <li>The site is now rendered server-side. The previous version used <a href="https://lit-html.polymer-project.org/">lit-html</a> to render content using JavaScript.</li> <li>Each museum now has its own page. Here's today's new museum listing for the <a href="https://www.niche-museums.com/browse/museums/46">Conservatory of Flowers</a> in San Francisco. These pages have a map on them.</li> <li>The site has an <a href="https://www.niche-museums.com/about">about page</a>.</li> <li>You can now link to the page for a specific latitude and longitude, e.g. <a href="https://www.niche-museums.com/?latitude=37.77&amp;longitude=-122.458">this location in Golden Gate Park</a>.</li> <li>The source code for the site is now <a href="https://github.com/simonw/museums">available on GitHub</a>.</li> </ul> <p>Notably, the site is entirely powered by <a href="https://github.com/simonw/datasette">Datasette</a>. It's a heavily customized Datasette instance, making extensive use of <a href="https://datasette.readthedocs.io/en/0.32/custom_templates.html#custom-templates">custom templates</a> and <a href="https://datasette.readthedocs.io/en/0.32/plugins.html">plugins</a>.</p> <p>It's a really fun experiment. I'm essentially using Datasette as a weird twist on a static site generator - no moving parts since the database is immutable but there's still stuff happening server-side to render the pages.</p> <h3 id="continuous-deployment">Continuous deployment</h3> <p>The site is entirely stateless and is published <a href="https://circleci.com/gh/simonw/museums">using Circle CI</a> to a serverless hosting provider (currently Zeit Now v1, but I'll probably move it to Google Cloud Run in the near future.)</p> <p>The site content - 46 museums and counting - lives in the <a href="https://github.com/simonw/museums/blob/master/museu…    
blog.db/entries http://simonwillison.net/2019/Nov/7/pinboard-to-sqlite/#atom-tag pinboard-to-sqlite 2019-11-07T20:46:38+00:00   1 <p><a href="https://github.com/jacobian/pinboard-to-sqlite">pinboard-to-sqlite</a></p> <p>Jacob Kaplan-Moss just released the second Dogsheep tool that wasn&#x27;t written by me (after goodreads-to-sqlite by Tobias Kunze) - this one imports your Pinterest bookmarks. The repo includes a really clean minimal example of how to use GitHub actions to run tests and release packages to PyPI.</p> <p>Via <a href="https://twitter.com/jacobian/status/1192500064163389442">@jacobian</a></p>    

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [search_index] (
   [type] TEXT,
   [key] TEXT,
   [title] TEXT,
   [timestamp] TEXT,
   [category] INTEGER,
   [is_public] INTEGER NOT NULL DEFAULT 0,
   [search_1] TEXT,
   [search_2] TEXT,
   [search_3] TEXT,
   PRIMARY KEY ([type], [key]),
   FOREIGN KEY(category) REFERENCES categories(id)
);
CREATE INDEX [idx_search_index_timestamp]
    ON [search_index] ([timestamp]);
CREATE INDEX [idx_search_index_category]
    ON [search_index] ([category]);
CREATE INDEX [idx_search_index_is_public]
    ON [search_index] ([is_public]);