datasette-sitemap
Generate sitemap.xml for Datasette sites
Installation
Install this plugin in the same environment as Datasette.
datasette install datasette-sitemap
Demo
This plugin is used for the sitemap on til.simonwillison.net:
Here's the configuration used for that sitemap.
Usage
Once configured, this plugin adds a sitemap at /sitemap.xml
with a list of URLs.
This list is defined using a SQL query in metadata.json
(or .yml
) that looks like this:
{
"plugins": {
"datasette-sitemap": {
"query": "select '/' || id as path from my_table"
}
}
}
Using metadata.yml
allows for multi-line SQL queries which can be easier to maintain:
plugins:
datasette-sitemap:
query: |
select
'/' || id as path
from
my_table
The SQL query must return a column called path
. The values in this column must begin with a /
. They will be used to generate a sitemap that looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://example.com/1</loc></url>
<url><loc>https://example.com/2</loc></url>
</urlset>
You can use UNION
in your SQL query to combine results from multiple tables, or include literal paths that you want to include in the index:
select
'/data/table1/' || id as path
from table1
union
select
'/data/table2/' || id as path
from table2
union
select
'/about' as path
If your Datasette instance has multiple databases you can configure the database to query using the database
configuration property.
By default the domain name for the genearted URLs in the sitemap will be detected from the incoming request.
You can set base_url
instead to override this. This should not include a trailing slash.
This example shows both of those settings, running the query against the content
database and setting a custom base URL:
plugins:
datasette-sitemap:
query: |
select '/plugins/' || name as path from plugins
union
select '/tools/' || name as path from tools
union
select '/news' as path
database: content
base_url: https://datasette.io
robots.txt
This plugin adds a robots.txt
file pointing to the sitemap:
Sitemap: http://example.com/sitemap.xml
You can take full control of the sitemap by installing and configuring the datasette-block-robots plugin.
This plugin will add the Sitemap:
line even if you are using datasette-block-robots
for the rest of your robots.txt
file.
Adding paths to the sitemap from other plugins
This plugin adds a new plugin hook to Datasete called sitemap_extra_paths()
which can be used by other plugins to add their own additional lines to the sitemap.xml
file.
The hook accepts these optional parameters:
datasette
: The current Datasette instance. You can use this to execute SQL queries or read plugin configuration settings.request
: The Request object representing the incoming request to/sitemap.xml
.
The hook should return a list of strings, each representing a path to be added to the sitemap. Each path must begin with a /
.
It can also return an async def
function, which will be awaited and used to generate a list of lines. Use this option if you need to make await
calls inside you hook implementation.
This example uses the hook to add two extra paths, one of which came from a SQL query:
from datasette import hookimpl
@hookimpl
def sitemap_extra_paths(datasette):
async def inner():
db = datasette.get_database()
path_from_db = (await db.execute("select '/example'")).single_value()
return ["/about", path_from_db]
return inner
Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd datasette-sitemap
python3 -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
pytest