til: cloudflare_robots-txt-cloudflare-workers.md
This data as json
| path | topic | title | url | body | html | shot | created | created_utc | updated | updated_utc | shot_hash | slug |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| cloudflare_robots-txt-cloudflare-workers.md | cloudflare | Adding a robots.txt using Cloudflare workers | https://github.com/simonw/til/blob/main/cloudflare/robots-txt-cloudflare-workers.md | I got an unexpected traffic spike to https://russian-ira-facebook-ads.datasettes.com/ - which runs on Cloud Run - and decided to use `robots.txt` to block crawlers. Re-deploying that instance was a little hard because I didn't have a clean repeatable deployment script in place for it (it's an older project) - so I decided to try using Cloudflare workers for this instead. DNS was already running through Cloudflare, so switching it to "proxy" mode to enable Cloudflare caching and workers could be done in the Cloudflare control panel.  I navigated to the "Workers" section of the Cloudflare dashboard and clicked "Create a Service", then used their "Introduction (HTTP handler)" starting template. I modified it to look like this and saved it as `block-all-robots`: ```javascript addEventListener("fetch", (event) => { event.respondWith( handleRequest(event.request).catch( (err) => new Response(err.stack, { status: 500 }) ) ); }); async function handleRequest(request) { const { pathname } = new URL(request.url); if (pathname == "/robots.txt") { return new Response("User-agent: *\nDisallow: /", { headers: { "Content-Type": "text/plain" }, }); } } ``` After deploying it, https://block-all-robots.simonw.workers.dev/robots.txt started serving my new `robots.txt` file: ``` User-agent: * Disallow: / ``` Then in the Cloudflare dashboard for `datasettes.com` I found the "Workers" section (not to be confused with the "Workers" section where you create and edit workers) I clicked "Add route" and used the following settings:  Route: `russian-ira-facebook-ads.datasettes.com/robots.txt` Service: `block-all-robots` Environment: `production` I clicked "Save" and https://russian-ira-facebook-ads.datasettes.com/robots.txt instantly started serving the new file. | <p>I got an unexpected traffic spike to <a href="https://russian-ira-facebook-ads.datasettes.com/" rel="nofollow">https://russian-ira-facebook-ads.datasettes.com/</a> - which runs on Cloud Run - and decided to use <code>robots.txt</code> to block crawlers.</p> <p>Re-deploying that instance was a little hard because I didn't have a clean repeatable deployment script in place for it (it's an older project) - so I decided to try using Cloudflare workers for this instead.</p> <p>DNS was already running through Cloudflare, so switching it to "proxy" mode to enable Cloudflare caching and workers could be done in the Cloudflare control panel.</p> <p><a href="https://user-images.githubusercontent.com/9599/147008621-6f87de32-4f6d-4d6b-a685-542fd21da7aa.png" target="_blank" rel="nofollow"><img src="https://user-images.githubusercontent.com/9599/147008621-6f87de32-4f6d-4d6b-a685-542fd21da7aa.png" alt="Having turned on the Proxied toggle in the Cloudlfare control panel" style="max-width:100%;"></a></p> <p>I navigated to the "Workers" section of the Cloudflare dashboard and clicked "Create a Service", then used their "Introduction (HTTP handler)" starting template. I modified it to look like this and saved it as <code>block-all-robots</code>:</p> <div class="highlight highlight-source-js"><pre><span class="pl-en">addEventListener</span><span class="pl-kos">(</span><span class="pl-s">"fetch"</span><span class="pl-kos">,</span> <span class="pl-kos">(</span><span class="pl-s1">event</span><span class="pl-kos">)</span> <span class="pl-c1">=></span> <span class="pl-kos">{</span> <span class="pl-s1">event</span><span class="pl-kos">.</span><span class="pl-en">respondWith</span><span class="pl-kos">(</span> <span class="pl-en">handleRequest</span><span class="pl-kos">(</span><span class="pl-s1">event</span><span class="pl-kos">.</span><span class="pl-c1">request</span><span class="pl-kos">)</span><span class="pl-kos">.</span><span class="pl-en">catch</span><span class="pl-kos">(</span> <span class="pl-kos">(</span><span class="pl-s1">err</span><span class="pl-kos">)</span> <span class="pl-c1">=></span> <span class="pl-k">new</span> <span class="pl-v">Response</span><span class="pl-kos">(</span><span class="pl-s1">err</span><span class="pl-kos">.</span><span class="pl-c1">stack</span><span class="pl-kos">,</span> <span class="pl-kos">{</span> <span class="pl-c1">status</span>: <span class="pl-c1">500</span> <span class="pl-kos">}</span><span class="pl-kos">)</span> <span class="pl-kos">)</span> <span class="pl-kos">)</span><span class="pl-kos">;</span> <span class="pl-kos">}</span><span class="pl-kos">)</span><span class="pl-kos">;</span> <span class="pl-k">async</span> <span class="pl-k">function</span> <span class="pl-en">handleRequest</span><span class="pl-kos">(</span><span class="pl-s1">request</span><span class="pl-kos">)</span> <span class="pl-kos">{</span> <span class="pl-k">const</span> <span class="pl-kos">{</span> pathname <span class="pl-kos">}</span> <span class="pl-c1">=</span> <span class="pl-k">new</span> <span class="pl-c1">URL</span><span class="pl-kos">(</span><span class="pl-s1">request</span><span class="pl-kos">.</span><span class="pl-c1">url</span><span class="pl-kos">)</span><span class="pl-kos">;</span> <span class="pl-k">if</span> <span class="pl-kos">(</span><span class="pl-s1">pathname</span> <span class="pl-c1">==</span> <span class="pl-s">"/robots.txt"</span><span class="pl-kos">)</span> <span class="pl-kos">{</span> <span class="pl-k">return</span> <span class="pl-k">new</span> <span class="pl-v">Response</span><span class="pl-kos">(</span><span class="pl-s">"User-agent: *\nDisallow: /"</span><span class="pl-kos">,</span> <span class="pl-kos">{</span> <span class="pl-c1">headers</span>: <span class="pl-kos">{</span> <span class="pl-s">"Content-Type"</span>: <span class="pl-s">"text/plain"</span> <span class="pl-kos">}</span><span class="pl-kos">,</span> <span class="pl-kos">}</span><span class="pl-kos">)</span><span class="pl-kos">;</span> <span class="pl-kos">}</span> <span class="pl-kos">}</span></pre></div> <p>After deploying it, <a href="https://block-all-robots.simonw.workers.dev/robots.txt" rel="nofollow">https://block-all-robots.simonw.workers.dev/robots.txt</a> started serving my new <code>robots.txt</code> file:</p> <pre><code>User-agent: * Disallow: / </code></pre> <p>Then in the Cloudflare dashboard for <code>datasettes.com</code> I found the "Workers" section (not to be confused with the "Workers" section where you create and edit workers) I clicked "Add route" and used the following settings:</p> <p><a href="https://user-images.githubusercontent.com/9599/147009015-222346ab-aa0f-403f-acdf-ca9788f525e6.png" target="_blank" rel="nofollow"><img src="https://user-images.githubusercontent.com/9599/147009015-222346ab-aa0f-403f-acdf-ca9788f525e6.png" alt="Screenshot of the Add Route dialog" style="max-width:100%;"></a></p> <p>Route: <code>russian-ira-facebook-ads.datasettes.com/robots.txt</code></p> <p>Service: <code>block-all-robots</code></p> <p>Environment: <code>production</code></p> <p>I clicked "Save" and <a href="https://russian-ira-facebook-ads.datasettes.com/robots.txt" rel="nofollow">https://russian-ira-facebook-ads.datasettes.com/robots.txt</a> instantly started serving the new file.</p> | <Binary: 87,420 bytes> | 2021-12-21T15:07:51-08:00 | 2021-12-21T23:07:51+00:00 | 2021-12-21T15:07:51-08:00 | 2021-12-21T23:07:51+00:00 | 36dbb3210d6769fb6c768cd1b12f367f | robots-txt-cloudflare-workers |