til: django_efficient-bulk-deletions-in-django.md
This data as json
| path | topic | title | url | body | html | shot | created | created_utc | updated | updated_utc | shot_hash | slug |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| django_efficient-bulk-deletions-in-django.md | django | Efficient bulk deletions in Django | https://github.com/simonw/til/blob/main/django/efficient-bulk-deletions-in-django.md | I needed to bulk-delete a large number of objects today. Django deletions are relatively inefficient by default, because Django implements its own version of cascading deletions and fires signals for each deleted object. I knew that I wanted to avoid both of these and run a bulk `DELETE` SQL operation. Django has an undocumented `queryset._raw_delete(db_connection)` method that can do this: ```python reports_qs = Report.objects.filter(public_id__in=report_ids) reports_qs._raw_delete(reports_qs.db) ``` But this failed for me, because my `Report` object has a many-to-many relationship with another table - and those records were not deleted. I could have hand-crafted a PostgreSQL cascading delete here, but I instead decided to manually delete those many-to-many records first. Here's what that looked like: ```python report_availability_tag_qs = ( Report.availability_tags.through.objects.filter( report__public_id__in=report_ids ) ) report_availability_tag_qs._raw_delete(report_availability_tag_qs.db) ``` This didn't quite work either, because I have another model `Location` with foreign key references to those reports. So I added this: ```python Location.objects.filter(latest_report__public_id__in=report_ids).update( latest_report=None ) ``` That combination worked! The Django debug toolbar confirmed that this executed one `UPDATE` followed by two efficient bulk `DELETE` operations. | <p>I needed to bulk-delete a large number of objects today. Django deletions are relatively inefficient by default, because Django implements its own version of cascading deletions and fires signals for each deleted object.</p> <p>I knew that I wanted to avoid both of these and run a bulk <code>DELETE</code> SQL operation.</p> <p>Django has an undocumented <code>queryset._raw_delete(db_connection)</code> method that can do this:</p> <div class="highlight highlight-source-python"><pre><span class="pl-s1">reports_qs</span> <span class="pl-c1">=</span> <span class="pl-v">Report</span>.<span class="pl-s1">objects</span>.<span class="pl-en">filter</span>(<span class="pl-s1">public_id__in</span><span class="pl-c1">=</span><span class="pl-s1">report_ids</span>) <span class="pl-s1">reports_qs</span>.<span class="pl-en">_raw_delete</span>(<span class="pl-s1">reports_qs</span>.<span class="pl-s1">db</span>)</pre></div> <p>But this failed for me, because my <code>Report</code> object has a many-to-many relationship with another table - and those records were not deleted.</p> <p>I could have hand-crafted a PostgreSQL cascading delete here, but I instead decided to manually delete those many-to-many records first. Here's what that looked like:</p> <div class="highlight highlight-source-python"><pre><span class="pl-s1">report_availability_tag_qs</span> <span class="pl-c1">=</span> ( <span class="pl-v">Report</span>.<span class="pl-s1">availability_tags</span>.<span class="pl-s1">through</span>.<span class="pl-s1">objects</span>.<span class="pl-en">filter</span>( <span class="pl-s1">report__public_id__in</span><span class="pl-c1">=</span><span class="pl-s1">report_ids</span> ) ) <span class="pl-s1">report_availability_tag_qs</span>.<span class="pl-en">_raw_delete</span>(<span class="pl-s1">report_availability_tag_qs</span>.<span class="pl-s1">db</span>)</pre></div> <p>This didn't quite work either, because I have another model <code>Location</code> with foreign key references to those reports. So I added this:</p> <div class="highlight highlight-source-python"><pre><span class="pl-v">Location</span>.<span class="pl-s1">objects</span>.<span class="pl-en">filter</span>(<span class="pl-s1">latest_report__public_id__in</span><span class="pl-c1">=</span><span class="pl-s1">report_ids</span>).<span class="pl-en">update</span>( <span class="pl-s1">latest_report</span><span class="pl-c1">=</span><span class="pl-c1">None</span> )</pre></div> <p>That combination worked! The Django debug toolbar confirmed that this executed one <code>UPDATE</code> followed by two efficient bulk <code>DELETE</code> operations.</p> | <Binary: 74,244 bytes> | 2021-04-09T10:58:37-07:00 | 2021-04-09T17:58:37+00:00 | 2022-03-20T21:56:38-07:00 | 2022-03-21T04:56:38+00:00 | 0b7d65f4eb063315a8e8369790c1f432 | efficient-bulk-deletions-in-django |