til: django_filter-by-comma-separated-values.md
This data as json
| path | topic | title | url | body | html | shot | created | created_utc | updated | updated_utc | shot_hash | slug |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| django_filter-by-comma-separated-values.md | django | Filter by comma-separated values in the Django admin | https://github.com/simonw/til/blob/main/django/filter-by-comma-separated-values.md | I have a text column which contains comma-separated values - inherited from an older database schema. I should refactor this into a many-to-many field (or maybe even a PostgreSQL array field), but I haven't done that yet. And I wanted to be able to filter by those values in the Django admin. Since I'm using PostgreSQL, I decided to figure out how to do this using the PostgreSQL `regexp_split_to_array()` function. There are two necessary SQL queries here: one to figure out all of the unique distinct values that are represented across all of those comma-separated lists, and one to filter for rows that include a specific value. Here's what I came up with for the first: ```sql select distinct unnest( regexp_split_to_array(my_column, ',\s*') ) from my_table ``` This uses `unnest()`, see [this TIL](https://til.simonwillison.net/postgresql/unnest-csv). For filtering down to rows that contain a specific value in their comma-separated list, I figured out this: ```sql select * from my_table where array_position( regexp_split_to_array( my_column, ',\s*' ), 'MyValue' ) is not null ``` That second one, translated into the Django ORM, looks like this: ```python from django.contrib.postgres.fields import ArrayField from django.db.models import F, IntegerField, TextField, Value from django.db.models.expressions import Func queryset.annotate( value_array_position=Func( Func( F(my_column), Value(",\\s*"), function="regexp_split_to_array", output_field=ArrayField(TextField()), ), Value(my_value), function="array_position", output_field=IntegerField() ) ).filter(value_array_position__isnull=False) ``` I didn't bother figuring out the ORM equivalent of that first `unnest()` SQL. Here's the reusable admin filter factory I came up with using these: ```python from django.contrib.admin import SimpleListFilter from django.contrib.postgres.fields import ArrayField from django.db import connection from django.db.models import F, TextField, Value from django.db.models.expressions import Func def make_csv_filter(filter_title, filter_parameter_name, table, column): class CommaSeparatedValuesFilter(SimpleListFilter): title = filter_title parameter_name = filter_parameter_name def lookups(self, request, model_admin): sql = """ select distinct unnest( regexp_split_to_array({}, ',\\s*') ) from {} """.format( column, table ) with connection.cursor() as cursor: cursor.execute(sql) values = [r[0] for r in cursor.fetchall() if r[0]] return zip(values, values) def queryset(self, request, queryset): value = self.value() if not value: return queryset else: return queryset.annotate( value_array_position=Func( Func( F(column), Value(",\\s*"), function="regexp_split_to_array", output_field=ArrayField(TextField()), ), Value(value), function="array_position", output_field=IntegerField() ) ).filter(value_array_position__isnull=False) return CommaSeparatedValuesFilter ``` Then you use it in a `ModelAdmin` subclass like this: ```python @admin.register(Reporter) class ReporterAdmin(admin.ModelAdmin): list_filter = ( make_csv_filter( filter_title="Roles", filter_parameter_name="role", table="reporter", column="role_names", ), ) ``` | <p>I have a text column which contains comma-separated values - inherited from an older database schema.</p> <p>I should refactor this into a many-to-many field (or maybe even a PostgreSQL array field), but I haven't done that yet. And I wanted to be able to filter by those values in the Django admin.</p> <p>Since I'm using PostgreSQL, I decided to figure out how to do this using the PostgreSQL <code>regexp_split_to_array()</code> function.</p> <p>There are two necessary SQL queries here: one to figure out all of the unique distinct values that are represented across all of those comma-separated lists, and one to filter for rows that include a specific value.</p> <p>Here's what I came up with for the first:</p> <div class="highlight highlight-source-sql"><pre><span class="pl-k">select distinct</span> unnest( regexp_split_to_array(my_column, <span class="pl-s"><span class="pl-pds">'</span>,<span class="pl-cce">\s</span>*<span class="pl-pds">'</span></span>) ) <span class="pl-k">from</span> my_table</pre></div> <p>This uses <code>unnest()</code>, see <a href="https://til.simonwillison.net/postgresql/unnest-csv" rel="nofollow">this TIL</a>.</p> <p>For filtering down to rows that contain a specific value in their comma-separated list, I figured out this:</p> <div class="highlight highlight-source-sql"><pre><span class="pl-k">select</span> <span class="pl-k">*</span> <span class="pl-k">from</span> my_table <span class="pl-k">where</span> array_position( regexp_split_to_array( my_column, <span class="pl-s"><span class="pl-pds">'</span>,<span class="pl-cce">\s</span>*<span class="pl-pds">'</span></span> ), <span class="pl-s"><span class="pl-pds">'</span>MyValue<span class="pl-pds">'</span></span> ) <span class="pl-k">is not null</span></pre></div> <p>That second one, translated into the Django ORM, looks like this:</p> <div class="highlight highlight-source-python"><pre><span class="pl-k">from</span> <span class="pl-s1">django</span>.<span class="pl-s1">contrib</span>.<span class="pl-s1">postgres</span>.<span class="pl-s1">fields</span> <span class="pl-k">import</span> <span class="pl-v">ArrayField</span> <span class="pl-k">from</span> <span class="pl-s1">django</span>.<span class="pl-s1">db</span>.<span class="pl-s1">models</span> <span class="pl-k">import</span> <span class="pl-v">F</span>, <span class="pl-v">IntegerField</span>, <span class="pl-v">TextField</span>, <span class="pl-v">Value</span> <span class="pl-k">from</span> <span class="pl-s1">django</span>.<span class="pl-s1">db</span>.<span class="pl-s1">models</span>.<span class="pl-s1">expressions</span> <span class="pl-k">import</span> <span class="pl-v">Func</span> <span class="pl-s1">queryset</span>.<span class="pl-en">annotate</span>( <span class="pl-s1">value_array_position</span><span class="pl-c1">=</span><span class="pl-v">Func</span>( <span class="pl-v">Func</span>( <span class="pl-v">F</span>(<span class="pl-s1">my_column</span>), <span class="pl-v">Value</span>(<span class="pl-s">",<span class="pl-cce">\\</span>s*"</span>), <span class="pl-s1">function</span><span class="pl-c1">=</span><span class="pl-s">"regexp_split_to_array"</span>, <span class="pl-s1">output_field</span><span class="pl-c1">=</span><span class="pl-v">ArrayField</span>(<span class="pl-v">TextField</span>()), ), <span class="pl-v">Value</span>(<span class="pl-s1">my_value</span>), <span class="pl-s1">function</span><span class="pl-c1">=</span><span class="pl-s">"array_position"</span>, <span class="pl-s1">output_field</span><span class="pl-c1">=</span><span class="pl-v">IntegerField</span>() ) ).<span class="pl-en">filter</span>(<span class="pl-s1">value_array_position__isnull</span><span class="pl-c1">=</span><span class="pl-c1">False</span>)</pre></div> <p>I didn't bother figuring out the ORM equivalent of that first <code>unnest()</code> SQL.</p> <p>Here's the reusable admin filter factory I came up with using these:</p> <div class="highlight highlight-source-python"><pre><span class="pl-k">from</span> <span class="pl-s1">django</span>.<span class="pl-s1">contrib</span>.<span class="pl-s1">admin</span> <span class="pl-k">import</span> <span class="pl-v">SimpleListFilter</span> <span class="pl-k">from</span> <span class="pl-s1">django</span>.<span class="pl-s1">contrib</span>.<span class="pl-s1">postgres</span>.<span class="pl-s1">fields</span> <span class="pl-k">import</span> <span class="pl-v">ArrayField</span> <span class="pl-k">from</span> <span class="pl-s1">django</span>.<span class="pl-s1">db</span> <span class="pl-k">import</span> <span class="pl-s1">connection</span> <span class="pl-k">from</span> <span class="pl-s1">django</span>.<span class="pl-s1">db</span>.<span class="pl-s1">models</span> <span class="pl-k">import</span> <span class="pl-v">F</span>, <span class="pl-v">TextField</span>, <span class="pl-v">Value</span> <span class="pl-k">from</span> <span class="pl-s1">django</span>.<span class="pl-s1">db</span>.<span class="pl-s1">models</span>.<span class="pl-s1">expressions</span> <span class="pl-k">import</span> <span class="pl-v">Func</span> <span class="pl-k">def</span> <span class="pl-en">make_csv_filter</span>(<span class="pl-s1">filter_title</span>, <span class="pl-s1">filter_parameter_name</span>, <span class="pl-s1">table</span>, <span class="pl-s1">column</span>): <span class="pl-k">class</span> <span class="pl-v">CommaSeparatedValuesFilter</span>(<span class="pl-v">SimpleListFilter</span>): <span class="pl-s1">title</span> <span class="pl-c1">=</span> <span class="pl-s1">filter_title</span> <span class="pl-s1">parameter_name</span> <span class="pl-c1">=</span> <span class="pl-s1">filter_parameter_name</span> <span class="pl-k">def</span> <span class="pl-en">lookups</span>(<span class="pl-s1">self</span>, <span class="pl-s1">request</span>, <span class="pl-s1">model_admin</span>): <span class="pl-s1">sql</span> <span class="pl-c1">=</span> <span class="pl-s">"""</span> <span class="pl-s"> select distinct unnest(</span> <span class="pl-s"> regexp_split_to_array({}, ',<span class="pl-cce">\\</span>s*')</span> <span class="pl-s"> ) from {}</span> <span class="pl-s"> """</span>.<span class="pl-en">format</span>( <span class="pl-s1">column</span>, <span class="pl-s1">table</span> ) <span class="pl-k">with</span> <span class="pl-s1">connection</span>.<span class="pl-en">cursor</span>() <span class="pl-k">as</span> <span class="pl-s1">cursor</span>: <span class="pl-s1">cursor</span>.<span class="pl-en">execute</span>(<span class="pl-s1">sql</span>) <span class="pl-s1">values</span> <span class="pl-c1">=</span> [<span class="pl-s1">r</span>[<span class="pl-c1">0</span>] <span class="pl-k">for</span> <span class="pl-s1">r</span> <span class="pl-c1">in</span> <span class="pl-s1">cursor</span>.<span class="pl-en">fetchall</span>() <span class="pl-k">if</span> <span class="pl-s1">r</span>[<span class="pl-c1">0</span>]] <span class="pl-k">return</span> <span class="pl-en">zip</span>(<span class="pl-s1">values</span>, <span class="pl-s1">values</span>) <span class="pl-k">def</span> <span class="pl-en">queryset</span>(<span class="pl-s1">self</span>, <span class="pl-s1">request</span>, <span class="pl-s1">queryset</span>): <span class="pl-s1">value</span> <span class="pl-c1">=</span> <span class="pl-s1">self</span>.<span class="pl-en">value</span>() <span class="pl-k">if</span> <span class="pl-c1">not</span> <span class="pl-s1">value</span>: <span class="pl-k">return</span> <span class="pl-s1">queryset</span> <span class="pl-k">else</span>: <span class="pl-k">return</span> <span class="pl-s1">queryset</span>.<span class="pl-en">annotate</span>( <span class="pl-s1">value_array_position</span><span class="pl-c1">=</span><span class="pl-v">Func</span>( <span class="pl-v">Func</span>( <span class="pl-v">F</span>(<span class="pl-s1">column</span>), <span class="pl-v">Value</span>(<span class="pl-s">",<span class="pl-cce">\\</span>s*"</span>), <span class="pl-s1">function</span><span class="pl-c1">=</span><span class="pl-s">"regexp_split_to_array"</span>, <span class="pl-s1">output_field</span><span class="pl-c1">=</span><span class="pl-v">ArrayField</span>(<span class="pl-v">TextField</span>()), ), <span class="pl-v">Value</span>(<span class="pl-s1">value</span>), <span class="pl-s1">function</span><span class="pl-c1">=</span><span class="pl-s">"array_position"</span>, <span class="pl-s1">output_field</span><span class="pl-c1">=</span><span class="pl-v">IntegerField</span>() ) ).<span class="pl-en">filter</span>(<span class="pl-s1">value_array_position__isnull</span><span class="pl-c1">=</span><span class="pl-c1">False</span>) <span class="pl-k">return</span> <span class="pl-v">CommaSeparatedValuesFilter</span></pre></div> <p>Then you use it in a <code>ModelAdmin</code> subclass like this:</p> <div class="highlight highlight-source-python"><pre><span class="pl-en">@<span class="pl-s1">admin</span>.<span class="pl-en">register</span>(<span class="pl-v">Reporter</span>)</span> <span class="pl-k">class</span> <span class="pl-v">ReporterAdmin</span>(<span class="pl-s1">admin</span>.<span class="pl-v">ModelAdmin</span>): <span class="pl-s1">list_filter</span> <span class="pl-c1">=</span> ( <span class="pl-en">make_csv_filter</span>( <span class="pl-s1">filter_title</span><span class="pl-c1">=</span><span class="pl-s">"Roles"</span>, <span class="pl-s1">filter_parameter_name</span><span class="pl-c1">=</span><span class="pl-s">"role"</span>, <span class="pl-s1">table</span><span class="pl-c1">=</span><span class="pl-s">"reporter"</span>, <span class="pl-s1">column</span><span class="pl-c1">=</span><span class="pl-s">"role_names"</span>, ), )</pre></div> | <Binary: 69,259 bytes> | 2021-04-21T09:31:55-07:00 | 2021-04-21T16:31:55+00:00 | 2021-04-28T16:23:10-07:00 | 2021-04-28T23:23:10+00:00 | b908bf07608a7730778650f861b6fb75 | filter-by-comma-separated-values |