In order to provide a broad, configurable search and filtering interface across areas of our site, we have implemented a custom StreamField block, FilterableList, that allows a user to specify what filters are available, how to order results, and which pages should be included in the search.
- How It Works
How It Works¶
The journey on how a page gets a filterable form is not necessarily a straight or simple path, but it is something that is important to know. To start, the page must support the
FilterableList block within a StreamField as we mentioned earlier, but from there we start to see some divergence. In order to utilize the
FilterableList the page must support one of the following two classes:
The more common mixin that pages will extend is the FilterableListMixin. This class defines several important methods, such as
get_form_class, which defines the form to use. We also have some methods that retrieve relevant information for the form to use, such as
get_filterable_queryset. The bulk of the work is done in the
get_context method, which is responsible for getting and populating the form, processing the form, and returning the results to the user.
The CategoryFilterableMixin is an extension of the base
FilterableListMixin that exposes some new functionality. It modifies how
get_filterable_queryset operates in that it gets an initial list of pages but limits them to only ones that are assigned a category within a set of initial categories, which is defined as the variable
filterable_categories on a given page model. We can see this in action with both Newsroom (
NewsroomLandingPage) and Recent Updates (
As of our initial release of Elasticsearch-backed filterable lists in March 2021, our filterable forms can be broken into three specific forms:
EventArchiveFilterForm. The majority of our filterable lists rely on
FilterableListForm and the other two are each leveraged by a single page.
This is the base form that the vast majority of cf.gov uses for filterable lists. It defines the core fields that are visible on the form as well as functions to assist in setting initial data and sanitizing form input. The important information regarding
FilterableListForm is that it defines the function
get_page_set, which is responsible for invoking a search query. The logic regarding how to pass categories into the search object is due to the previously mentioned
CategoryFilterableMixin, which modifies the initial search parameters to enforce a category search if and only if the
filterable_categories list is passed into the form when initialized.
EnforcementActionsFilterForm is an extension of
FilterableListForm, adding on two fields specific to Enforcement Actions, and using a refined search class to provide search functionality against the new fields and a proper ordering by initial filing date.
EventArchiveFilterForm is another extension of
FilterableListForm, the only real modification being the invocation of an event specific search class that allows us to provide filtering based on event dates rather than page publication dates.
There is currently only one type of document defined,
FilterablePagesDocument, which is based off the
AbstractFilterPage class. This document is responsible for housing data related to any of the filterable page types that extend
NewsroomPage, to name a few. In order to get fields that are specific to a page type, such as the status list for an Enforcement Action, you use the
prepare_field function syntax. The use of
get_instances_from_related is to enforce the auto-updating of our index when changes occur to a specific page we have indexed, rather than just the relation to
AbstractFilterPage that is reflected in the database.
Search is the final piece of the puzzle, where we actually leverage Elasticsearch to filter and match documents and return them in an ordered
QuerySet. Before breaking down the search classes, it's important to discuss the current implementation from an Elasticsearch perspective to understand how we're gathering results.
The expanded search for filterable lists is using a multi-match query across the title, topic name, preview description, and content fields of all
FilterablePagesDocuments. We are leveraging a phrase_prefix matching style with a currently configured slop of 2, to allow for some looser matching restrictions. We also provide a boost score for matching to the title and topic name fields, indicated by
^10 within the code base. This boost score is to enable better ordering by relevance when desired. Search currently supports two different methods of ordering results: relevance and date published. Relevance is calculated by the Elasticsearch engine when returning results, and the date published is calculated based on page publication date. Enforcement Actions define their own ordering logic based on initial filing date.
FilterablePagesDocumentSearch is the core search class that is used across the majority of our searching. It is invoked from
FilterableListForm. This search class defines the common structure for our search function, as well as the base logic for filtering against all common fields and logic behind our multi-match and ordering steps. The core function called from outside the class is the
search function, which properly chains all of our filter/match/sorting logic and returns the resulting list as a Django
EventFilterablePagesDocumentSearch is an extension of
FilterablePagesDocumentSearch that defines behavior specific to our future and past Events listings. The class overwrites one method from its parent, the
filter_date function, to change the behavior to filter based on fields specific to events, the start and end date of an event.
EnforcementActionsFilterForm is an extension of
FilterablePagesDocumentSearch that exposes some additional filter logic through the
apply_specific_filters function. We also see that
order_results have been overwritten to leverage an Enforcement Action-specific field, initial filing date.