avatarTom Nijhof

Summary

The article discusses Supabase's advanced full-text search capabilities, including the use of eq, like, ilike, and textSearch functions, and explains how to index for improved search performance using textSearch.

Abstract

The article delves into the functionalities provided by Supabase for conducting advanced full-text searches within a database, leveraging SQL and PostgreSQL features. It compares different search methods such as exact matching with eq, pattern matching with like and ilike, and the more sophisticated textSearch which utilizes PostgreSQL's tsvector for optimized searching. The author illustrates these methods with examples from a database of coffees and teas, searching for items containing "rooibos." The article also provides guidance on creating indexes to enhance search efficiency and performance, demonstrating the use of EXPLAIN to compare query costs. The conclusion emphasizes the power of Supabase's tools, particularly the textSearch feature with indexing, for executing effective and efficient full-text searches.

Opinions

  • The author suggests that eq is best for exact matches and is highly sensitive to errors, often used for IDs.
  • like and ilike are considered useful for pattern matching, with ilike offering case-insensitive searches.
  • The author expresses that on-the-fly transformation for textSearch is not recommended due to high query costs, but this can be mitigated with proper indexing.
  • The use of wildcards in textSearch is noted to be similar to like and ilike, but with potentially higher performance when indexed.
  • The author advocates for the creation of a tsvector column and a GIN index to significantly improve the efficiency of full-text searches.
  • The article conveys that while indexing increases the cost of inserting and updating records, the benefits of faster search operations often outweigh this cost.
  • The author concludes that Supabase's search tools, when used with indexing, provide a robust solution for advanced database searches, offering a balance between flexibility and performance.

Supabase for Advanced Full-Text Search

Searching within your database is a handy feature for your users. However, there are many different ways to do this. Supabase has multiple different functions implemented that can be used for it all based on SQL and PostgreSQL functionality. We will explore eq, like, ilike, textSearch, and how to index for textSearch.

The examples given below are all run on the same database of coffees and teas. We are looking for all the drinks with rooibos in the name; rooibos is a kind of herb tea from South Africa.

By using EXPLAIN I get a cost for the queries to compare against each other. This is done on the SQL equivalent of the Supabase functions. The cost will be in the form of 1..10 meaning it is between 1 and 10. The numbers are an abstraction. Lower numbers are better, numbers close to each other mean there is not a lot of variance. In general, I take the highest number to compare different queries.

Prerequisite knowledge

I assume you have the following knowledge:

  • Set up a basic Supabase project (Article)
  • You can use select, from, and order on a Supabase object (Article)
  • (optional) Advanced full-text search in SQL and PostgreSQL (Article)

This blog is part of caffeinecritics.com the code can be found on GitHub.

Supabase logo found by a magnifying glass in a text

Preparation

To make the comparison more realistic an index is put on the column name. This is done with the following SQL query.

CREATE INDEX if not EXISTS idx_drink_name
ON public.drinks
USING btree (name)
TABLESPACE pg_default;

You can execute SQL queries in SQL Editor within your Supabase dashboard.

Screenshot showing where the SQL Editor is within the Supabase dashboard.

eq

The most basic search is comparing. You use eq to compare a column against a value. This has the lowest cost of all methods, but it is also very sensitive to errors. A wrong capital, only part of a word, or only one word, and it does not work. For this reason, it is often used for IDs and other cases where you know exactly what you are looking for.

supabase
  .from('drinks')
  .select('id, name, image_url, description')
  .eq('name',"Rooibos")
  .then((res) => {
    const data = res.data
    // Do stuff
})

cost=0.27..2.49

Result of .eq with the input Rooibos it will only find 1 result with an exact match to the input.

LIKE and ILIKE

The like operator is used for pattern matching in place of the eq. It is used to search for a specified pattern in a column. The % wildcard is often used within a like operation to represent any sequence of characters, and _ (underscore) represents a single character. The ilike operator is similar to like. However, it disregards the case of the letters in the comparison. So like would say Rooibos and rooibos are different, but ilike would say they are the same. There is no cost difference between like and ilike. However, the max cost is about 10 times higher as a naive comparison.

The query will return 17 results, from “Rooibos” to “Rooibos Green Raspberry Orange”

supabase
  .from('drinks')
  .select('id, name, image_url, description')
  .ilike('name', "Rooib%") // replace with .like if you want case sensitive search
  .then((res) => {
    const data = res.data
    // Do stuff
})

cost=0.00..24.36

Result of ilike with the input rooib%. It finds all with rooibos in the name even if they are not lowercase.

Text search

Behind the textSearch is PostgreSQLs tsvector, a special data type used to represent text in a form optimized for full-text search. It’s part of PostgreSQL’s full-text search capabilities. A tsvector is a vector of lexemes, which are the basic units of text. It is a way to simplify language for search operations. This processing involves breaking down the document into words followed by:

  • removing common words. These are words like the, for, in, is, be, being etc. These words are so common that it does not add value to know if they are there. If you search for “the best coffee” you do not care for every coffee with the word “the” in the name.
  • Stemming is reducing a word to its most basic form. No plurals, and verbs only in the simplest form. So “cats” become “cat”, “loving” becomes “love” etc. This will help us if we search for “coffee loved by cats” you will find “a cat loves this coffee”.

Let’s look at this in action.

"This coffee is great for cats! Meow Coffees"

will become

'cat':6 'coffe':2, 8 'great':4 'meow':7

Coffe is a weird result, but this does not matter, because all coffee, coffe, coffees will all become coffe. For us, it is useful that a typo will still result in coffee results.

The number behind the word is the location where the word is found. In the case of coffee, the second and eighth words have the stem coffee.

Direct textSearch

We do not have to use tsvector directly, Supabase will do it for us when we use textSearch. Firstly we are going to transform our data on the fly. This means all the cost will be at the moment we search. This cost is very high, but we can bring that down later using an index. The below query will give us every name that has at least 1 word matching our text “rooibos”.

I would not recommend to match on the fly, in most cases like or ilike will be a cheaper alternative.

supabase
  .from('drinks')
  .select('id, name, image_url, description')
  .textSearch('name', "rooibos")
  .then((res) => {
    const data = res.data
    // Do stuff
})

cost=0.00..158.86

Using a wildcard

Let’s power this up a bit with wildcards. We do not need to complete the word “rooibos” but can use a wildcard with :*. Similar to % in like and ilike. Also here it does not have an impact on the cost.

Again matching on the fly is not really worth it, using like or ilike is generally better.

supabase
  .from('drinks')
  .select('id, name, image_url, description')
  .textSearch('name', "rooib:*")
  .then((res) => {
    const data = res.data
    // Do stuff
})

cost=0.00..158.86

Text search index

In order to index the vectors we are going to introduce a new column with the vectors. Every time we update a row or insert a new one, we will generate a (new) value for this column. This index makes inserting and updating more expensive. This time we will search over the name AND description combined.

In order to do this we do need to use the SQL editor with the Supabase dashboard.

Screenshot showing where the SQL Editor is within the Supabase dashboard.

Step 0: Remove NULL

Make sure none of the values are NULL. Replace them with empty strings. The query below does this for all descriptions.

UPDATE drinks
SET description = COALESCE(description, '')
WHERE description IS NULL;

Also, make sure new entries of name or description are not nullable. Edit the table and click the gear behind the columns to uncheck is Nullable.

Screenshot showing the checkbox of “is nullable” being unchecked

Step 1: Create an index

Just run this query. Trust me!

ALTER TABLE drinks
ADD COLUMN fts tsvector
GENERATED ALWAYS
as (to_tsvector('english', description || ' ' || name)) STORED;
CREATE INDEX drink_fts
ON drinks
USING gin (fts);

Fine, let’s break it down so you do not just have to trust a random person on the internet.

ALTER TABLE drinks
ADD COLUMN fts tsvector
GENERATED ALWAYS
as (to_tsvector('english', description || ' ' || name)) STORED;

This command adds a new column called “fts” of type tsvector. The generated always makes sure that the values in the “fts” column are automatically generated when a row is updated or inserted.

The data in this column is generated using the to_tsvector function, which processes the concatenated values of the “description” and “name” columns. It will use the English dictionary to get the stem of the words. The stored keyword indicates that the generated values are stored physically on disk for efficient retrieval.

CREATE INDEX drink_fts
ON drinks
USING gin (fts);

This command creates an index named “drink_fts” on the “drinks” table. The index is a GIN (Generalized Inverted Index) index, and it is created on the “fts” column. GIN indexes are suitable for indexing complex data types like tsvector.

Step 2: New search query

So now we use the indexed column “fts” for searching. We do this by replacing the to_tsvector(name) with our new column. The cost is lower than even the LIKE and the ILIKE. If we add a wildcard the cost is going up but still lower than the LIKE and the ILIKE. Making our PostgreSQL perform better than native SQL for full-text searching.

supabase
  .from('drinks')
  .select('id, name, image_url, description')
  .textSearch('fts', "rooibos")
  .then((res) => {
    const data = res.data
    // Do stuff
})

cost=3.31..4.42

supabase
  .from('drinks')
  .select('id, name, image_url, description')
  .textSearch('fts', "rooib:*")
  .then((res) => {
    const data = res.data
    // Do stuff
})

cost=4.69..11.26

If I want to order the results I will still name and NOT fts. So that would be:

supabase
  .from('drinks')
  .select('id, name, image_url, description')
  .textSearch('fts', "rooib:*")
  .order('name', { ascending: true })
  .then((res) => {
    const data = res.data
    // Do stuff
})
Result of textSearch of rooibos. Note how Haardvuur melange does not have rooibos in the name but does have it in the description.

Conclusion

In summary, Supabase offers powerful tools for advanced database searches. We explored methods like eq, like, ilike, and the robust textSearch. Each method serves different purposes, from basic comparisons to flexible pattern matching. The textSearch feature, backed by PostgreSQL’s tsvector, stands out for optimized full-text searches if we use the index.

Supabase
Full Text Search
Backend Development
Front End Development
Tutorial
Recommended from ReadMedium