avatarSoliman ElSaber

Summary

The website content introduces BigQuery's search indexes feature, which significantly accelerates text searches within unstructured text and semi-structured JSON data across all table columns.

Abstract

The article discusses the recent enhancement in BigQuery's search capabilities through the use of search indexes. This feature enables faster SQL-based searches for specific words across entire tables, including both structured and semi-structured data. The author illustrates the improvement by comparing the performance of the traditional CONTAINS_SUBSTR function with the new SEARCH command, both before and after implementing a search index. Using a football dataset as an example, the author demonstrates the accuracy and speed gains achieved with search indexes, particularly when dealing with large datasets. The article emphasizes the ease of finding unique data elements without prior knowledge of table schemas and the potential for significantly improving query performance in BigQuery.

Opinions

  • The author is impressed with the speed and efficiency of BigQuery's search indexes feature, noting it is much faster than the previous CONTAINS_SUBSTR function.
  • They highlight the accuracy of the search results, which improve when using the SEARCH command with indexing, as it reduces irrelevant results.
  • The author suggests that the new feature is user-friendly, allowing users to leverage Google Standard SQL for searching without needing to understand the underlying table schemas in advance.
  • They are optimistic about the benefits of using search indexes, especially for those working with large datasets, as it can dramatically decrease data reading times and enhance the overall data retrieval experience.

Short introduction for a nice feature to try from BigQuery

Enhancing BigQuery SEARCH features with SEARCH INDEX

A faster way to find text in unstructured text and semi-structured JSON

Recently, I started to explore the search indexes feature from BigQuery, which speeds up the search functionality a lot. The feature allows us to use SQL to find a specific word in the whole table fields in a faster way.

BigQuery search GCP blog

If you are familiar with the BigQuery string functions, then you may already use the CONTAINS_SUBSTR function to find a specific word that appears in the selected field. Then you now have a faster function with extra features.

The new feature facilitates text search using the SEARCH command, and it is faster than CONTAINS_SUBSTR because when you can create a search index on ALL COLUMNS, all STRING or JSON data the table will be indexed.

“BigQuery search indexes enable you to use Google Standard SQL to easily find unique data elements buried in unstructured text and semi-structured JSON data, without having to know the table schemas in advance.”

— GCP blog here.

Let’s try it together…

I am using a small football dataset, which contains 4 tables: players, clubs, countries, and transfers.

We will try first to use the CONTAINS_SUBSTR command to find a specific country:

You can take a note of the time consumed in reading the data, and also the accuracy of the results returned.

The command CONTAINS_SUBSTR(name, ‘oman’) here returned two results:

  • Oman
  • Romania

Let's search for a country under the countries table before indexing the SEARCH:

The command SEARCH(name, ‘oman’) here returned only one result:

  • Oman

Implement the search indexing:

CREATE SEARCH INDEX
    countries_index
ON
    `football.countries`(ALL COLUMNS);

Now, let’s run the same SEARCH query again

As you can see, the time consumed in reading the data decreased dramatically.

We will try the same using more bigger table, transfers.

Find player Neymar using CONTAINS_SUBSTR command:

Find player Neymar using SEARCH command without indexing:

Implement SEARCH indexing

CREATE SEARCH INDEX
    player_name_index
ON
    `football.transfers_all`(ALL COLUMNS);

Find player Neymar using SEARCH command after indexing:

Hope this quick introduction can help you to explore the enhancements that you can implement on your BigQuery queries to speed up the search results.

Sql
Google Cloud Platform
Big Data
Bigquery
Search
Recommended from ReadMedium