Combining the Best of Both Worlds: Hybrid Search in Elasticsearch with BM25 and HNSW

When it comes to search algorithms, there is no one-size-fits-all solution. Different algorithms work better in different scenarios, and sometimes a combination of algorithms is needed to achieve the best results. In Elasticsearch, one popular approach to combining search algorithms is to use a hybrid search, combining the BM25 algorithm for text search with the HNSW algorithm for nearest neighbor search. In this blog post, we’ll explore the benefits, challenges, and use cases of hybrid search in Elasticsearch.
BM25 is a widely used algorithm for text search that calculates a score based on the term frequency and inverse document frequency of each term in the query. HNSW, as we saw in the previous blog post, is an algorithm for approximate nearest neighbor search that constructs a small world graph of interconnected nodes. By combining these two algorithms, we can perform hybrid search that combines the strengths of both.
One of the biggest challenges of hybrid search is balancing the weights of the two algorithms. In other words, we need to decide how much weight to give to the BM25 score and how much weight to give to the HNSW score when combining them. This can be tricky, as the optimal weights may vary depending on the data and the specific search scenario.
However, when done correctly, hybrid search can lead to significant improvements in search accuracy and efficiency. For example, in e-commerce applications, hybrid search can be used to combine text search with visual search, enabling users to find products that match both their textual and visual queries. In scientific applications, hybrid search can be used to combine text search with similarity search on high-dimensional data, enabling researchers to find relevant documents based on both their textual content and their data.
Let’s take a look at an example of how to implement hybrid search in Elasticsearch. First, we need to index our data using a mapping that includes both the BM25 and HNSW similarity algorithms:
PUT /my_index
{
"mappings": {
"properties": {
"text": {
"type": "text",
"similarity": "my_similarity"
},
"vector": {
"type": "dense_vector",
"dims": 10,
"similarity": "my_similarity"
}
}
},
"settings": {
"similarity": {
"my_similarity": {
"type": "hybrid",
"weight": 0.5,
"bm25": {
"type": "BM25",
"b": 0.75,
"k1": 1.2
},
"hnsw": {
"efSearch": 100,
"efConstruction": 200,
"m": 48
}
}
}
}
}Here, we’ve defined a mapping that includes both a text field for BM25 text search and a vector field for HNSW similarity search on high-dimensional data. We've also defined a hybrid similarity algorithm that combines the BM25 and HNSW algorithms with a weight of 0.5, meaning that each algorithm will contribute equally to the final score.
Next, we can perform a search using both the text and vector fields, like this:
GET /my_index/_search
{
"query": {
"bool": {
"should": [
{ "match": { "text": "search term" }},
{ "knn": { "vector": { "vector": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"k": 5 } }
]
}
}
}This search query combines a `match` query on the `text` field with a `knn` query on the `vector` field, using the hybrid similarity algorithm we defined earlier. In this example, we’ve used a weight of 0.5 for both BM25 and HNSW, but you can experiment with different weights to find the optimal balance for your data and search scenario. In conclusion, hybrid search in Elasticsearch is a powerful technique for combining the strengths of different search algorithms to achieve better search accuracy and efficiency. By combining the BM25 algorithm for text search with the HNSW algorithm for nearest neighbor search, users can leverage the power of both algorithms in their search and analytics applications. With careful tuning of the weights of the two algorithms, hybrid search can be a powerful tool for a wide range of applications, from e-commerce to scientific research.
About Author Evergreen Technologies:
Active in blogging and teaching teaching online courses in Lifestyle, Travel, Wellbing, Computer vision , Natural Language Processing and SaaS system developmentOver 20 years of experience in fortune 500 companies
Website: https://www.mentorai.blog
Youtube Channel: https://www.youtube.com/channel/UCPyeQQp4CfprsybVr8rgBlQ?view_as=subscriber
Linked in: @evergreenllc2020
Twitter: @tech_evergreen
Over 22,000 students in 145 countries





