avatarOrlando Moroni

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

11985

Abstract

ick <b>Triggers</b> in the left navigation menu.</p><p id="d29e">Clicking on the <b>Add Trigger</b> button, you can configure your new trigger.</p><figure id="d5e7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*G98buo3E4fWsmaHM09_p-g.png"><figcaption>Atlas Triggers Page</figcaption></figure><p id="1f57"><b><i>HF_Create_Embeddings trigger</i></b></p><p id="3e79">We build the first trigger on our <i>vector_search.proverbs </i>collection, naming it <i>HF_Create_Embeddings</i>. Configure the trigger as shown below:</p><figure id="a5ff"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*x4PCMjGWrGeTnzrgZIVZsA.png"><figcaption>Create Trigger</figcaption></figure><p id="ee08">Select <b>Function</b> as the event type in the function section, and paste the Javascript function code reported on the following code block.</p><figure id="db3d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*chQ0hBUGt8rENDOGEWJKgw.png"><figcaption>Trigger’s Function</figcaption></figure><p id="d93b">The code to be pasted on the above form is the following:</p><div id="4408"><pre><span class="hljs-built_in">exports</span> = <span class="hljs-keyword">async</span> <span class="hljs-keyword">function</span>(<span class="hljs-params">changeEvent</span>) { <span class="hljs-comment">// Get the full document from the change event.</span> <span class="hljs-keyword">const</span> doc = changeEvent.<span class="hljs-property">fullDocument</span>;

<span class="hljs-comment">// Define the Hugging Face API url and key.</span>
<span class="hljs-keyword">const</span> url = <span class="hljs-string">'https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2'</span>;
<span class="hljs-comment">// Use the name you gave the value of your API key in the “Values” utility inside of App Services</span>
<span class="hljs-keyword">const</span> hf_read_token = context.<span class="hljs-property">values</span>.<span class="hljs-title function_">get</span>(<span class="hljs-string">"HF_value"</span>);

<span class="hljs-keyword">try</span> {
    <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(<span class="hljs-string">`Processing document with id: <span class="hljs-subst">${doc._id}</span>`</span>);

    <span class="hljs-comment">// Call Hugging Face API to get the embeddings.</span>
    <span class="hljs-keyword">let</span> response = <span class="hljs-keyword">await</span> context.<span class="hljs-property">http</span>.<span class="hljs-title function_">post</span>({
        <span class="hljs-attr">url</span>: url,
         <span class="hljs-attr">headers</span>: {
            <span class="hljs-string">'Authorization'</span>: [<span class="hljs-string">`Bearer <span class="hljs-subst">${hf_read_token}</span>`</span>],
            <span class="hljs-string">'Content-Type'</span>: [<span class="hljs-string">'application/json'</span>]
        },
        <span class="hljs-attr">body</span>: <span class="hljs-title class_">JSON</span>.<span class="hljs-title function_">stringify</span>({
            <span class="hljs-comment">// The field inside your document that contains the data to embed, here it is the “proverb” field from the sample proverbs data.</span>
            <span class="hljs-attr">inputs</span>: [doc.<span class="hljs-property">proverb</span>]
        })
    });

    <span class="hljs-comment">// Parse the JSON response</span>
    
    <span class="hljs-keyword">let</span> responseData = <span class="hljs-variable constant_">EJSON</span>.<span class="hljs-title function_">parse</span>(response.<span class="hljs-property">body</span>.<span class="hljs-title function_">text</span>());

    <span class="hljs-comment">// Check the response status.</span>
    <span class="hljs-keyword">if</span>(response.<span class="hljs-property">statusCode</span> === <span class="hljs-number">200</span>) {
        <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(<span class="hljs-string">"Successfully received embedding."</span>);

        <span class="hljs-keyword">const</span> embedding = responseData[<span class="hljs-number">0</span>];

        <span class="hljs-comment">// Get the cluster in MongoDB Atlas.</span>
        <span class="hljs-keyword">const</span> mongodb = context.<span class="hljs-property">services</span>.<span class="hljs-title function_">get</span>(<span class="hljs-string">'Cluster0'</span>);
        <span class="hljs-keyword">const</span> db = mongodb.<span class="hljs-title function_">db</span>(<span class="hljs-string">'vector_search'</span>); <span class="hljs-comment">// Replace with your database name.</span>
        <span class="hljs-keyword">const</span> collection = db.<span class="hljs-title function_">collection</span>(<span class="hljs-string">'proverbs'</span>); <span class="hljs-comment">// Replace with your collection name.</span>

        <span class="hljs-comment">// Update the document in MongoDB.</span>
        <span class="hljs-keyword">const</span> result = <span class="hljs-keyword">await</span> collection.<span class="hljs-title function_">updateOne</span>(
            { <span class="hljs-attr">_id</span>: doc.<span class="hljs-property">_id</span> },
            <span class="hljs-comment">// The name of the new field you’d like to contain your embeddings.</span>
            { <span class="hljs-attr">$set</span>: { <span class="hljs-attr">proverb_embedding</span>: embedding }}
        );

        <span class="hljs-keyword">if</span>(result.<span class="hljs-property">modifiedCount</span> === <span class="hljs-number">1</span>) {
            <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(<span class="hljs-string">"Successfully updated the document."</span>);
        } <span class="hljs-keyword">else</span> {
            <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(<span class="hljs-string">"Failed to update the document."</span>);
        }
    } <span class="hljs-keyword">else</span> {
        <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(<span class="hljs-string">`Failed to receive embedding. Status code: <span class="hljs-subst">${response.statusCode}</span>`</span>);
    }

} <span class="hljs-keyword">catch</span>(err) {
    <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">error</span>(err);
}

};</pre></div><p id="d52a">The trigger <i>HF_Create_Embeddings </i>will invoke the Hugging Face <i>all-MiniLM-L6-v2</i> model API to get the vector embedding for each proverb inserted into the <i>proverbs</i> collection.</p><p id="2ec6"><b><i>Semantic_Query trigger</i></b></p><p id="85d8">The second trigger will be created on the <i>queries</i> collection. From the trigger’s function, we’ll invoke the Hugging Face embedding model to get the embedding of the user query, and then we’ll execute the vector search through the MongoDB aggregate command.</p><p id="3655">The result of the vector search we’ll be saved on the <i>queries</i> collection itself.</p><p id="9860">To create the second trigger, follow the same process we used for the first one, but make sure to adjust the parameters as outlined below (all other parameter values remain the same):</p><p id="cc52"><b>Name</b> <i>Semantic_Query</i></p><p id="f626"><b>Collection Name</b> <i>queries</i></p><p id="622b">For the function’s code, paste the following block:</p><div id="056c"><pre><span class="hljs-built_in">exports</span> = <span class="hljs-keyword">async</span> <span class="hljs-keyword">function</span>(<span class="hljs-params">changeEvent</span>) { <span class="hljs-comment">// Get the full document from the change event.</span> <span class="hljs-keyword">const</span> doc = changeEvent.<span class="hljs-property">fullDocument</span>;

<span class="hljs-comment">// Define the Hugging Face API url and key.</span>
<span class="hljs-keyword">const</span> url = <span class="hljs-string">'https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2'</span>;
<span class="hljs-comment">// Use the name you gave the value of your API key in the “Values” utility inside of App Services</span>
<span class="hljs-keyword">const</span> hf_read_token = context.<span class="hljs-property">values</span>.<span class="hljs-title function_">get</span>(<span class="hljs-string">"HF_value"</span>);

<span class="hljs-keyword">try</span> {
    <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(<span class="hljs-string">`Processing document with id: <span class="hljs-subst">${doc._id}</span>`</span>);

    <span class="hljs-comment">// Call Hugging Face API to get the embeddings of the query.</span>
    <span class="hljs-keyword">let</span> response = <span class="hljs-keyword">await</span> context.<span class="hljs-property">http</span>.<span class="hljs-title function_">post</span>({
        <span class="hljs-attr">url</span>: url,
         <span class="hljs-attr">headers</span>: {
            <span class="hljs-string">'Authorization'</span>: [<span class="hljs-string">`Bearer <span class="hljs-subst">${hf_read_token}</span>`</span>],
            <span class="hljs-string">'Content-Type'</span>: [<span class="hljs-string">'application/json'</span>]
        },
        <span class="hljs-attr">body</span>: <span class="hljs-title class_">JSON</span>.<span class="hljs-title function_">stringify</span>({
            <span class="hljs-comment">// The field inside your document that contains the data to embed, here it is the “query” field from the "queries" collection.</span>
            <span class="hljs-attr">inputs</span>: [doc.<span class="hljs-property">query</span>]
        })
    });

    <span class="hljs-comment">// Parse the JSON response</span>
    
    <span class="hljs-keyword">let</span> responseData = <span class="hljs-variable constant_">EJSON</span>.<span class="hljs-title function_">parse</span>(response.<span class="hljs-property">body</span>.<span class="hljs-title function_">text</span>());

    <span class="hljs-comment">// Check the response status.</span>
    <span class="hljs-keyword">if</span>(response.<span class="hljs-property">statusCode</span> === <span class="hljs-number">200</span>) {
        <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(<span class="hljs-string">"Successfully received embedding."</span>);

        <span class="hljs-keyword">const</span> embedding = responseData[<span class="hljs-number">0</span>];

        <span class="hljs-comment">// Get the cluster in MongoDB Atlas.</span>
        <span class="hljs-keyword">const</span> mongodb = context.<span class="hljs-property">services</span>.<span class="hljs-title function_">get</span>(<span class="hljs-string">'Cluster0'</span>);
        <span class="hljs-keyword">const</span> db = mongodb.<span class="hljs-title function_">db</span>(<span class="hljs-string">'vector_search'</span>); <span class="hljs-comment">// Replace with your database name.</span>
        <span class="hljs-keyword">const</span> proverbs_collection = db.<span class="hljs-title function_">collection</span>(<span class="hljs-string">'proverbs'</span>); <span class="hljs-comment">// Replace with your collection name.</span>
        <span class="hljs-keyword">const</span> queries_collection = db.<span class="hljs-title function_">collection</span

Options

(<span class="hljs-string">'queries'</span>); <span class="hljs-comment">// Replace with your collection name.</span>

        <span class="hljs-comment">// Query for similar documents.</span>
        <span class="hljs-keyword">const</span> documents = <span class="hljs-keyword">await</span> proverbs_collection.<span class="hljs-title function_">aggregate</span>([
        {
         <span class="hljs-string">"$search"</span>: {
               <span class="hljs-string">"index"</span>: <span class="hljs-string">"vector_search_index"</span>,
               <span class="hljs-string">"knnBeta"</span>: {
                   <span class="hljs-string">"vector"</span>: embedding,
                   <span class="hljs-string">"path"</span>: <span class="hljs-string">"proverb_embedding"</span>,
                   <span class="hljs-string">"k"</span>: <span class="hljs-number">2</span>
                   }
               }
        },
        {
         <span class="hljs-string">"$project"</span>:{
              <span class="hljs-string">"_id"</span>:<span class="hljs-number">0</span>,
              <span class="hljs-string">"proverb"</span>:<span class="hljs-number">1</span>
             }
        }
        ]).<span class="hljs-title function_">toArray</span>();
        
       <span class="hljs-comment">// Update the document in MongoDB.</span>
       <span class="hljs-keyword">const</span> result = <span class="hljs-keyword">await</span> queries_collection.<span class="hljs-title function_">updateOne</span>(
              { <span class="hljs-attr">_id</span>: doc.<span class="hljs-property">_id</span> },
         <span class="hljs-comment">// The "answer" field will contain the query result.</span>
            { <span class="hljs-attr">$set</span>: { <span class="hljs-attr">query_embedding</span>: embedding , <span class="hljs-attr">answer</span>: documents  }}
          );

    } <span class="hljs-keyword">else</span> {
        <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(<span class="hljs-string">`Failed to receive embedding. Status code: <span class="hljs-subst">${response.statusCode}</span>`</span>);
    }

} <span class="hljs-keyword">catch</span>(err) {
    <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">error</span>(err);
}

};</pre></div><h2 id="57b0">6. Create the Vector Search Index</h2><p id="b79e">We must create a <i>vector search index</i> on the <i>proverbs</i> collection to enable the vector searches. The <i>proverbs</i> collection will contain the embedding of our proverb sentences (<i>proverb_embedding</i> field) that we’ll be searched to respond to our queries.</p><p id="b896">To create the index, go to <b>Atlas Search</b>: from the Database Deployments page, click on <b>Search</b> on the left menu, then select your cluster in the <b>Select data source</b> drop-down menu, and press the <b>Go to Atlas Search</b> button.</p><figure id="14ba"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*6-AyrrNaSavV5cApcSqQiQ.png"><figcaption>Atlas Search</figcaption></figure><p id="73e6">Click on <b>Create Search Index</b> button to configure the new search index:</p><figure id="f9d6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*FEdRyy4uoucA96Hph0UwaQ.png"><figcaption>Atlas Search Page</figcaption></figure><p id="bc81">On the following page, select the <b>JSON Editor</b> box and press the <b>Next</b> button:</p><figure id="ad2e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*WQ0WFuMKa6QstWF4QaBuCw.png"><figcaption>Create a Search Index</figcaption></figure><p id="700c">Select the <i>vector_search</i> database and the <i>proverbs</i> collection in the <b>Database and Collection</b> area, and name the index <i>vector_search_index</i> in the <b>Index Name </b>field.</p><figure id="4f71"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*8VU1_cmqg4xDKDlmzwa9oQ.png"><figcaption>JSON Editor</figcaption></figure><p id="eeea">Paste the following JSON document into the text area, and click the Next button.</p><div id="234c"><pre><span class="hljs-punctuation">{</span> <span class="hljs-attr">"mappings"</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span> <span class="hljs-attr">"dynamic"</span><span class="hljs-punctuation">:</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-attr">"fields"</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span> <span class="hljs-attr">"proverb_embedding"</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span> <span class="hljs-attr">"dimensions"</span><span class="hljs-punctuation">:</span> <span class="hljs-number">384</span><span class="hljs-punctuation">,</span> <span class="hljs-attr">"similarity"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"dotProduct"</span><span class="hljs-punctuation">,</span> <span class="hljs-attr">"type"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"knnVector"</span> <span class="hljs-punctuation">}</span> <span class="hljs-punctuation">}</span> <span class="hljs-punctuation">}</span> <span class="hljs-punctuation">}</span></pre></div><p id="073f">Clicking the <b>Create Search Index,</b> you start the index creation.</p><figure id="a550"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*lBV7kMYcGWyuOuA8Bw1vnQ.png"><figcaption>Create Atlas Search Index</figcaption></figure><p id="1836">The new index will be available in the <i>ACTIVE</i> state shortly.</p><figure id="00ac"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*VoE8tn1E3hQwDL84I3FIaw.png"><figcaption>Index created</figcaption></figure><h2 id="25f4">7. Insert the proverbs dataset into MongoDB</h2><p id="35fc">We will insert some English proverbs in the <i>proverbs</i> collection to populate our embedding store. We’ll add one proverb at a time from the Atlas UI.</p><p id="b7f2">The first proverb we are inserting says:</p><p id="bed9" type="7">A jack of all trades is master of none</p><p id="f274">To insert a proverb:</p><ul><li>navigate to <b>Browse Collections</b> from the Database Deployments page;</li><li>select the <i>proverbs</i> collection under the <i>vector_search</i> database</li><li>add a single-field document having “proverb” as field name and the proverb sentence as field value;</li><li>then press <b>Insert</b>.</li></ul><figure id="6570"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*nnPR5stsWNd_y-200lULyQ.png"><figcaption>Insert Document</figcaption></figure><p id="d4fd">Magically a new field called <i>proverb_embedding</i> will be added to the document:</p><figure id="7219"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*cJey-LIVwQE1j2_ZvpGOLg.png"><figcaption>Proverbs collection</figcaption></figure><p id="1e31">The <i>proverb_embedding</i> field contains the embedding vector (an array with 384 floating point elements) generated by the Hugging Face text embedding model API invoked by the <i>HF_Create_Embeddings </i>trigger.</p><p id="36a6">You can insert any English proverbs of your choice into the <i>proverbs</i> collection. In our test, we inserted the following ten proverbs, randomly picked from the web:</p><p id="9eca"><i>A jack of all trades is master of none.</i></p><p id="22f3"><i>All that glitters is not gold.</i></p><p id="faa4"><i>An apple a day keeps the doctor away.</i></p><p id="d37e"><i>Better late than never.</i></p><p id="38ce"><i>Curiosity killed the cat.</i></p><p id="7e4a"><i>If you play with fire, you’ll get burned.</i></p><p id="d8dc"><i>Justice delayed is justice denied.</i></p><p id="35cb"><i>Night brings counsel.</i></p><p id="a607"><i>Rome wasn’t built in a day.</i></p><p id="d1d0"><i>The grass is greener on the other side of fence.</i></p><p id="1e85">We have ten documents in our <i>proverbs</i> collection.</p><figure id="f0b0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*2Wc3Dr1o6i8kBnmiuSgDzw.png"><figcaption>Proverbs documents</figcaption></figure><h2 id="fddb">8. Run the semantic queries</h2><p id="e45b">We’ll insert a single-field document in the <i>queries</i> collection to execute our search. The field name will be “<i>query”</i>, and the value will be the text of our search:</p><p id="cd73"><i>{ “query”: “Things that look good outwardly may not be as valuable or good.”}</i></p><p id="0189">As soon as a new document is inserted into the <i>queries</i> collection, the <i>Semantic_Query</i> trigger:</p><ul><li>invokes the Hugging Face API to get the embedding of the query, passing the proverb sentence;</li><li>store the received embedding vector into the document itself (<i>query_embedding</i> field);</li><li>executes the vector search on the <i>vector_search_index</i>, through the MongoDB <i>aggregate</i> command;</li><li>save the search results into the document itself (<i>answer</i> field).</li></ul><p id="714e">To test the query, go to the <b>Collections</b> tab and insert our query on the <i>queries</i> collection:</p><figure id="87b6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*bt3Ozvh00vwJgKpR1-Cf4Q.png"><figcaption>Query document</figcaption></figure><p id="5a95">Here is the answer:</p><figure id="038b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aLhPu5pzsKp7CTCLGF-ocg.png"><figcaption>Query Answer</figcaption></figure><p id="d904">The two proverbs with the most similar meaning to our query are:</p><p id="38b8"><i>“All that glitters is not gold.”</i> and <i>“A jack of all trades is master of none”</i>.</p><p id="7ff6">The answer looks fine! You can experiment with your dataset and your queries. Additionally, you could test out different embedding models, such as the OpenAI text embedding API, to assess the accuracy of the responses.</p><h1 id="244f">References</h1><ul><li><a href="https://www.mongodb.com/docs/atlas/getting-started/">MongoDB Atlas Manual</a></li><li><a href="https://www.mongodb.com/products/platform/atlas-vector-search">MongoDB Atlas Vector Search</a></li><li><a href="https://www.mongodb.com/developer/products/atlas/semantic-search-mongodb-atlas-vector-search/">MongoDB Atlas Search Tutorial</a></li><li><a href="https://lemongrad.com/proverbs-with-meanings-and-examples/">Common English Proverbs</a></li><li><a href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2">Hugging Face Text Embedding Model</a></li></ul><p id="999d">If you wish to expand your knowledge of MongoDB, look at my articles on How MongoDB Works.</p><div id="13fb" class="link-block"> <a href="https://medium.com/@moroni.orlando/list/0292e4d00387"> <div> <div> <h2>How MongoDB Works</h2> <div><h3>Discover the magic of MongoDB: an easy-to-follow introduction for mastering MongoDB.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*f8f8052733eadd6f86302e81f5b7b52347124e38.jpeg)"></div> </div> </div> </a> </div><p id="e04a">If you appreciate the stories I write and would like to show your support, <a href="https://medium.com/@moroni.orlando/membership"><b>you can become a Medium member</b></a>. For just $5 a month, you’ll have unlimited access to all the stories on Medium. By using <a href="https://medium.com/@moroni.orlando/membership">my referral link</a> to sign up, I’ll receive a small commission. Thank you for considering!</p></article></body>

Introducing the latest MongoDB Atlas Vector Search feature

Getting Started with MongoDB Atlas for Semantic Search

A quick tutorial for advanced search with MongoDB & Hugging Face

Photo by Google DeepMind on Unsplash

On June 22nd, MongoDB launched Atlas Vector Search in preview mode.

I tried this new feature for you!

The idea is to store a small dataset of common English proverbs on MongoDB and ask something like:

Question: Things that look good outwardly may not be as valuable or good.

Answer: All that glitters is not gold.

The inspiration for this post was taken from the official MongoDB Atlas Vector Search tutorial.

Introduction to MongoDB Atlas Vector Search

Vector search is an advanced technique used to perform semantic searches, where data is searched based on its meaning rather than the data itself.

This search method utilizes Machine Learning models to effectively search unstructured data, including text, audio, video, and images. It allows finding items that are similar or related to the search item. It is used for several use cases, like recommendation systems, chatbots, or search engines.

When dealing with text data, vector search makes finding words or phrases of similar meaning possible, even if the exact query words are not in the searched sentences.

Vector search is based on the concept of embedding.

Embeddings

Vector Search employs sophisticated Machine Learning models, known as encoders, to produce vector embeddings that provide a numerical representation of unstructured input data.

Vector embeddings transform unstructured data, which is typically incomprehensible to computers, into a numerical format that the machine can easily interpret.

Atlas Vector Search Data Workflow

Embeddings are high-dimensional vectors that are essentially arrays of numerical values. These vectors possess the ability to encapsulate the contextual and semantic information of the data, enabling us to perform meaningful comparisons and computations.

For instance, text embedding models (encoders) can learn the relationship between the words in a phrase, generating embeddings that capture the semantic and contextual information of the sentences.

OpenAI-Introducing Text Embeddings

In the above image, the phrase “bovine buddies say” has been encoded in an array of floating point numbers ([-0.005, 0.012, -0.008, …, -0.010]).

The dimensionality of the vector depends on the embedding model and can be high (up to thousands of elements).

OpenAI-Introducing Text Embeddings

Text embedding models assign similar numerical representations to phrases that have similar meanings.

Representing items as vectors in multi-dimensional space, it is possible to determine if two or more sentences have similar meanings by their distance.

Embeddings are not limited to text. You can even create an embedding of an image and compare it with a text embedding to verify if the sentence accurately describes the image.

Atlas Vector Search

Atlas Vector Search is the new MongoDB Atlas feature that extends the MongoDB search capabilities to the next level.

MongoDB Atlas Vector Search provides:

  • a vector store to persist embedding vectors generated by external ML models of your choice (OpenAI, Hugging Face, and more);
  • a vector store index for indexing the stored embedding vectors;
  • a search operation that implements an Approximate Nearest Neighbor (ANN) algorithm to perform semantic searches on the stored vectors.

With MongoDB Atlas Search, users can expand their information search capabilities beyond basic keyword matching. This innovative tool enables context-aware semantic search, allowing for inferring meaning from the user’s search term.

Atlas Vector Search in action

Now let’s try the MongoDB Atlas Vector Search new feature.

We’ll execute the following steps to complete this tutorial:

  1. Create a free MongoDB ATLAS cluster.
  2. Create MongoDB collections for proverbs and queries.
  3. Generate a Hugging Face API token.
  4. Import Hugging Face API token into Atlas
  5. Create Atlas Database Triggers and functions to invoke HF APIs.
  6. Create the Vector Search Index.
  7. Insert the proverbs dataset into MongoDB.
  8. Run the semantic queries.

We are going to use the Atlas UI only for performing the tasks of this tutorial.

1. Create a free MongoDB ATLAS cluster

The first step is to deploy our MongoDB Atlas free cluster (M0 cluster).

For this tutorial, feel free to use any already available Atlas cluster instead of creating a new one.

2. Create MongoDB collections for proverbs and queries

We’ll use two collections belonging to the same database in this tutorial:

  • vector_search.proverbs for storing proverbs and their embeddings
  • vector_search.queries for storing queries and answers.

Database and collection will be created from the Atlas UI.

From your database deployment, click the Browse Collections button:

MongoDB Atlas UI

Then click + Create Database on the Collections tab, insert the database name (vector_search), first collection name (proverbs), and click the Create button.

Create Database

Select or hover over the database vector_search to create the second collection and click the plus sign + icon.

Collections Tab

Then create the queries collection inside the database vector_search:

Create Collection

You now have your collections ready.

3. Generate a Hugging Face API token

We will use the free public Hugging Face Inference API to obtain the vector embeddings for our proverbs.

We must create a read access token on the Hugging Face site before invoking the text embedding API.

Go to the Hugging Face website and Log In or Sign Up.

https://huggingface.co

After the login, go to the upper right corner, click the Profile icon, and select Settings. Then on the left side of the Profile Settings page, click on Access Tokens and press the New Token button.

Access Tokens Page

Give a name to your token, select the read Role, and click Generate a token.

Create a new access token.

Copy and save the generated token in a safe position.

API Access Token

4. Import Hugging Face API token into Atlas

We have to import the Hugging Face token previously generated into Atlas before invoking the HF APIs.

Go to the App Services page on the Atlas UI:

Atlas App Services

Click on the Triggers application (the leftmost box), select Values in the menu on the left, then click the Create New Value button.

App Services Values

The first thing to create is a secret. Give the name HF_secret to your value, choose Secret as type, and paste the Hugging Face token into the Add Content field. Then click Save.

New Secret

It is then necessary to create a new environment variable to use in our functions, so click again on Create New Value on the right upper corner button.

Atlas Values Page

Create a value named HF_value of type Value and Link to Secret HF_secret, then press Save. Do follow precisely what has been done below.

New Value

5. Create Atlas Database Triggers and functions to invoke HF APIs

From the Atlas UI, we can define database triggers on our proverbs and queries collections to invoke the Hugging Face APIs each time a new document is inserted in those collections.

To create a Database Trigger, navigate to your Database Deployment and click Triggers in the left navigation menu.

Clicking on the Add Trigger button, you can configure your new trigger.

Atlas Triggers Page

HF_Create_Embeddings trigger

We build the first trigger on our vector_search.proverbs collection, naming it HF_Create_Embeddings. Configure the trigger as shown below:

Create Trigger

Select Function as the event type in the function section, and paste the Javascript function code reported on the following code block.

Trigger’s Function

The code to be pasted on the above form is the following:

exports = async function(changeEvent) {
    // Get the full document from the change event.
    const doc = changeEvent.fullDocument;

    // Define the Hugging Face API url and key.
    const url = 'https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2';
    // Use the name you gave the value of your API key in the “Values” utility inside of App Services
    const hf_read_token = context.values.get("HF_value");

    try {
        console.log(`Processing document with id: ${doc._id}`);

        // Call Hugging Face API to get the embeddings.
        let response = await context.http.post({
            url: url,
             headers: {
                'Authorization': [`Bearer ${hf_read_token}`],
                'Content-Type': ['application/json']
            },
            body: JSON.stringify({
                // The field inside your document that contains the data to embed, here it is the “proverb” field from the sample proverbs data.
                inputs: [doc.proverb]
            })
        });

        // Parse the JSON response
        
        let responseData = EJSON.parse(response.body.text());

        // Check the response status.
        if(response.statusCode === 200) {
            console.log("Successfully received embedding.");

            const embedding = responseData[0];

            // Get the cluster in MongoDB Atlas.
            const mongodb = context.services.get('Cluster0');
            const db = mongodb.db('vector_search'); // Replace with your database name.
            const collection = db.collection('proverbs'); // Replace with your collection name.

            // Update the document in MongoDB.
            const result = await collection.updateOne(
                { _id: doc._id },
                // The name of the new field you’d like to contain your embeddings.
                { $set: { proverb_embedding: embedding }}
            );

            if(result.modifiedCount === 1) {
                console.log("Successfully updated the document.");
            } else {
                console.log("Failed to update the document.");
            }
        } else {
            console.log(`Failed to receive embedding. Status code: ${response.statusCode}`);
        }

    } catch(err) {
        console.error(err);
    }
};

The trigger HF_Create_Embeddings will invoke the Hugging Face all-MiniLM-L6-v2 model API to get the vector embedding for each proverb inserted into the proverbs collection.

Semantic_Query trigger

The second trigger will be created on the queries collection. From the trigger’s function, we’ll invoke the Hugging Face embedding model to get the embedding of the user query, and then we’ll execute the vector search through the MongoDB aggregate command.

The result of the vector search we’ll be saved on the queries collection itself.

To create the second trigger, follow the same process we used for the first one, but make sure to adjust the parameters as outlined below (all other parameter values remain the same):

Name Semantic_Query

Collection Name queries

For the function’s code, paste the following block:

exports = async function(changeEvent) {
    // Get the full document from the change event.
    const doc = changeEvent.fullDocument;

    // Define the Hugging Face API url and key.
    const url = 'https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2';
    // Use the name you gave the value of your API key in the “Values” utility inside of App Services
    const hf_read_token = context.values.get("HF_value");

    try {
        console.log(`Processing document with id: ${doc._id}`);

        // Call Hugging Face API to get the embeddings of the query.
        let response = await context.http.post({
            url: url,
             headers: {
                'Authorization': [`Bearer ${hf_read_token}`],
                'Content-Type': ['application/json']
            },
            body: JSON.stringify({
                // The field inside your document that contains the data to embed, here it is the “query” field from the "queries" collection.
                inputs: [doc.query]
            })
        });

        // Parse the JSON response
        
        let responseData = EJSON.parse(response.body.text());

        // Check the response status.
        if(response.statusCode === 200) {
            console.log("Successfully received embedding.");

            const embedding = responseData[0];

            // Get the cluster in MongoDB Atlas.
            const mongodb = context.services.get('Cluster0');
            const db = mongodb.db('vector_search'); // Replace with your database name.
            const proverbs_collection = db.collection('proverbs'); // Replace with your collection name.
            const queries_collection = db.collection('queries'); // Replace with your collection name.

            
            // Query for similar documents.
            const documents = await proverbs_collection.aggregate([
            {
             "$search": {
                   "index": "vector_search_index",
                   "knnBeta": {
                       "vector": embedding,
                       "path": "proverb_embedding",
                       "k": 2
                       }
                   }
            },
            {
             "$project":{
                  "_id":0,
                  "proverb":1
                 }
            }
            ]).toArray();
            
           // Update the document in MongoDB.
           const result = await queries_collection.updateOne(
                  { _id: doc._id },
             // The "answer" field will contain the query result.
                { $set: { query_embedding: embedding , answer: documents  }}
              );

        } else {
            console.log(`Failed to receive embedding. Status code: ${response.statusCode}`);
        }

    } catch(err) {
        console.error(err);
    }
};

6. Create the Vector Search Index

We must create a vector search index on the proverbs collection to enable the vector searches. The proverbs collection will contain the embedding of our proverb sentences (proverb_embedding field) that we’ll be searched to respond to our queries.

To create the index, go to Atlas Search: from the Database Deployments page, click on Search on the left menu, then select your cluster in the Select data source drop-down menu, and press the Go to Atlas Search button.

Atlas Search

Click on Create Search Index button to configure the new search index:

Atlas Search Page

On the following page, select the JSON Editor box and press the Next button:

Create a Search Index

Select the vector_search database and the proverbs collection in the Database and Collection area, and name the index vector_search_index in the Index Name field.

JSON Editor

Paste the following JSON document into the text area, and click the Next button.

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "proverb_embedding": {
        "dimensions": 384,
        "similarity": "dotProduct",
        "type": "knnVector"
      }
    }
  }
}

Clicking the Create Search Index, you start the index creation.

Create Atlas Search Index

The new index will be available in the ACTIVE state shortly.

Index created

7. Insert the proverbs dataset into MongoDB

We will insert some English proverbs in the proverbs collection to populate our embedding store. We’ll add one proverb at a time from the Atlas UI.

The first proverb we are inserting says:

A jack of all trades is master of none

To insert a proverb:

  • navigate to Browse Collections from the Database Deployments page;
  • select the proverbs collection under the vector_search database
  • add a single-field document having “proverb” as field name and the proverb sentence as field value;
  • then press Insert.
Insert Document

Magically a new field called proverb_embedding will be added to the document:

Proverbs collection

The proverb_embedding field contains the embedding vector (an array with 384 floating point elements) generated by the Hugging Face text embedding model API invoked by the HF_Create_Embeddings trigger.

You can insert any English proverbs of your choice into the proverbs collection. In our test, we inserted the following ten proverbs, randomly picked from the web:

A jack of all trades is master of none.

All that glitters is not gold.

An apple a day keeps the doctor away.

Better late than never.

Curiosity killed the cat.

If you play with fire, you’ll get burned.

Justice delayed is justice denied.

Night brings counsel.

Rome wasn’t built in a day.

The grass is greener on the other side of fence.

We have ten documents in our proverbs collection.

Proverbs documents

8. Run the semantic queries

We’ll insert a single-field document in the queries collection to execute our search. The field name will be “query”, and the value will be the text of our search:

{ “query”: “Things that look good outwardly may not be as valuable or good.”}

As soon as a new document is inserted into the queries collection, the Semantic_Query trigger:

  • invokes the Hugging Face API to get the embedding of the query, passing the proverb sentence;
  • store the received embedding vector into the document itself (query_embedding field);
  • executes the vector search on the vector_search_index, through the MongoDB aggregate command;
  • save the search results into the document itself (answer field).

To test the query, go to the Collections tab and insert our query on the queries collection:

Query document

Here is the answer:

Query Answer

The two proverbs with the most similar meaning to our query are:

“All that glitters is not gold.” and “A jack of all trades is master of none”.

The answer looks fine! You can experiment with your dataset and your queries. Additionally, you could test out different embedding models, such as the OpenAI text embedding API, to assess the accuracy of the responses.

References

If you wish to expand your knowledge of MongoDB, look at my articles on How MongoDB Works.

If you appreciate the stories I write and would like to show your support, you can become a Medium member. For just $5 a month, you’ll have unlimited access to all the stories on Medium. By using my referral link to sign up, I’ll receive a small commission. Thank you for considering!

Mongodb
Machine Learning
Semantic Search
JavaScript
Hugging Face
Recommended from ReadMedium