avatarFabio Chiusano

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

1956

Abstract

orpus. Image by the author.</figcaption></figure><p id="bb26">Similar to sentences, let’s try to extract sequences of nodes that share some context from graphs.</p><figure id="adcb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Ow9BkdG4tlYzMBfo5pm4qQ.png"><figcaption>A graph. Image by the author.</figcaption></figure><p id="293e">How can we get these sequences of nodes? Simple, using <a href="https://en.wikipedia.org/wiki/Random_walk">random walks</a>!</p><figure id="2844"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*2ly12E4ugckar1HTGknlmA.png"><figcaption>Graphical representation of a node, a random walk, and a corpus of random walks. Image by the author.</figcaption></figure><p id="e60d">We can perform many random walks from distinct starting nodes of the graph to obtain a corpus of random walks, i.e. sequences of related nodes. With this corpus, we can use the same models we use in NLP to learn embeddings, such as Word2Vec.</p><p id="fbf1"><a href="https://snap.stanford.edu/node2vec/">Node2Vec</a> is exactly this algorithm: it follows the intuition that random walks through a graph can be treated like sentences in a corpus. It’s part of a family of algorithms called <a href="https://arxiv.org/abs/2110.12344">walk-based graph embedding algorithms</a>, which learn node embeddings in two steps:</p><ol><li>Create a corpus of node sequences by performing random walks on the graph</li><li>Learn node embeddings on such corpus using machine learning models that learn on sequences.</li></ol><p id="20f2">Thank you for reading! If you are interested in learning more about NLP, remember to follow NLPlanet on <a href="https://medium.com/nlplanet">Medium</a>, <a href="https://www.linkedin.com/company/nlplanet">LinkedIn</a>, and <a href="https://twitter.com/nlplanet_">Twitter</a>!</p><p id="e0f6"><b>Two minutes NLP related posts</b></p><div id="770f" class="link-block"> <a href="https

Options

://readmedium.com/two-minutes-nlp-11-word-embeddings-models-you-should-know-a0581763b9a9"> <div> <div> <h2>Two minutes NLP — 11 word embeddings models you should know</h2> <div><h3>TF-IDF, Word2Vec, GloVe, FastText, ELMO, CoVe, BERT, RoBERTa, etc.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*DpgA3bbFBmQ9I_amVDFvCQ.png)"></div> </div> </div> </a> </div><div id="02e4" class="link-block"> <a href="https://readmedium.com/two-minutes-nlp-topic-modeling-and-semantic-search-with-top2vec-87855a973c8d"> <div> <div> <h2>Two minutes NLP — Topic Modeling and Semantic Search with Top2Vec</h2> <div><h3>Top2Vec, Doc2Vec, UMAP, HDBSCAN, and topic vectors</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*d69b4JqnW-hqiD-f)"></div> </div> </div> </a> </div><div id="58bd" class="link-block"> <a href="https://readmedium.com/two-minutes-nlp-33-important-nlp-tasks-explained-31e2caad2b1b"> <div> <div> <h2>Two minutes NLP — 33 important NLP tasks explained</h2> <div><h3>Information Retrieval, Knowledge Bases, Chatbots, Text Generation, Text-to-Data, Text Reasoning, etc.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*pR9nbCcPHwCZnSX5VHrYZA.png)"></div> </div> </div> </a> </div></article></body>

Two minutes NLP — Using Word2Vec to learn node embeddings on graphs

Node2Vec, Word2Vec, graphs, and random walks

Photo by Markus Winkler on Unsplash

Learning useful representations from graphs is useful for a variety of machine learning applications. Naively, nodes in a graph can be represented with discrete vectors using one-hot encoding, similar to how words are encoded with the bag-of-words approach in NLP. However, as machine learning models have been tuned to work better on fixed-dimension continuous features, it would be useful to embed nodes in low-dimensional spaces similar to how word embeddings are learned in NLP with algorithms like Word2Vec, GloVe, and BERT.

Example of two-dimension node embeddings obtained from a graph. Image from https://snap.stanford.edu/node2vec.

Can the word embedding algorithms from NLP be used to produce node embeddings of graphs as well?

Word embedding algorithms learn from a corpus of sentences, where each sentence is a sequence of consecutive words that share a context.

Graphical representation of a word, a sentence, and a corpus. Image by the author.

Similar to sentences, let’s try to extract sequences of nodes that share some context from graphs.

A graph. Image by the author.

How can we get these sequences of nodes? Simple, using random walks!

Graphical representation of a node, a random walk, and a corpus of random walks. Image by the author.

We can perform many random walks from distinct starting nodes of the graph to obtain a corpus of random walks, i.e. sequences of related nodes. With this corpus, we can use the same models we use in NLP to learn embeddings, such as Word2Vec.

Node2Vec is exactly this algorithm: it follows the intuition that random walks through a graph can be treated like sentences in a corpus. It’s part of a family of algorithms called walk-based graph embedding algorithms, which learn node embeddings in two steps:

  1. Create a corpus of node sequences by performing random walks on the graph
  2. Learn node embeddings on such corpus using machine learning models that learn on sequences.

Thank you for reading! If you are interested in learning more about NLP, remember to follow NLPlanet on Medium, LinkedIn, and Twitter!

Two minutes NLP related posts

NLP
Naturallanguageprocessing
Artificial Intelligence
Word2vec
Data Science
Recommended from ReadMedium