Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2016

Abstract

learned by predicting the context from a single word.</p><h2 id="3d76">Doc2Vec</h2><figure id="d29f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*0d7lR22v3IHndjJdAkiegg.png"><figcaption>Distributed Memory Model of Doc2Vec Image from the paper Distributed Representations of Sentences and Documents.</figcaption></figure><p id="dd10">We’ll see now how to learn embeddings of paragraphs but the same approach can be used to learn embeddings of entire documents as well.</p><p id="b3ba">In Doc2Vec, every paragraph in the training set is mapped to a unique vector, represented by a column in matrix D, and every word is also mapped to a unique vector, represented by a column in matrix W. The paragraph vector and word vectors are averaged or concatenated to predict the next word in a context.</p><blockquote id="e929"><p>The paragraph vector is shared across all contexts generated from the same paragraph but not across paragraphs. The word vector matrix W, however, is shared across paragraphs.</p></blockquote><p id="755e">The paragraph token can be thought of as another word. It acts as a memory that remembers what is missing from the current context. For this reason, this model is called Distributed Memory (DM) Doc2Vec. There is also a second architecture called Distributed Bag-of-Words (DBOW) Doc2Vec, which is inspired by Skip-gram Word2Vec.</p><p id="81fa">The paragraph vectors and word vectors are trained using stochastic gradient descent.</p><blockquote id="a8fa"><p>At prediction time, one needs to obtain the paragraph vector for a new paragraph by gradient descent, keeping fixed the parameters for the rest of the model.</p></blockquote><p id="92a0">Thank you for reading! If you are interested in learning more about NLP, remember to follow NLPlanet on <a href="https://medium.com/nlplanet">Medium</a>, <a href="https://www.linkedin.com/company/nlplanet">LinkedIn</a>, and <a href="https://twitter.com/nlplanet_">Twitter</a>!</p><p id="7f4a"><b>Two minutes NLP related posts

Options

</b></p><div id="55a0" class="link-block"> <a href="https://readmedium.com/two-minutes-nlp-tips-for-recommender-systems-with-nlp-a3f03555d65a"> <div> <div> <h2>Two minutes NLP — Tips for Recommender Systems with NLP</h2> <div><h3>Content-based and User-based Filtering, Collaborative Filtering, and Hybrid Approaches</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*xUOLCqGWldPFFrAs8hFXQQ.png)"></div> </div> </div> </a> </div><div id="4ba7" class="link-block"> <a href="https://readmedium.com/two-minutes-nlp-11-word-embeddings-models-you-should-know-a0581763b9a9"> <div> <div> <h2>Two minutes NLP — 11 word embeddings models you should know</h2> <div><h3>TF-IDF, Word2Vec, GloVe, FastText, ELMO, CoVe, BERT, RoBERTa, etc.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*DpgA3bbFBmQ9I_amVDFvCQ.png)"></div> </div> </div> </a> </div><div id="c445" class="link-block"> <a href="https://readmedium.com/two-minutes-nlp-using-word2vec-to-learn-node-embeddings-on-graphs-ebece432a4b6"> <div> <div> <h2>Two minutes NLP — Using Word2Vec to learn node embeddings on graphs</h2> <div><h3>Node2Vec, Word2Vec, graphs, and random walks</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*T0uCEF-OkC6bNreF)"></div> </div> </div> </a> </div></article></body>

Two minutes NLP — Doc2Vec in a nutshell

CBOW and Skip-gram Word2Vec, DM and DBOW Doc2Vec

Doc2Vec is an unsupervised algorithm that learns embeddings from variable-length pieces of texts, such as sentences, paragraphs, and documents. It’s originally presented in the paper Distributed Representations of Sentences and Documents.

Let’s review Word2Vec first, as it provides the inspiration for the Doc2Vec algorithm.

Word2Vec

Word2Vec learns word vectors by predicting a word in a sentence using the other words in the context. In this framework, every word is mapped to a unique vector, represented by a column in a matrix W. The concatenation or sum of the vectors is then used as features for the prediction of the next word in a sentence.

The word vectors are trained using stochastic gradient descent. After the training converges, words with similar meanings are mapped to a similar position in the vector space.

The presented architecture is called Continuous Bag-of-Word (CBOW) Word2Vec. There is also a second architecture called Skip-gram Word2Vec, where word vectors are learned by predicting the context from a single word.

Doc2Vec

We’ll see now how to learn embeddings of paragraphs but the same approach can be used to learn embeddings of entire documents as well.

In Doc2Vec, every paragraph in the training set is mapped to a unique vector, represented by a column in matrix D, and every word is also mapped to a unique vector, represented by a column in matrix W. The paragraph vector and word vectors are averaged or concatenated to predict the next word in a context.

The paragraph vector is shared across all contexts generated from the same paragraph but not across paragraphs. The word vector matrix W, however, is shared across paragraphs.

The paragraph token can be thought of as another word. It acts as a memory that remembers what is missing from the current context. For this reason, this model is called Distributed Memory (DM) Doc2Vec. There is also a second architecture called Distributed Bag-of-Words (DBOW) Doc2Vec, which is inspired by Skip-gram Word2Vec.

The paragraph vectors and word vectors are trained using stochastic gradient descent.

At prediction time, one needs to obtain the paragraph vector for a new paragraph by gradient descent, keeping fixed the parameters for the rest of the model.

Thank you for reading! If you are interested in learning more about NLP, remember to follow NLPlanet on Medium, LinkedIn, and Twitter!

Two minutes NLP related posts

Two minutes NLP — Tips for Recommender Systems with NLP

Content-based and User-based Filtering, Collaborative Filtering, and Hybrid Approaches

medium.com

Two minutes NLP — 11 word embeddings models you should know

TF-IDF, Word2Vec, GloVe, FastText, ELMO, CoVe, BERT, RoBERTa, etc.

medium.com

Two minutes NLP — Using Word2Vec to learn node embeddings on graphs

Node2Vec, Word2Vec, graphs, and random walks

medium.com