avatarFanghua (Joshua) Yu

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

6917

Abstract

="5ce7">There are several papers on this subject, and according to the study published by a team at Standford University, in-context learning is the process that LM conducts a <b>Bayesian inference</b> of a <b>latent concept</b> conditioned on the <b>prompt</b>, and this capability comes from structure (long-term coherence) in the pretraining data.</p><div id="9406" class="link-block"> <a href="http://ai.stanford.edu/blog/understanding-incontext/"> <div> <div> <h2>How does in-context learning work? A framework for understanding the differences from traditional…</h2> <div><h3>In this post, we provide a Bayesian inference framework for in-context learning in large language models like GPT-3 and…</h3></div> <div><p>ai.stanford.edu</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*j-TRMOMHHNLExI_3)"></div> </div> </div> </a> </div><p id="6bab"><b>What’s a concept?</b> We can think of a concept as a <i>latent variable</i> that contains various document-level statistics. For example, a “news topics” concept describes a distribution of words (news and their topics), a format (the way that news articles are written), a relation between news and topics, and other semantic and syntactic relationships between words. In general, concepts may be a combination of many latent variables that specify different aspects of the semantics and syntax of a document, but we simplify here by grouping them all into one concept variable.</p><p id="4d08">The prompt gives LM something to concentrate on and infer concepts (by locating previously learned capabilities) from samples contained in the prompt, so that it can make correct predictions on the test example.</p><p id="4ba9"><i>[1] Wei etc. Emergent Abilities of Large Language ModelsUsing Knowledge Graph to Prepare Context</i></p><h1 id="5753">Experiments</h1><p id="1669">In-context learning can improve the accuracy of GPT-3 Q&A by providing the model with contextual information that can help it generate more accurate and relevant responses.</p><p id="0a8e">After understanding the power of in-context learning, I’m going to use a new strategy for a knowlede graph based Q&A as described below:</p><p id="34e3"><b>#1 Make initial assessment on the question and decide relevant subject(s).</b> For example, for question:</p><p id="382a"><i>How many movies Tom Hanks has acted in?</i></p><p id="3722">It’s not difficult to figure out it is about actor <i>Tom Hanks</i>.</p><p id="d365"><b>#2 Retrieve data from the knowledge graph for the subjects, and prepare context of the prompt.</b> For the example question above, a query is executed against KG to pull out sufficient facts about Tom Hanks.</p><p id="f7e8"><b>#3 Prepare a prompt using both context and question to GPT-3 to get answer.</b></p><p id="a3b5">To demonstrate the solution, I will use the <b>Movies Graph</b> created on <a href="https://neo4j.com/cloud/platform/aura-graph-database/">Neo4j AuraDB</a>. If you didn’t follow my previous posts, here is the one which explained the complete process.</p><div id="03c6" class="link-block"> <a href="https://readmedium.com/adding-q-a-features-to-your-knowledge-graph-in-3-simple-steps-3ffe6f5caef4"> <div> <div> <h2>Adding Q&A Features to Your Knowledge Graph in 3 Simple Steps</h2> <div><h3>Combine OpenAI GPT-3 with Neo4j Graph Database</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*8jmwI7lwgatEvCk5tdnuWw.jpeg)"></div> </div> </div> </a> </div><p id="037c">In the above project, I showed samples of natural language questions and corresponding Cypher statements for GPT-3 to provide correct Cypher for new questions. It worked in some cases, but is still quite limited without much larger samples and fine tuning.</p><p id="9065">With the new strategy introduced above, in step #2, sufficient facts about Tom Hanks are retrieved from the Movies Graph using the Cypher below:</p><div id="a35f"><pre>:param person<span class="hljs-operator">=</span><span class="hljs-operator">></span><span class="hljs-string">'Tom Hanks'</span>

<span class="hljs-operator">/</span><span class="hljs-operator">/</span> <span class="hljs-number">1.</span> Person <span class="hljs-keyword">MATCH</span> (p:Person{name:$person}) <span class="hljs-keyword">WITH</span> p, p.name <span class="hljs-operator">+</span> <span class="hljs-string">' born in '</span> <span class="hljs-operator">+</span> p.born <span class="hljs-operator">+</span> <span class="hljs-string">'.\n'</span> <span class="hljs-keyword">AS</span> text <span class="hljs-operator">/</span><span class="hljs-operator">/</span> <span class="hljs-number">2.</span> Person <span class="hljs-keyword">and</span> movies <span class="hljs-keyword">MATCH</span> (p) <span class="hljs-operator">-</span>[r1]<span class="hljs-operator">-</span><span class="hljs-operator">></span> (m:Movie) <span class="hljs-keyword">WITH</span> p, r1, text, m <span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> m.released <span class="hljs-keyword">ASC</span> <span class="hljs-keyword">WITH</span> p, text, <span class="hljs-keyword">collect</span>(m) <span class="hljs-keyword">AS</span> movies, <span class="hljs-keyword">collect</span>(p.name <span class="hljs-operator">+</span> <span class="hljs-string">' '</span> <span class="hljs-operator">+</span> type(r1) <span class="hljs-operator">+</span> <span class="hljs-string">' movie: '</span> <span class="hljs-operator">+</span> m.title <span class="hljs-operator">+</span> <span class="hljs-string">'.\n'</span>) <span class="hljs-keyword">AS</span> text02 <span class="hljs-keyword">WITH</span> p, movies, text <span class="hljs-operator">+</span> reduce(tt <span class="hljs-operator">=</span> <span class="hljs-string">''</span>, t <span class="hljs-keyword">IN</span> text02 <span class="hljs-operator">|</span> tt <span class="hljs-operator">+</span> t) <span class="hljs-keyword">AS</span> text2
<span class="hljs-operator">/</span><span class="hljs-operator">/</span> <span class="hljs-number">3.</span> Other persons <span class="hljs-keyword">and</span> movies <span class="hljs-keyword">MATCH</span> (p) <span class="hljs-operator">-</span>[r01]<span class="hljs-operator">-</span><span class="hljs-operator">></span> (m2) <span class="hljs-operator"><</span><span class="hljs-operator">-</span>[r2]<span class="hljs-operator">-</span> (p2:Person) <span class="hljs-keyword">WHERE</span> m2 <span class="hljs-keywor

Options

d">IN</span> movies <span class="hljs-keyword">WITH</span> p, text2, movies, <span class="hljs-keyword">collect</span>(p2.name <span class="hljs-operator">+</span> <span class="hljs-string">' '</span> <span class="hljs-operator">+</span> type(r2) <span class="hljs-operator">+</span> <span class="hljs-string">' movie: '</span> <span class="hljs-operator">+</span> m2.title <span class="hljs-operator">+</span> <span class="hljs-string">'.\n'</span>) <span class="hljs-keyword">AS</span> text03 <span class="hljs-keyword">WITH</span> p, movies, text2 <span class="hljs-operator">+</span> reduce(tt <span class="hljs-operator">=</span> <span class="hljs-string">''</span>, t <span class="hljs-keyword">IN</span> text03 <span class="hljs-operator">|</span> tt <span class="hljs-operator">+</span> t) <span class="hljs-keyword">AS</span> text3
<span class="hljs-keyword">RETURN</span> text3 <span class="hljs-operator">+</span> reduce(tt <span class="hljs-operator">=</span> <span class="hljs-string">''</span>, t <span class="hljs-keyword">IN</span> movies <span class="hljs-operator">|</span> tt <span class="hljs-operator">+</span> <span class="hljs-string">'Movie '</span> <span class="hljs-operator">+</span> t.title <span class="hljs-operator">+</span> <span class="hljs-string">' was released in '</span> <span class="hljs-operator">+</span> t.released <span class="hljs-operator">+</span> <span class="hljs-string">'.\n'</span>) <span class="hljs-keyword">AS</span> context; </pre></div><p id="e2ac">It returns facts on Tom Hanks, movies he acted in, other actors in the same movies, and details of those movies in the text shown below (to save space, only first 3 lines of facts of each category was listed:</p><div id="a4e0"><pre>Tom Hanks born in 1956. Tom Hanks ACTED_IN movie: Joe Versus the Volcano. Tom Hanks ACTED_IN movie: A League of Their Own. Tom Hanks ACTED_IN movie: Sleepless in Seattle. ... ... ... ... Ron Howard DIRECTED movie: Apollo 13. Ed Harris ACTED_IN movie: Apollo 13. Gary Sinise ACTED_IN movie: Apollo 13. ... ... ... ... Movie Joe Versus the Volcano was released in 1990. Movie A League of Their Own was released in 1992. Movie Sleepless in Seattle was released in 1993. ... ... ... ...</pre></div><p id="cd70">Stored as a Labelled Property Graph(LPG) in Neo4j, it’s <b>much more efficient</b> to find Tom Hanks (as a Node of Person), and retrieve relevant knowledge (other nodes of Person, and Movie) through various relationships connected to it. Some of the reasons are:</p><ul><li>The logical model <b>is</b> the physical model in Neo4j</li><li>All label (entity type), relationship and property can be named in natural language</li><li>Labelled property graphs are highly flexible and can be used to model almost anything in the real world, and support a wide range of data structures, including hierarchical, network, and relational data.</li><li>Efficient querying: Labelled property graphs are optimized for querying and can perform complex graph traversals and filtering operations quickly and efficiently.</li><li>Expressiveness: Labelled property graphs are highly expressive and can capture complex relationships and patterns in data that might be difficult to represent in other data models, while keep the model friendly for human-beings.</li></ul><figure id="4642"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*fftsp_iZUZ3CkEmi.png"><figcaption>The Labelled Property Graph Model of Movies</figcaption></figure><p id="1b07">Using the solution, I have tested some Q&As and here are the results:</p><div id="63f6"><pre>Q: When was Tom Hanks born? A: Tom Hanks was born in 1956.

Q: How many movies Tom Hanks has acted in? A: Tom Hanks has acted in 13 movies.

Q: Did Tom Hanks act in The Matrix? A: No, Tom Hanks did not act in The Matrix.

Q: Who else acted in the same movies that Tom Hanks acted? A: Kevin Bacon, Bill Paxton, Parker Posey, Greg Kinnear, Meg Ryan, Steve Zahn, Madonna, Rosie O'Donnell, Geena Davis, Lori Petty, Nathan Lane, Liv Tyler, Charlize Theron, Julia Roberts, Philip Seymour Hoffman, and Rita Wilson.

Q: Who acted in the same movies that Tom Hanks acted for most times? A: Meg Ryan.

Q: what is the weather today? A: I don't know.

Q: Which director did Tom Hanks work with most of the times? A: Tom Hanks worked with Robert Zemeckis the most, having collaborated on Cast Away, The Polar Express, and That Thing You Do.

Q: What are movies Tom Hanks acted in before 2000, tell me all you have? | movie | year | | Joe Versus the Volcano | 1990 | | A League of Their Own | 1992 | | Sleepless in Seattle | 1993 | | Apollo 13 | 1995 | | That Thing You Do | 1996 | | You've Got Mail | 1998 | | The Green Mile | 1999 |</pre></div><p id="3933">I have to say, the answers given by GPT-3 were impressive! They were simply extracted from the context, and it is no longer needed to do Cypher translation. I even added an irrelevant question intendedly, ie. <i>what is the weather today?</i>, and it responded with <i>I don’t know</i>, which is exactly what I expected.</p><h1 id="c39e">Summary</h1><p id="66b8">GPT-3’s in-context learning capabilities can help overcome the challenges raised aginst it for being a reliable source of truth. In-context learning works by:</p><ul><li>Analyzing the input text (prompt), including any contextual information that may be relevant, such as the up-to-date factes or user’s history or preferences.</li><li>Extracting key features of the input text, including important words, phrases, and concepts.</li><li>Generating a response that is tailored to the specific context of the question or prompt, based on its analysis of the input text and its understanding of natural language.</li></ul><p id="e920">By incorporating contextual information provided by querying relevant knowledge from a KG, GPT-3 can generate more accurate, personalized, and relevant responses.</p><p id="6fc8">PS. If you are still confused about how to <b>#1 Make initial assessment on the question and decide relevant subject(s).</b>, the answer is GPT-3 again! Check the <b>Entity & Relationship Extraction</b> section of my another post:</p><div id="ca61" class="link-block"> <a href="https://readmedium.com/building-an-academic-knowledge-graph-with-openai-graph-database-part-2-7986d3e5277b"> <div> <div> <h2>Building An Academic Knowledge Graph with OpenAI & Graph Database — Part 2</h2> <div><h3>Use GPT-3 in the End-to-End Pipeline of Knowledge Graph</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*IndQBezd6Cl3qo7YHy-ORA.jpeg)"></div> </div> </div> </a> </div><p id="02f9">This is the human version of In-context Learning!</p></article></body>

Improving GPT-3 Q&A Experiences with In-context Learning over Knowledge Graph

When In-Context Learning Meets Knowledge Graph

Melbourne is famous for its street art culture, not only in the town, but also in outside suburbs. Photo by the author.

Like many other people who are fascinated by the capabilities of ChatGPT, I keep asking it all kinds of things coming to my mind almost at any time. One day I was testing how up-to-date its knowledge is:

Q: When was Neo4j 5.5 released?

A: Neo4j 5.5 was released on April 28, 2021.

For the work I am doing on the daily basis, I know this is not true, as Neo4j 5.5 was just released a couple of weeks ago. However, the anwser given looks so certain, and it may make you believe that it is a fact, but it is not.

A lot of people have found out this, and concerns and challenges arise on GPT-3’s fit-for-purpose of a reliable source of truth. Apparently, it is not.

What’s behind ChatGPT is the GPT-3 Large Language Model(LLM) based on a transformer neural network architecture, which allows it to process and generate natural language text at a high level of accuracy and coherence. The model has been trained on a massive corpus of diverse text, including books, articles, and web pages, which has enabled it to develop a strong understanding of natural language and its nuances. ChatGPT’s knowledge and understanding of the world is based on the information it was trained on up to Sep. 2021, however it may still generate incorrect responses to questions that involve current events or recent developments.

Generated Answer vs. Fact-based Answer

While GPT-3 is a highly advanced language model, there are several limitations to consider when it is applied to use cases which require up-to-date, reliable and controllable answers. Here are some of them:

  1. Limited real-world knowledge: While GPT-3 has been trained on a massive amount of text data, it still lacks real-world knowledge and experiences that humans possess. This means that GPT-3 may generate responses that are factually incorrect or inappropriate in certain contexts.
  2. Biases in training data: Since GPT-3 has been trained on large datasets of text written by humans, it may exhibit biases and perpetuate stereotypes that are present in the training data.
  3. Limited ability to understand context: While GPT-3 has the ability to generate text that is grammatically correct and semantically meaningful, it may struggle to understand the context and nuances of certain prompts or questions.
  4. Difficulty with rare or complex words: GPT-3 may struggle with generating text that includes rare or complex words that are not commonly used in everyday language. This is a big gap for domain specific Q&As, e.g. science and research.
  5. High computational cost: GPT-3 is a highly complex and computationally intensive model, which means that it requires significant computing resources to run effectively. This may limit its accessibility to smaller organizations or individuals who don’t have access to high-performance computing resources.
  6. Lack of explainability: GPT-3 is a black box model, which means that it’s difficult to understand how it generates its responses and why it makes certain decisions. This lack of transparency can make it difficult to diagnose and correct errors or biases in the model.

These limitations highlight the importance of using GPT-3 as a tool in conjunction with human expertise and judgment, rather than relying on it as a sole source of information or decision-making. On the other side, by providing appropriate context, GPT-3 can effectively improve over those limits.

What Is In-Context Learning?

In-context learning for GPT-3 refers to the use of contextual information to improve the accuracy and relevance of the responses generated by the language model.

Contextual information can include the specific context of the question or prompt. By incorporating this contextual information into the response generation process, GPT-3 can generate more accurate, personalized, and relevant responses that better meet the needs of individual users.

Let’s have a look at how in-context learning may work using the same question given at the beginning of this article.

Answer the question as truthfully as possible using the provided context, and if the answer is not relevant to the context below, say “I don’t know.”

Context:

Neo4j version 5.5 was released in Feb. 2023.

Q: When was Neo4j 5.5 released?

A: February 2023.

This time, before a question was asked, there is instruction & context provided, so that GPT-3 knows the answer is extracted from the context, i.e.:

Providing Instruction and Context in the prompt of a question.

If I asked GPT-3 anything that is not relevant to the context, it will simply answer I don’t know, rather than making something up based on its own knowledge.

The Mystery of In-context Learning

In fact, LLM sunch as GPT-3 is just a language model which is trained on internet-scale text data to predict the next token given the preceding text, i.e. the so-called text completion, or generative AI on text. While the training procedure is both simple and general, the GPT-3 paper[1] found that the large scale leads to a particularly interesting emergent behavior called in-context learning. On many benchmark NLP benchmarks, in-context learning is competitive with models trained with much more labeled data. It perfomed extremely well on LAMBADA (commonsense sentence completion) and TriviaQA (question answering) tasks.

Unlike conventional machine learning methodology, in-context learning has no optimization of any parameters required. However, the mystery is that the LM itself wasn’t trained to learn from examples at the first place. Because of this, there’s seemingly a mismatch between pretraining (what it’s trained to do, which is next token prediction) and in-context learning (what we’re asking it to do).

So how exactly did this capability come from?

There are several papers on this subject, and according to the study published by a team at Standford University, in-context learning is the process that LM conducts a Bayesian inference of a latent concept conditioned on the prompt, and this capability comes from structure (long-term coherence) in the pretraining data.

What’s a concept? We can think of a concept as a latent variable that contains various document-level statistics. For example, a “news topics” concept describes a distribution of words (news and their topics), a format (the way that news articles are written), a relation between news and topics, and other semantic and syntactic relationships between words. In general, concepts may be a combination of many latent variables that specify different aspects of the semantics and syntax of a document, but we simplify here by grouping them all into one concept variable.

The prompt gives LM something to concentrate on and infer concepts (by locating previously learned capabilities) from samples contained in the prompt, so that it can make correct predictions on the test example.

[1] Wei etc. Emergent Abilities of Large Language ModelsUsing Knowledge Graph to Prepare Context

Experiments

In-context learning can improve the accuracy of GPT-3 Q&A by providing the model with contextual information that can help it generate more accurate and relevant responses.

After understanding the power of in-context learning, I’m going to use a new strategy for a knowlede graph based Q&A as described below:

#1 Make initial assessment on the question and decide relevant subject(s). For example, for question:

How many movies Tom Hanks has acted in?

It’s not difficult to figure out it is about actor Tom Hanks.

#2 Retrieve data from the knowledge graph for the subjects, and prepare context of the prompt. For the example question above, a query is executed against KG to pull out sufficient facts about Tom Hanks.

#3 Prepare a prompt using both context and question to GPT-3 to get answer.

To demonstrate the solution, I will use the Movies Graph created on Neo4j AuraDB. If you didn’t follow my previous posts, here is the one which explained the complete process.

In the above project, I showed samples of natural language questions and corresponding Cypher statements for GPT-3 to provide correct Cypher for new questions. It worked in some cases, but is still quite limited without much larger samples and fine tuning.

With the new strategy introduced above, in step #2, sufficient facts about Tom Hanks are retrieved from the Movies Graph using the Cypher below:

:param person=>'Tom Hanks'

// 1. Person
MATCH (p:Person{name:$person})
WITH p, p.name + ' born in ' + p.born + '.\n' AS text
// 2. Person and movies
MATCH (p) -[r1]-> (m:Movie)
WITH p, r1, text, m ORDER BY m.released ASC
WITH p, text, collect(m) AS movies, collect(p.name + ' ' + type(r1) + ' movie: ' + m.title + '.\n') AS text02
WITH p, movies, text + reduce(tt = '', t IN text02 | tt + t) AS text2  
// 3. Other persons and movies
MATCH (p) -[r01]-> (m2) <-[r2]- (p2:Person)
WHERE m2 IN movies
WITH p, text2, movies, collect(p2.name + ' ' + type(r2) + ' movie: ' + m2.title + '.\n') AS text03
WITH p, movies, text2 + reduce(tt = '', t IN text03 | tt + t) AS text3  
RETURN text3 + reduce(tt = '', t IN movies | tt + 'Movie ' + t.title + ' was released in ' + t.released + '.\n') AS context; 

It returns facts on Tom Hanks, movies he acted in, other actors in the same movies, and details of those movies in the text shown below (to save space, only first 3 lines of facts of each category was listed:

Tom Hanks born in 1956.
Tom Hanks ACTED_IN movie: Joe Versus the Volcano.
Tom Hanks ACTED_IN movie: A League of Their Own.
Tom Hanks ACTED_IN movie: Sleepless in Seattle.
... ... ... ...
Ron Howard DIRECTED movie: Apollo 13.
Ed Harris ACTED_IN movie: Apollo 13.
Gary Sinise ACTED_IN movie: Apollo 13.
... ... ... ...
Movie Joe Versus the Volcano was released in 1990.
Movie A League of Their Own was released in 1992.
Movie Sleepless in Seattle was released in 1993.
... ... ... ...

Stored as a Labelled Property Graph(LPG) in Neo4j, it’s much more efficient to find Tom Hanks (as a Node of Person), and retrieve relevant knowledge (other nodes of Person, and Movie) through various relationships connected to it. Some of the reasons are:

  • The logical model is the physical model in Neo4j
  • All label (entity type), relationship and property can be named in natural language
  • Labelled property graphs are highly flexible and can be used to model almost anything in the real world, and support a wide range of data structures, including hierarchical, network, and relational data.
  • Efficient querying: Labelled property graphs are optimized for querying and can perform complex graph traversals and filtering operations quickly and efficiently.
  • Expressiveness: Labelled property graphs are highly expressive and can capture complex relationships and patterns in data that might be difficult to represent in other data models, while keep the model friendly for human-beings.
The Labelled Property Graph Model of Movies

Using the solution, I have tested some Q&As and here are the results:

Q: When was Tom Hanks born?
A: Tom Hanks was born in 1956.

Q: How many movies Tom Hanks has acted in?
A: Tom Hanks has acted in 13 movies.

Q: Did Tom Hanks act in The Matrix?
A: No, Tom Hanks did not act in The Matrix.

Q: Who else acted in the same movies that Tom Hanks acted?
A: Kevin Bacon, Bill Paxton, Parker Posey, Greg Kinnear, Meg Ryan, Steve Zahn, Madonna, Rosie O'Donnell, Geena Davis, Lori Petty, Nathan Lane, Liv Tyler, Charlize Theron, Julia Roberts, Philip Seymour Hoffman, and Rita Wilson.

Q: Who acted in the same movies that Tom Hanks acted for most times?
A: Meg Ryan.

Q: what is the weather today?
A: I don't know.

Q: Which director did  Tom Hanks work with most of the times?
A: Tom Hanks worked with Robert Zemeckis the most, having collaborated on Cast Away, The Polar Express, and That Thing You Do.

Q: What are movies Tom Hanks acted in before 2000, tell me all you have?
| movie | year |
| Joe Versus the Volcano | 1990 |
| A League of Their Own | 1992 |
| Sleepless in Seattle | 1993 |
| Apollo 13 | 1995 |
| That Thing You Do | 1996 |
| You've Got Mail | 1998 |
| The Green Mile | 1999 |

I have to say, the answers given by GPT-3 were impressive! They were simply extracted from the context, and it is no longer needed to do Cypher translation. I even added an irrelevant question intendedly, ie. what is the weather today?, and it responded with I don’t know, which is exactly what I expected.

Summary

GPT-3’s in-context learning capabilities can help overcome the challenges raised aginst it for being a reliable source of truth. In-context learning works by:

  • Analyzing the input text (prompt), including any contextual information that may be relevant, such as the up-to-date factes or user’s history or preferences.
  • Extracting key features of the input text, including important words, phrases, and concepts.
  • Generating a response that is tailored to the specific context of the question or prompt, based on its analysis of the input text and its understanding of natural language.

By incorporating contextual information provided by querying relevant knowledge from a KG, GPT-3 can generate more accurate, personalized, and relevant responses.

PS. If you are still confused about how to #1 Make initial assessment on the question and decide relevant subject(s)., the answer is GPT-3 again! Check the Entity & Relationship Extraction section of my another post:

This is the human version of In-context Learning!

Gpt 3
ChatGPT
Knowledge
Neo4j
NLP
Recommended from ReadMedium