Free AI web copilot to create summaries, insights and extended knowledge, download it at here
4588
Abstract
on't know, don't try to make up an answer.
"""</span>
<span class="hljs-comment"># Prepend context if used</span>
<span class="hljs-keyword">if</span> context != <span class="hljs-string">""</span>:
question = <span class="hljs-string">"Use the following context to answer the users question:\n```\n"</span> + context + <span class="hljs-string">"\n```\n\n"</span> + question
response = openai.ChatCompletion.create(
engine=<span class="hljs-string">"gpt-35-turbo"</span>,
messages = [{<span class="hljs-string">"role"</span>:<span class="hljs-string">"system"</span>,<span class="hljs-string">"content"</span>:system},{<span class="hljs-string">"role"</span>:<span class="hljs-string">"user"</span>,<span class="hljs-string">"content"</span>:question}],
temperature=<span class="hljs-number">0.0</span>,
max_tokens=<span class="hljs-number">500</span>,
top_p=<span class="hljs-number">0.95</span>,
frequency_penalty=<span class="hljs-number">0</span>,
presence_penalty=<span class="hljs-number">0</span>,
stop=<span class="hljs-literal">None</span>)
<span class="hljs-keyword">return</span> response[<span class="hljs-string">'choices'</span>][<span class="hljs-number">0</span>][<span class="hljs-string">'message'</span>][<span class="hljs-string">'content'</span>]</pre></div><p id="212c">This first one, <code>ask</code> is simply a wrapper to calling OpenAI GPT 3.5. Turbo, including a System Prompt about looking through research papers. It also accepts a <code>context</code>variable which is included in the prompt as necessary.</p><div id="6164"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">extract_section</span>(<span class="hljs-params">documents, section_name, debug=<span class="hljs-literal">False</span></span>):
section_page = <span class="hljs-string">""</span>
section_text = <span class="hljs-string">""</span>
<span class="hljs-keyword">for</span> idx, page <span class="hljs-keyword">in</span> <span class="hljs-built_in">enumerate</span>(documents):
<span class="hljs-keyword">if</span> section_text == <span class="hljs-string">""</span> <span class="hljs-keyword">and</span> section_name <span class="hljs-keyword">in</span> page.text.lower():
<span class="hljs-keyword">if</span> debug: <span class="hljs-built_in">print</span>(idx)
context = page.text
<span class="hljs-keyword">if</span> idx < <span class="hljs-built_in">len</span>(documents)-<span class="hljs-number">2</span>:
context += <span class="hljs-string">"\n"</span> + documents[idx+<span class="hljs-number">1</span>].text
context += <span class="hljs-string">"\n"</span> + documents[idx+<span class="hljs-number">2</span>].text
answer = ask(<span class="hljs-string">f"Does the above have the section called '<span class="hljs-subst">{section_name}</span>' or similar, and does it, in detail, explain the <span class="hljs-subst">{section_name}</span>?"</span>, context)
<span class="hljs-keyword">if</span> answer.startswith(<span class="hljs-string">"Yes"</span>):
answer = ask(<span class="hljs-string">f"\n-----\nWhat is the <span class="hljs-subst">{section_name}</span> in the document? Return everything in this section, up to the next heading. Do not interpret it, give me the verbatim text."</span>, context)
<span class="hljs-keyword">if</span> debug: <span class="hljs-built_in">print</span>(answer + <span class="hljs-string">"\n----------"</span>)
section_page = idx + <span class="hljs-number">1</span>
section_text = answer
<span class="hljs-keyword">if</span> debug: <span class="hljs-built_in">print</span>(section_page, section_text, validate)
<span class="hljs-keyword">return</span> section_text, section_page</pre></div><p id="f08f">In the <code>extract_section</code>function, we do a couple of things:</p><ol><li>We use the <code>section_name</code>we pass in to do a really simple check. We iterate through all pages in the document and see if the text in <code>section_name</code>exists in the lower case version of the page</li><li>If it does, it uses that page and the two subsequent pages and pass them into a couple of LLM prompts to see if it has a section named <code>section_name</code> and if so, it extracts the section verbatim</li><li>Returns a tuple of the the section text, and the page it which it was found</li></ol><p i
Options
d="5b9a">Of course, this is a one time activity. In reality this would be used and ran to extract the relevant sections and cache them for future use.</p><p id="eede">So let’s first start to build up a <code>sections</code>variable. For the first section I am actually going to cheat a little and not use <code>extract_section</code>function because the section I want, <code>authors</code>does not have a section heading, so we just use the <code>ask</code>function and pass in the first page of the document.</p><div id="0aa4"><pre>sections = {}
sections[<span class="hljs-string">"authors"</span>] = (ask(<span class="hljs-string">"Who are the authors mentioned before the abstract"</span>, documents[<span class="hljs-number">0</span>].text), <span class="hljs-number">1</span>)
sections[<span class="hljs-string">"authors"</span>]</pre></div><figure id="2fc0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*iXz3LsCIKxj0qAHG2IV0bQ.png"><figcaption></figcaption></figure><p id="3aa2">OK that looks good. now let’s use the <code>extract_section</code>function to extract the <code>abstract</code>section.</p><div id="c468"><pre>sections[<span class="hljs-string">"abstract"</span>] = extract_section(documents, <span class="hljs-string">"abstract"</span>)
sections[<span class="hljs-string">"abstract"</span>]</pre></div><figure id="4c09"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*AaGiiD9HRReA34Laz6ODCw.png"><figcaption></figcaption></figure><p id="e4af">OK so lets see if what we’ve done is of any use.</p><p id="9052">First let’s look at what license is applicable to this. We’ll start with the Llama Index search:</p><div id="601a"><pre>%%<span class="hljs-built_in">time</span>
query = <span class="hljs-string">'What licenses are mentioned?'</span>
<span class="hljs-built_in">print</span>(query)
answer = query_engine.query(query)
<span class="hljs-built_in">print</span>(answer.response)</pre></div><figure id="5b74"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*LjefM-QYa773vVlv_M9JVg.png"><figcaption></figcaption></figure><p id="f59c">Oh that's a little disappointing. It couldn't find anything.</p><p id="67c9">What about if we use just the abstract section.</p><div id="c085"><pre>%%time
ask(query, sections[<span class="hljs-string">"abstract"</span>][<span class="hljs-number">0</span>])</pre></div><figure id="3cee"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*QqBO27pBanGBmPf-89Sz-w.png"><figcaption></figcaption></figure><p id="ebe0">That looks good. Not only did it get the right answer, it was also quicker because we only use the section of interest in the prompt, and not the k chunks that the semantic search thought would be relevant.</p><p id="ef28">OK another quick check. Let’s ask a question about an author. This author was responsible for one of the papers in the References, but not actually an author of this paper. So asking if they are an author of this paper should say no, right?</p><div id="ae97"><pre>%%<span class="hljs-built_in">time</span>
query = <span class="hljs-string">'Is Jacob Austin an author of this paper?'</span>
<span class="hljs-built_in">print</span>(query)
answer = query_engine.query(query)
<span class="hljs-built_in">print</span>(answer.response)</pre></div><figure id="8ee4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*2Bvv0A6CRjdrd-SbEVil9Q.png"><figcaption></figcaption></figure><p id="a977">OK well that’s a little odd. It thinks he was an author of this paper, And it thinks that because the semantic search found him as an author, but not distinguished it as being a paper in the reference and not the paper itself.</p><p id="f0a0">What about using the sections specifically?</p><div id="2ffe"><pre>%%time
ask(query, sections[<span class="hljs-string">"authors"</span>][<span class="hljs-number">0</span>])</pre></div><figure id="5d52"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*JdCeCo313dTBVHVmAL0HqQ.png"><figcaption></figcaption></figure><p id="a781">Well yes of course that it would work and recognise he is not an author of this paper. And of course it’s quicker because we only use the section of interest in the prompt, and not the k chunks that the semantic search thought would be relevant.</p><p id="5ce2">I personally use this approach a fair bit. I’m not saying it’s better. I’m saying it’s simpler. More Balanced. An alternate approach and another tool in your arsenal.</p><p id="dfff">So, how about it? KISS and BRAG?</p><p id="4a3a">Thanks for reading.</p></article></body>