Free AI web copilot to create summaries, insights and extended knowledge, download it at here

12231

Abstract

readability):</p><div id="35d3"><pre>A chat <span class="hljs-keyword">between</span> a curious <span class="hljs-keyword">user</span> <span class="hljs-keyword">and</span> an artificial intelligence assistant. The assistant gives helpful, detailed, <span class="hljs-keyword">and</span> polite answers <span class="hljs-keyword">to</span> the <span class="hljs-keyword">user</span><span class="hljs-string">'s questions. USER: {prompt} ASSISTANT:</span></pre></div><p id="ea61">So our prompt can be something like this, in the code:</p><div id="cd23"><pre>prompt = <span class="hljs-string">f"USER: What is Science? ASSISTANT:"</span></pre></div><figure id="5f73"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*RS2NejRZxnebFW_1.jpeg"><figcaption>Click here to <a href="https://build-your-own-ai.ck.page/97a99ce2f7">get it for FREE</a></figcaption></figure><h1 id="162b">Real life Use Cases</h1><p id="1c9e">Ok then, but how good can this little model possibly be?</p><p id="4563">In a previous article I compared many of the little LLM performances, and the GPT series was not that good</p><div id="ef57" class="link-block"> <a href="https://readmedium.com/does-size-of-llms-matter-e68404c44e86"> <div> <div> <h2>Does SIZE (of LLMs) Matter?</h2> <div><h3>Let’s compare the performance and capabilities of Tiny LLM and verify when too Small is really too much. — Part 1</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*96ArYs1Iz57aO80NDQuFvA.jpeg)"></div> </div> </div> </a> </div><p id="2778">Considering that the base model is still the Tiny-Llama-1.1B we should not expect too much… but here is the catch!</p><p id="95cb">The WizardVicuna dataset realty boosted the capabilities of the bases model. Let’s explore 3 main stream use cases:</p><ol><li>general questions</li><li>Extract information from a context (RAG)</li><li>Format the output following the instructions.</li></ol><h2 id="c782">1.General questions</h2><p id="51fb"><b>What is science?</b></p><blockquote id="ba3c"><p>Science is the study of knowledge through observation, experimentation, analysis, and interpretation. It involves using logical reasoning, critical thinking, and empirical evidence to understand phenomena in the natural world or human experience.</p></blockquote><p id="e375"><b>Write a daily schedule for a busy professional man. Describe a typical daily routine from morning to bed time. Consider he is a man with a wife and kids.</b></p><p id="c8f3">Note: here I had to specify better what i wanted, becuase asking only for daily routine was giving me correct but inconsistent answers (never ending list)</p><div id="d23d"><pre> Morning Routine: <span class="hljs-bullet">-</span> Wake up at around 6am, shower and get dressed. <span class="hljs-bullet">-</span> Have breakfast (breakfast can be anything from eggs, bacon or pancakes) and coffee/tea. <span class="hljs-bullet">-</span> Check emails and work on important tasks for the day. <span class="hljs-bullet">-</span> Take a quick walk outside to clear his mind before starting work. <span class="hljs-bullet">-</span> Start working on his daily routine: <span class="hljs-bullet"> -</span> Opening email and answering any urgent messages. <span class="hljs-bullet"> -</span> Creating to-do lists for the day's activities. <span class="hljs-bullet"> -</span> Checking social media accounts for updates or breaking news. <span class="hljs-bullet"> -</span> Reviewing important documents/files. <span class="hljs-bullet"> -</span> Making phone calls if necessary. <span class="hljs-bullet"> -</span> Attending meetings or appointments as needed. <span class="hljs-bullet"> -</span> Taking breaks and stretching to avoid burnout. <span class="hljs-bullet">-</span> Lunch: <span class="hljs-bullet"> -</span> Eating a light lunch, such as sandwiches or salads with hummus or guacamole. <span class="hljs-bullet"> -</span> Drinking water or tea/coffee. <span class="hljs-bullet">-</span> Afternoon Routine: <span class="hljs-bullet"> -</span> Working on any important tasks from the morning's work. <span class="hljs-bullet"> -</span> Checking emails and answering urgent messages. <span class="hljs-bullet"> -</span> Taking breaks to stretch, walk around, or do some light exercise. <span class="hljs-bullet"> -</span> Spending time with family/friends if possible. <span class="hljs-bullet">-</span> Evening Routine: <span class="hljs-bullet"> -</span> Watching TV or Netflix before going to bed. <span class="hljs-bullet"> -</span> Reading a book or watching an episode of his favorite show. <span class="hljs-bullet"> -</span> Taking a warm bath or reading a book while relaxing in bed. <span class="hljs-bullet">-</span> Bedtime Routine: <span class="hljs-bullet"> -</span> Winding down by taking a hot bath, reading a book, and listening to soothing music. <span class="hljs-bullet"> -</span> Going back to sleep with the lights off.</pre></div><figure id="0e53"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*0LI6yS8_E5fYA-h_g0thMw.jpeg"><figcaption>Your prompt laboratory — from <a href="https://lexica.art/prompt/5d1d4d9b-10de-4cf9-9f54-433d70e7e68d">Lexica.art</a></figcaption></figure><h2 id="5322">2.Extract information from a context (RAG)</h2><p id="cf58">I took the extract from a youtube video and used as context to reply user questions.</p><p id="4420">NOTE: this method is applicable also for chatbots that must reply to specific data only and not use external knowledge</p><div id="b3cb"><pre>Reply <span class="hljs-keyword">to</span> <span class="hljs-keyword">the</span> question only using <span class="hljs-keyword">the</span> provided context. [context]WHAT IS HIERARCHY <span class="hljs-number">4.0</span>? whether you own build manage maintain <span class="hljs-keyword">or</span> operate an oil plant inevitably issues arise <span class="hljs-keyword">that</span> require immediate action <span class="hljs-keyword">and</span> resolution. With big data flowing <span class="hljs-keyword">in</span> constantly <span class="hljs-keyword">from</span> all sectors making sense <span class="hljs-keyword">of</span> everything <span class="hljs-keyword">while</span> troubleshooting issues <span class="hljs-keyword">without</span> wasting <span class="hljs-built_in">time</span> can be a huge challenge. So what's <span class="hljs-keyword">the</span> solution? introducing hierarchy <span class="hljs-number">4.0</span> <span class="hljs-keyword">and</span> Innovative software solution <span class="hljs-keyword">for</span> control Safety Systems . Hierarchy <span class="hljs-number">4.0</span> presents an interactive diagram <span class="hljs-keyword">of</span> <span class="hljs-keyword">the</span> entire plant revealing cause <span class="hljs-keyword">and</span> effect Behavior <span class="hljs-keyword">with</span> readings provided <span class="hljs-keyword">in</span> a hierarchical view allowing <span class="hljs-keyword">for</span> a deep understanding <span class="hljs-keyword">of</span> <span class="hljs-keyword">the</span> system's strategy All data <span class="hljs-keyword">is</span> collected <span class="hljs-keyword">from</span> multiple sources visualized <span class="hljs-keyword">as</span> a diagram <span class="hljs-keyword">and</span> optimized <span class="hljs-keyword">through</span> a customized dashboard allowing users <span class="hljs-keyword">to</span> <span class="hljs-built_in">run</span> a logic simulation <span class="hljs-keyword">from</span> live data <span class="hljs-keyword">or</span> pick a moment <span class="hljs-keyword">from</span> their history. Your simulation <span class="hljs-keyword">is</span> based <span class="hljs-keyword">on</span> actual safety Logics <span class="hljs-keyword">not</span> just <span class="hljs-keyword">on</span> a math model Now <span class="hljs-keyword">every</span> users can prepare an RCA report <span class="hljs-number">90</span> percent faster <span class="hljs-keyword">in</span> just a few minutes. Hierarchy can be used <span class="hljs-keyword">for</span> any project phase starting <span class="hljs-keyword">from</span> engineering <span class="hljs-keyword">to</span> commissioning <span class="hljs-keyword">and</span> up <span class="hljs-keyword">to</span> operation <span class="hljs-keyword">and</span> maintenance <span class="hljs-keyword">while</span> supporting hazop Hazard analysis <span class="hljs-keyword">by</span> reducing human <span class="hljs-keyword">error</span> <span class="hljs-keyword">and</span> avoiding incorrect documentation. Hierarchy <span class="hljs-number">4.0</span> supports plant operators <span class="hljs-keyword">in</span> decision making taking <span class="hljs-keyword">into</span> account both <span class="hljs-keyword">the</span> safety <span class="hljs-keyword">and</span> <span class="hljs-keyword">the</span> operability <span class="hljs-keyword">of</span> their assets. Hierarchy <span class="hljs-number">4.0</span> Embraces a block <span class="hljs-built_in">log</span> approach: <span class="hljs-keyword">it</span> automatically calculates all Logics affected <span class="hljs-keyword">by</span> an override <span class="hljs-keyword">and</span> gives a full understanding <span class="hljs-keyword">of</span> constraints. Not convinced let's look <span class="hljs-keyword">at</span> <span class="hljs-keyword">the</span> data! During <span class="hljs-keyword">its</span> <span class="hljs-keyword">first</span> project hierarchy <span class="hljs-number">4.0</span> prevented a revenue loss <span class="hljs-keyword">of</span> <span class="hljs-number">45</span> million dollars. Plants <span class="hljs-keyword">that</span> utilize hierarchy <span class="hljs-number">4.0</span> save up <span class="hljs-keyword">to</span> <span class="hljs-number">95</span> <span class="hljs-keyword">of</span> their <span class="hljs-built_in">time</span> <span class="hljs-keyword">and</span> deliver a <span class="hljs-literal">return</span> <span class="hljs-keyword">on</span> investment up <span class="hljs-keyword">to</span> five <span class="hljs-keyword">times</span> <span class="hljs-keyword">in</span> value <span class="hljs-keyword">from</span> <span class="hljs-built_in">day</span> one <span class="hljs-keyword">and</span> experience a Personnel utilization <span class="hljs-keyword">and</span> plant efficiency increase <span class="hljs-keyword">by</span> <span class="hljs-number">20</span> percent per <span class="hljs-built_in">year</span>.Try our demo <span class="hljs-keyword">and</span> make <span class="hljs-keyword">the</span> move <span class="hljs-keyword">to</span> hierarchy <span class="hljs-number">4.0</span> today[<span class="hljs-keyword">end</span> <span class="hljs-keyword">of</span> context]

Question: what <span class="hljs-keyword">is</span> Hierarchy <span class="hljs-number">4.0</span>?</pre></div><p id="ce46">The model processed the 500 tokens in the prompt in 40 seconds with this reply 👍🥳:</p><blockquote id="42ac"><p>Hierarchy 4.0 is an innovative software solution for control safety systems that provides a deep understanding of the system’s strategy through interactive diagrams and optimized dashboards. It allows users to run a logic simulation from live data or pick a moment from their history, reducing human error and avoiding incorrect documentation. The RCA report can be prepared in just a few minutes using Hierarchy 4.0’s block log approach. This supports plant operators in decision making taking into account both the safety and operability of their assets.</p></blockquote><p id="d1ad">For punctual/factual RAG usually we use smaller chunks (250/350) after similarity search on a database, so the speed will be good enough.</p><p id="ecce">An more importantly, the information is correct!</p><blockquote id="f069"><p>⚠️ remember to limit the reply of the chatbot in your system pr

Options

ompt, to make sure that it replies only to the provided knowledge and say “I don’t know” for all other scenarios.</p></blockquote><figure id="8c6a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*94pbfKJF2e66q-JgzpxWzA.jpeg"><figcaption>A good model is able to follow instructions — <a href="https://lexica.art/prompt/5d1d4d9b-10de-4cf9-9f54-433d70e7e68d">Lexica.art image</a></figcaption></figure><h2 id="c68e">3.Format the output following the instructions.</h2><p id="d8e9">Here I am going to use 2 main tasks: have a summary from the text and ask the LLM to provide the main topics of a text formatting it into a list.</p><p id="dd30">The prompt is quite simple, but I had to try several version of it before finding the one giving me only the Summary without any introduction words…</p><div id="5409"><pre>[<span class="hljs-keyword">text</span>]... insert here your <span class="hljs-keyword">text</span>... ... the move <span class="hljs-keyword">to</span> hierarchy <span class="hljs-number">4.0</span> today[<span class="hljs-keyword">end</span> <span class="hljs-keyword">of</span> <span class="hljs-keyword">text</span>]

SUMMARY:</pre></div><p id="20da">It did a good job here!</p><blockquote id="a0c0"><p>Hierarchy 4.0 is an innovative software solution for control safety systems that provides a deep understanding of the system’s strategy, visualizes data in a hierarchical view, and supports plant operators in decision making taking into account both the safety and operability of their assets. It also saves time and delivers a return on investment by preventing revenue loss due to errors or overlooked hazards.</p></blockquote><p id="c524">Also for the second tasks I had to find the good wording and order to get a markdown list as an output. The model followed it correctly</p><div id="e82f"><pre>Write the main points <span class="hljs-keyword">of</span> the following <span class="hljs-keyword">text</span>. [<span class="hljs-keyword">text</span>]WHAT <span class="hljs-built_in">IS</span> HIERARCHY <span class="hljs-number">4.0</span>? whether you ... the move <span class="hljs-keyword">to</span> hierarchy <span class="hljs-number">4.0</span> today[<span class="hljs-keyword">end</span> <span class="hljs-keyword">of</span> <span class="hljs-keyword">text</span>]

format the output <span class="hljs-keyword">as</span> a markdown list.</pre></div><p id="6950">And as expected the result is good!</p><div id="5b1b"><pre>- Hierarchy <span class="hljs-number">4.0</span> <span class="hljs-built_in">is</span> an innovative software solution <span class="hljs-keyword">for</span> control safety systems that provides a deep understanding <span class="hljs-keyword">of</span> the system<span class="hljs-comment">'s strategy through interactive diagrams, optimized with a customized dashboard and supported by RCA reports.</span>

All data <span class="hljs-built_in">is</span> collected <span class="hljs-keyword">from</span> multiple sources visualized <span class="hljs-keyword">as</span> a hierarchical view allowing users <span class="hljs-keyword">to</span> run a logic simulation <span class="hljs-keyword">from</span> live data <span class="hljs-built_in">or</span> pick a moment <span class="hljs-keyword">from</span> their history.
Your simulation <span class="hljs-built_in">is</span> based <span class="hljs-keyword">on</span> actual safety logics <span class="hljs-built_in">not</span> just <span class="hljs-keyword">on</span> a math model, making it faster <span class="hljs-built_in">and</span> more accurate <span class="hljs-keyword">for</span> any project phase starting <span class="hljs-keyword">from</span> engineering <span class="hljs-keyword">to</span> commissioning <span class="hljs-built_in">and</span> up <span class="hljs-keyword">to</span> operation <span class="hljs-built_in">and</span> maintenance.
Hierarchy can be used <span class="hljs-keyword">for</span> any project phase starting <span class="hljs-keyword">from</span> engineering <span class="hljs-keyword">to</span> commissioning <span class="hljs-built_in">and</span> up <span class="hljs-keyword">to</span> operation <span class="hljs-built_in">and</span> maintenance <span class="hljs-keyword">while</span> supporting hazop Hazard analysis <span class="hljs-keyword">by</span> reducing human <span class="hljs-keyword">error</span> <span class="hljs-built_in">and</span> avoiding incorrect documentation.
Hierarchy <span class="hljs-number">4.0</span> supports plant operators <span class="hljs-keyword">in</span> decision making taking <span class="hljs-keyword">into</span> account both the safety <span class="hljs-built_in">and</span> the operability <span class="hljs-keyword">of</span> their assets, <span class="hljs-keyword">with</span> a block log approach that calculates all Logics affected <span class="hljs-keyword">by</span> an override automatically.
<span class="hljs-built_in">Not</span> convinced <span class="hljs-keyword">let</span><span class="hljs-comment">'s look at the data! During its first project hierarchy 4.0 prevented a revenue loss of 45 million dollars, saving up to 95 of their time and delivering a return on investment up to five times in value from day one and experiencing a Personnel utilization and plant efficiency increase by 20 percent per year.</span>
<span class="hljs-keyword">Try</span> our demo <span class="hljs-built_in">and</span> make the move <span class="hljs-keyword">to</span> hierarchy <span class="hljs-number">4.0</span> today!</pre></div><p id="c9db">NOTE: To my surprise, the list generation was accurate and always on the topic.</p><figure id="e385"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*uB55x4F4Ft4zs4aqkb6SfQ.jpeg"><figcaption>Do It Yourself <a href="https://lexica.art/prompt/a6bfcfd8-5e7a-458c-817e-4a403f9fe238">imagined by Lexica.art</a></figcaption></figure><h1 id="59d4">Test it yourself</h1><p id="4676">To run with the same Gradio User Interface I used you can easily:</p><ul><li>prepare the virtual environment and activate it</li><li>install the dependencies as per the Required Packages section above</li><li>download the <a href="https://huggingface.co/afrideva/Tiny-Vicuna-1B-GGUF/resolve/main/tiny-vicuna-1b.q5_k_m.gguf?download=true">q5 GGUF model file from here</a></li><li>download the <a href="https://github.com/fabiomatricardi/KingOfTheTiny/raw/main/40-vicuna1B_PG_MEM.py">python file from here</a></li><li>fix line 39 as described above</li></ul><div id="dbc3"><pre><span class="hljs-comment">#only "tiny-vicuna-1b.q5_k_m.gguf" if you are on Windows</span> <span class="hljs-number">39</span>| modelfile = <span class="hljs-string">"./tiny-vicuna-1b.q5_k_m.gguf"</span> </pre></div><ul><li>save the file and run this command on your terminal from the project directory</li></ul><div id="225a"><pre>python 40-vicuna1B_PG_MEM.py</pre></div><p id="7d2a">Your default browser will open a new tab with the Gradio Interface</p><figure id="66bb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*BpIvccf_30bGDmyS2eDPpQ.png"><figcaption>You can adjust the AI parameters on the left. You can also leave comments and notes…</figcaption></figure><p id="d5d1">To close the application:</p><ul><li>close the Browser Tab</li><li>type <code>^C</code> on the Terminal window</li></ul><h1 id="1ae7">Conclusions</h1><p id="6018">Lately it may seem that I am focusing too much on Small Language Models. And it is true.</p><p id="9c9d">But if you look around there are really tons of resources about private/paid AI services and really little for common users. And this is really frustrating if we consider how MANY open source models are out there for free…</p><p id="d657">The best SOTA (State Of The Art) benchmark will always be you! I mean, we know what we want to build, and we need a digital partner good enough to do a good job.</p><p id="2653">Hope you enjoyed the article. If this story provided value and you wish to show a little support, you could:</p><ol><li>Clap a lot of times for this story</li><li>Highlight the parts more relevant to be remembered (it will be easier for you to find it later, and for me to write better articles)</li><li><b>Learn how to start to Build Your Own AI</b>, download <a href="https://build-your-own-ai.ck.page/97a99ce2f7">This Free eBook</a></li><li>Sign up for a Medium membership using <a href="https://medium.com/@fabio.matricardi/membership">my link</a> — ($5/month to read unlimited Medium stories)</li><li>Follow me on Medium</li><li>Read my latest articles <a href="https://medium.com/@fabio.matricardi">https://medium.com/@fabio.matricardi</a></li></ol><p id="0a52">All the code and screenshots are in my GitHub Repository</p><div id="ebfb" class="link-block"> <a href="https://github.com/fabiomatricardi/KingOfTheTiny/tree/main"> <div> <div> <h2>GitHub - fabiomatricardi/KingOfTheTiny: Repo of the code from the Medium article about TinyVicuna1B…</h2> <div><h3>Repo of the code from the Medium article about TinyVicuna1B GGUF - GitHub - fabiomatricardi/KingOfTheTiny: Repo of the…</h3></div> <div><p>github.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*mDyigsl7-5Ndtnwi)"></div> </div> </div> </a> </div><p id="9d99">If you want to read more here some ideas using Small Language models and Hugging Face Hub:</p><div id="e833" class="link-block"> <a href="https://generativeai.pub/a-hitchhiker-guide-to-llm-with-hugging-face-7d1aee68dbf1"> <div> <div> <h2>A Hitchhiker Guide to LLM with Hugging Face.</h2> <div><h3>12 things I wish I knew before starting to work with Hugging Face LLM</h3></div> <div><p>generativeai.pub</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*DQs7qrrlDzN5Z-v2.jpeg)"></div> </div> </div> </a> </div><div id="687c" class="link-block"> <a href="https://readmedium.com/past-present-perfect-flan-t5-stands-the-test-of-time-e1db93ff7846"> <div> <div> <h2>Past, Present, Perfect: Flan-T5 Stands the Test of Time</h2> <div><h3>Fast, Flawless, Forever. Flan-T5 outsmart the Giants and flexes text prowess on everyday hardware.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*CKhW7N0FsgNbQke50QhBGw.png)"></div> </div> </div> </a> </div><div id="8854" class="link-block"> <a href="https://ai.gopubby.com/the-llm-revolution-dethroning-gpus-and-redefining-ais-future-038e6f686a79"> <div> <div> <h2>The LLM Revolution: Dethroning GPUs and Redefining AI’s Future</h2> <div><h3>From Efficiency Concerns to Architectural Bottlenecks, Why the LLM Landscape is Shifting</h3></div> <div><p>ai.gopubby.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*ls0w_eB8VPlhEIIfDAaSIQ.jpeg)"></div> </div> </div> </a> </div><h1 id="5b00">Stackademic</h1><p id="3d62"><i>Thank you for reading until the end. Before you go:</i></p><ul><li><i>Please consider <b>clapping</b> and <b>following</b> the writer! 👏</i></li><li><i>Follow us on <a href="https://twitter.com/stackademichq"><b>Twitter(X)</b></a>, <a href="https://www.linkedin.com/company/stackademic"><b>LinkedIn</b></a>, and <a href="https://www.youtube.com/c/stackademic"><b>YouTube</b></a><b>.</b></i></li><li><i>Visit <a href="http://stackademic.com/"><b>Stackademic.com</b></a> to find out more about how we are democratizing free programming education around the world.</i></li></ul></article></body>

Tiny-Vicuna-1B is the lightweight champion of the Tiny Models

Command and Conquer: the smallest Vicuna flavor is the Tiny Master of Instruction, answers your every call (Flawlessly!)

Forget the hulking behemoths of the AI arena, lumbering through processing power like a tank through mud. The battlefield has shifted, and the reigning champion isn’t Goliath, but David — in this case, the tiny and mighty Tiny-Vicuna-1b.

In a world obsessed with ever-larger language models, the quest for micro warriors who can dance on mobile devices and still pack a linguistic punch has become the new holy grail. And Vicuna-1b, the smallest flavor of Vicuna, stands triumphant — a Tiny Master of Instruction, ready to answer your every command (flawlessly!) without weighing down your devices.

But, will it be good enough? What can we do in real life scenarios with a model so small?

Let’s have a look together!

This is the final Application running our Tiny Champion

The promises

The past year saw a general trend towards giant Language Models: it was also leaked, later in 2023 (with no big surprises as I mentioned in several articles…) that GPT-4 is a Model of Experts.

So it seems that if you want to do something with AI you need the computational power of IBM or NASA: how can you ran 8 models of 220Billion parameters each making it a monster 1.76 Trillion parameter effective size?

Against transformers that guzzle terabytes like water, nano Models (like the new born GeminiNano), Tiny Models (like the TinyLlama family) and Microsoft Phi1 and 2 are rewriting the rules.

But all these actors have in common one goal: produce models Small enough to fit the hardware specifications of Mobile devices and Good enough to be useful in the business.

Tiny-Vicuna-1B

Don’t be fooled by the name only. Tiny-Vicuna is indeed a Llama model (but anyway Vicuna has always been an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT).

This small model is part of the TinyLlama project, that aims to pretrain a 1.1B Llama model on 3 trillion tokens with some proper optimization. But since Tiny Vicuna 1B is a TinyLLama 1.1B finetuned with WizardVicuna dataset it has been called Tiny-Vicuna!

This lightweight champion proves that size doesn’t matter when it comes to brains. It sprints through NLP tasks with the grace of a cheetah, leaving bulkier models gasping in its dust. Think flawless summarization, precise question answering.

Running it in quantized version it requires no more than 700 Mb of RAM! Tiny-Vicuna-1b whispers promises of a future where AI isn’t confined to data centers, but lives in your pocket, ready to be your personal language ninja at a moment’s notice.

So, buckle up, fellow AI adventurers, because we’re about to dive into the fascinating world of Tiny-Vicuna-1B-GGUF.

Tiny-Vicuna-1B-GGUF on the test bench

We will use the Quantized GGUF model files for Tiny-Vicuna-1B from Jiayi-Pan. This is mainly because even if it is a barely 1.1 Billion parameters model, running it on CPU in full precision will require almost 10 GB of RAM.

According to the original model card…

This model is a fine-tuned version of TinyLlama on WizardVicuna Dataset. It should be fully compatible with Vicuna-v1.5 series.

As a bonus… this model is under the Apache 2.0 license, and that means that 🥳

The Apache software license gives users permission to reuse code for nearly any purpose, including using the code as part of proprietary software. As with other open source licenses, the Apache license governs how end-users can utilize the software in their own projects. This license is a widely-used open source license, and like other permissive licenses, it continues to grow in popularity because it encourages the use of open source software within proprietary projects.

Required packages

We don’t need to much dependencies. You can find all the code on my Github repo.

Create a virtual environment in a new directory and activate it:

mkdir TinyVicuna
cd TinyVicuna

python3.10 -m venv venv #I am using python 3.10
python -m venv venv  #if you are on Windows

#to activate the Virtual Environment
source venv/bin/activate  #for mac
venv\Scripts\activate     #for windows users

As a minimum, since we have also a Gradio UI we have to install the following pip packages

pip install llama-cpp-python
pip install gradio
pip install psutil
pip install plotly

The last 2 dependencies are used only to give CPU/RAM usage statistics during the inference time. I think it is good to see all of this with your own eyes 😉.

Next is to download in the same directory the GGUF file. You can choose the quantization method, but do not go below q4. In my case I decided to use the q5 version: a little heavier but with minor quality loss. You can download them from here:

The core code for running the model is in 2 different sections of the python file. I will explain them here below. You can directly download the python file from the repo.

from llama_cpp import Llama
modelfile = "./tiny-vicuna-1b.q5_k_m.gguf"
contextlength=2048
stoptoken = '<s>'

################ LOADING THE MODELS  ###############################
# Set gpu_layers to the number of layers to offload to GPU. 
# Set to 0 if no GPU acceleration is available on your system.
####################################################################
llm = Llama(
  model_path=modelfile,  # Download the model file first
  n_ctx=contextlength,  # The max sequence length to use - note that longer sequence lengths require much more resources
  #n_threads=2,            # The number of CPU threads to use, tailor to your system and the resulting performance
)

######### INFERENCE #######################
response = llm(prompt, 
                max_tokens=max_new_tokens, 
                stop=['Q:', stoptoken], 
                temperature = temperature,
                repeat_penalty = repeat_penalty,
                top_p = top_p,
                echo=False)
print(response)

We import the Llama class from llama.cpp and instantiate it into the llm variable. As you can see we need to pass very little parameters here: the model path (that includes the GGUF filename) and the context window.

NOTE 1: if you are running on Windows the modelfile text does not need ./ so it should simply be modelfile = “tiny-vicuna-1b.q5_k_m.gguf”

NOTE 2: Every model is trained with a specific context window. If it is not mentioned in the model card you can always peek at the terminal console while loading the model the first time

log from llama.cpp while loading the model

You can read more about Model Cards in this article

Model Cards Is All You Need

Exploring Hugging Face Model Cards: Your All-in-One Guide to AI Mastery — Read, Craft, Deploy!

blog.stackademic.com

Then we run the inference calling the llm with additional parameters. The only required one, really, is the prompt: the others are optional.

Note that the prompt must follow the specific Format of the Model. If you are unsure, and nothing is mentioned in the model card (like in this case) you can have a look to this amazing repository.

screenshot from the Hardware corner LLM section

We search for the Vicuna version 1.5 because the original model states that This model is a fine-tuned version of TinyLlama on WizardVicuna Dataset. It should be fully compatible with Vicuna-v1.5 series.

Selecting the model you can verify what is the expected prompt format.

screenshot from https://www.hardware-corner.net/llm-database/Vicuna/

So our prompt template is as follows (be aware that there are no new lines, I created them for readability):

A chat between a curious user and an artificial intelligence assistant. 
The assistant gives helpful, detailed, and polite answers to the 
user's questions. USER: {prompt} ASSISTANT:

So our prompt can be something like this, in the code:

prompt = f"USER: What is Science? ASSISTANT:"

Real life Use Cases

Ok then, but how good can this little model possibly be?

In a previous article I compared many of the little LLM performances, and the GPT series was not that good

Does SIZE (of LLMs) Matter?

Let’s compare the performance and capabilities of Tiny LLM and verify when too Small is really too much. — Part 1

medium.com

Considering that the base model is still the Tiny-Llama-1.1B we should not expect too much… but here is the catch!

The WizardVicuna dataset realty boosted the capabilities of the bases model. Let’s explore 3 main stream use cases:

general questions
Extract information from a context (RAG)
Format the output following the instructions.

1.General questions

What is science?

Science is the study of knowledge through observation, experimentation, analysis, and interpretation. It involves using logical reasoning, critical thinking, and empirical evidence to understand phenomena in the natural world or human experience.

Write a daily schedule for a busy professional man. Describe a typical daily routine from morning to bed time. Consider he is a man with a wife and kids.

Note: here I had to specify better what i wanted, becuase asking only for daily routine was giving me correct but inconsistent answers (never ending list)

 Morning Routine: 
- Wake up at around 6am, shower and get dressed.
- Have breakfast (breakfast can be anything from eggs, bacon or pancakes) and coffee/tea.
- Check emails and work on important tasks for the day.
- Take a quick walk outside to clear his mind before starting work.
- Start working on his daily routine: 
   - Opening email and answering any urgent messages.
   - Creating to-do lists for the day's activities.
   - Checking social media accounts for updates or breaking news.
   - Reviewing important documents/files.
   - Making phone calls if necessary.
   - Attending meetings or appointments as needed.
   - Taking breaks and stretching to avoid burnout.
- Lunch: 
   - Eating a light lunch, such as sandwiches or salads with hummus or guacamole.
   - Drinking water or tea/coffee.
- Afternoon Routine: 
   - Working on any important tasks from the morning's work.
   - Checking emails and answering urgent messages.
   - Taking breaks to stretch, walk around, or do some light exercise.
   - Spending time with family/friends if possible.
- Evening Routine: 
   - Watching TV or Netflix before going to bed.
   - Reading a book or watching an episode of his favorite show.
   - Taking a warm bath or reading a book while relaxing in bed.
- Bedtime Routine: 
   - Winding down by taking a hot bath, reading a book, and listening to soothing music.
   - Going back to sleep with the lights off.

Your prompt laboratory — from Lexica.art

2.Extract information from a context (RAG)

I took the extract from a youtube video and used as context to reply user questions.

NOTE: this method is applicable also for chatbots that must reply to specific data only and not use external knowledge

Reply to the question only using the provided context.
[context]WHAT IS HIERARCHY 4.0? whether you own build manage maintain or operate an oil plant inevitably issues arise that require immediate action and resolution. With big data flowing in constantly from all sectors making sense of everything while troubleshooting
issues without wasting time can be a huge challenge.  So what's the solution?
introducing hierarchy 4.0 and Innovative software solution for control Safety Systems . Hierarchy 4.0 presents an interactive diagram of the entire plant revealing cause and effect Behavior with readings provided in a hierarchical view allowing for a deep understanding of the system's strategy
All data is collected from multiple sources visualized as a diagram and optimized through a customized dashboard allowing users to run a logic simulation from live data or pick a moment from their history.  Your simulation is based on actual safety Logics not just on a math model
Now every users can prepare an RCA report 90 percent faster in just a few minutes. Hierarchy can be used for any project phase starting from engineering to commissioning and up to operation and maintenance while supporting hazop Hazard analysis by reducing human error and avoiding incorrect documentation. Hierarchy 4.0 supports plant operators in decision making taking into account both the safety and the operability of their assets.  Hierarchy 4.0 Embraces a block log approach: it automatically calculates all Logics affected by an
override and gives a full understanding of constraints.  Not convinced let's look at the data!
During its first project hierarchy 4.0 prevented a revenue loss of 45 million dollars.  Plants that utilize hierarchy 4.0 save up to 95 of their time and deliver a return on investment up to five times in value from day one and experience a Personnel utilization and plant efficiency increase by 20 percent per year.Try our demo and make the move to hierarchy 4.0 today[end of context]

Question: what is Hierarchy 4.0?

The model processed the 500 tokens in the prompt in 40 seconds with this reply 👍🥳:

Hierarchy 4.0 is an innovative software solution for control safety systems that provides a deep understanding of the system’s strategy through interactive diagrams and optimized dashboards. It allows users to run a logic simulation from live data or pick a moment from their history, reducing human error and avoiding incorrect documentation. The RCA report can be prepared in just a few minutes using Hierarchy 4.0’s block log approach. This supports plant operators in decision making taking into account both the safety and operability of their assets.

For punctual/factual RAG usually we use smaller chunks (250/350) after similarity search on a database, so the speed will be good enough.

An more importantly, the information is correct!

⚠️ remember to limit the reply of the chatbot in your system prompt, to make sure that it replies only to the provided knowledge and say “I don’t know” for all other scenarios.

A good model is able to follow instructions — Lexica.art image

3.Format the output following the instructions.

Here I am going to use 2 main tasks: have a summary from the text and ask the LLM to provide the main topics of a text formatting it into a list.

The prompt is quite simple, but I had to try several version of it before finding the one giving me only the Summary without any introduction words…

[text]... insert here your text...
... the move to hierarchy 4.0 today[end of text]

- SUMMARY:

It did a good job here!

Hierarchy 4.0 is an innovative software solution for control safety systems that provides a deep understanding of the system’s strategy, visualizes data in a hierarchical view, and supports plant operators in decision making taking into account both the safety and operability of their assets. It also saves time and delivers a return on investment by preventing revenue loss due to errors or overlooked hazards.

Also for the second tasks I had to find the good wording and order to get a markdown list as an output. The model followed it correctly

Write the main points of the following text.
[text]WHAT IS HIERARCHY 4.0? whether you ... the move to 
hierarchy 4.0 today[end of text]

format the output as a markdown list.

And as expected the result is good!

- Hierarchy 4.0 is an innovative software solution for control safety systems that provides a deep understanding of the system's strategy through interactive diagrams, optimized with a customized dashboard and supported by RCA reports.
- All data is collected from multiple sources visualized as a hierarchical view allowing users to run a logic simulation from live data or pick a moment from their history.
- Your simulation is based on actual safety logics not just on a math model, making it faster and more accurate for any project phase starting from engineering to commissioning and up to operation and maintenance.
- Hierarchy can be used for any project phase starting from engineering to commissioning and up to operation and maintenance while supporting hazop Hazard analysis by reducing human error and avoiding incorrect documentation.
- Hierarchy 4.0 supports plant operators in decision making taking into account both the safety and the operability of their assets, with a block log approach that calculates all Logics affected by an override automatically.
- Not convinced let's look at the data! During its first project hierarchy 4.0 prevented a revenue loss of 45 million dollars, saving up to 95 of their time and delivering a return on investment up to five times in value from day one and experiencing a Personnel utilization and plant efficiency increase by 20 percent per year.
- Try our demo and make the move to hierarchy 4.0 today!

NOTE: To my surprise, the list generation was accurate and always on the topic.

Test it yourself

To run with the same Gradio User Interface I used you can easily:

prepare the virtual environment and activate it
install the dependencies as per the Required Packages section above
download the q5 GGUF model file from here
download the python file from here
fix line 39 as described above

#only "tiny-vicuna-1b.q5_k_m.gguf" if you are on Windows
39|  modelfile = "./tiny-vicuna-1b.q5_k_m.gguf"

save the file and run this command on your terminal from the project directory

python 40-vicuna1B_PG_MEM.py

Your default browser will open a new tab with the Gradio Interface

You can adjust the AI parameters on the left. You can also leave comments and notes…

To close the application:

close the Browser Tab
type ^C on the Terminal window

Conclusions

Lately it may seem that I am focusing too much on Small Language Models. And it is true.

But if you look around there are really tons of resources about private/paid AI services and really little for common users. And this is really frustrating if we consider how MANY open source models are out there for free…

The best SOTA (State Of The Art) benchmark will always be you! I mean, we know what we want to build, and we need a digital partner good enough to do a good job.

Hope you enjoyed the article. If this story provided value and you wish to show a little support, you could:

Clap a lot of times for this story
Highlight the parts more relevant to be remembered (it will be easier for you to find it later, and for me to write better articles)
Learn how to start to Build Your Own AI, download This Free eBook
Sign up for a Medium membership using my link — ($5/month to read unlimited Medium stories)
Follow me on Medium
Read my latest articles https://medium.com/@fabio.matricardi

All the code and screenshots are in my GitHub Repository

GitHub - fabiomatricardi/KingOfTheTiny: Repo of the code from the Medium article about TinyVicuna1B…

Repo of the code from the Medium article about TinyVicuna1B GGUF - GitHub - fabiomatricardi/KingOfTheTiny: Repo of the…

github.com

If you want to read more here some ideas using Small Language models and Hugging Face Hub:

A Hitchhiker Guide to LLM with Hugging Face.

12 things I wish I knew before starting to work with Hugging Face LLM

generativeai.pub

Past, Present, Perfect: Flan-T5 Stands the Test of Time

Fast, Flawless, Forever. Flan-T5 outsmart the Giants and flexes text prowess on everyday hardware.

medium.com

The LLM Revolution: Dethroning GPUs and Redefining AI’s Future

From Efficiency Concerns to Architectural Bottlenecks, Why the LLM Landscape is Shifting

ai.gopubby.com

Stackademic

Thank you for reading until the end. Before you go:

Please consider clapping and following the writer! 👏
Follow us on Twitter(X), LinkedIn, and YouTube.
Visit Stackademic.com to find out more about how we are democratizing free programming education around the world.