1*uk9fTHbJDS8EQGLRv5DyWw.png"><figcaption>A robot working in a restaurant, by author, generated by Midjourney</figcaption></figure><p id="f903"><b>In other words:</b> Based on its available sensors, the language model should be able to assess the situation, ask the right questions, and act towards resolving any task at hand!</p><p id="c981" type="7">For this experiment, Google relied on a 540-billion parameter language model called PaLM to develop a one of kind butler robot!</p><h1 id="c12b">Task Oriented Algorithm: Input Analysis and Real Time Decision Making</h1><p id="71d7">So what’s new? Can we simply connect outside sensors to an existing language model?</p><p id="b390">The answer is obviously no. We need to bring together language models and the physical world. And that’s SayCan’s mission!</p><p id="0409"><b>As a matter if fact, the algorithm SayCan bridges the gap between large language models (LLMs) and robotic systems </b>by providing a way to connect the high-level semantic knowledge represented in LLMs to real-world tasks that can be executed by robots.</p><p id="ec01">This is achieved by combining</p><ul><li>the outputs of the LLM</li><li>with the output of value functions, which provide a measure of how likely a particular task will succeed from the current state.</li></ul><figure id="121f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*UBwGOudheQjarCV2FSo6PA.png"><figcaption>SayCan combines LLMs’ output with a new Value Function that evaluates the success of potential actions.</figcaption></figure><p id="92f1">Thus, SayCan brings several advantages over traditional language models.</p><ul><li><b>Firstly, it provides a way to perform high-level natural language instructions in a real-world environment. </b>Traditional language models can interpret and generate natural language text, but they lack the ability to execute those instructions in the real-world. SayCan enables this by using the output of the LLM to inform decision-making and guide the execution of tasks by a robot.</li><li><b>Secondly, SayCan allows robots to operate in a more autonomous and flexible manner.</b> By combining the output of the LLM with value functions, SayCan can select tasks that are both feasible and contextually appropriate, allowing the robot to operate in a more dynamic and adaptive manner.</li><li><b>Finally, SayCan provides a way to make the decision-making process of robots more interpretable and transparent</b>. The visualizations of the decision-making process in SayCan, which show the contribution of both the LLM and the value functions, highlight the interpretability of the approach and help to build trust in autonomous systems.</li></ul><h1 id="9f57">Experimental Results</h1><p id="b3a3">In benchmarking experiments, SayCan has demonstrated impressive results in both</p><ul><li>an office kitchen,</li><li>and a mock office kitchen,</li></ul><p id="2069">where it succes
Options
sfully planned and executed complex tasks specified by natural language instructions.</p><p id="c20d"><b>For example</b>, <b>when given the task “I spilled my coke, can you bring me something to clean it up?” SayCan planned and executed the steps to find a sponge, pick it up, bring it to the user, and complete the task.</b></p><figure id="b6ca"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*FeIjSe2U8QNsDKmwktUUuA.png"><figcaption>Given the task “I spilled my coke, can you bring me something to clean it up?”, SayCan successfully planned and executed the following steps 1. Find a sponge 2. Pick up the sponge 3. Bring it to you 4. Done.</figcaption></figure><p id="9467">Similarly, when given the slightly different task “I spilled my coke, can you bring me a replacement,” SayCan successfully planned and executed the steps to find a coke can, pick it up, bring it to the user, and complete the task.</p><p id="a3a0"><b>To be fair, the robot is still slow but for a first experiment, it does look promising!</b></p><p id="70ca">This is still an early experiment and you can follow its updates <a href="https://say-can.github.io/">here</a>.</p><h1 id="0452">Conclusion</h1><p id="6a45">The convergence of robotics and large language models is a key development in the field of artificial intelligence, and it has the potential to revolutionize the way robots interact with humans.</p><p id="4c02">With algorithms like SayCan, we are now seeing the first glimpses of a new generation of robots that can understand high-level, temporally extended instructions and complete complex tasks in real-world environments.</p><p id="4222">The future of robotics and AI is exciting, and it will be fascinating to see what other breakthroughs lie ahead.</p><p id="31f1"><b>That being said, its range of applications are very wide, and can potentially be dangerous. It’s important that we think about safe guards limiting robots’ ability to harm humans.</b></p><p id="b640">Read more about robots and LLMs as well as potential safe guards to protect humans:</p><div id="978f" class="link-block">
<a href="https://readmedium.com/asimovs-3-laws-of-robotics-and-sparrow-s-23-laws-of-language-models-38aea9bc32ca">
<div>
<div>
<h2>Asimov’s 3 laws of Robotics and Sparrow’s 23 laws of Language Models</h2>
<div><h3>Asimov’s famous “Three Laws of Robotics” were created as a safeguard against the potential dangers of sentient robots.</h3></div>
<div><p>medium.com</p></div>
</div>
<div>
<div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*4bK1eO2VU25HSpW7l_gWIg.png)"></div>
</div>
</div>
</a>
</div><p id="985c"><i>If you liked this post, please consider supporting us: 🔔 <b>clap </b>& <b>follow 🔔</b></i></p></article></body>
Connecting Robots and Large Language Models, Google’s SayCan
The convergence of robotics and large language models is a new frontier in the field of artificial intelligence, and it holds immense promise for the future.
You can read more about recent advances in robotics and Large Language Models in my previous blog post.
Today’s blog post revisit a recent research Paper that went unnoticed for some reason. As a matter of fact, with the development of robots like SayCan by Google, we are now seeing the first glimpses of a new generation of robots that can interact naturally with humans and understand high-level, temporally extended instructions.
Paper Title: “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances”
If you like this topic, please consider supporting us: 🔔 clap & follow 🔔
Connecting Language Models to the Physical World!
One of the main challenges in integrating robotics with large language models is providing the necessary contextual grounding that allows the language model to make informed decisions.
The robot acts as the “hands and eyes” of the language model, providing the necessary physical context, while the language model provides high-level semantic knowledge about the task.
A robot working in a restaurant, by author, generated by Midjourney
In other words: Based on its available sensors, the language model should be able to assess the situation, ask the right questions, and act towards resolving any task at hand!
For this experiment, Google relied on a 540-billion parameter language model called PaLM to develop a one of kind butler robot!
Task Oriented Algorithm: Input Analysis and Real Time Decision Making
So what’s new? Can we simply connect outside sensors to an existing language model?
The answer is obviously no. We need to bring together language models and the physical world. And that’s SayCan’s mission!
As a matter if fact, the algorithm SayCan bridges the gap between large language models (LLMs) and robotic systems by providing a way to connect the high-level semantic knowledge represented in LLMs to real-world tasks that can be executed by robots.
This is achieved by combining
the outputs of the LLM
with the output of value functions, which provide a measure of how likely a particular task will succeed from the current state.
SayCan combines LLMs’ output with a new Value Function that evaluates the success of potential actions.
Thus, SayCan brings several advantages over traditional language models.
Firstly, it provides a way to perform high-level natural language instructions in a real-world environment. Traditional language models can interpret and generate natural language text, but they lack the ability to execute those instructions in the real-world. SayCan enables this by using the output of the LLM to inform decision-making and guide the execution of tasks by a robot.
Secondly, SayCan allows robots to operate in a more autonomous and flexible manner. By combining the output of the LLM with value functions, SayCan can select tasks that are both feasible and contextually appropriate, allowing the robot to operate in a more dynamic and adaptive manner.
Finally, SayCan provides a way to make the decision-making process of robots more interpretable and transparent. The visualizations of the decision-making process in SayCan, which show the contribution of both the LLM and the value functions, highlight the interpretability of the approach and help to build trust in autonomous systems.
Experimental Results
In benchmarking experiments, SayCan has demonstrated impressive results in both
an office kitchen,
and a mock office kitchen,
where it successfully planned and executed complex tasks specified by natural language instructions.
For example, when given the task “I spilled my coke, can you bring me something to clean it up?” SayCan planned and executed the steps to find a sponge, pick it up, bring it to the user, and complete the task.
Given the task “I spilled my coke, can you bring me something to clean it up?”, SayCan successfully planned and executed the following steps 1. Find a sponge 2. Pick up the sponge 3. Bring it to you 4. Done.
Similarly, when given the slightly different task “I spilled my coke, can you bring me a replacement,” SayCan successfully planned and executed the steps to find a coke can, pick it up, bring it to the user, and complete the task.
To be fair, the robot is still slow but for a first experiment, it does look promising!
This is still an early experiment and you can follow its updates here.
Conclusion
The convergence of robotics and large language models is a key development in the field of artificial intelligence, and it has the potential to revolutionize the way robots interact with humans.
With algorithms like SayCan, we are now seeing the first glimpses of a new generation of robots that can understand high-level, temporally extended instructions and complete complex tasks in real-world environments.
The future of robotics and AI is exciting, and it will be fascinating to see what other breakthroughs lie ahead.
That being said, its range of applications are very wide, and can potentially be dangerous. It’s important that we think about safe guards limiting robots’ ability to harm humans.
Read more about robots and LLMs as well as potential safe guards to protect humans: