avatarEric Saund

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

6329

Abstract

ull of milk. You never would have even entertained that extraneous sawdust proposition if it had not been suggested, but when it was raised, the answer came to mind immediately — No! Sawdust is nothing like milk.</li></ul><p id="9f0e">The threads of Artificial Intelligence research tend to specialize in one or another of the three pillars, and sometimes, build bridges across them. Most recently, the most striking breakthroughs have occurred in the Pattern Recognition pillar. <a href="https://en.wikipedia.org/wiki/Deep_learning">Deep Learning</a> is a type of so-called <a href="https://en.wikipedia.org/wiki/Artificial_neural_network">Artificial Neural Network</a> technology that has notably revolutionized the fields of Computer Vision, Speech Recognition, and Natural Language Processing. As well, Artificial Neural Network methods impact the Knowledge pillar by bringing “soft” or “fuzzy” representations, achieved by distributing numeric values across vectors of feature attributes.</p><p id="5ce9">Artificial Intelligence has not however been able to unify the three Pillars of Intelligence under an over-arching Cognitive Architecture. The human mind effortlessly invokes each pillar of intelligence in coordination with the others, as needed. When I say, “a glass full of milk spilled,” your mind automatically connects the word sequence to Knowledge — perhaps a visual image of a glass of milk. A Reasoning step triggers in the form of a mental simulation of the glass tipping, and consequently the liquid contained therein flowing over the rim. In the course of your own Reasoning, the apparatus of Pattern Matching and Knowledge both contribute. You know milk and cream to be similar sorts of things, and that therefore they spill in the same manner. By contrast, sawdust and sugar are not liquids, but instead similar in their granular properties; they flow and land differently when poured, more similarly to each other than to any liquid. For which would a paper towel be in order, and for which a broom? To you, the implications of liquid versus granular are immediately clear. No AI can do this today.</p><h2 id="550c">Architecture of a Conversational Agent</h2><p id="df50">Short of this depth, today’s conversational agents nonetheless display remarkable abilities to answer even obscure questions. This is due to a specifically designed partnership between two of the Pillars of Intelligence, Knowledge and Pattern Matching. We’ll focus on a knowledge representation known as a Knowledge Graph, and the Pattern Recognition component known as Entity/Intent Recognition.</p><p id="1fa7">The architecture of a conversational agent looks like this.</p><figure id="3dfc"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*kkfOHr44QJP9meAxDni_mA.png"><figcaption>The Architecture of a Conversational Agent forms a Perception/Action Loop</figcaption></figure><p id="226a">Question Answering and other types of dialogue by a Conversational Agent occur via a <i>Perception/Action Loop </i>architecture<i>. </i>In its basic form, this is a series of five computing steps. The starting point is an acoustic waveform picked up by a microphone when a user speaks a question or command.</p><ol><li>A computing module called <i>Automatic Speech Recognition</i> (ASR) converts the waveform signal to representations for words.</li><li>A Natural Language Processing module interprets the words into an internal computer “language” called a <i>Logical Form</i>. The Logical Form represents the meaning of the question in a unified way across different possible phrasings people might use to ask the same thing.</li><li>A Dialog Manager module is responsible for receiving the Logical Form and deciding how to respond. It is in the Dialog Manager that possible answers to the question are searched for, and a response is formulated. The output is generally itself a Logical Form.</li><li>Natural Language Generation turns the answer-bearing Logical Form back into a word sequence in the human language it was asked in.</li><li>A Text-to-Speech module synthesizes an acoustic signal using some trained parameters defining vocal quality and intonation. This results in an output waveform that is sent to the speakers.</li></ol><p id="e772">For decades, a huge obstacle to conversational agents lay in the Automatic Speech Recognition module. If the words spoken by the user are transmitted incorrectly to the NLP module, then there could be no hope of delivering a correct answer. The past 15 years have seen explosive improvement in ASR due to advances in Machine Learning algorithms, computing power, and available data sets for training the algorithms.</p><h2 id="6edc">Knowledge Graph</h2><p id="5cf4">These days, the cutting edge of intelligent conversational agents resides in the Natural Language Processing and Dialog Manager modules. First let’s consider how large collections of facts can be represented in a knowledge representation called a <i>knowledge graph</i>. Then we’ll see how Natural Language Processing constructs a Logical Form from a user’s query, to look up answers in a knowledge graph.</p><figure id="fdb4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*xC_az782o61VzeKQLeGWHQ.png"><figcaption>A small portion of a large knowledge graph</figcaption></figure><p id="4bdb">A graph consists of nodes, and links connecting nodes. In a knowledge graph, nodes represent things in the world, or <i>entities</i>, and links represent relations among the entities. Entities can be concrete or abstract. The figure above shows a small section of a much larger graph. This section represents the fact that <b>Leonard Nimoy</b> was a <b>Person</b>, that he played the role of <b>Spock</b>, that <b>Spock</b> is a <b>Character</b> in <b>Star Trek</b>, and that <b>Star Trek</b> is a <b>TV Series</b>.</p><p id="1495">Knowledge graphs can have different rules and design parameters in the way nodes and links are used. In some knowledge graphs, entities come in two flavors, <i>type</i> entities (green) which are classes of things, and <i>token </i>entities (blue), which are particular instances. The <b>Leonard Nimoy</b> node is a token, an instance of the type, <b>Person</b>. In some knowledge graphs, entities are organized hierarchically, so that for example, <b>TV Series<

Options

/b> could be a subtype of the class, <b>Entertainment Genre</b>. Some knowledge graphs define a fixed set of link/relation types, while others are open-ended.</p><p id="9868">These rules and design parameters are the <i>ontology</i> of the knowledge graph. Ontology means, “the nature of being,” and the word comes from a branch of philosophy, epistemology, which studies the nature of knowledge. How does knowledge in the abstract stand in relation to the real world? How do agents acquire knowledge, maintain knowledge in the face of changes, and invest trust in its correctness? The history of AI research has revealed that a great deal hinges on design decisions about knowledge ontologies. For example, if a knowledge graph allows links to take any arbitrary labels, then how can it be discovered that two links are equivalent, or contradictory? On purely practical grounds, hardened technical engineers have come to greatly appreciate and respect the wisdom that philosophers bring to the table when it comes to designing knowledge ontologies.</p><p id="0804">Dozens of knowledge graphs have been constructed over the years according to an array of more or less tightly constrained knowledge ontologies. A number of them are very big and put to heavy use today, complete with APIs (Application Programming Interfaces) providing access to application developers. Knowledge content (actual nodes and links) is added by various combinations of hand curation and automatic harvesting from text found in Wikipedia, newspaper articles, and other online sources.</p><p id="6626">A knowledge graph enables direct answers to questions like, <b>“Who played Spock in Star Trek?”</b> To answer this question, it first has to be transformed to a Logical Form representation that can address entities and relations. This job is performed by the NLP module, which, in conjunction with ASR, functions as the Pattern Matching pillar of a conversational agent.</p><p id="c815">Natural Language Processing is a practical application of Computational Linguistics. This field also aims for the more ambitious goal of Natural Language <i>Understanding</i>, but this term is somewhat of an overreach because what we conventionally mean by “understanding” brings together Knowledge, Pattern Matching, and Reasoning to a degree that today’s AI simply does not approach.</p><p id="ea25">The initial steps of Natural Language Processing involve classifying words in terms of their grammatical parts of speech, and known entity types. The word, “who”, is a question pronoun, while “in” is a preposition that connotes containment or membership. “Spock” and “Star Trek” are labeled as known entity names as found in pre-compiled entity name lists.</p><figure id="13a0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*78SygKH6z0Ficsroyaqu4Q.png"><figcaption>Entity/Intent Extraction converts a natural language question to a Logical Form query.</figcaption></figure><p id="3763">Given this cataloging of the words, a processing step known as Entity/Intent Extraction attempts to classify the utterance in terms of its purpose (question, command, statement, etc.) and then assign known and unknown entities associated with that purpose. Entity/Intent Extraction analyzes the syntactic structure of the utterance, which is dependent on word order. These days, Machine Learning methods hold sway in the detailed steps for how this is carried out. Every syntactic pattern maps to a Logical Form template. The example shows one way that a Logical Form can be written out. If you can read the nested parentheses of computerese, then you can puzzle out for yourself how the original question gets transformed. Finally, the actual entities and relations from the original word sequence fill in the template, resulting in a final Logical Form that represents what the user asked, but now in a highly structured format.</p><p id="d2df">Conveniently, the Logical Form is itself equivalent to a graph. This is called a <i>query graph</i>. The example query graph looks a lot like the part of the knowledge graph pertaining to <b>Spock</b> and <b>Star Trek</b>, except the <b>Person</b> instance node has a question mark. That is the unknown value the user wants to know.</p><figure id="c3bb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*yZDNqoDSx0RBKBYo4rr2Vg.png"><figcaption>Subgraph matching finds a part of the knowledge graph that aligns with the query.</figcaption></figure><p id="b30e">From here, a well-known method in computer science is applied, <i>subgraph matching</i>. That simply means, finding a portion of the huge knowledge graph that aligns with the specified nodes and links of the query graph. The unknown <b>Person</b> instance node is a wildcard variable that will match to anything in the knowledge graph. When the subgraph match is completed, then this node is found to correspond with the <b>Leonard Nimoy</b> node in the knowledge graph, and presto, an answer can be filled in and returned.</p><p id="24b1">An inverse process converts the filled-in query graph back to a sentence in Natural Language, this time as a statement instead of a question. And via the Text-to-Speech module, the agent presents its brilliance with pride and joy. Well, such mechanical pride and joy as a computer can muster. The engineers and their sponsors harvest the credit.</p><p id="c466">The Knowledge Graph is just one of several kinds of structured data representation that can be used to store information for natural language queries. Relational databases are preferred when large numbers of data items have the same sets of attributes. But like knowledge graphs, all data organizations provide interfaces for access by Logical Form queries. In <a href="https://readmedium.com/reverse-engineering-conversational-agents-a5c43c116c3c">Part IV</a> we’ll attempt to track down where Alexa’s knowledge about Star Trek and other television series comes from.</p><p id="b1ca">In <a href="https://readmedium.com/reverse-engineering-conversational-agents-a5c43c116c3c">Part IV</a> we play with a question-answering Conversational Agent <a href="https://readmedium.com/reverse-engineering-conversational-agents-a5c43c116c3c"><b>Click here to read Part IV: Reverse-Engineering Conversational Agents</b></a></p></article></body>

How Do Conversational Agents Know So Much?

Jibo, Echo/Alexa, Google Home

This is Part III of a four-part series: How to Be a Robot Psychologist

Part I: Why Robot Psychology? Part II: Human and Robot Psychology and Cognition Part III: How Do Conversational Agents Know So Much? Part IV: Reverse-Engineering Conversational Agents

The Three Pillars of Intelligence

To Amazon, the reception for its voice agent, Alexa, was a big surprise. Apple’s Siri had put voice input onto smartphones. But here was a new class of device that you could shout at across the kitchen to set a timer, play music, or look up facts on the internet. The culture adopted Alexa with an avalanche of jokes and memes. Google soon followed with its own smart speaker called Google Home. And the startup, Jibo, offered similar skills through its own AI technology and incorporation of third party data sources, but wrapped in a fun character with a unique swiveling body form.

These conversational agents result from many millions of dollars of investment and the labor of hundreds of the smartest researchers and developers in academia and industry. They represent the current pinnacle of scientific and engineering accomplishment in the human endeavor to create artificial beings that augment and magnify our own brainpower. Indeed, Conversational AI agents are remarkably capable. Yet they are very dumb. Here, Part III of this series explains how AI can bring us instant access to vast knowledge. Then in Part IV we play robot psychologist to explore the ways in which AI agents are limited in their abilities to carry out even simple conversations. The key concept at play is Cognitive Architecture, which was introduced in Part II.

We can ask, “Alexa, who won the 1934 world series?”, and immediately it responds, “The Saint Louis Cardinals beat the Detroit Tigers 4 to 3 in the 1934 World Series.” If you are lucky enough to live near a library, it would take you half an hour to travel there and find a reference book containing this fact. That’s if the library is open. By any measure, that is remarkable, intelligent behavior, and useful to boot.

Intelligence, however, is not a single thing. On the one hand, we recognize that people have different kinds of intelligence, verbal, visuo-spatial, musical, emotional, social. But in the fields of Cognitive Science and Artificial Intelligence, intelligence decomposes in a different way.

We can call these the three Pillars of Intelligence.

The Three Pillars of Intelligence
  • Knowledge refers broadly to facts, data, skills, procedures, and beliefs organized in such fashion that they can be looked up and accessed when needed. A major area of study in Artificial Intelligence and Cognitive Science is called knowledge representation. This is about the organization and expression of knowledge in computer data structures, along with the computing operations that function over them. While we colloquially distinguish knowledge from beliefs by virtue of whether they are true or not, in the fields of computational intelligence, both correct and incorrect assertions are regarded under the same rubric, “knowledge.” Evidence for and against, degree of confidence, and relations to grounded facts about the external world are all considered additional attributes that attach into knowledge representations.
  • Pattern Matching is about generalizing across specific cues and instances of data. The importance of pattern matching is most apparent when we consider visual scenes or spoken words. At a detailed signal level of pixels, our eyes never see exactly the same scene twice. Always something is different, maybe the lighting, the point of view, the focus of our eyes. Similarly, at the level of acoustic waveforms, we never hear exactly the same audio signal twice. Even if we play a recording over and over, something changes, maybe the position of our head with respect to the speakers, a faint car horn in the distance, the arrangement of pillows slightly alters the acoustics of the room. Our brains are designed to factor apart irrelevant differences and distill out commonalities, so we can in fact recognize the same visual scene or spoken sentence — in terms of objects and words — at different times. The principle extends to more abstract concepts and ideas as well. We are able to recognize the same enumerated arguments for why the Beatles were the greatest band in history, even if expressed by different music historians in different words.
  • Reasoning is the ability to take some explicitly stated assertions and knowledge, and derive new assertions. The common notions of logical inference and deduction are important aspects of reasoning. But the concept is broader. Reasoning extends also to exploration of alternative outcomes by the application of different workflow steps, for example, contemplating the most efficient sequence of motions to unload the dishwasher. The remarkable thing about human reasoning is that it kicks in automatically just when needed. Consider: “A glass full of milk got knocked over and spilled on the floor.” So, was the glass full of sawdust? No! Of course not. I just said it was a glass full of milk. You never would have even entertained that extraneous sawdust proposition if it had not been suggested, but when it was raised, the answer came to mind immediately — No! Sawdust is nothing like milk.

The threads of Artificial Intelligence research tend to specialize in one or another of the three pillars, and sometimes, build bridges across them. Most recently, the most striking breakthroughs have occurred in the Pattern Recognition pillar. Deep Learning is a type of so-called Artificial Neural Network technology that has notably revolutionized the fields of Computer Vision, Speech Recognition, and Natural Language Processing. As well, Artificial Neural Network methods impact the Knowledge pillar by bringing “soft” or “fuzzy” representations, achieved by distributing numeric values across vectors of feature attributes.

Artificial Intelligence has not however been able to unify the three Pillars of Intelligence under an over-arching Cognitive Architecture. The human mind effortlessly invokes each pillar of intelligence in coordination with the others, as needed. When I say, “a glass full of milk spilled,” your mind automatically connects the word sequence to Knowledge — perhaps a visual image of a glass of milk. A Reasoning step triggers in the form of a mental simulation of the glass tipping, and consequently the liquid contained therein flowing over the rim. In the course of your own Reasoning, the apparatus of Pattern Matching and Knowledge both contribute. You know milk and cream to be similar sorts of things, and that therefore they spill in the same manner. By contrast, sawdust and sugar are not liquids, but instead similar in their granular properties; they flow and land differently when poured, more similarly to each other than to any liquid. For which would a paper towel be in order, and for which a broom? To you, the implications of liquid versus granular are immediately clear. No AI can do this today.

Architecture of a Conversational Agent

Short of this depth, today’s conversational agents nonetheless display remarkable abilities to answer even obscure questions. This is due to a specifically designed partnership between two of the Pillars of Intelligence, Knowledge and Pattern Matching. We’ll focus on a knowledge representation known as a Knowledge Graph, and the Pattern Recognition component known as Entity/Intent Recognition.

The architecture of a conversational agent looks like this.

The Architecture of a Conversational Agent forms a Perception/Action Loop

Question Answering and other types of dialogue by a Conversational Agent occur via a Perception/Action Loop architecture. In its basic form, this is a series of five computing steps. The starting point is an acoustic waveform picked up by a microphone when a user speaks a question or command.

  1. A computing module called Automatic Speech Recognition (ASR) converts the waveform signal to representations for words.
  2. A Natural Language Processing module interprets the words into an internal computer “language” called a Logical Form. The Logical Form represents the meaning of the question in a unified way across different possible phrasings people might use to ask the same thing.
  3. A Dialog Manager module is responsible for receiving the Logical Form and deciding how to respond. It is in the Dialog Manager that possible answers to the question are searched for, and a response is formulated. The output is generally itself a Logical Form.
  4. Natural Language Generation turns the answer-bearing Logical Form back into a word sequence in the human language it was asked in.
  5. A Text-to-Speech module synthesizes an acoustic signal using some trained parameters defining vocal quality and intonation. This results in an output waveform that is sent to the speakers.

For decades, a huge obstacle to conversational agents lay in the Automatic Speech Recognition module. If the words spoken by the user are transmitted incorrectly to the NLP module, then there could be no hope of delivering a correct answer. The past 15 years have seen explosive improvement in ASR due to advances in Machine Learning algorithms, computing power, and available data sets for training the algorithms.

Knowledge Graph

These days, the cutting edge of intelligent conversational agents resides in the Natural Language Processing and Dialog Manager modules. First let’s consider how large collections of facts can be represented in a knowledge representation called a knowledge graph. Then we’ll see how Natural Language Processing constructs a Logical Form from a user’s query, to look up answers in a knowledge graph.

A small portion of a large knowledge graph

A graph consists of nodes, and links connecting nodes. In a knowledge graph, nodes represent things in the world, or entities, and links represent relations among the entities. Entities can be concrete or abstract. The figure above shows a small section of a much larger graph. This section represents the fact that Leonard Nimoy was a Person, that he played the role of Spock, that Spock is a Character in Star Trek, and that Star Trek is a TV Series.

Knowledge graphs can have different rules and design parameters in the way nodes and links are used. In some knowledge graphs, entities come in two flavors, type entities (green) which are classes of things, and token entities (blue), which are particular instances. The Leonard Nimoy node is a token, an instance of the type, Person. In some knowledge graphs, entities are organized hierarchically, so that for example, TV Series could be a subtype of the class, Entertainment Genre. Some knowledge graphs define a fixed set of link/relation types, while others are open-ended.

These rules and design parameters are the ontology of the knowledge graph. Ontology means, “the nature of being,” and the word comes from a branch of philosophy, epistemology, which studies the nature of knowledge. How does knowledge in the abstract stand in relation to the real world? How do agents acquire knowledge, maintain knowledge in the face of changes, and invest trust in its correctness? The history of AI research has revealed that a great deal hinges on design decisions about knowledge ontologies. For example, if a knowledge graph allows links to take any arbitrary labels, then how can it be discovered that two links are equivalent, or contradictory? On purely practical grounds, hardened technical engineers have come to greatly appreciate and respect the wisdom that philosophers bring to the table when it comes to designing knowledge ontologies.

Dozens of knowledge graphs have been constructed over the years according to an array of more or less tightly constrained knowledge ontologies. A number of them are very big and put to heavy use today, complete with APIs (Application Programming Interfaces) providing access to application developers. Knowledge content (actual nodes and links) is added by various combinations of hand curation and automatic harvesting from text found in Wikipedia, newspaper articles, and other online sources.

A knowledge graph enables direct answers to questions like, “Who played Spock in Star Trek?” To answer this question, it first has to be transformed to a Logical Form representation that can address entities and relations. This job is performed by the NLP module, which, in conjunction with ASR, functions as the Pattern Matching pillar of a conversational agent.

Natural Language Processing is a practical application of Computational Linguistics. This field also aims for the more ambitious goal of Natural Language Understanding, but this term is somewhat of an overreach because what we conventionally mean by “understanding” brings together Knowledge, Pattern Matching, and Reasoning to a degree that today’s AI simply does not approach.

The initial steps of Natural Language Processing involve classifying words in terms of their grammatical parts of speech, and known entity types. The word, “who”, is a question pronoun, while “in” is a preposition that connotes containment or membership. “Spock” and “Star Trek” are labeled as known entity names as found in pre-compiled entity name lists.

Entity/Intent Extraction converts a natural language question to a Logical Form query.

Given this cataloging of the words, a processing step known as Entity/Intent Extraction attempts to classify the utterance in terms of its purpose (question, command, statement, etc.) and then assign known and unknown entities associated with that purpose. Entity/Intent Extraction analyzes the syntactic structure of the utterance, which is dependent on word order. These days, Machine Learning methods hold sway in the detailed steps for how this is carried out. Every syntactic pattern maps to a Logical Form template. The example shows one way that a Logical Form can be written out. If you can read the nested parentheses of computerese, then you can puzzle out for yourself how the original question gets transformed. Finally, the actual entities and relations from the original word sequence fill in the template, resulting in a final Logical Form that represents what the user asked, but now in a highly structured format.

Conveniently, the Logical Form is itself equivalent to a graph. This is called a query graph. The example query graph looks a lot like the part of the knowledge graph pertaining to Spock and Star Trek, except the Person instance node has a question mark. That is the unknown value the user wants to know.

Subgraph matching finds a part of the knowledge graph that aligns with the query.

From here, a well-known method in computer science is applied, subgraph matching. That simply means, finding a portion of the huge knowledge graph that aligns with the specified nodes and links of the query graph. The unknown Person instance node is a wildcard variable that will match to anything in the knowledge graph. When the subgraph match is completed, then this node is found to correspond with the Leonard Nimoy node in the knowledge graph, and presto, an answer can be filled in and returned.

An inverse process converts the filled-in query graph back to a sentence in Natural Language, this time as a statement instead of a question. And via the Text-to-Speech module, the agent presents its brilliance with pride and joy. Well, such mechanical pride and joy as a computer can muster. The engineers and their sponsors harvest the credit.

The Knowledge Graph is just one of several kinds of structured data representation that can be used to store information for natural language queries. Relational databases are preferred when large numbers of data items have the same sets of attributes. But like knowledge graphs, all data organizations provide interfaces for access by Logical Form queries. In Part IV we’ll attempt to track down where Alexa’s knowledge about Star Trek and other television series comes from.

In Part IV we play with a question-answering Conversational Agent Click here to read Part IV: Reverse-Engineering Conversational Agents

Artificial Intelligence
Conversational Agents
Cognitive Architecture
Alexa
Google Home
Recommended from ReadMedium