Not unlike J.A.R.V.I.S. from Iron Man, Josh is a voice activated personal assistant (focused on the smart home). One of the things we aspire to is a natural flow of communication between users and the system. In a perfect world, you should never have to stress about whether or not Josh understands you. However, let’s stop for a minute and think about what that actually means. What does it mean to have a truly conversational system? What does it mean to understand language?
Hollywood is all smoke and mirrors
First things first, let’s go back to the classic example of HAL 9000. In the infamous scene where Dave tries to open the pod bay doors, HAL famously responds with “I’m sorry Dave, I’m afraid I can’t do that.” Movies have a habit of making things seem far more simple than they actually are, so let’s break this down just a little bit. It seems that forming a simple sentence should be pretty simple, right? Not exactly.
In this situation, HAL needs to understand the concept of open, that is, to split the doors and HAL needs to understand which doors Dave is referring to.
Simple is always complicated
If you’re familiar with any type of software development, this can be imagined to be the same as:
open(podbay_doors)
Where open() is a function that takes in an instance of a door, and then can open that door, probably by calling an .open() method on the door object. This means that HAL knew that the word open had to map to the function open. Okay, cool, that’s not bad. You can maybe just look for words that are known to be an action, then use some cute reflection to call a method with the same name as the action uttered. Then, HAL had to know that the pod bay doors were what to open. Thinking about context, pod bay doors could be obvious because they were on a space ship. In truth it isn’t, because there are multiple doors that could have all been what Dave wanted, but let’s just say that it is actually that simple. Hold on though, there’s more. HAL had to respond to Dave. This is where the magic happens.
Firstly, humans ourselves are notoriously bad at forming grammatically correct sentences. We drop words, we assume context, we back track. Building a coherent sentence is no easy task, but HAL doesn’t just stop at that. HAL starts by saying “I’m sorry, Dave.” HAL actually apologizes ahead of time, because HAL knows that Dave won’t like what follows. This means that HAL had to understand the emotions that Dave would go through after hearing the sentence that HAL hasn’t even said yet. Not only is that impressive, but HAL was actually polite enough to think to apologize ahead of time.
Then he delivers the rest of the famous quote: “I’m afraid I can’t do that.” Again, HAL doesn’t just respond with a simple “no”. HAL actually makes it explicit that he’s “afraid.” I mean, he’s completely mocking Dave, but he knows to hide his mockery in the form of a polite phrase by saying “I’m afraid.” Lastly, HAL actually has the gall to say he “can’t” do that. Not that he won’t do that, but that he can’t do that. Now, most of us use “can” improperly, because we don’t actually know that we should use “will.” Being an advanced intelligence, HAL knows that “won’t” would be the proper thing to say. HAL chooses to use “can’t” as if someone else is giving him the permission, but at this point Dave knows, you know, I know, we all know HAL is the one in charge. It’s a double mock of Dave, and HAL is just being snide.
For many people, HAL 9000 was the first real time people were exposed to the idea of an artificial intelligence. Even today, it’s still a fantastic example of such. There are more recent ones like J.A.R.V.I.S. in the modern Ironman movies. There’s also the female AI from the movie Her, named Samantha. Maybe one of my favorite AIs is GLaDOS from the Portal franchise of video games. All of these systems display the fact that they can sympathize, which is just a mind blowing concept.
More than just words
So, what does it mean to understand language? Well, simply put, it means that you can understand the relationship between words and the objects they refer to. Then the next question is, what does it take to understand these relationships? It requires what we call “world knowledge”. The words only represent concepts, but a system that can understand language needs to understand what these concepts mean.
A system developed by Google called Word2Vec has a math like function with words that works to fairly good success. For example:
king - man + woman = ?
Well, a king can be considered a male in a seat of power. What do we call the female in the seat of power? Word2Vec will tell you the answer is “queen”, and in my opinion that’s a pretty good answer.
Obtaining world knowledge means that we need to be able to identify and segment what information we receive so that we can properly bin it into categories in our brain. There are many ways for us to obtain this world information as well. We take in data through all of our senses, and computers don’t have access to this type of data the same way we do.
A great quote on the subject was uttered by one of the world leaders in deep learning, Yann Lecun. In an AMA on Reddit, he was asked the following question:
“How would you rank the real challenges/bottlenecks in engineering an intelligent ‘OS’ like the one demonstrated in the movie ‘Her’ … given current challenges in audio processing, NLP, cognitive computing, machine learning, transfer learning, conversational AI, affective computing .. etc. (i don’t even know if the bottlenecks are in these fields or something else completely). What are your thoughts?”
His answer in full can be found here, but here is a condensed version:
“Something like the intelligent agent in “Her” is totally out of reach of current technology…
…The agent in Her has a deep understanding of human behavior and human nature. It’s going to take quite a while before we build machines that can do that…
…If I say “John is walking out the door”, we build a mental picture of the scene that allows us to say that John is no-longer in the room, that we are probably seeing his back, that we are in a room with a door, and that “walking out the door” doesn’t mean the same thing as “walking out the dog”. This mental picture of the world and the event is what allows us to reason, predict, answer questions, and hold intelligent dialogs…”
Understanding language is more than knowing the definitions of words. To understand language is to understand the world. We use language as a means of communicating ideas of the world to each other and to ourselves. For a computer system to truly understand what we want, it’ll need to experience the same events we do. It needs to understand emotion, it needs to know what it’s like to have our senses, and it needs to learn to speak and communicate organically the way we do. Truly understanding language may in fact mean that computers need to grow and develop the same way humans do. All I know is that although we’re nowhere near that, with the breakthroughs in AI and other fields, we’re realizing that maybe it’s not an impossible task.
This post was written by Aaron at Josh.ai. Previously, Aaron worked at Northrop Grumman before joining the Josh team where he works on natural language programming (NLP) and artificial intelligence (AI). Aaron is a skilled YoYo expert, loves video games and music, has been programming since middle school and just turned 21.
Josh.ai is an artificial intelligence agent for your home. If you’re interested in learning more, visit us at https://josh.ai.