In this episode Byron and Jakob Uszkoreit discuss machine learning, deep learning, AGI, and what this could mean for the future of humanity.
Jakob Uszkoreit has a masters degree in Computer Science and Mathematics from Technische Universität Berlin. Jakob has also worked at Google for the past 10 years currently in deep learning research with Google Brain.
Byron Reese: This is Voices in AI, brought to you by GigaOm. I’m Byron Reese. Today our guest is Jakob Uszkoreit, he is a researcher at Google Brain, and that’s kind of all you have to say at this point. Welcome to the show, Jakob.
Let's start with my standard question which is: What is artificial intelligence, and what is intelligence, if you want to start there, and why is it artificial?
Jakob Uszkoreit: Hi, thanks for having me. Let's start with artificial intelligence specifically. I don't think I'm necessarily the best person to answer the question what intelligence is in general, but I think for artificial intelligence, there's possibly two different kind of ideas that we might be referring to with that phrase.
One is kind of the scientific or the group of directions of scientific research, including things like machine learning, but also other related disciplines that people commonly refer to with the term 'artificial intelligence.' But I think there's this other maybe more important use of the phrase that has become much more common in this age of the rise of AI if you want to call it that, and that is what society interprets that term to mean. I think largely what society might think when they hear the term artificial intelligence, is actually automation, in a very general way, and maybe more specifically, automation where the process of automating [something] requires the machine or the machines doing so to make decisions that are highly dynamic in response to their environment and in our ideas or in our conceptualization of those processes, require something like human intelligence.
So, I really think it's actually something that doesn't necessarily, in the eyes of the public, have that much to do with intelligence, per se. It's more the idea of automating things that at least so far, only humans could do, and the hypothesized reason for that is that only humans possess this ephemeral thing of intelligence.
Do you think it's a problem that a cat food dish that refills itself when it's empty, you could say has a rudimentary AI, and you can say Westworld is populated with AIs, and those things are so vastly different, and they're not even really on a continuum, are they? A general intelligence isn’t just a better narrow intelligence, or is it?
So I think that's a very interesting question. Whether basically improving and slowly generalizing or expanding the capabilities of narrow intelligences, will eventually get us there, and if I had to venture a guess, I would say that's quite likely actually. That said, I'm definitely not the right person to answer that. I do think that guesses, that aspects of things are today still in the realms of philosophy and extremely hypothetical.
But the one trick that we have gotten good at recently that's given us things like AlphaZero, is machine learning, right? And it is itself a very narrow thing. It basically has one core assumption, which is the future is like the past. And for many things it is: what a dog looks like in the future, is what a dog looked like yesterday. But, one has to ask the question, “How much of life is actually like that?” Do you have an opinion on that?
Yeah so I think that machine learning is actually evolving rapidly from the initial classic idea of basically trying to predict the future just in the past, and not just the past as a kind of encapsulated version of the past. So it's basically a snapshot captured in this fixed static data set. You expose machines to that, you allow it to learn from that, train on that, whatever you want to call it, and then you evaluate how the resulting model or machine or network does in the wild or on some evaluation tasks, and tests that you've prepared for it.
It's evolving from that classic definition towards something that is quite a bit more dynamic, that is starting to incorporate learning in situ, learning kind of "on the job," learning from very different kinds of supervision, where some of it might be encapsulated by data sets, but some might be given to the machine through somewhat more high level interactions, maybe even through language. There is at least a bunch of lines of research attempting that. Also quite importantly, we're starting slowly but surely to employ machine learning in ways where the machine's actions actually have an impact on the world, from which the machine then keeps learning. I think that that's actually something [for which] all of these parts are necessary ingredients, if we ever want to have narrow intelligences, that maybe have a chance of getting more general. Maybe then in the more distant future, might even be bolted together into somewhat more general artificial intelligence.
Two years ago when AlphaGo played Lee Sedol, where were you? What were you doing when that was going on? Do you remember?
I was traveling actually, which was interesting because I only got the results with quite a bit of delay, but to be quite frank, I actually wasn't very surprised to see the results of the game. It was interesting to then see it actually happen, and really to also follow up on the media's reaction, the Go community's reaction to this event, but it wasn't like I was on the edge of my seat, very uncertain about the outcome.
And in Game 2 there was the legendary Move 37 which was the first time people started talking about AlphaGo and AI for that matter being creative, that this was a creative move. Do you believe that AlphaGo, just as an example, is creative, or can [it] only mimic creativity? Or there's no difference in those two statements?
I think that really depends on the actual definition of creativity that you want to use. If by creativity you mean, basically break out of patterns that have been observed or that have even ever occurred before, by virtue of having "understood" the dynamics of, in this case, a game, then I would say yes, that was something that was kind of an act of creativity. That said, the way humans often times conceptualize creativity, is something much more closely related to the human condition, and to expressing things about the human condition.
Creativity as you find it in say, music or the arts overall, I think is something that is quite different and requires a much deeper, and maybe even a real understanding of the human condition to be performed, and that it certainly was not. So I do think that for some definition, it definitely broke out of the established pattern so much that the human experts observing it were taken aback and thought they'd seen something that was novel, truly novel. And if you define that as creative, then yes it certainly was.
I'm going to misquote Moby Dick here, because I'm doing this from memory, but there's a passage in there, and it goes like this: "and he piled forth on the whale's white hump, the sum of all his rage and fury. If his chest had been a cannon, he would have fired his heart upon it." Now when you think of a passage like that, it's got metaphor, it's visual and it's vivid and it's emotional.
Does your analytical mind think, yeah a computer could do that, a computer could write that? If we gave it the corpus of the internet, and it could find out the kinds of metaphors that worked well, and applied them in the right situations and all that, do you think, in theory a computer could write that and is that going to be a general intelligence, or is that like a pretty simple thing?
So this actually gets us to a very interesting question that I've debated a lot throughout my career, because I have worked a lot on natural language understanding, and natural language generation systems. Just like a million monkeys let loose on typewriters will eventually produce Shakespeare, surely, especially given very powerful statistical models of language, as we have them today, it is conceivable that a machine could generate sentences or phrases that sound similar, that maybe hint at a similar command of English.
But the act of doing so I think is still fundamentally different and from a human author writing the sentence and the reason is at least today's models of language have absolutely no connection to the extremely rich other aspects of the human experience. They are extremely smart in quotes, extremely advanced pattern matchers and pattern recognizers, and they use these patterns and recompose them in ways that seem somewhat likely according to previous observations, in order to generate stuff, but they cannot possibly relate to the depths of human emotions and the human experience and why that might be triggered by this specific situation as depicted in that book.
So on one hand I think it is conceivable that we can build machines in the short term that can fake certain things like this, and one approach could be that they find existing outputs of human creativity and then they modify it slightly, maybe even under guidance of a human, and then generate variants, but for them to have an experience or to be able to imagine an experience and then describe that in such a way, I think that is quite far off.
And why do you think it is, anytime I see a chatbot that purports to try to pass the Turing test—those are all constrained and all of that—any time I come across any of these, I type the same question: “What's bigger, a nickel or the sun?” and I never found a system that would answer that question. Why is that so hard?
It's difficult because these machines, first of all, [still have] nowhere near the computational power to even simulate, let alone in real time, the computational processes that are happening in human brains these days. But even if we were to have that computational power, today's machine learning paradigm is used to train existing models for chatbots or a similar application, [and they] are just not exposed to sufficiently rich stimuli. All they typically at least ever see is text scans and static data sets. They cannot possibly, for example, experiment with actions they might take in an environment, and see how the environment reacts in order to understand causality as opposed to correlation. Nor do they hope to experience the rich perception of the world that humans have.
A funny kind of aspect of that is that humans also in their perception of the environment are extremely specific. That is you think about, the senses we use in order to understand or to even just take in the world, are really quite specific, and in a certain sense quite arbitrarily chosen. The fact that we have vision as our primary sense, and then we can also feel slight vibrations in the fluid medium around us that shapes our experience massively, but is actually somewhat arbitrary. I believe that until we have machines that we expose to a similarly broad range and similar also range of stimuli, they will not be able to really understand what humans talk about when they write such a sentence. Even once we do, there still is a question of, “Will we have the compute, will we have the methodology to build models that get there?” But certainly without exposing them to it, we will not get there in my opinion.
Let's talk a little bit, if we can, about transfer learning. You know, a human you can train with a sample size of one, right? You could show somebody a photograph of a mythical creature and say, "find all these mythical creatures, find all occurrences of that in all these photos and if it's upside down, inverted, different color, covered in peanut butter, whatever, people say, “Ah, there it is, there it is, there it is...” Do you think that's the secret to a general intelligence? Like that's 80% of the problem. Whatever mechanism we use to do that, is how we take data from... why we don't have to be trained on everything? Or do you think that's just one component of many that we're going to have to crack, and do you have any insights on how we do that trick?
So, I don't have a particularly deep insight on how humans do that trick. I do think that being able to do it is in a certain sense, a necessary condition. Is it sufficient? I'm not so sure, but you can in some of these "transfer learning problems," you can actually wrap up, or require a large number of different aspects of intelligence in order to solve it, and I think, it's actually less about the transfer.
Or maybe, the transfer plays and important role, but one of the interesting things that is often under-emphasized I believe, is the fact that, even though you called it zero shot or single shot learning, when you actually describe this task, there was no training, there was no preparation. Instead, you actually said a bunch of sentences to me in order to describe that task, so technically what really happened is you said a bunch of sentences to me that I understood to the extent that I created a simulation in my head, where I could imagine these pictures, even though I've never actually seen a picture of a mythical creature covered in peanut butter, I imagined that and I imagined the idea of having to count them, and then I can effectively understand, or can try to understand what this task is that you expect of me and I start performing it, without ever having had any kind of training.
This isn't even a case where I can see a few examples of the task, and then generalize from a much smaller number of examples than machines can generalize from today, but there's actually a fundamentally different way of communicating what a task is. Today, at least all machine learning research tasks are not the meaning of what a task is, is not communicated to the machine via any means other than examples. So you literally just expose examples of the application of the mapping to the machine and then you expect the machine to re-create that mapping. Whereas the way humans learn, at least to a large extent, not always, but to a large extent often, and this specific case that you brought up is spanning different levels of abstraction. You can use language to communicate a bunch of different concepts to me, including a specific instance of a task and then I can try to perform that, and I think that that kind of illustrates how far away today's standard machine learning is from what humans can do.
That said, there are people like I said earlier, who are starting to work on using machine learning in settings like this where you have agents that exist in and interact with simulated or real environments, and they try either communicating with other agents, or communicating with humans. They try to learn or to train the agents to use language—generating and understanding—in order to communicate the definition of tasks, goals, aspects of the environment etc. I think that's actually a fascinating research direction.
So coming back to your example, I think [it] kind of subsumes a whole bunch of interesting aspects of human intelligence that we would require machines to at least emulate in order for them to become much more general. I don't think transferred learning in particular is the only one or maybe even the most crucial one. I think using language or similar facilities to communicate much more abstract notions, to basically communicate simulations if you wish, is something extremely powerful as well.
So you are a researcher at Google Brain, and everybody listening knows that that's one of the epicenters of advances in artificial intelligence. I'm curious because when you read about the Manhattan Project, the people [involved] had a sense of the moment they were in and they had a sense of what they were doing and how it would change the world, and that it was a gravitas that was over [it].
If you think of the moon landing and you think of the constitutional convention and the United States, and there are just these moments where people are so aware of their moment and that they are doing the thing that is going to change the world the most, the next, is it like that on a day-to-day [basis] around where you work? Or is it like, ‘hey what are they bringing in for lunch today?’ I don't want to say ‘it's just a job,’ or it's like, ‘we show up and solve whatever problem...’ or is it like, ‘we are doing something they'll talk about in 1,000 years?’
So I think it's a continuum. I definitely do sometimes wonder what I should be getting for lunch. But no, I think overall, there is something ‘in the air.’ There definitely is the perception that this is the beginning of something quite big, potentially quite big, likely quite big. There definitely are moments where just seeing what different parts of the team produce, seeing what kind of advances are made and how rapidly that happens, there definitely are these moments where you're looking at something and you think, ‘wow, this must appear like magic to the outside.’
Maybe sometimes there are people in the group who don't exactly know what's going on, in that specific case. And it's quite an exciting atmosphere actually to be in. I will say, at least I can speak for a large part of the group in saying, I don't think we necessarily think that artificial general intelligence is that, or even is absolutely crucial to that change that we might be affecting or helping along. I think that much before that, we will actually see this field of AI/machine learning have tremendous impact on the world, just using, I guess what we would now maybe call somewhat more advanced narrow forms of artificial intelligence. I think a lot of people are already extremely excited about that prospect.
So talking briefly about a general intelligence, let me follow up with that. So also when you think of the Manhattan Project, there was also a lot of angst about this specific technology that can be used for good and for evil, and they would think through the implications of it. There would be late night discussions about what was going to happen. Do you have that lurking/hanging in the air all the time about: “Is this a technology that can be put to good or evil use, and in the wrong hands etc.?” Do you spend a lot of mental energy on how it's going to be used and any culpability, responsibility or credit you get from that?
Absolutely, I actually think people may be surprised by how conscious people are of these aspects of their work. I think it's, like any technology—its use ultimately dictates whether it's for good or for evil, but, and like any or most influential technologies, there definitely is potential on both sides.
I do think that people are overall, very aware of that, and are thinking through a whole bunch of scenarios, not the least because there is actually extremely interesting research and science to be done, considering the abuse of such technology. If you try to anticipate what could ill-intentioned actors try to do with some of this intellectual property that you would like to share with the world that we would like to be public, that we publish, that if ill-intentioned actors were to use this, what could they do with it and how could we mitigate those effects, how could we maybe not prevent them from doing it, in some cases maybe you could prevent them, but certainly install safeguards, not necessarily in the technology, but developing tools and developing safeguards outside that makes the downside, or limits the downside so to speak. Whether this is technology for understanding how to make neural nets more robust to all sorts of different forms of tampering, whether it's basically detection of attempts to do so etc, etc.
So let's talk briefly about a general intelligence. When I ask people when they think we're going to have one, I get answers between 5 and 500 years, which by the way is quite telling. And then there's a group of people who don't believe that it's possible, and that breaks down into two groups. One portion of them believe that humans aren't mechanistic, they have a soul or a spirit, there is something in them that doesn't necessarily obey the laws of physics, and that is the source of self and consciousness and all the rest, and that can't be manufactured in a lab. So draw a circle around that. There are a lot of people who would say that, that they're not machines.
But then I also find a group of people, who don't appeal to that as an argument, and yet say, "we cannot build a general intelligence." Could you make that argument without appealing to anything spiritual necessarily, which that could all well be the case, but could you make an argument why an AGI might be impossible, given the laws of science as we know?
I don't see that actually. My personal opinion is that it's clearly not impossible. I do believe that humans are mechanistic to use that word. I'm not exactly sure we already understand all the necessary aspects of physics overall in order to treat them as such, but given that assumption, I think that it's certainly not impossible. I wouldn't exactly know how to make the argument.
Fair enough. Let me try this on you then: we have brains that we don't understand how they do what they do, and we don't not understand them because there are so many neurons. We've spent 20 years trying to figure out how a nematode’s 302 neurons make it do... We just don't know how neurons work, they could be as complicated as...
Then we have these minds, and minds, they are all these things your brains can do, that it doesn't seem like a brain should be able to do, like a sense of humor (your liver doesn't have a sense of humor, your stomach doesn't have one) [but] somehow your brain does. So call that the mind. And then we don't understand that, and then we have ‘conscious,’ and consciousness means we experience the world, a computer can measure temperature, we can feel warmth. Not only do we not know how it is that we're conscious, we don't even know how to ask the question scientifically, and we do not know what the answer would even look like. And so, it seems to me to say, well we had a general intelligence, we don't know how brains work, we don't know how our minds work, we don't know how it is we're conscious, but yeah, there's no question we can't build one. That seems a stretch to me, so defend that if you're up...
So basically I think, there's a couple of different aspects, number one is, I'm not sure that we want to define—and now we're back to square one—defining artificial intelligence, or general intelligence, I'm not sure we want to define it in such a way that includes these potentially very particularly human aspects, such as, a sense of humor. And if we leave that out, then maybe it actually turns out that humor or even consciousness are just not necessary as phenomena, to be considered an artificial intelligence or an artificial general intelligence.
But the creativity certainly would be, right? I mean there's that scene in I, Robot where Spooner asks the robot, "Can a robot paint a painting, can a robot write a symphony?" And Sonny says, "Well, can you?" So we would expect them to write symphonies and write great plays. We would expect those to have humor and have emotional gravitas, and have all of this human stuff, and furthermore, it could very well be the case that you have to be conscious... So, fair enough, maybe we don't need all those parts to make a general intelligence, but that still doesn't say, well, I know we can build one, even though I don't know how any of that stuff works, I know we can build one.
Yeah, let me also qualify a little bit about what I mean by ‘build,’ and what I'm actually, in the long term, optimistic about. So I should say, I wouldn't say this is a certain fact, but I definitely don't believe it's impossible, and that is that when I say ‘build,’ I don't necessarily mean architect after having understood the machine as a whole, after having understood the roles of all the different capacities/capabilities/phenomena, maybe even emergent phenomena from that machine and then devising blueprints, upon construction actually turns out to be intelligent. When I say ‘build,’ I mean creates environments and machinery that is able to improve over time, and explain itself over time, improve itself over time, and as a result maybe eventually, with sufficient capacity, become rather "intelligent" and generally intelligent. So, I don't necessarily think of this as something where it's a necessary condition for humans to have understood how it all works for us to be able to in some sense, replicate some of it.
Fair enough, but, it seems to me what you're saying is, “Okay we have one example of general intelligence, and it stands to reason that some of these capabilities of that one example, like consciousness, that mind, that creativity and that humor may be components.” It sounds like you're saying, “Yes we can build one, we may never understand it, but we can at least evolve it or something like that.” Is that what I'm hearing?
That's basically what I'm trying to say.
So you mentioned emergence though, do you think that human intelligence is fundamentally an emergent property? I mean clearly it's emergent, clearly none of your cells are intelligent, and yet you, the collective you are. How do you wrap your head around that? Do you think it's possible that we're going to just build a computer that eventually becomes so complicated that somehow it has this emergent behavior come about that we don't understand? Is that what you're thinking?
Yeah, that's basically what I'm thinking. I don't feel like I'm qualified to really reason about the mechanisms of emergence in human intelligence, going from individual cells to a full organism, and having this seemingly sudden emergence of intelligence. But I do feel like if we manage to make progress towards something that is maybe a bit broader than narrow intelligence, then it's actually somewhat likely that it'll look like this, that basically putting together a whole bunch of comparatively simple things, and by ‘whole bunch’ I mean a great deal of them, and then might give rise to dramatically more complex and potentially difficult to understand behavior.
And we're running out of time here. What, in the end, are you optimistic or pessimistic about the future? You're going to tell me you're optimistic, I assume. And if so, tell me why. Give me the case for that from what you know and what you're doing.
So far, I think if you're asking specifically about the future of this field AI or machine learning...
Let me clarify: so, you have all the promise of it, it'll prolong our lives and it's help us figure out new forms of energy and we'll solve scientific problems and we'll increase the standard of living of everybody. We're going to increase productivity, we're effectively going to make everyone on the planet smarter. On the other hand, it's a technology that governments can use to spy on people, and it can listen to every phone conversation, works everything out. It can be used in all these other [ways], and somehow you net it all out, to say overall it's going to be good. Make that case in closing here.
I think we've seen that at least in the longer term, technology overall has had—information technology in particular—has had a democratizing effect, in general, in the longer term, distributing power at least somewhat more evenly. I think that with AI related technologies, that will not be any different. I definitely see these dangers, and at the same time, I see the leaders in the fields, some of whom I'm lucky enough to work with here, be extremely conscious of this, and they're aware of those dangers, actively working on ways to prevent them or mitigate them, or limit them, and really trying to work responsibly. So I don't necessarily see why, with this admittedly maybe significantly different or more advanced technology, it should in the long term be any different.
All right, well I'm in agreement and let's hope that's how it all works out.
Give us a little insight into your day-to-day work day, and what you’re working on. What fires you up in the morning when you get there? What exciting breakthrough have we not talked about yet?
Let’s start with the more applied and more short-term technologies that I’ve been working on, that actually do motivate me quite a bit. When I started working in this overall field, I worked on Google Translate for several years, which back in those days worked quite differently and used quite different methods from what it’s using today. And one of the most exciting aspects of my day-to-day work these days is that I spend a significant part of my time working on deep learning methods that try to improve the quality of products like Google Translate.
And one reason that’s so motivating is because Google Translate in particular is one of those tools that really brings people together and allows broader and more equal access to information, especially in the global sense. And it has a very strong motivation effect on me to see that, when we do studies, for example, to see what people actually use this tool to translate. Of course, we find all sorts of things across the board, but some of the most common translations that we see, or use cases we hear about, are to communicate with new friends, help in precarious situations while travel, etc., etc. And it’s really quite amazing to be able to have a part in that.
I think one fascinating thing about Translate is that, if you look at the Bureau of Labor Statistics, their estimates for the number of translators we need is skyrocketing and will continue to skyrocket. That the minute you can reduce the cost of something, like translating automatically, to zero it creates this enormous new demand. And then you need human translators to do all these other things, like the specifics of contracts and face-to-face meetings, localization, and customization and all of that.
Is that something which surprises you? That you make something that is as good as a human translator and the world’s going to need more translators.
It’s not too surprising actually. It’s one instance of a pattern that people often allude to when talking about the future of jobs or the future of work, that basically automating a significant chunk of a certain family of jobs might actually not mean that there will be fewer jobs in that area, but might mean that they just become somewhat more specialized, somewhat more focused on the particular differentiating abilities of humans doing a subset of those tasks. So, overall, I think it’s something we’ve seen time and time again over the course of human evolution and civilization—improving technology actually increases the demand for human labor in the same area. It’s not always true, but often it is, and in the case of translation I’m not surprised that it might remain the case for quite a while until machine translation got to the point where it can translate a particular important contract.
Right, I mean the textbook case is ATM machines. You know, there’s more tellers now than there were before you introduced the machine, because the machine lowered the cost of opening branches, and each branched needed tellers, and so forth.
So, is that kind of how it works, you focus on one thing, like machine translation, for a long time and you enhance, enhance, enhance, enhance? Or do you have multiple projects you work on at once?
This is exactly where I wanted to go with this. The interesting thing is that, as opposed to the way we worked on Translate back in 2006, one of the crucial difference between now and then is that the models that we’re using for translation now also apply quite well to dramatically different tasks. Even some that don’t deal with text at all.
So, one of the most exciting aspects of my work now is that even though I’ve invested a lot of time over the years that we initially used to prove machine translation, we then started to adapt those models with sometimes surprisingly small changes to, for example, do super resolution of images. Where you take a low-resolution image of a human face and you generate a much higher resolution image of a human face automatically. Where the model basically has to invent missing details that the low-resolution version just doesn’t contain anymore.
And this is one of the major advances we’ve seen with the rise of deep learning, that the underlying methodologies when you look at problems as different as image generation, machine translation, things like analyzing network traffic in networks that do intrusion detection—a lot of those problems now rely on extremely similar methodology. And it really changes the way you conduct research in multiple ways, but one way in particular is that you now basically have many completely different applications to evaluate your work in, but also you can use applications in these different areas to draw inspiration for how to improve certain things.
Basically, use different failure cases in different applications in order to understand what is it that these models are missing. What are the currently, truly limiting factors? I think that that is one of the most fascinating things about my day-to-day work—that basically I expect the mathematical models that I’m experimenting with now to not only be applied in Google Translate, but over time in a whole range of different products spanning different media—text, spoken languages, images, increasingly video, etc.
Day-to-day, how much autonomy do you and your team have in charting the direction? How is the ship guided about what you work on or not work on? What are the criteria and so forth?
So, Google Brain, overall, is now a fairly large and somewhat complex organization with different tiers, and those tiers span a pretty broad array of different areas of work. There’s a group that focuses on using AI and machine learning in health applications, a broad spectrum of those. There are, of course, many people who are working on tooling, TensorFlow probably being the most prominent example of that—tooling for enabling machine learning research, AI research, but also for deployment of the fruits of that labor. And then there’s a fairly large research arm. And the researchers within that research arm, ultimately, are extremely free at choosing their directions of research. There are a bunch of themes that emerge largely, and we have some organizational tools, if you wish, to then facilitate larger projects to drive forward research in these specific themes as we realize there is something significant that might require or benefit from lots of people, from a larger team. But ultimately, the individuals on the team are, to a very large extent, self-guided.
I want to thank you so much for a fascinating near-hour. I could have gone on all day, but I hope you'll come back to the show some other time.
Byron explores issues around artificial intelligence and conscious computers in his new book The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity.