Episode 37: A Conversation with Mike Tamir

In this episode, Byron and Mike talk about AGI, Turing Test, machine learning, jobs, and Takt.

-
-
0:00
0:00
0:00

Guest

Mike Tamir serves as Head of Data Science at Uber ATG and lecturer for UC Berkeley iSchool Data Science masters program. Mike has led several teams of Data Scientists in the bay area as Chief Data Scientist for InterTrust and Takt, Director of Data Sciences for MetaScale, and Chief Science Officer for Galvanize he oversaw all data science product development and created the MS in Data Science program in partnership with UNH. Mike began his career in academia serving as a mathematics teaching fellow for Columbia University and graduate student at the University of Pittsburgh. His early research focused on developing the epsilon-anchor methodology for resolving both an inconsistency he highlighted in the dynamics of Einstein’s general relativity theory and the convergence of “large N” Monte Carlo simulations in Statistical Mechanics’ universality models of criticality phenomena.

Transcript

Byron Reese: This is Voices in AI, brought to you by Gigaom. I’m Byron Reese. I’m excited today, our guest is Mike Tamir. He is the Chief Data Science Officer at Takt, and he’s also a lecturer at UC Berkeley. If you look him up online and read what people have to say about him, you notice that some really, really smart people say Mike is the smartest person they know. Which implies one of two things: Either he really is that awesome, or he has dirt on people and is not above using it to get good accolades. Welcome to the show, Mike!

Mark Cuban came to Austin, where we’re based, and gave a talk at South By Southwest where he said the first trillionaires are going to be in artificial intelligence. And he said something very interesting, that if he was going to do it all over again, he’d study philosophy as an undergrad, and then get into artificial intelligence. You studied philosophy at Columbia, is that true?

I did, and also my graduate degree, actually, was a philosophy degree, cross-discipline with mathematical physics.

So how does that work? What was your thinking? Way back in the day, did you know you were going to end up where you were, and this was useful? That’s a pretty fascinating path, so I’m curious, what changed, you know, from 18-year-old Mike to today?

[Laughs] Almost everything. So, yeah, I think I can safely say I had no idea that I was going to be a data scientist when I went to grad school. In fact, I can safely say that the profession of data science didn’t exist when I went to grad school. I did, like a lot of people, who joined the field around when I did, kind of became a data scientist by accident. My degree, while it was philosophy, was fairly technical. It made me more focused on mathematical physics and helped me learn a little bit about machine learning while I was doing that.

Would you say studying philosophy has helped you in your current career at all? I’m curious about that.

Um, well, I hope so. It was very much a focus thing, the philosophy of science. So I think back all the time when we are designing experiments, when we are putting together different tests for different machine learning algorithms. I do think about what is a scientifically-sound way of approaching it. That’s as much the physics background as it is the philosophy background. But it certainly does influence, I’d say daily what we do in our data science work.

Even being a physicist that got into machine learning, how did that come about?

Well, a lot of my graduate research in physics was focused on a little bit of neural activity, but also a good deal of it was focusing in quantum statistical mechanics, which really involved doing simulations and thinking about the world in terms of lots of random variables and unknowns that results in these emergent patterns. And in a lot of ways what we do now, in fact, at Takt is actually writing a lot about group theory and how that can be used as a tool for analyzing the effectiveness of deep learning. Um, there are a lot of, at least at a high level, similarities in trying to find those superpatterns in the signal in machine learning and the way you might think about emergent phenomenon in physical systems.

Would an AGI be emergent? Or is it going to be just nuts and bolts brute force?

[Laughs] That is an important question. The more I find out about successes, at least the partial successes, that can happen with deep learning and with trying to recreate the sorts of sensitivities that humans have, that you would have with object recognition, with speech recognition, with semantics, with general, natural language understanding, the more sobering it is thinking about what humans can do, and what we do with our actual, with our natural intelligence, so to speak.

So do you think it’s emergent?

You know, I’m hesitant to commit. It’s fair to say that there is something like emergence there.

You know this subject, of course, a thousand times better than me, but my understanding of emergence is that there are two kinds: there’s a weak kind and a strong one. A weak one is where something happens that was kind of surprising—like you could study oxygen all your life, and study hydrogen but not be able to realize, “Oh, you put those together and it’s wet.” And then there’s strong emergence which is something that happens that is not deconstructable down to its individual components, it’s something that you can’t actually get to by building up—it’s not reductionist. Do you think strong emergence exists?

Yeah, that’s a very good question and one that I refuse to think about quite a bit. The answer, or my answer I think would be, it’s not as stark as it might seem. Most cases of strong emergence that you might point to, actually, there are stories you can tell where it’s not as much of a category distinction or a non-reducible phenomenon as you might think. And that goes for things as well studied as space transitions, and criticality phenomenon in the physics realm, as it does possibly for what we talk about when we talk about intelligence.

I’ll only ask you one more question on this, and then we’ll launch into AI. Do you have an opinion on whether consciousness is a strong emergent phenomenon? Because that’s going to speak to whether we can build it.

Yeah so, that’s a very good question, again. I think that what we find out when we are able to recreate some of the—we’re really just in the beginning stages in a lot of cases—at least semi-intelligent, or a component of what integrated AI look like. It shows more about the magic that we see when we see consciousness. It brings human consciousness closer to what we see in the machines rather than the other way around.

That is to say, human consciousness is certainly remarkable, and is something that feels very special and very different from what maybe imperatively constructed machine instructions are. There is another way of looking at it though, which is that maybe by seeing how, say, a deep neural net is able to adapt to signals that are very sophisticated and maybe even almost impossible to really boil it down, it’s actually something that we do that we might imagine are brains are doing all the time, just in a far, far larger magnitude of parameters and network connections.

So, it sounds like you’re saying it may not be that machines are somehow ennobled with consciousness, but that we discover that we’re not actually conscious. Is that kind of what you’re saying?

Yeah, or maybe something in the middle.

Okay.

Certainly, our personal experience of consciousness, and what we see when we interact with other humans or other people, more generally; there’s no denying that, and I don’t want to discount how special that is. At the same time, I think that there is a much blurrier line, is the best way to put it, between artificial, or at least the artificial intelligence that we are just now starting to get our arms around, and what we actually see naturally.

So, the shows called Voices in AI, so I guess I need to get over there to that topic. Let’s start with a really simple question: What is artificial intelligence?

Hmm. So, until a couple years ago, I would say that artificial intelligence really is what we maybe now call integrated AI. So a dream of using maybe several integrated techniques of machine learning to create something that we might mistake for, or even accurately describe as, consciousness.

Nowadays, the term “artificial intelligence” has, I’d say, probably been a little bit whitewashed or diluted. You know, artificial intelligence can mean any sort of machine learning or maybe even no machine learning at all. It’s a term that a lot of companies put in their VC deck, and it could be something as simple as just using a logistic regression—hopefully, logistic regression that uses gradient descendants as opposed to closed-form solution. Right now, I think it’s become kind of indistinguishable from generic machine learning.

I, obviously, agree, but, take just the idea that you have in your head that you think is legit: is it artificial in the sense that artificial turf isn’t really grass, it just looks like it? Or is it artificial in the sense we made it. In other words, is it really intelligence, or is it just something that looks like intelligence?

Yeah, I’m sure people bring up the Turing test quite a bit when you broach this subject. You know, the Turing test is very coarsely… You know, how would you even know? How would you know the difference between something that is an artificial intelligence and something that’s a bona fide intelligence, whatever bona fide means. I think Turing’s point, or one way of thinking about Turing’s point, is that there’s really no way of telling what natural intelligence is.

And that again makes my point, that it’s a very blurry line, the difference between true or magic soul-derived consciousness, and what can be constructed maybe with machines, there’s not a bright distinction there. And I think maybe what’s really important is that we probably shouldn’t discount ostensible intelligence that can happen with machines, any more than we should discount intelligence that we observe in humans.

Yeah, Turing actually said, a machine may do it differently but we still have to say that the machine is thinking, it just may be different. He, I think, would definitely say it’s really smart, it’s really intelligent. Now of course the problem is we don’t have a consensus definition even of intelligence, so, it’s almost intractable.

If somebody asks you what’s the state of the art right now, where are we at? Henceforth, we’re just going to use your idea of what actual artificial intelligence is. So, if somebody said “Where are we at?” are we just starting, or are we actually doing some pretty incredible things, and we’re on our way to doing even more incredible things?

[Laughs] My answer is, both. We are just starting. That being said, we are far, we are much, much further along than I would have guessed.

When do you date, kind of, the end of the winter? Was there a watershed event or a technique? Or was it a gradualism based on, “Hey, we got faster processors, better algorithms, more data”? Like, was there a moment when the world shifted? 

Maybe harkening back to the discussion earlier, you know, someone who comes from physics, there’s what we call the “miracle year,” when Einstein published his theory—a really remarkable paper—roughly just over a hundred years ago. You know, there is a miracle year and then there’s also when he finally was able to crack the code in general relativity. I don’t think we can safely say that there been a miracle year until far, far in the future, when it comes to the realm of deep learning and artificial intelligence.

I can say that, in particular, with natural language understanding, the ability to create machines that can capture semantics, the ability of machines to identify objects and to identify sounds and turn them into words, that’s important. The ability for us to create algorithms that are able to solve difficult tasks, that’s also important. But probably at the core of it is the ability for us to train machines to understand concepts, to understand language, and to assign semantics effectively. One of the big pushes that’s happened, I think, in the last several years, when it comes to that, is the ability to represent sequences of terms and sentences and entire paragraphs, in a rich mathematically-representable way that we can then do things with. That’s been a big leap, and we’re seeing a lot of the progress that with neural word embeddings with sentence embeddings. Even as recently as a couple months ago, some of the work with sentence embedding that’s coming out is certainly part of that watershed, and that move from dark ages in trying to represent natural language in a intelligible way, to where we are now. And I think that we’ve only just begun.

There’s been a centuries-old dream in science to represent ideas and words and concepts essentially mathematically, so that they can manipulated just like anything else can be. Is that possible, do you think?

Yeah. So one way of looking at the entire twentieth century is a gross failure in the ability to accurately capture the way humans reason in Boolean logic, and the way we represent first order logic, or more directly in code. That was a failure, and it wasn’t until we started thinking about the way we represent language in terms of the way concepts are actually found in relation to one another, training an algorithm to read all of Wikipedia and to start embedding that with Word2vec—that’s been a big deal.

The fact that by doing that, and now we can start capturing everything. It’s sobering, but we now have algorithms that can, with embed sentences, detect things like logical implications or logical equivalence, or logical non-equivalence. That’s a huge step, and that’s a step that I think we tried quite a bit to do, or many tried to do without experience and failed.

Do you believe that we are on a path to creating an AGI, in the sense that what we need is some advances in algorithms, some faster machines, and more data, and eventually we’re going to get there? Or, is AGI going to come about, if it does, from a presently-unknown approach, a completely different way of thinking about knowledge?

That’s difficult to speculate. Let’s take a step back. Five years ago, less than five years ago, if you wanted to propose a deep learning algorithm for an industry to solve a very practical problem, the response you would get is stop being too academic, let’s focus on something a little simpler, a little bit easier to understand. There’s been a dramatic shift, just in the last couple years, that now, the expectation is if you’re someone in the role that I’m in, or that my colleagues are in, if you’re not considering things like deep learning, then you’re not doing your job. That’s something that seems to have happened overnight, but was really a gradual shift over the past several years.

Does that mean that deep learning is the way? I don’t know. What do you really need in order to create an artificial intelligence? Well, we have a lot of the pieces. You need to be able to observe maybe visually or with sounds. You need to be able to turn those observations into concepts, so you need to be able to do object recognition visually. Deep learning has been very successful in solving those sorts of problems, and doing object recognition, and more recently making that object recognition more stable under adversarial perturbation.

You need to be able to possibly hear and respond, and that’s something that we’ve gotten a lot better at, too. We’ve got a lot of the work done by doing research labs, there’s been some fantastic work in making that more effective. You need to be able to not just identify those words or those concepts, but also put them together, and put them together, not just in isolation but in the context of sentences. So, the work that’s coming out of Stanford and some of the Stanford graduates, Einstein Labs, which is sort of at the forefront there, is doing a very good job in capturing not just semantics—in the sense of, what is represented in this paragraph and how can I pull out the most important terms?—but doing a job of abstractive text summarization, and, you know, being able to boil it down to terms and concepts that weren’t even in the paragraph. And you need to be able to do some sort of reasoning. Just like the example I gave before, you need to be able to use sentence embedding to be able to classify—we’re not there yet, but—that this sentence is related to this sentence, and this sentence might even entail this sentence.

And, of course, if you want to create Cylons, so to speak, you also need to be able to do physical interactions. All of these solutions in many ways have to do with the general genre of what’s now called “deep learning,” of being able to add parameters upon parameters upon parameters to your algorithm, so that you can really capture what’s going on in these very sophisticated, very high dimensional spaces of tasks to solve.

No one’s really gotten to the point where they can integrate all of these together, and I think is that going to be something that is now very generic, that we call deep learning, which is really a host of lots of different techniques that just use high dimensional parameter spaces, or is it going to be something completely new? I wouldn’t be able to guess.

So, there are a few things you left of your list, though, so presumably you don’t think an AGI would need to be conscious. Consciousness isn’t a part of our general intelligence. 

Ah, well, you know, maybe that brings us back to where we started.

Right, right. Well how about creativity? That wasn’t in your list either. Is that just computational from those basic elements you were talking about? Seeing, recognizing, combining?

So, an important part of that is being able to work with language, I’d say, being able to do natural language understanding and do natural language understanding at higher than the word level, but at the sentence level, certainly anything that might be what they call mistaken or “identified as” thinking. Have to have that as a necessary component. And being able to interact, being able to hold conversations, to abstract and to draw conclusions and inferences that aren’t necessarily there.

I’d say that that’s probably the sort of thing that you would expect of a conscious intelligence, whether it’s manifest in a person or manifest in a machine. Maybe I should say manifested in a human, or manifested in a machine.

So, you mentioned the Turing test earlier. And, you know, there are a lot of people who build chatbots and things that, you know, are not there yet, but people are working on it. And I always type in one, first question, it’s always the same, and I’ve never seen a system that even gets the question, let alone can answer it.

The question is, “What’s bigger, a nickel or the sun?” So, two questions, one, why is that so hard for a computer, and, two, how will we solve that problem?

Hmm. I can imagine how would I build a chatbot, and I have worked on this sort of project in the past. One of the things—and I mentioned earlier, this allusion to a miracle year—is the advances that happened, in particular, in 2013 with figuring out ways of doing neural-word embeddings. That’s so important, and one way of looking at why that’s so important is that, when we’re doing machine learning in general—this is what I tell my students, this what drives a lot of our design—you have to manage the shape of your data. You have to make sure that the amount of examples you have, the density of data points you have, is commensurate with the amount of degrees of freedom that you have representing your world, your model.

Until very recently, there have been attempts, but none of them as successful as we’ve seen in the last five years. The baseline has been what’s called the one-hot vector encoding, where you have a different dimension for every word in your language, usually it’s around a million words. You have all zeros and then for the word maybe in the first dimension you take the first word in the dictionary to order them that way, and you have the word ‘a,’ which is spelled with the letter ‘a,’ and that’s then the one and all zeros. And then for the second word you have a zero and a one and the rest zeros. So the point here, and not to get technical, but your dimensions are just too many.

You have millions and millions of dimensions. When we talk with students about this, it’s called the curse of dimensionality, every time you add even one dimension, you need twice as many data points in order to maintain the same density. And maintaining that density is what you need in order to abstract, in order to generalize, in order to come up with an algorithm that can actually find a pattern that works, not just for the data that it sees, but for the data that it will see.

What happens with these neural word embeddings? Well, they solve the problem of the curse of dimensionality, or at least they’ve really gotten their arms a lot further around it than ever before. They’ve enabled us to represent terms, represent concepts, not in these million dimensional vector spaces, where all that rich information is still there, but it’s spread so thinly across so many dimensions that you can’t really find a single entity as easily as you can if it were only representing a smaller number of dimensions, and that’s what these embeddings do.

Now, once you have that dimensionality, once you’re able to compress them into a lower dimension, now you can do all sorts of things that you want to do with language that you just couldn’t do before. And that’s part of why we see this slow operation with chatbots, they probably have something like this technology. What does this have to do with your question? These embeddings, for the most part, happen not by getting instructions—well nickels are this size, and they’re round, and they’re made of this sort of composite, and they have a picture of Jefferson stamped on the top—that’s not how you learn to mathematically represent these words at all.

What you do is you feed the algorithm lots and lots of examples of usage—you let it read all of Wikipedia, you let it read all of Reuters—and slowly but surely what happens is, the algorithm will start to see these patterns of co-usage, and will start to learn how one word follows after another. And what’s really remarkable, and could be profound, at least I know that a lot of people would want to infer that, is that the semantic kind of comes out for free.

You end up seeing the geometry of the way these words are embedded in such a way that you see, a famous example is a king vector minus a man vector plus a woman vector equals a queen vector, and that actually bears out in how the machine can now represent the language, and it did that without knowing anything about men, women, kings, or queens. It did it just by looking at frequencies of occurrence, how those words occur next to each other. So, when you talk about nickels and the sun, my first thought, given that running start, is that well, the machine probably hasn’t seen a nickel and a sun in context too frequently, and one of the dirty secrets about these neural embeddings is that they don’t do as well on very low-frequency terms, and they don’t always do well in being able to embed low frequency co-occurrences.

And maybe it’s just the fact that it hasn’t really learnt about, so to speak, it hasn’t read about, nickels and suns in context together.

So, is it an added wrinkle that, for example, you take a word like set, s-e-t, I think OED has two or three hundred definitions of it, you know—it’s something you do, it’s an object, etcetera. You know there’s a Wikipedia entry on a sentence, an eight word long grammatically correct sentence which is, “Buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo,” which contains nouns, verbs, all of that. Is there any hope that if you took all the monkeys in all the universe typing cogent and coherent sentences, would it ever be enough to train it to what a human can do?

There’s a couple things there, and one of the key points that you’re making is that there are homonyms in our language, and so work should be done on disambiguating the homonyms. And it’s a serious problem for any natural language understanding project. And, you know, there are some examples out there of that. There’s one recently which is aimed at not just identifying a word but also disambiguating the usages or the context.

There are also others, not just focused on how to mathematically-represent how to pinpoint a representation of a word, but also how to represent the breadth of the usage. So maybe imagine not a vector, but a distribution or a cloud, that’s maybe a little thicker as a focal point, and all of those I think are a step in the right direction for capturing what is probably more representative of how we use language. And disambiguation, in particular with homonyms, is a part of that.

I only have a couple more questions in this highly theoretical realm, then I want to get down to the nitty gritty. I’m not going to ask you to pick dates or anything, but the nickel and the sun example, if you were just going to throw a number out, how many years is it until I type that question in something, and it answers it? Is that like, oh yeah we could do it if we wanted to, it’s just not a big deal, maybe give it a year? Or, is it like, “Oh, no that’s kind of tricky, wait five years probably.”

I think I remember hearing once never make a prediction.

Right, right. Well, just, is that a hard problem to solve?

The nickel and the sun is something that I’d hesitate to say is solvable in my lifetime, just to give a benchmark there, violating that maxim. I can’t say exactly when, what I can say is that the speed with which we are solving problems that I thought would take a lot longer to solve, is accelerating.

To me, while it’s a difficult problem and there are several challenges, we are still just scratching the surface in natural language understanding and word representation in particular, you know words-in-context representation. I am optimistic.

So, final question in this realm, I’m going to ask you my hard Turing test question, I wouldn’t even give this to a bot. And this one doesn’t play with language at all.

Dr. Smith is eating lunch at his favorite restaurant. He receives a call, takes it and runs out without paying his tab. Is management likely to prosecute? So you have to be able to infer it’s his favorite restaurant, they probably know who he is, he’s a doctor, that call was probably an emergency call. No, they’re not going to prosecute because that’s, you know, an understandable thing. Like, that doesn’t have any words that are ambiguous, and yet it’s an incredibly hard problem, isn’t it?

It is, and in fact, I think that is the, that is one of the true benchmarks—even moreso than comparing a nickel and a sun—of real, genuine natural language understanding. It has all sorts of things—it has object permanence, it has tracking those objects throughout different sentences, it has orienting sequences of events, it has management, which is mentioned in that last sentence, which is how you would be able to infer that management is somehow connected to the management of the restaurant.

That is a super hard one to solve for any Turing machine. It’s also something we’re starting to make progress on. Using LSDMs that do several passes through a sequence of sentences, classic artificial sentence dataset, that natural language understanding finds—the Facebook of AGI dataset, which actually is out there to help use as a benchmark for training these sorts of object permanence in multi-sentence thread. And we’ve made modest gains in that. There are algorithms like the Ask Me Anything algorithm, that have shown that it’s at least possible to start tracking objects over time, and with several passes come up with the right answer to questions about objects in sentences across several different statements.

Pulling back to the here and now, and what’s possible and what’s not. Did you ever expect AI to become part of the daily conversation, just to be part of popular culture the way it is now?

About as much as I expect that in a couple years that AI is going to be a term much like Big Data, which is to say overused.

Right.

I think, with respect to an earlier comments, the sort of AI that you and I have been dancing around, which is fully-integrated AI, is not what we talk about when we talk about what’s in daily conversation now, or for the most part not what we’re talking about in this context. And so it might be a little bit of a false success, or a spurious usage of “AI” in as much frequency as we see it.

That doesn’t mean that we haven’t made remarkable advances. It doesn’t mean that the examples that I’ve mentioned, in particular, in deep learning aren’t important, and aren’t very plausibly an early set of steps on the path. I do think that it’s a little bit of hype, though.

If you were a business person and you’re hearing all of this talk, and you want to do something that’s real, and that’s actionable, and you walk around your business, department to department—you go to HR, and to Marketing and you got to Sales, and Development—how do you spot something that would be a good candidate for the tools we have today, something that is real and actionable and not hype?

Ah, well, I feel like that is the job I do all the time. We’re constantly meeting with new companies, Fortune 500 CEOs and C-Suite execs, and talking about the problems that they want to solve, and thinking about ways of solving them. Like, I think a best practice is to always to keep it simple. There are a host of free deep learning techniques for doing all sorts of things—classification, clustering, user item matching—that are still tried-and-true, and that should probably be done first.

And then there are now, a lot of great paths to using these more sophisticated algorithms that mean that you should be considering them early. How exactly to consider one case from the other, I think that part of that is practice. It’s actually one of the things that when I talk to students about what they’re learning, I find that they’re walking away with not just, “I know what the algorithm is, I know what the objective function is, and how to manage momentum in the right way and optimizing that function,” but also how do you see the similarity between matching users and items in the recommender, or abstracting the latent semantic association of a bit of text or with an image, and there are similarities, and certain algorithms that solve all those problems. And that’s, in a lot of ways, practice.

You know, when the consumer web first came out and it became popularized, people had, you know, a web department, which would be a crazy thought today, right? Everything I’ve read about you, everybody says that you’re practical. So, from a practical standpoint, do you think that companies ought to have an AI taskforce? And have somebody whose job it is to do that? Or, is it more the kind of thing that it’s going to gradually come department by department by department? Or, is it prudent to put all of your thinking in one war room, as it were?

So, yeah, the general question is what’s the best way to do organizational design with machine learning machines, and the first answer is there are several right ways and there are a couple wrong ways. So, one of these wrong ways of the early-days are where you have this data science team that is completely isolated and is only responsible for R&D work, prototyping certain use cases and then they, to use a phrase you hear often, throw it over the wall to engineering to go implement, because I’m done with this project. That’s a wrong way.

There are several right ways, and those right ways usually involve bringing the people who are working on machine learning closer to production, closer to engineering, and also bringing the people involved in engineering and production closer to the machine learning. So, overall blurring those lines. You can do this with vertical integrated small teams, you could do this with peer teams, you can do this with a mandate that some larger companies, like Google, are really focused on making all their engineers machine learning engineers. I think all those strategies can work.

It all sort of depends on the size and the context of your business, and what kind of issues you have. And depending on those variables, then, among the several solutions, there might be one or two that are most optimal.

You’re the Chief Data Science Officer at Takt, spelled T-A-K-T, and is takt.com if anybody wants to go there. What does Takt do?

So we do the backend machine learning for large-scale enterprises. So, you know, many of your listeners might go to Starbucks and use the app to pay for Starbucks coffee. We do all of the machine learning personalization for the offers, for the games, for the recommendors in that app. And the way we approach that is by creating a whole host of different algorithms for different use cases—this goes back to your earlier question of abstracting those same techniques for many different use cases—and then apply that for each individual customer. We find the list completion use case, the recursive neural network approach, where there’s a time series of opportunity, where you can have interactions with an end user, and then learn from that interaction, and follow up with another interaction, doing things like reinforcement learning to do several interactions in a row, which may or may not get a signal back, but we have been trained to work towards that goal over time without that direct feedback signal.

This is the same sort of algorithms, for instance, that were used to train AlphaGo, to win a game. You only get that feedback at the end of the game, when you’ve won or lost. We take all of those different techniques and embed them in different ways for these large enterprise customers.

Are you a product company, a service company, a SaaS company—how does all that manifest?

We are a product company. We do tend to focus on the larger enterprises, which means that there is a little bit of customization involved, but there’s always going to be some customization involved when it comes to machine learning. Unless it’s just a suite of tools, which we are not. And what that means is that you do have to train and apply and suggest the right kinds of use cases for the suite of tools that we have, machine learning tools that we have.

Two more questions, if I may. You mentioned Cylons earlier, a Battlestar Galactica reference to those who don’t necessarily watch it. What science fiction do you think gets the future right? Like, when you watch it or read it, or what have you, you think “Oh yeah, things could happen that way, I see that”?

[Laughs] Well, you know the physicist in me still is both hopeful and skeptical about faster-than-light travel, so I suppose that wouldn’t really be the point of your question, is more with computers and with artificial intelligence.

Right, like Her or Ex Machina or what have you.

You know, it’s tough to say which of these, like, conscious-being robots is the most accurate. I think there are scenes worth observing that already have happened. Star Trek, you know, we create the iPad way before they had them in Star Trek time, so, good for reality. We also have all sorts of devices. I remember, when, in the ’80s—to date myself—the movie Star Trek came out, and Scotty gets up in front of his computer, an ’80s computer, and picks up the mouse and starts speaking into it and saying, “Computer, please do this.”

And my son will not get that joke, because he can say “Hey, Siri” or “Okay, Google” or “Alexa” or whatever the device is, and the computer will respond. And that’s, I like to focus on those smaller wins, that we are dramatically much quicker than forecasts in some cases able to accomplish that. I did see an example the other day about HAL, the Space Odyssey artificial intelligence, where people were mystified that this computer program could beat a human in chess, but didn’t blink an eye that the computer program could not only hold a conversation, but has a very sardonic disposition towards the main character. That, probably, very well captures this dichotomy of the several things are very likely to be captured, and we can get to very quickly, and other things that we thought were easy but take quite a lot longer than expected.

Final question, overall, are you an optimist? People worry about this technology—not just the killer robots scenario, but they worry about jobs and whatnot—but what do you think? Broadly speaking, as this technology unfolds, do you see us going down a dystopian path, or are you optimistic about the future?

I’ve spoken about this before a little bit. I don’t want to say, “I hope,” but I hope that Skynet will not launch a bunch of nuclear missiles. I can’t really speak with confidence to whether that’s a true risk or just an exciting storyline. What I can say is that the displacement of service jobs by automated machines is a very clear and imminent reality.

And that’s something that I’d like to think that politicians and governments and everybody should be thinking about—in particular how we think about education. The most important skill we can give our children is teaching them how to code, how to understand how computer programs work, and that’s something that we really just are not doing enough of yet.

And so will Skynet nuke everybody? I don’t know. Is it the case that I am, at six years old, teaching my son how to code already? Absolutely. And I think that will be make a big difference in the future.

But wouldn’t coding be something relatively easy for an AI? I mean it’s just natural language, tell it what you want it to do.

Computers that program themselves. It’s a good question.

So you’re not going to suggest, I think you mentioned, your son be a philosophy major at Columbia?

[Laughs] You know what, as long as he knows some math and he knows how to code, he can do whatever he wants.

Alright, well we’ll leave it on that note, this was absolutely fascinating, Mike. I want to thank you, thank you so much for taking the time. 

Well thank you, this was fun.

Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here.