Episode 68: A Conversation with Suju Rajan

In this episode Byron and Suju discuss differences in machine and human learning as well as where machines could take us in advertising, privacy and medicine.

:: ::

Guest

Suju Rajan has a PhD in Machine Learning from the University of Texas. Dr. Rajan is also currently the head of research at Criteo.

Transcript

Byron Reese: This is Voices in AI, brought to you by GigaOm and I'm Byron Reese. Today I'm so excited our guest is Suju Rajan. She is the head of research over at Criteo, and she holds a PhD in Machine Learning from the University of Texas. Welcome to the show.

Suju Rajan: Great to be here, Byron.

You know, we're based in Austin, so I drive by your alma mater every day almost, so it's kind of like a hometown interview.

That's pretty cool. Go Longhorns!

We're recording this in August and you picked a good time not to be here, you know?

I can imagine. I think when I graduated they actually were at the Rose Bowl (not that they actually won it), so I'm happy I was there at the right time.

There you go. So I always like to start with the simple question: what is artificial intelligence, or if you prefer, what is intelligence?

Let's go with artificial intelligence, because I don't think I'm quite qualified to answer what is intelligence overall. Let's say the classical definition of artificial intelligence, what can I say, it's more 'textbook,' right? So this is where the whole field started off a few decades ago in fact, where the goal was to create intelligence in machines, which was comparable to human level intelligence, and what does that mean? What do we think when we say someone is intelligent, right? So it is the ability for us to reason, to be able to extrapolate to situations that we hadn't been in before, and to come out of it relatively unscathed in some sense. So, the ability to reason, to make sort of facile decisions, to be able to solve a longer term problem than just the task at hand, and the relative information to do this is what I think is the standard of artificial intelligence.

So that's a really high bar, because a simple definition is: ‘systems that respond to their environments.’

So let me take it a step down, so that's to your point, my high bar. Today the way artificial intelligence is being used overall in media and maybe in some portions of even the community is the ability to perform really well at certain specific tasks at a level that is comparable to what a human would do. Now nobody questions, 'is it really human-like'? Because it's within a really constrained environment within the space of the data the thing has been trained on. If you look at some of the tests that they've done, it's in a very narrow domain. Now ‘do we all agree that that is artificial intelligence?’ becomes an interesting debate, but I want to say that the mainstream has focused a lot more on intelligence in very narrow specific tasks, but I wouldn't call it artificial intelligence.

All right, so your particular area of study is a technique used in artificial intelligence, called machine learning. And machine learning, simply put, is: you take a lot of data about the past and you study it and you make predictions about the future, is that a fair oversimplification?

A fair oversimplification, yes.

And so the philosophic implication is that the future behaves like the past, and in a lot of cases, that's what a cat looks like tomorrow is probably what a cat looked like yesterday. But what a cellphone looks like tomorrow is not what a cellphone looked like 10 years ago, right? And how Chess is played tomorrow is the same as it's been played for 400 years, so, that's a really good application of it. What are some good applications of AI and things that aren't so good?

Okay, great question here again, so, I think you've sort of nailed the whole answer. So imagine that your goal is somewhat fixed, right? And we as humans know what that goal needs to be. So if you could figure out that all that you had the system to do, was to recognize cats in a picture—and this is a very, very well defined problem—maybe we mess up how we train the model. We are not careful to how it can be adapted and so forth, but within the scope of these sorts of problems, where the goal is really well defined.

Chess, for all its beauty, is still a constrained problem, right? There is a fixed space that you can explore and maybe I'm over trivializing this, but in some sense it's a constrained problem. It's here that we have made lots of good progress and at least the algorithms that we are inventing are enabling us to make lots of progress in that sphere.

Now what it is not good at is to be able to do a longer term task. So imagine that, there was this interesting problem that someone was talking to me about… If you wanted to graduate from a school with a good GPA or if you wanted to land a specific good job, now what is the set of courses that you would have to take, how would you have to perform, and so on and so forth? But the kind of data that we had to solve this particular problem through an AI system, it became so trivialized that it was almost laughable, the sorts of things that came out of it.

So in terms of a long term projection where the path is pretty fuzzy, and it really comes down to human experience and having to talk to bunches of people and constantly learning and readjusting and so on and so forth. These sorts of longer term goals in which the end state is not as clear, we have a long, long, long, long way to go.

Do you think that human language can be understood by a computer? Can you predict the next thing I'm going to say by studying everything everybody's ever said before in the history of recorded words?

Okay, can the computer predict it? Yes, with maybe a high probability. In fact if we had taken all of Byron's conversations over lots of lots of times, we are able, to a good extent model what your next statement could be, but would it be able to understand the context in which you are going to do and to have a meaningful conversation off of that?

Again... let's call it a longer term objective, right? So within a shorter term context, right now, what would you say as the next word in a sequence of words? So within that window, maybe we might do a pretty good job, or maybe some phrases that you might use, as fantastic or whatever, so these sorts of filler things, just because by the frequency in which they appear and the context in which you say it, maybe yes. But can a computer have the sort of conversation we are having, where it's able to reason over long sentences, relate back to the fact that I said something about Austin, inject it? I don't think we are there yet.

Banana, banana, hedgehog. Now, no machine would have ever predicted I would say that next, would it have?

No it wouldn't have, no.

And likewise, if I started a sentence and said, "Today I feel really..." It doesn't have any way... it could know every time I've ever said that before in the past, and it would have no insight whatsoever as to how I would feel today, right?

Okay, but, this is going to sound sort of creepy, not to say that people are building this in any way—again, the notion is not to understand everything that's going on—but imagine we had this sort of complex system that knew, and if it had monitored you, this is where the big catch is when we talk about how intelligent these things can be, and if you were to personalize it to this level.

Imagine that we could have observed Byron and the way he uses language in a variety of different contexts, but see it's not just the speech patterns, it's the context in which you are saying things, right? And maybe if I take it a level creepier, if we had monitored how you did chores around your house, maybe you woke up mad, and maybe it even comes down to the physical signs. Maybe you have an elevated blood pressure, your temperature is a bit high and your pulse high, etc… If we had gobs of these sorts of context, then maybe it's possible to say, “Hey, in the context of these things, given work he has done in the past, this is what he would predict.”

But the amount of signal that needs to go in, and the amount of data that you would have had to have to make this sort of a prediction, is almost insane. To your point then, it comes down to ‘do you even trust a system which would look at you at that level?’ I'm going to say it's not infeasible, but again, requires a lot of work to be done.

Well that's really interesting, because I guess your thesis is that if you had cameras that saw everything I did, and read every emotion in my face and, to your point, my blood pressure and my pulse, and the tone of my voice and all of those things...

Exactly, right? So I want to counter it with, I don't know or a good friend or a spouse, at some point we kind of know what this person is going to say, right? What makes that possible? It's years and years of observing how Byron behaves in certain settings and maybe if you have been friends with someone for longer, and you've come into a restaurant and you're looking mad, they can look at your face and they kind of know that you're mad, right? And they know what Byron's ‘feeling mad’ vocabulary sounds like, but how do we as humans even learn this? It's observing how you behave in these contexts and getting all of these cues, it's not just based off of the words that you say.

So, I asked the question about language because, you know, the Turing test is a commonly known [indicator of whether] a machine can think. Turing said that if you're chatting with it... like text messaging it, and the computer gets you to pick it 30% of the time, you have to say that the machine is thinking. I guess what my question’s boiling down to, is if you had to build something that passed the Turing test, would you use machine learning primarily as the tool to do it? And the only reason we don’t have that now, is we simply don't have enough data.

It's not just a question of data. Do we have algorithms that can do what we call 'causal inference?' Do we have algorithms that are able to have a good sense of how probable any possible situation is? As an example, and again your oversimplification write up and as I said, as long as your task is well scoped, you will be able to define a metric for it, and you can train a model. So machine learning works perfectly in these cases.

Now, for things like a long haul conversation, what is even the metric that you would tune the model on? It goes beyond the scope of the sort of research that we have done in this. Maybe we have started to think on those lines, but it is research that still needs to be done, right? So I'm not going to be able to tell you it's just machine learning with lots of data. New fields of research even have to be invented for this sort of thing to happen, so it's not machine learning as we know it, which is going to solve this problem.

The first thing I always ask whatever bot I come across, is "what's bigger, a nickel or the sun?" And no bot's ever answered that, and we understand why a person would know a nickel—I'm referring to the coin—because it's round like the sun. But here's a hard question for a computer, which is: “Doctor Smith is eating lunch at his favorite restaurant when he receives a phone call, where he rushes out the door neglecting to pay his bill, are the owners likely to prosecute?” And a person takes that all apart and says, "Well, Doctor Smith's eating at his favorite restaurant, and he's a doctor, he probably just got an emergency call and so, no, they're going to not prosecute him. He'll settle up another time."

But what I just heard you say is [for] a question like that, we don't even have the basic tools to teach a computer to answer that question, and we're going to need new algorithms and new data and new ways of thinking to do it.

Right, so when your question was as simple as "Can you predict the next word I say?" even for that I had to say, "Hey you're going to forget my..., blah, blah, blah." Right? But the whole context of this thing, yes, somehow as humans, we have learned, right? Maybe because we have been in similar situations, we know when humans have different relationships with owners, with the proprietors, the knowledge that goes into it, and how do you represent this knowledge in a way that computers can reason with it? Yes, we have barely barely scratched the surface.

I'm going to ask you a series of questions now. How do you think people are like machines? Because I can confidently say if you hit your thumb with a hammer, it's going to hurt, and then somebody would say, "have you ever hit your thumb with a hammer?" And I'd be like, "oh yeah, of course," and then they would say, "when?" And I can't remember any time in particular. So somehow humans are really good at taking data and somehow extracting metadata or learning or something out of it, like we can remember the conclusion: “Don't hit your thumb with a hammer,” without having to remember ever having done it. Do we do anything like that in machine learning? Or is it just simply, brute force studying of data?

I'm hard pressed here to come up with a good analogy. Are we able to reason in a way or be... Okay, I don't think so, at least I'm not aware of any work that is able to extrapolate even at this level and I'm going to give you a pretty weak answer over here. Maybe the closest that I can think of is just in terms of one of the most talked about things, like language translation as an example, right? So maybe they do not construct sentences in this way, and try to translate it to other languages, but, even that doesn't quite compute. No, so my answer is going to be ‘no.’ I don't think we have machine learning algorithms which are able to extrapolate in the way you're talking about, because of this context. I cannot apply it in a different context just because I remember it, no.

It's almost as if machines are able to play Chess, but they're not playing Chess in any way shape or form. It's kind of like you can use a microwave to make a corndog or you can put it in the oven, and they both create a hot corn dog, but they don't work anything alike. And so I'm curious with people, you can train a person with the sample size of one. I could draw a realistic drawing of an alien with tentacles coming out of its nose and 14 feet and 9 eyes and whatever, and then I could say, "Find that in this series of photographs," and even if it's upside down or half obscured or what have you, I can go, "there it is, there it is, there it is..." So, why can't we do that with machines, why do we need so many examples of something to teach a machine, but a person can do it with one?

Awesome question. I'm going to borrow a little bit of what I heard at [a conference] today, right? Alex Polla, one of the leading researchers was talking about the perfect storm, and how there is this good confluence of the data and the infrastructure and the tools, which also sort of influence the algorithms that we are coming out with. And I'll stop my reference to him, but, going on with my personal take on this, so if you look at the algorithms that have come out, that is driving some of these recent things about how computers are able to do a fantastic job at recognizing objects and whatnot, it really comes down to it learning from gobs and gobs of data.

The sort of algorithms that you need to fit this giant convolution and neural net with so many different parameters, and to be able to train a network at that scale, you really need an enormous ton of labelled data even to pass it back and forth so that this network even converges. Now that's one aspect of machine learning and by all means, it serves its own purpose, it's a good field of exploration and we have the compute power. Let's go along that, because it's serving some specific need.

Now flip it aside. Is this the most efficient way to train something? Perhaps not. Should we reimagine modelizations which enable us to do these tasks more efficiently without using as much labelled data? Absolutely yes, but again this is research that needs to happen, because at the end of the day, the sorts of algorithms that they have been playing with, which are more mainstream now, require gobs of data, right? And I don't think… it's an interesting question.

If you were to put a gun to my head and say, "Hey, come up with something that recognizes a cat with just one example of it?" I'll completely overfit and obviously it's going to fail. So, yeah as a field, we need to evolve more in terms of how these things are designed. But I want to add on a little bit as well, right? So if you see what has driven most of the progress in this field, it stems from the ability to have tons of data, right? The fact that AI is having a second resurgence, (a) is because of the compute power in fact we now have, so we are able to train these giant networks, and (b) it's also access to the tremendous amount of data that we are able to collect as well—the fact that searches are worth several billions possibly a month.

So the ability that they're having this sort of data, the amount of digitized text, the amount of online photographs that have been tagged, that are being made available, so maybe it's the richness of data that is making us think about, “Hey we have all of this, now what are core things that we can build from having access to this data?” Maybe that is what is driving the current field of work in a certain sense, and it's going to take us a while to now step back and say, "Hey maybe it is not as efficient, what can we do better if we had less?" Because right now we are suffering from a richness of data.

If I were to ask you to imagine a trout swimming in a river, and imagine a trout in a jar of formaldehyde in a laboratory, okay? These are two things you probably don't have a bunch of familiarity with I'm guessing. [If] I were to say, “are they the same temperature?” You would say, "well no." Do they smell the same? "No." Do they weigh the same? "Yes." Are they the same color? "Maybe." And I could just throw a bunch of things like that at you, and with little experience, little familiarity with the topic, you can answer all the questions almost instantly and nail them.

What we're doing there I'm guessing, is taking a bunch of knowledge, but you may not have a lot of familiarity with anything in a jar of formaldehyde, so your brain is saying, "well I'm just going to imagine it in fluid, and I know formaldehyde has an odor maybe. And I've never seen a fish in a river, but, I know what animals are like." And so we do all of these multi-level transferred learning things that we don't ever deliberately think about. It just comes so naturally, and it makes me wonder if we have this incredibly sparse matrix of data, and we have this incredibly rich knowledge that comes out of it because we're so good at just intuitively applying something here from there.

You bring a great point to this. I want to say something about Tom Mitchell who is a professor in Carnegie Mellon University (CMU), who was doing this work on NELL, which was ‘never ending language learning.’ Striking a chord of thoughts over here, so why am I able to answer these questions? So, it's life experience overall, the fact that you have heard somewhere that trout is a fish and this is what fish do in rivers, and you know about labs, you know about animals in jars, and so, you've seen this, and somehow you have all of this experience that you're able to reason out of.

Now, that's a lot of data, right? And so, what Tom did with NELL was basically to use textual data, so all the news articles of the world and all of the written stuff, to compute something that is always learning, learning about relationships with an entity, what words appear in the context of other words, so you're able to reason on top of this sort of knowledge base that you can build. But think about it, it's just one aspect of it right, so, you might have to go find gobs and gobs of written text about how animals in the wild differ from animals in labs, and to be able to infer, ‘hey, there is this smell associated with it.’

To some extent, we as humans are learning from a tremendous amount of sensory signals as well: the things that we see, the things that we smell, they're experiences that we have had which sort of help us to see, to your point, transfer from one environment to another. Maybe that's what is helping us, and even though I'm saying it's ‘data richness,’ maybe I should have qualified it to say, ‘data richness in the form of written text.’ Maybe we are getting there with voice-based communications, but in some minimalistic way still, but how do you encapsulate everything that you learn as a human? People always keep saying, "How do babies learn from these few examples?" But it's not just that, the thing that you observe. How do we encode this information, get it to talk to other parts, what's the architecture that you would have to build? Becomes an interesting problem, and in fact, an amazing problem to think about.

Well I will only ask one more human question, which is: If humans do this, if we hit our thumb with the hammer, and we don't remember it but we remember the learning, and somehow that information is encoded in a brain in a way that we don't understand, because there's not a location where the fact hitting your thumb with the hammer hurts, there's no location in my brain where you say, "ahh, that's that, if I cut that out, you'll no longer know that." Then we're able to like cross-pollinate all this knowledge effortlessly, like we're not even thinking it through. We just "instinctively," know these things, and then we have all of this knowledge. Isn't that so dissimilar to what we do with machine learning—that we can't really study the brain to know any more about how do we make better machine learning algorithms?

So maybe this is a good segue in some sense. I guess the larger question we have to ask ourselves is… we have so many interesting problems that we need to sort, with data, there are so many ways in which life can be made more efficient, if they used existing tools per se, to solve some of the interesting problems. What is the end goal we are getting to here? Is it to create a replica of a human mind and a human brain?

No, no, I agree completely with where you're going, and that's what I was going to say… What is it that we do something that seems almost magical and we get these kind of amazing results, but machines do something else completely different? They do math very quickly, and so, flipping that coin over, what are things that machines can do now that no human can even come close to? What kinds of things are these machines going to let us do that all the people in all the world would never be able to do?

I'll give you an example straight out of my current application that I'm working on, or even anything else that I've been thinking of... search engine as an example. Imagine that you literally had - when did I look this up, maybe a couple of days back - close to 1.9 billion websites in the world. Imagine that I told you as a human, to—given a search query—find me the best out of this 1.9 billion websites that particular webpage that I need to answer to this particular person. Can we as humans, even think about solving this problem? It's impossible, right? Even if they put together all of our lifetimes in some sense.

These are the sorts of tasks that machines are doing a great great job at these days, that humans are not even thinking about. Now the question is, “Do we need to spend our time thinking about that, or do we try to make lives more efficient in other ways, so, really making sense but not necessarily reasoning?” Maybe trying to find patterns which humans have encoded in some way, to be able to retrieve nuggets of information from large treasure troves of data is what I think machines are doing a really good job at these days.

And so what are some examples of things that, whether we have the data yet or not, paint me a picture of all the good things that you believe we're going to use machines to do. We know they'll diagnose disease, they'll match it to cures better... but what are some other ones? Just stimulate our minds for a moment with your wishlist of problems you would love to see machines focused on.

Anything that has got to do with human loss of life. As an example, let's say [regarding] the whole California fires [situation], the point that I was trying to ask myself is, if they had lots and lots of good sensor data about what the wind movement was like, what the soil moisture level could have been, the precipitation patterns over the years, if they could sort have pulled together all of this knowledge… Of course we didn't have sensors for a lot of these things, and fires don't happen, thank God, as often. If they could have pulled together all of this, can we come up with a model that can prevent this sort of large-scale human devastation?

I think the sensor technology is there, it's cheap enough, the compute power is there, they are able to model these things a lot more efficiently. What is the direction in which the wind can move? People working in climate sciences have been doing this for a very, very long time, so places where human life can somehow be saved. I would love to see machines being able to help us more, maybe they're even able to prevent—it's very hard to predict where exactly a fire is going to happen, that's almost impossible in some sense—but, once it happens, what could be a good containment plan to prevent this from spreading further? Again I'm trivializing the whole problem, and maybe I don't understand the space enough, but if we could do something on that scale, that would be something that would be amazing for me.

The second part is of course health, which is getting a lot of good attention these days. I'm going to share something about pregnancy: for the number of humans who give birth to children, it's still a very unknown process, and there's is a lot of anxiety about whether the kids are developing well. There are all sorts of these loss of life [cases], which could have been avoided if they had somehow gotten the data together; and heart attacks as well, in the sense that the sensations that you get before you actually have a heart attack, are pretty… but this is something I would love to see. I'm hearing of folks putting together all of these physical symptoms, the medicines that you have been taking, what are your sensations now? So, between the point that you start getting the feeling of how your heart is not doing well to what would be the next steps before you get to a hospital... So anything that can help us, what can I say, ‘reduce the unnecessary loss of human life,’ is where I would love to see machine learning take us for the better.

You know the tricky thing about it of course if that the same tools that we build to read through all of ...

Can be used for very bad reasons...

Right, it used to be that everybody's privacy was protected because there's just so many people and so much data. But now, every phone call can be voice to text, and now AIs can read lips, so you've got cameras everywhere... and facial recognition is coming into its own, so that a despotic government will find it ever easier to build profiles on every single person based on every word they say and all of that. I'm an optimist by nature, I write optimistically, but there's no easy answer to that, is there?

No, you're perfectly right and I think for every good thing that you want to create, there could be nefarious elements that want to use the data in a not so nice way, right? Of course I think anywhere where there is money to be made, unfortunately people get creative about how you could misuse or abuse the data. Now how do we prevent this from happening? And also to the larger question of what is my data being used for? Maybe I don't care about having my data shared for whatever reasons, it's a personal choice at the end of the day it's my signal, it's my data. It's a question I think we as a society need to start thinking about soon, because with the more development that we have with AI, the more data that is possible to get collected.

Right now, even the watches that we wear on our hands are capable of collecting how good your heart rate is, how much are you moving, is it a healthy lifestyle, this is already being done. And we rely on the collectors of this data to stand by their promise that yes, we are not going to use your data in some sort of nefarious way. But who enforces it? What happens when push comes to shove?

These are questions that I think as a society we haven't even started thinking about, but in all fairness, for us to have evolved to a state that we understand what a law about, something has to be… that thing needs to have existed for a while. And this whole revolution with data and all of the things that we can do with it, the possibility of doing these things, if you think about it is fairly reasoned, and in fact has happened within, not even our whole lifetime, right, a part of our lifetimes.

I think they are at the cutting edge in some sense, and it's heartening to know that companies want to come together to create an ‘AI for good.’ What are these policies? It needs to have a lot more voices on the table; it needs a lot more participation than just a few big companies putting together a consortium of sorts, even though they control a lot of the data, but how do we re-imagine the field going forward? We have our work cut out for us, but it's an exciting time as well.

Do you think that privacy as an idea was just a really short lived fad? We used to live in these small communities, 300 people or whatever, and everybody knew everything about everybody else, and you lived with your parents and your grandparents, and everybody knew everybody's business. Then the industrial revolution came along, people moved out, they moved to the city, anonymity was born, they got an apartment, and since then, people have had a lot of privacy.

Now, all these tools make our lives radically transparent again. Do you think the normal state of humanity is no privacy, and we're just returning to that and you kind of have to get used to it, or not?

Okay maybe it's a weird thought experiment, but even in your version before where it was a small community where everyone knew everything in some sense, I would still say we had some control. It still was up to you to decide yes, people knew when I went in, when I went out, but they didn't know anything more than that. They possibly could not have seen what books did this person read in their own leisure time, what music did they listen to.

Alternatively, yeah they know you went to a museum, so in some sense you still had a notion I would not say that we never were private, maybe there was a semblance of that within a small community and of course your family knows a lot about you. But these days we express a lot through online media and whatnot, and that gets us into trouble more often than we would like, but I would say humans have always had a notion of privacy, and we always will want some amount of privacy and control of what everybody knows. The further away, the bigger your network gets to know, the more uncomfortable we become, and I don't see that changing personally speaking.

So you're the head of research at Criteo. Tell us a little bit about your company and what you do there, what are you researching?

Okay, so what Criteo does is it helps the brands of the world and the retailers of the world to get the word out about the products that they have, to the users around the world. To give you an example, and maybe this is an interesting one, so let's say, as an example we have a big box retailer or let's say a maker of some fancy shoes and they have this new product, and they want to figure out who is going to buy this product, so they can show it to them and convince them to purchase.

At the end of the day, advertising is what gets people aware of what is being sold, and that has always been the format played since ages past. Now the question is, “How do you reach people online in a meaningful way?” So obviously if you're reading some very not so great article, or it's in a very different context showing an ad [that] to you doesn't make sense… When does it make sense for us to show an ad about a product that you're likely to purchase?

It needs to have some sort of a value to the advertiser to show the ad to you as well, or maybe there's awareness in some sense that they want to tell the consumers of the world, “Hey, we are here and here is the shoe that we buy,” so how do we make that connect? Maybe I live in a very hot part of the world, and I'm never going to buy a woolen coat as an example. It's meaningless to say, "show this ad to this person out there," because you're never going to convert. So, how do you find the right people who possibly could be interested in the products that advertisers and retailers and brands want to get them in front of so a connect can be made and hopefully everybody's happy on both sides? That's what we try to solve.

And so as the head of research, what are some of the kinds of practical problems you tackle?

Okay, so exactly the scale of the data, so these are anonymized sources in some sense. If you think about the sorts of data that we have, Criteo as an example feeds close to 200 billion ad requests a day. That's200 billion, so we need to be able to decide, hey, do we even get to show an ad at this particular instance? How much do we have to pay for this particular opportunity so that it's meaningful to the advertiser and there is a reasonable chance that there is value that is going to be generated from this transaction? The scale of that, it has to happen, and after we do the number of personalized ads that we show so that it's meaningful, that the woolen coat is shown to a person who is living in a cold country, is actually 3.5 billion.

So how do we make sense of all of the data so that we are able to show the right ad to the right person, is a challenging problem overall. Going back to something that we were touching upon earlier, whenever money is involved, and that is always this element of people getting creative, so the advertising ecosystem for the most part is a pretty complicated mechanism, because money is involved. There are always publishers of sites online who are trying to monetize.

Another thing we as humans have to work on is the economics of free content in some sense. Today what drives that is online advertising. The publishers are trying to make the people who put up the content: the New York Times, the CNN's of the world, and I guess even the GigaOm's of the world show ads, so this is how you monetize. Now the question is, they are trying to make the most money, the advertisers obviously do not want to overspend, they only want to spend the right amount of money so that they are able to drive customers to come to their store, to buy things at their store, so there is a sort of a conflicting thing. There is a part of the group which wants to make a lot and there's a part that needs to be efficient, so how do you balance this whole transactions in a way that everyone's happy at the end of the day?

As you can imagine, trying to make everybody happy is a hard problem, so, I'm oversimplifying this, but how do you come up with the algorithms that are sensitive to changes in the market—which still try to maximize the revenue goals for our advertisers—is fairly challenging. It requires us to be thinking outside of some of the traditional machine learning algorithms as well, and of course, the scale, because at any given point, being able to field as many ads as the ones I was talking about, it literally means that we have less than 10 milliseconds to make a decision sometimes. The latest greatest deep net model with 100 layers would not suffice, so you still need to be intelligent for the features that go into these models. Creating new areas of research around these topics is what the group focuses on.

So do you believe that the kinds of technologies you're building have general purpose beyond your specific application? Are you building tools that can be used to solve all kinds of other problems, or are the kinds of problems you're tackling really very narrow to the task at hand?

Great question. This is something that is interesting, so I think of machine learning research overall as being transferable in some sense, that the algorithms that you come up with are not necessarily driven by the application, application being advertising versus music prediction, versus healthcare prediction and so on and so forth. The short answer is, no we are not building very specialized algorithms to this use case.

As an example, let me talk to you about recommendations. After we decide an ad needs to be shown, we need to figure out what product we put in that ad which makes sense. The point of this whole exercise is to understand user preferences, so you match them up. Now where do you see the commonality to this problem? Almost anything else that you do. Music that you hear, the news articles that are being recommended to you, maybe the movies that Netflix wants to cue up. So in a broad sense, the tools that we build are transferable to other parts or other similar tasks. Of course advertising has its specific constraints, but the constraints are not baked into the model itself for the most part.

So you know, I think everybody is on board with, "I don't want to be shown a wool coat if I'm living in the tropics." A lot of people find that to be good, and the ad you click on isn't, if you click on it you're like "ooh, all right," and so I guess in theory I would want to click on, I would want to find more things that were click worthy as I shop.

I certainly find on Amazon, that [when] I was ordering something one day, I don't even remember what, and it said, "Do you want these robot salt and pepper shakers that you wind up and they walk across the table?" And I'm like, "yes I do." And you know, I never would have found those on my own, but luckily somebody ‘like me’ discovered them and bought them, and now they got my $11 too.

I'm kind of curious what does that alchemy look like? What sorts of factors do you find are ‘telling’ in terms of predicting what I would be interested in? You mentioned geography, but what are all the variables that you're going into (because you're not monitoring my heart rate at this point...)?

No, so perfect example. You said that thank God there are some people like me who also expressed an interest in these little robotic salt shakers, which sounded pretty cool by the way. In this sense a lot of it comes from wisdom of the crowds, so as much as we would like to believe that purchase patterns are so very very unique, it's easy to think that as humans there are other people who are possibly similar in the way we purchase things at least. It's definitely true of life stages in the sense that when you are moving out to set up your own apartment, when you end up getting married, when you end up having babies.

There is a lot to do with context in which these purchases are being made, which you can extrapolate in aggregate from, you can learn from the wisdom of the crowds. So besides the geographic aspect of it, the second thing that I want to highlight a lot more, is the sequential nature of our purchases. Yes the robotic salt shakers were pretty cool, but… when I had a baby a while back, the sorts of purchases I had to make… of course I had no clue. You start looking up what were the reviews on this, but given what I skew towards, maybe I only buy things from a certain specific brand, or maybe I'm interested in a particular facet that it's non-BPA and so on. You would have been able to infer these sorts of things.

Again, you don't need to know my specific preferences, but you can learn from the wisdom of the crowd. It comes down to some aspect of ‘what are your preferences over a period of time?’ What is the context in which you're currently shopping? As an example, and quite a bit is what we call the ‘collaborative filtering’ aspect, what are people like me also interested in? [These] are some of the factors that go into the recommendations.

Are there factors you deliberately wouldn't use, that at some level, you think they're just too controversial or they're too invasive or what have you?

It's a great question, so, at least at Criteo our focus has been pretty much on the retailers' side of things and maybe when it comes to really really specific aspects, I don't know if you remember this story from a while ago, about Target sending an ad to...

Right, the woman who was going to have the baby before the family even knew it?

Exactly. These are the sorts of things that advertising as a field needs to watch out for. How do you make sure that you don't end up putting people in some sort of a compromised situation, is the part that I feel we need to pay a lot of attention to in some sense, but this is a hard problem. I think Target's problem was a little bit more, because it was also a flier that they sent, so not necessarily an ad that was shown on her device. They are not necessarily good at figuring out, who was looking at the device right now, so, if that ad had shown on that girl's phone, and if her dad happened to be looking at that phone, maybe because it might have looked random, and it could have been explained away in some sense.

But I wish, as a field, advertising needs to be a lot more cognizant of ‘where do we pass the line.’ Recommending a woolen coat or robotic salt shaker seems innocuous enough. But when it is something a lot more personal, then we need to be a lot more careful that the user is really interested in this, as opposed to us trying to be too smart about it.

I see, but even in your example about shopping for stuff for your child, your newborn, that would be in that area that you could [object to], in theory.

But it's something that I'm choosing to do, right? Like I'm happy to engage with these sorts of ads because I really need help in figuring out what products to buy. So, maybe this whole notion that user consent, that I wanted to be shown ads of this type, would really help, and having been in recommendation systems for a long time now, everybody complains when recommendations are dumb and stupid, but as humans we somehow expect machines to be smart. Like how can it show me this dumb ad, or how can it show me this dumb article, but the cognitive overload to get that feedback in… and in fact in my previous company, we had tried proposing all sorts of controls to let humans control their feed.

So tell me that you're not interested in this entity, that you don't want to see articles like this so I can learn better. You would imagine that people used these tools a lot more to curate their experiences, but the uptake that we had was so trivial and it was not for lack of better UI design, because we tried out many different variants of it in some sense. There is this, I also want to say unreasonable expectation that we expect things to be perfect, and how do you draw that fine balance without asking for too much feedback is an interesting problem.

Retargeting isn't so much ethically problematic, but it certainly can be annoying. I was on this office supply place website to order a ream of paper for my printer, right? And I saw that you could order one ream, or a box of five or a pallet, and I was like, “I wonder what a pallet of paper costs?” And so I like click on this pallet of paper and sure enough, it's $1900 worth of paper, it would fill my garage kind of thing. Then I clicked back, and then for like two months, every website I go to is trying to sell me a pallet of paper, because I made the mistake of looking at it. I'm not going to buy a pallet of paper people... Luckily we're getting [to] a place [where] the worst thing is when you buy something, but the system doesn't know you bought it, and so it keeps showing you ads...

Yeah we also call it ‘the fridge problem’ right? Hey I've already purchased a fridge, I am not going to need one for the next five years, stop showing me ads. So, these I think are common sensical things. Hopefully the ad that you were seeing was not from Criteo. I sure hope so. If it is, please send me an email the next time.

No, what they're getting good at is, ‘I don't want to see this ad anymore,’ and then...

No, but you would think that even we are able to learn. If you had an office supply manager, right, we should have seen in the signal that people who buy pallets of paper, also are interested in buying a cartload of printer cartridges, or they're buying huge rolls of carpet for their office. I'm just making up things, but there is something in the history of your purchases which is what I talked about, the linearity and the context. So who is likely to be interested in buying a pallet of paper?

And to be fair to you, do we need to show an ad if you're this one time buyer who knows for a fact that you are going to buy a pallet of paper? So yes, maybe it helps to show you some options, but it has to be more than just one fact leak on an ad. There are reasonable things that advertising systems can do, but why is advertising overall as inefficient? Because I think that the incentives are not necessarily lined up.

There are some folks who are okay with just being for clicks, so the click could be on any random ad and you get paid, and so yes let me just put the worst looking ad, and I hope you don't remember, but these yellow teeth ad which used to show up for a while for a long time, like why are these ads even being created in the first place? And so maybe just optimizing for clicks is not a good idea. You need to be optimizing for something that matters, that it's actually a sale that is going to happen, and there's this whole notion of incrementality that I personally believe in. If you were anyway going to buy that exact pallet of paper, it doesn't matter that I'm going to show you 20 ads about it, because you already made up your mind, you're going to get it, so why do I need to show an ad?

But what matters or what should matter for our advertiser is what else do you buy because of that ad that was shown? It's not already the thing that you were supposed to purchase, so I think the more the industry measures, again I believe that online advertising is a fairly recent industry in some sense, and we are just about figuring out what is the right objective functions, what should matter for our advertisers at the end of the day, what would be the right business model that we would have to build? These are questions that are actually being worked on, and Criteo was also in the space of coming up with business models that make more sense for our advertisers, so, I think yeah, we are evolving.

Well that is a good place to leave it. It's been a delightful hour. I want to thank you for taking the time out of your very busy schedule to chat with us.

Thank you Byron, I had a fantastic time chatting with you as well.

Byron explores issues around artificial intelligence and conscious computers in his new book The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity.