Episode 99: A Conversation with Patrick Surry

Byron speaks with Patrick Surry of Hopper on the nature of intelligence and the path that our relationship with AI is taking.

:: ::

Guest

As Chief Data Scientist at Hopper, Patrick Surry analyzes flight data to help consumers make smart travel choices. Patrick is recognized as a travel expert and he frequently provides data-driven insight on the travel industry and airfare trends. Patrick's studies and commentary are frequently featured in outlets such as New York Times, USA Today, Wall Street Journal, TIME, among many others. Patrick also regularly appears on various broadcast stations to offer travel insight and tips. Patrick holds a PhD in mathematics and statistics from the University of Edinburgh, where he studied optimization based on evolutionary algorithms, following an HBSc in continuum mechanics from the University of Western Ontario.

Transcript

Byron Reese: This is Voices in AI, brought to you by GigaOm. I’m Byron Reese. Today my guest is Patrick Surry. He is the Chief Data Scientist at Hopper. He holds a PhD in math and statistics from the University of Edinburgh. Welcome to the show, Patrick.

Patrick Surry: It’s great to be here.

I like to start our journey off with the same question for most guests, which is: What is artificial intelligence? Specifically, why is it artificial?

That’s a really interesting question. I think there’s a bunch of different takes you get from different people about that. I guess the way I think about it [is] in a pragmatic sense of trying to get computers to mimic the way that humans think about problems that are not necessarily easily broken down into a series of methodical steps to solve.

It’s getting computers to think like humans, or is it getting computers to solve problems that only humans used to be able to solve?

I think for me the way that AI started was this whole idea of trying to understand how we could mimic human thought processes, so thinking about playing chess, as an example. We were trying to understand – it was hard to write down how a human played chess, but we wanted to make a machine that could mimic that human ability. Interestingly enough, as we build these machines, we often come up with different ways of solving the problem that are nothing like the way a human actually solves the problem.

Isn’t that kind of almost the norm in a way? Taking something pretty simple, why is it that you can train a human with a sample size of one? “This is an alien. Find this alien in these photos.” Even if the alien is upside down or half obscured or under water, we’re like “there, there, and there.” Why can’t computers do that?

I think computers are getting better at those kinds of problems. I think humans have a whole set of not greatly understood pattern matching abilities that we’ve actually trained and evolved over thousands of years and trained since we were born as individuals that limit the kinds of problems in the way that we solve problems, but do it in a really interesting way that allows us to solve the kind of practical problems that we’re actually interested in as a species, to be able to survive and eat and find a mate and those kinds of things.

You know, it’s interesting because you’re right. It took us a long time, but it shouldn’t take computers nearly that long. They’re moving at the speed of light, right? If it takes a toddler five years, won’t we eventually be able to train a blank slate of a computer in five minutes?

Yes. I think you’re starting to see evidence of that now, right? I think we sort of started from a different place with computers. We started with this very predictable step-by-step binary system. We could show mathematically you could solve any kind of well-formulated mathematical problem. Then we decided [with] this universal computing device, it would be cool if we could make it solve the kinds of problems that humans solve. It’s almost like we started from the wrong place, in a sense. If you were trying to mimic humans, maybe we should have gone a lot farther down the analog computing path instead of trying to build everything on top of this binary computer, which it doesn’t really match the underlying hardware of a human very well.

We’re massively parallel, and computers just sequentially are enormously fast.

Also, this sort of digital versus analog thing is always interesting. The way human brains seem to work is with lots of gradients of electricity and chemicals and that is very different from the fundamental unit of a computer, which is this 0 or 1 bit. I think when you look at a lot of the recent work that’s being done in computer vision and these generative networks and so forth, the starting point is first of all to construct something that looks a lot more analog and a lot more like things that you find in someone’s brain out of these fundamental units that we originally built in the computer.

You know, records, LPs, they’re analog. CDs came along and they’re digital. Do you think people can tell the difference between the two when they listen to them?

I certainly cannot.

I can’t either. Yet, I think maybe it’s my own shortcoming. I don’t know. That’s not an approximation of an analog experience. It’s beyond an approximation to me at least.

There are people I know who claim that they can tell the differences. I think it’s with a lot of things. We’ve got to a point where you have a really high fidelity of approximation that you can’t really tell is different. You look back at the early days of television or the first computer monitor that I think I had way back in the day with my Apple IIe or whatever it was, there were four colors. You could individually see every box on the screen as a little pixel.

Now you have an 8K TV. If you’re not within an inch of the screen, it looks like a completely continuous picture. It’s sort of to that thing. I think with the CD, once you get to a certain level of digital approximation, it may not be the most efficient, but you can trick most of the people.

Do you think a general intelligence is possible?

Yeah, I think I’m in that camp that it certainly feels and looks like a lot of the things that we’re starting to be able to do now with computers is being pretty close to replicating things that we thought were uniquely solvable by people. Chess has always fascinated me. I’ve been a big fan of both playing chess but also the advances in how computers play the game. I guess in the early days of computers, we were using all these tricks that played the game in a completely different way than humans think about it.

Now we’re getting to a point where it’s not clear any more that the computer is really playing a different game. You read the latest books by Kasparov and it’s fascinating to see how he thinks about it as the first person to lose to a machine. Now these really interesting hybrids, the most powerful chess players are these hybrid teams where you have one or more people paired with their favorite computer, and they can beat both the best people and the best computers individually.

That’s Kasparov’s thing. I don’t know if you remember that he famously said right after he lost, “Well, at least Deep Blue didn’t enjoy beating me.” I guess that’s my question. Will we get a computer that will enjoy at some point beating you?

I think maybe that depends on what you mean by ‘enjoy.’ I think we’re still at baby steps that even where our computer is doing something that is hard to distinguish from a human, it’s working in a very narrow space. In some sense, it’s enjoying the fact that it can win because it’s getting feedback about the moves that it makes in the previous games that it played. That’s how it trains itself by getting this positive reinforcement. We wouldn’t say that’s an enjoyment because it’s not part of this bigger holistic organism.

It feels like as you make these machines more and more general purpose thinking machines that it will start to exhibit some of the things that we might identify now as these kinds of emotional responses. It’s kind of a generalized feedback mechanism.

Let me ask a different form of the same question, which is: the way we are having success with AI right now is we say “Let’s take a lot of data about the past and let’s study it and let’s make projections into the future.” Is that a fair description of how machine learning works?

I think so. There’s certainly for a long time been this sort of directed learning where we’re presenting lots of historical examples and a bunch of right answers in saying how can we figure out patterns in that historical data that help us guesstimate.

It’s all predicated on the assumption that the future is like the past, right?

Exactly.

This isn’t my analogy. I need to figure out whose it is. Somebody pointed out that if you feed a computer everything about the orbits of all the planets and the moons and all that, it would be able to predict eclipses. It wouldn’t ever probably come up with gravity. It can take a bunch of data about past things and make projections about future things, but it doesn’t understand necessarily what’s going on. Is that fair?

I don’t know if I agree totally with the gravity thing. If you have a rich enough data set about positions of objects and movements of objects, you can start finding these simplified patterns. Gravity is just a bunch of equations. That’s a simpler way –

Let me say it slightly differently. How about this? A human could answer the question of what would happen if Mars disappeared overnight. How would things change? What if the moon disappeared? How would things change? If all you’ve got is data about our steady state universe and you ask the computer, ‘What would happen if I remove the moon or I shrunk the sun by half?’ It could have a billion years’ worth of data but wouldn’t know how to solve that problem. Do you still disagree with it?

I think so. You think of all this historical data starting off as this massive volume of unexplained numbers. It’s this huge data set. The goal of artificial intelligence or of any of this predictive modeling is to develop a much more compact representation in a sort of information erratic sense. I want to come up with a formula that explains all these massive historical numbers. If you come up with the right representation, and I think gravity or gravitational equations is a form of that, then you can say “the way I explain where Mars is, is because I know where all these other things are.I can tell ten minutes from now they’re going to move in a certain way because of this set of equations. That’s this underlying pattern.”

If you figure out what that pattern is, now you can say, “If I delete the moon or shrink the sun, I could explore what would happen in this alternative universe and how things might change.” Whether you really work that pattern out correctly, I don’t know. I have certainly seen some interesting examples where people can do that on a small scale. You have a data set of a ball bouncing and where it is at different points in time, and the machine can figure out that there’s a formula that simplifies that data set.

Let me ask a different formula question. I would love to get the specifics about the work you’re doing. This whole idea, ‘Let’s take a bunch of data about the past. Let’s make projections in the future,’ I don’t know how that gets you the Harry Potter series. I don’t know how that gets you Lin-Manuel Miranda’s Hamilton. Do you think it does, given enough books – maybe there aren’t enough books that have been written, but a million years from now, does that give you that level of complex creativity?

That’s a super interesting question. I don’t know what the ultimate answer to that is. I feel it’s a little bit like an enjoyment question that you asked about. You look at some of these generative additive networks that people are working on now and it seems like what you’re really talking about is kind of dreaming. It’s like this thing of ‘I built this representation of the world based on a whole bunch of past experience, which is unique to me and the observations I’ve made. Then I’m sort of dreaming about these simulated universes where I delete your moon’ or whatever it is.

Some of these dreams are really interesting patterns that other people enjoy because for some reason they resonate with the patterns that they’ve absorbed. Maybe far enough down the road you do that to a situation like that where you have these generalized enough computers that are able to meaningfully share their representation of the world. I think that’s probably something that’s missing now.

You can’t take one chess program and have it share its explanation of how to play chess with another one because they operate in totally different kind of niches and representations of the world. Once you get to that point where you do have a way for these things to communicate, maybe you do get a Harry Potter of the AI world.

I do have two more questions along these lines, and then I want to move on. My first one is: when it comes to general intelligence, the vast majority of guests on this show say “We don’t know how to build it, but we know it’s possible because people are machines. If everything in you is mechanistic, then someday we’ll build a mechanistic – we can duplicate it. General intelligence is a mechanistic phenomenon.” Do you agree with that statement that the reason we believe we can make a general intelligence is because we fundamentally think our intelligence is mechanistic?

I don’t really like to agree with that statement, but I find myself having to go down that path. The more layers you peel back, the deeper you look, it feels like all these processes are driven by fundamental physics and chemistry underneath. It’s a complicated way that we’ve assembled, but we haven’t been able to find Roger Penrose’s unique thing that makes us different from another machine. It does take you down this uncomfortable path that first of all, everything is predetermined because we’re all just operating according to the laws of physics. We should be able to construct something that replicates the way we do things.

I’m going to put you in the column of ‘yes.’ I’m still at 95% of people agree with that. The other question I want to ask you is (I get almost a 50/50 split on this question): Do you believe a general intelligence – we have machine learning. We can get it to solve certain tasks. Some people think we’re just going to be able to solve bigger and more complicated tasks.Eventually you evolve your way gradually to this general intelligence?

Other people say “No, we haven’t started working on general intelligence. That basic idea, that basic structure of ‘let’s take a bunch of data and just look for patterns and then project them’ isn’t actually – I mean, it’s great, but it’s not one step along the path to general intelligence.” Which of those two camps would you fall in?

I think we’re still only playing around the barest edges of that problem. Almost everything that people work on [is] immediately narrowed to a very specific domain. I think some of the approaches are in the direction of that sort of generalized intelligence where we’re not having to provide explicit training and we are able to develop these patterns within a particular domain.

I don’t think we’ve really made or tried to make a lot of progress on this sort of general set of feedback mechanisms and representation that you would really need for a general intelligence. Even in a broad domain, look at game playing, having a machine that you can give any board game and it will figure out how to play it with you, it feels like we’re a long way even from thinking about something like that.

The funny thing is [that at] the original conference in ’56 at Dartmouth where they thought they could solve AI in a summer with hard work, that was based on the idea that intelligence was probably just a handful of simple laws like the way physics is or electricity or magnetism [works]. Maybe that doesn’t seem to be the case.

On to you and the fascinating work you’re doing at Hopper. For the people that aren’t familiar with Hopper, talk a little bit about Hopper’s mission. Then tell us some of the AI problems you’ve tackled and overcome or still working on or what have you. Share some of your experiences working with large data sets there.

I think the mission at Hopper is really to help travelers make smarter decisions about their travel, so buying airfare and hotels. As somebody who used to travel a lot as a consultant, it always felt to me like this was an industry that was pretty archaic and broken in a lot of ways and also one that had evolved in a strange direction.

Back in the old days when I used to travel a lot, there was a person I could call and would figure things out and call me back and save me a bunch of time and money for the privilege of charging me $25 a booking or something. We as an industry moved away and said we don’t need travel agents. We can build these tools that let people make all their own decisions.

If you look at how people use them, people spend hours online trying to figure out a flight to buy and then end up buying something and feeling bad about it. They didn’t research enough or the price is going to change or they’re going to pay more than the person next to them. What we’re trying to do at Hopper is provide that advice to the consumer, ultimately recreating some of what that great travel agent used to do for you. Somebody who is an expert in the domain, we watch all the historical data that we can find about prices and we help make suggestions to you about when to buy a ticket.

When you look at the price, is it a good price? Is it likely to go down? Should you wait? Are there other options? If you’re looking for a beach vacation in February from Montreal, you probably don’t care which island it is in the Caribbean if we can get you somewhere cheaper. It’s more money to spend at the bar or whatever you want to do on your holiday. That’s the vision of Hopper.

We do all this on a mobile phone, so we have a great way to talk to our individual users about the trips that they’re watching. From a data science and AI point of view, the thing that’s fascinating to me is the scale and complexity of the data. Every day we collect something like 30 billion priced itineraries from airfare and hotel searches that people make all over the world. It’s all anonymous, and it’s to us, think of a giant stock ticker on Times Square where we’re seeing all these different trips that were available for sale at different points in time and what the asking price was for that trip. It’s a fascinating marketplace.

The complexity comes because there’s just so many things you can buy. If you think about all the different itineraries you can purchase at any point in time, it’s in the trillions. Imagine a Dow Jones where you have trillions of different stocks being traded. Even though we’re seeing billions of prices every day, we don’t know the price of most things most of the time. It’s a really interesting kind of environment. How do you do prediction on top of that when you have this huge volume of time series that are sparse but obviously correlated to each other in really interesting ways?

Help me understand how do you deal with that? What do you do? The way you set the problem up is there’s too many different fares that you can’t know them all at once. How do you use AI to solve that problem?

I think where we started, the way I always start with these problems is we start small, start looking at individual examples. We started by creating a whole set of relatively traditional statistical models for describing explainable features of travel prices. For example, if you look at the price of a ticket at the same time last year, it tends to be similar to what it is this year. There’s a very predictable seasonal variation. There’s also a really interesting variation with advanced purchase, so how far ahead of the flight you are when you’re going to buy a ticket or book your hotel, which is a function of the strange economics of the industry.

The people who are willing to pay the most for tickets and hotels are business travelers who don’t know they want the thing until a few days beforehand. The challenge with the airlines and the hotel companies is to sell enough of their inventory to make sure the flight or the hotel is going to be full, but save enough that they can sell at two, three, or even ten times the price at the last minute. That means there tends to be a predictable variation with how far in advance you are.

There’s lots of interesting variation by market. For example, if you’re flying from New York to Honolulu, prices don’t tend to rise very much at the last minute because it’s not a place where a business person is going to go and set up a meeting two days beforehand.

If you’re flying from New York to Chicago, you’re going to pay much more at the last minute. These predictable variations are things that we can study and model in aggregate. On top of that, there’s really interesting volatility that we look at, so the way that prices vary over time, how much they bump up and down as airlines try to maximize their revenue or hotels try to make sure all those rooms are going to be filled. We can kind of predict that. We end up with this huge multi-dimensional time series prediction problem.

Part of it is creating all these aggregates and then synthesizing models on top of that, which allow us to predict the central trend. Then that really allows us to make these estimates of how likely it is that a future price is going to be better than today’s. Our goal is not to predict exactly what the price is going to be tomorrow. It’s to tell the user [the answer to the question] ‘should I buy it now or wait because it’s going to drop?’ We want to know how confidently we can make that prediction.

In that example you gave about the Honolulu last minute is cheaper than the New York to Chicago last minute, is that a person at each airline who makes that determination? Or are they also just using models to try to maximize? They want to maximize total revenue for the flight. They can sell a few tickets at the last minute for a bunch or a bunch of tickets beforehand [for] cheap [prices]. Are they just using pure automated models and you’re studying this Homer system like we were talking before, the orbits of the planets, and making predictions on that? Are there people that kind of do that manually?

That’s a great question. That analogy is really good. Historically it was all done by people. There are market managers at the airlines and they’re responsible for a few routes. They would set these prices.

In fact, way back in history these prices were all regulated. Nowadays it’s mostly done by computers. In some sense it’s like an example of algorithmic trading where the sellers are all using this army of computers to optimize their prices. As a consumer you’re up against this army of machines. We’re trying to help you predict what their machines are going to say.

There are still some examples where people are involved. Those are interesting for extreme prices. Sales, for example, are things that are often triggered by people. The airline will decide they’re doing a marketing campaign or they want to meet a quarterly goal or something, so they’ll put a bunch of flights on sale. Although it’s started by a person, it has some predictability.

We find that these sales tend to start on Mondays and Tuesdays because it’s when people come into the office and make these decisions about overriding the machine. There’s definitely a combination. It’s mostly computers, but there’s a little bit of a human element in there. That’s what helps us to be able to make these predictions fairly accurately. There’s many nuances that I think we’re only just beginning to uncover in the laws of gravity, if you want to take that analogy.

There’s something on the order of 100,000 flights a day. You can buy 331 days in advance on my airlines. You’re looking at something like 33 million flights in a year. How do you get from 33 million to your trillions? I guess there’s three classes of service. That gets me to 33 million times 3. Now I’m up to 100 million. How do you step function up to the trillions you were talking about? What are the other dimensions I’m missing in that analysis?

The really interesting thing is the suppliers, the airlines are thinking in terms of flights. As you say, there’s not that many flights that fly every day, around 100,000. There’s only about 15,000 city pairs that are directly connected by flights. The way consumers buy travel is by trips.

You might be wanting to fly from Wichita in Kansas to Vladivostok in Russia. There is no flight that goes there.You have to buy a combination of three or four different flights on the outbound and another three or four flights on the return. Obviously, a lot of those combinations are rare. As soon as you allow connections and return trips, you’re squaring and squaring those numbers several times, which is how you get to the trillions of options.

If you think about an itinerary as a combination of even up to three or four connected outbound legs plus three or four connected return legs in the next year between something like 2,000 distinct airports around the world, it’s easy to get to those big numbers. It’s obviously a long tail. There’s a lot.

The data you have to feed your models with are historic data of – do you even know what the ticket actually sold for in the end or do you just know what price it was offered at in the end?

We just see the quote. The stock ticker analogy is pretty good. It’s basically the airline that was willing to sell the combination of flights gives a quote as to how much they would sell it for at a particular point in time. Even that, the problem of deciding whether there is a valid itinerary that an airline will sell is an interesting mathematical problem in itself.

It’s not a simple marketplace where all these products are listed and there’s a price tag. It’s an actual calculation to figure out how you can combine different flights together to get to a ticket price, which is actually a super clever mechanism invented back in the ‘70s or ‘60s to let travel agents with a price book and a pencil work out the price of these probably billions of itineraries at that time. Nowadays anybody who is building a marketplace like this would do it in a different way. It’s an interesting archaic thing.

You have bunches of users using this, right?

Yeah.

Do you have a network effect that you achieve as you get more – you in the end, do know what people on your platform spent, don’t you, or not?

Yes, exactly. The place where it all started is we were looking at all this external data about pricing. Then on the user side we build this really interesting network of how people purchase flights and how people interact with flights. Because we’re a mobile app, we actually have a really good understanding of who the people are that generate all these different searches, which allows us to do some really interesting things in terms of aggregating lots of demand. We know that certain users are flying from New York to the Caribbean for a vacation. If they’re flexible, which many of our users are, we potentially can steer them towards a particular destination that might be cheaper. We can do interesting matching between supply and demand.

The other thing that’s really interesting is to watch how users interact with multiple destinations. You can imagine somebody flying from New York to Barbados also will search for a flight to the Bahamas. That gives us a signal that those two destinations, even though they’re far apart, are related in some way. That allows us to make these interesting alternative recommendations. You as a user search for Barbados, but we see Bermuda is cheap. Then we might recommend that to you.

There’s a lot of subtlety even in that. Our offices are split between Boston and Montreal. If you look at how people search for Europe from Boston, there’s a lot of amorphous blob of destinations in Europe. People are trying to go to Europe. They don’t care so much if it’s Rome or Paris or London. They’re doing this European trip.

If you look at how people in Montreal think about Europe, it’s very distinct, particularly inside France. It’s very different flying from Montreal to Paris versus flying Montreal to Lyon. For someone from Boston, they probably wouldn’t care. There’s a lot more substitutability is what I’m saying. There’s these really interesting patterns in terms of where people live, where people are traveling. We can use that to help make smarter recommendations to people, depending on what their particular flexibility might be.

That is fascinating. It’s a lot like Amazon suggesting this product to buy. You want to fly from Boston to Bahamas, then maybe you would rather fly to the Grand Cayman or something. The app is Hopper. It’s all app based, right?

It is, yes.

Then I guess it’s on all appropriate platforms and etc. where fine apps are sold?

Yes, exactly.

What about you, Patrick? You’re a fascinating guy. How can people keep up with what you’re doing and working on and noodling about?

I’m just @PatrickSurry. I’m also easy to reach at Hopper. I’m just Patrick@Hopper. I spend a lot of time thinking about these problems and going around talking. I’m certainly interested in feedback and other people who are looking at these large scale market and recommendation systems. I think there’s a bunch of fascinating things. As I say, I feel like we’ve only really scratched the surface of this huge dataset that we’re exploring and all these correlations that are buried inside it.

Thank you for being on the show.

I appreciate it. Thank you very much. Great to talk to you.