In this episode, Byron and Gregory talk about consciousness, jobs, data science, transfer learning.
Gregory Piatetsky-Shapiro is a data scientist, co-founder of KDD conferences and ACM SIGKDD association for Knowledge Discovery and Data Mining.
Byron Reese: This is “Voices in AI”, brought to you by Gigaom. I’m Byron Reese. Today our guest is Gregory Piatetsky. He’s a leading voice in Business Analytics, Data Mining, and Data Science. Twenty years ago, he founded and continues to operate a site called KDnuggets about knowledge discovery. It’s dedicated to the various topics he’s interested in. Many people think it’s a must-read resource. It has over 400,000 regular monthly readers. He holds an MS and a PhD in computer science from NYU.
Welcome to the show.
Gregory Piatetsky: Thank you, Byron. Glad to be with you.
I always like to start off with definitions, because in a way we’re in such a nascent field in the grand scheme of things that people don’t necessarily start off agreeing on what terms mean. How do you define artificial intelligence?
Artificial intelligence is really machines doing things that people think require intelligence, and by that definition the goalposts of artificial intelligence are constantly moving. It was considered very intelligent to play checkers back in the 1950s, then there was a program. The next boundary was playing chess, and then computers mastered it. Then people thought playing Go would be incredibly difficult, or driving cars. General artificial intelligence is the field that tries to develop intelligent machines. And what is intelligence? I’m sure we will discuss, but it’s usually in the eye of the beholder.
Well, you’re right. I think a lot of the problem with the term artificial intelligence is that there is no consensus definition of what intelligence is. So, are you saying if we’re constantly moving the goalposts, it sounds like you’re saying we don’t have systems today that are intelligent.
No, no. On the contrary, we have lots of systems today that would have been considered amazingly intelligent 20 or even 10 years ago. And the progress is such that I think it’s very likely that those systems will exceed our intelligence in many areas, you know maybe not everywhere, but in many narrow, defined areas they’ve already exceeded our intelligence. We have many systems that are somewhat useful. We don’t have any systems that are fully intelligent, possessing what is a new term now, AGI, Artificial General Intelligence. Those systems remain still ahead in the future.
Well, let’s talk about that. Let’s talk about an AGI. We have a set of techniques that we use to build the weak or narrow AI we use today. Do you think that achieving an AGI is just continuing to apply to evolve those faster chips, better algorithms, bigger datasets, and all of that? Or do you think that an AGI really is qualitatively a different thing?
I think AGI is qualitatively a different thing, but I think that it is not only achievable but also inevitable. Humans also can be considered as biological machines, so unless there is something magical that we possess that we cannot transfer to machines, I think it’s quite possible that the smartest people can develop some of the smartest algorithms, and machines can eventually achieve AGI. And I’m sure it will require additional breakthroughs. Just like deep learning was a major breakthrough that contributed to significant advances in state of the art, I think we will see several such great breakthroughs before AGI is achieved.
So if you read the press about it and you look at people’s predictions on when we might get an AGI, they range, in my experience, from 5 to 500 years, which is a pretty telling fact alone that it’s that kind of range. Do you care to even throw in a dart in that general area? Like do you think you’ll live to see it or not?
Well, my specialty as a data scientist is making predictions, and I know when we don’t have enough information. I think nobody really knows. And I have no basis on which to make a prediction. I hope it’s not 5 years and I think our experience as a society shows that we have no idea how to make predictions for 100 years from now. It’s very instructive to find so-called futurology articles, things that were written 50 years ago about what will happen in 50 years, and see how naive were those people 50 years ago. I don’t think we will be very successful in predicting in 50 years. I have no idea how long it will take, but I think it will be more than 5 years.
So some people think that what makes us intelligent, or an indispensable part of our intelligence, is our consciousness. Do you think a machine would need to achieve consciousness in order to be an AGI?
We don’t know what is consciousness. I think machine intelligence would be very different from human intelligence, just like airplane flight is very different from a bird, you know. Both airplanes and birds fly, the flight is governed by the same laws of aerodynamics and physics, but they use very different principles. The airplane flight does not copy bird flight, it is inspired by it. I think in the same way, we’re likely to see that machine intelligence doesn’t copy human intelligence, or human consciousness. “What exactly is consciousness?” is more a question for philosophers, but probably it involves some form of self-awareness. And we can certainly see that machines and robots can develop self-awareness. And you know, self-driving cars already need to do some of that. They need to know exactly where they’re located. They need to predict what will happen. If they do something, what will other cars do? They have a form that is called model of the mind, mirror intelligence. One interesting anecdote on this topic is that when Google’s self-driving car was originally started their experiments, it couldn’t cross the intersection because it was always yielding to other cars. It was following the rules as they were written, but not the rules as people actually execute them. And so it was stuck at that intersection supposedly for an hour or so. Then the engineers adjusted the algorithm so it would better predict what people will do and what it will do, and it’s now able to negotiate the intersections. It has some form of self-awareness. I think other robots and machine intelligence will develop some form of self-awareness, and whether it will be called consciousness or not will be to our descendants to discuss.
Well, I think that there is an agreed upon definition of consciousness. I mean, you’re right that nobody knows how it comes about, but it’s qualia, it’s experiencing things. It’s, if you’ve ever had that sensation when you’re driving and you kind of space, and all of a sudden two miles later you kind of snap to and think, “Oh my gosh, I’ve got no recollection of how I got here.” That time you were driving, that’s intelligence without consciousness. And then when you kind of snap to, and all of the sudden you’re aware, you’re experiencing the world again. Do you think a computer can actually experience something? Because wouldn’t it need to experience the world in order to really be intelligent?
Well computers, if they have sensors, actually they already experience the world. The self-driving car is experiencing the world through its radar and LIDAR and various other sensors and so on, so they do experience and they do have sensors. I think it’s not useful to debate computer consciousness, because it’s like a question of, you know, how many angels can fit on the pin of a needle. I think what we can discuss is what they can or cannot do. How they experience it is more a question for philosophers.
So a lot of people are worried – you know all of this, of course – there’s two big buckets of worry about artificial intelligence. The first one is that it’s going to take human jobs and they’re going to have mass unemployment, and any number of dystopian movies play that scenario out. And then other people say, no, every technology that’s come along, even disruptive ones like electricity, and mechanical power replacing animal power and all of that, were merely then turned around and used by humans to increase their productivity, and that’s how you get increases in standard of living. On that question, where do you come down?
I’m much more worried than I am optimistic. I’m optimistic that technology will progress. What I’m concerned with is it will lead to increasing inequality and increasingly unequal distribution of wealth and benefits. In Massachusetts, there used to be many toll collectors. And toll collector is not a very sophisticated job, but recently they were eliminated. And the machines that eliminated them didn’t require full intelligence, basically just an RFID sensor. So we already see many jobs being eliminated by a simpler form of automation. And what society will do about it is not clear. I think the previous disruptions had much longer timespans. But now when people like these toll collectors are being laid off, they don’t have enough time to retrain themselves to become, let’s say computer programmers or doctors. What I’d like to do about it, I’m not sure. But I like a proposal by Andrew Ng, who was from Stanford Coursera. Andrew, he proposed the modified version of basic income, that people who are unemployed and cannot find jobs get some form of basic income. Not just to sit around, but they would be required to learn new skills and learn something new and useful. So maybe that would be a possible solution.
So do you really think that when you look back across time – you know, the United States, I can only speak to that, went from generating 5% of its energy with steam to 80% in just 22 years. Electrification happened electrifyingly fast. The minute we had engines there was wholesale replacement of the animals, they were just so much more efficient. Isn’t it actually the case that when these destructive technologies come along, they are so empowering that they are actually adopted incredibly quickly? And again, just talking about the US, unemployment for 230 years has been between 5% and 9%, other than the Great Depression, but in all the other time, it never bumped. When these highly disruptive technologies came along, it didn’t cause unemployment generally to go up, and they happened quickly, and they eliminated an enormous number of positions. Why do you think this one is different?
The main reason why I think it is different is because it is qualitatively different. Previously, the machines that came, like the steam and electricity-driven, it would eliminate some of the manual work and people could climb up on the pyramid of skills to do more sophisticated work. But nowadays, artificial general intelligence sort of captures this pyramid of skills, and it now competes with people on the cognitive skills. And it can eventually climb to the top of the pyramid, so there will be nowhere to climb to exceed it. And once you generate one general intelligence, it’s very easy to copy it. So you would have a very large number, let’s say, of intelligent robots that will do a very large number of things. They will compete with people to do other things. It’s just very hard to retrain, let’s say, a coal miner to become, let’s say, producer of YouTube videos.
Well that isn’t really how it ever happens, is it? I mean, that’s kind of a rigged set-up, isn’t it? What matters is, can everybody do a job a little bit harder than they have? Because the maker of YouTube videos is a film student. And then somebody else goes to film school, and then the junior college professor decides to… I mean, everybody just goes up a little bit. You never take one group of people and train them to do an incredibly radically different thing, do you?
Well, I don’t know about that exactly, but to return to your analogy, you mentioned that the United States for 200 years the pattern was such. But, you know, the United States is not the only country in the world, and 200 years is a very small part of our history. We look at several thousand years, and look with what happened in the north, we see they’re very complex things. Unemployment rate in the Middle Ages was much higher than 5% or 10%.
Well, I think the important thing, and the reason why I used 200 years is because that’s the period of industrialization that we’ve seen, and automation. And so the argument is Artificial Intelligence is going to automate jobs, so you really only need to look over the period you’ve had other things automating jobs to say, “What happens when you automate a lot of jobs?” I mean, by your analogy, wouldn’t the invention of the calculator have put mathematicians out of business? I mean like with ATM machines, an ATM machine in theory replaces a bank teller. And yet we have more bank tellers today than we did when the ATM was introduced, because that too allows banks to open more branches and hire more tellers. I mean, is it really as simple as, “Well, you’ve built this tool, now there’s a machine doing a job a human did and now you have an unemployed human.” Is that kind of the only force at work?
Of course it’s not simple, there are many forces at work. And there are forces that resist change, as we’ve seen from Luddites in 18th century. And now there are people, for example coal mining districts, who want to go back to coal mining. Of course, it’s not that simple. What I’m saying is we only had a few examples of industrial revolutions, and as data scientists say, it’s very hard to generalize from few examples. It’s true that past technologies have generated more work. It doesn’t follow that this new technology, which is different, will generate more work for all the people. It may very well be different. We cannot rely on three or four past examples to generalize for the future.
Fair enough. So let’s talk, if we can, about how you spend your days, which is in data science, what are some recent advances that you think have materially changed the job of a data scientist? Are there ones? And are there more things that you can kind of see that are about to change and begin? Like how is that job evolving as technology changes?
Yes, well data scientists now live in the golden age of the field. There are now more powerful tools that make data science much easier, tools like Python and R. And Python and R both have a very large ecosystem of tools, like scikit-learn for example in the case of Python, or whatever Hadley Wickham comes up in the case of R. There are tools like Spark and various things on top of that that allow data scientists to access very large amount of data. It’s much easier and much faster for data scientists to build models. The danger for data scientists, again, is automation, because as those tools make it easier and easier, and soon they make the work, you know, a large part of it automated. In fact, there are already companies like DataRobot and others that allow business users who are not data scientists just to plug their data, and DataRobot or their competitors just generate the results. No data scientist needed. That is already happening in many areas. For example, ads on the internet are automatically placed, and there are algorithms that make millions of decisions per second and build lots of models. Again, no human involvement because humans just cannot do millions of models a second. There are many areas where this automation is already happening. And recently I had a poll in KDnuggets asking, when do you think data science work will be automated? Then the median answer was about 20 or 25. So although this is a golden age for data scientists, I think they should enjoy it because who knows what will happen in the next 8 to 10 years.
So, when Mark Cuban was talking about the first – he gave a talk earlier this year – he said the first trillionaires will be in businesses that utilize AI. But he said something very interesting, which is, he said that if he were coming up through university again, he would study philosophy. That’s the last thing that’s going to be automated. What would you suggest to a young person today listening to this? What do you think they should study, in the cognitive area, that is either blossoming or is it likely to go away?
I think what will be very much in demand is at the intersection of humanities and technology. If I was younger I would still study machine learning and databases, which is actually what I studied for my PhD 30 years ago. I probably would study more mathematics. The deep learning algorithms that are making tremendous advances are very mathematically intensive. And the other aspect is, kind of maybe the hardest to automate is human intuition and empathy, understanding what other people need and want, and how to best connect with them. I don’t know how much that can be studied, but if philosophy or social studies or poetry is the way to it, then I would encourage young people to study it. I think we need a balanced approach, not just technology but humanities as well.
So, I’m intrigued that our DNA is– I’m going to be off here, whatever I say. I think is about is about 740 meg, it’s on that order. But when you look at how much of it we share with, let’s say, a banana, it’s 80-something percent, and then how much we share with a chimp, it’s 99%. So somewhere in that 1%, that 7 or 8 meg of code that tells how to build you, is the secret to artificial general intelligence, presumably. Is it possible that the code to do an AGI is really quite modest and simple? Not simple – you know, there’s two different camps in the AGI world. And one is that humans are a hack of 100 or 200 or 300 different skills that you put them all together and that’s us. Another one is, we had Pedro Domingos on the show and he had a book called The Master Algorithm, which posits that there is an algorithm that can solve any problem, or any solvable problem, the way human is. Where on that spectrum would you fall? And do you think there is a simple answer to an AGI?
I don’t think there is a simple answer. Actually, I’m a good friend with Pedro and I moderated his webcast on his book last year. But I think that the master algorithm that he looks for may exist, but it doesn’t exclude having lots of additional specialized skills. I think there is very good evidence that there is such a thing as general intelligence in humans, that people, for example, make have different scores on SAT on verbal and math. I know that my verbal score would be much lower than my math score. But usually if you’re above average on one, you would be above average on the other. And likewise, if you’re below average on one, you will be below average. People seem to have some general skills, and in addition there are a lot of specialized skills. You know, you can be a great chess player but have no idea how to play music, or vice versa. I think there are some general algorithms, and there are lots of specialized algorithms that leverage special structure of the domain. You can think of it this way, that when people were developing chess-playing programs, they initially applied some general algorithms, but then they found that they could speed up these programs by building specialized hardware that was very specific to chess. Likewise, people when they start new skills they approach it generally, then they develop the specialized expertise which speeds up their work. I think likewise it could be with intelligence. There may be some general algorithm, but it would have ways to develop lots of special skills that would leverage whatever specific or particular tasks.
Broadly speaking, I guess data science relies on three things: it relies on hardware, faster and faster hardware; better and better data, more of it and labeled better; and then better and better algorithms. If you kind of had to put those three things side by side, where are we most efficient? Like if you could really amp one of those three things way up, what would it be?
That’s a very good question. With current algorithms, it seems that more data produces much better results than a smarter algorithm, especially if it is relevant data. For example, for image recognition there was a big quantitative jump when deep learning trained on millions of images as opposed to thousands of images. But I think what we need for next big advances is having somewhat smarter algorithms. One big shortcoming for deep learning is, again, it requests so much data. People seem to be able to learn from very few examples. And the algorithms that we have are not yet able to do that. In algorithm’s defense, I have to say that when I say people can learn from very few examples, we assume those are adults and they’ve already spent maybe 30 or 40 years of training interacting with the world. So maybe if algorithms can spend some years training and interacting with the world, they’ll acquire enough knowledge so they’ll be able to generalize to other similar examples. Yes, I think probably data, then algorithms, and then hardware. That would be my order.
So, you’re alluding to transfer learning, which is something humans seem to be able to do. Like you said, you could show a person who’s never seen an Academy Award, what that little statue that looks like, and then you could show them photographs of it in the dark, on its side, underwater, and they could pick it out. And what you just said is very interesting, which is, well yeah, we only had one photo of this thing, but we had a lifetime of learning how to recognize things underwater and in different lighting and all that. What do you think about transfer learning for computers? Do you think we’re going to be able to use the datasets that we have that are very mature, like the image one, or handwriting recognition, or speech translation, are we going to be able to use those to solve completely unrelated problems? Is there some kind of meta-knowledge buried in those things we’re doing really well now, that we can apply to things we don’t have good data on?
I think so. I think because the world itself is the best representation. So recently I read a paper that applied this negative transformation to ImageNet, and it turns out that now a deep learning system that was trained to recognize, I don’t remember exactly what it was, but let’s say cats, would not be able to recognize negatives of cats, because the negative transformation is not part of its repertoire. But that is very easy to remedy if you just add negative vocabulary image to the training. I think there is maybe a large but finite number of such transformations that humans are familiar with, like the negative and rotated and other things. And it’s quite possible that by doing such transformation to very large existing databases, we could teach those machine learning systems to achieve and exceed human levels. Because humans themselves are not perfect in recognition.
Earlier, this conversation we’re having, we’re taking human knowledge and how people do things and we’re kind of applying that to computers. Do you think AI researchers learn much from brain science? Do they learn much from psychology? Or is it more that’s handy for telling stories or helping people understand things? But as you started at the very beginning with airplanes and birds we were talking, there really isn’t a lot of mapping between how humans do things and how machines do them.
Yes, by the way, the airplanes and birds analogy I think is due to Yann LeCun. And I think some AI researchers are inspired by how humans do things, and the prime example is Geoff Hinton who is an amazing researcher, not only because of what he achieved, but he has extremely good understanding of both computers and human consciousness. And several talks that I’ve heard of him and some conversation afterwards, he suggested he uses his knowledge of how human brain works as an inspiration for coming up with new algorithms. Again, not copying them but inspiring the algorithms. So to answer your question, yes, I think human consciousness is very relevant to understanding how intelligence could be achieved, and as Geoff Hinton says, that’s the only working example we have at the moment.
We were able to kind of do chess in AI so easily because there were so many – not so easily, obviously people worked very hard on it – but because there were so many well-kept records of games that would be training data. We can do handwriting recognition well because we have a lot of handwriting and it’s been transcribed. We do translation well because there is a lot of training data. What are some problems that would be solvable if we just had the data for them, and we just don’t have it nor do we have any good way of getting it? Like, what’s a solvable problem that really our only impediment is that we don’t have the data?
I think at the forefront of such problem is medical diagnosis, because there are many diseases where the data already exists, it’s just maybe not collected in electronic form. There is a lot of genetic information that could be collected and correlated with both diseases and treatment, what works. Again, it’s not yet collected, but Google and 23andMe and many other companies are working on that. Medical radiology recently witnessed great success of a startup called Enlitic, where they were able to identify tumors using deep learning on almost the same quality as human radiologists. So I think in medicine and health care we will see big advances. And in many other areas where there is a lot of data, we can also see big advances. But the flipside of data, or what we can touch on it, is people, at least in some part of the political spectrum, are losing connection on whether it’s actually true or not. Last year’s election saw a tremendous amount of fake news stories that seemed to have significant influence. So while on one hand we’re training machines to do a better and better job in recognizing what is true, many humans are losing their ability to recognize what is true and what is happening. Just to witness denial of climate change by many people in this country.
You mention text analysis on your LinkedIn profile. I just saw that that was something that you evidently know a lot about. Is the problem you’re describing solvable? If you had to say the number one problem of the worldwide web is you don’t know what to believe, you don’t know what’s true, and you just don’t have a way necessarily of sorting results by truthiness, do you think that that is a machine learning problem, or is that not one? Is it going to require moderation in humans? Or is truth not a defined enough concept on which to train 50 billion web pages?
I think the technical part certainly can be solved from machine learning point of view. But the worldwide web does not exist in vacuum, it is embedded in human society. And as such, it suffers from all the advantages and problems of humans. If there are human actors that will find it beneficial to bend the truth and use the worldwide web to convince other people what they want to convince them of, they will find some ways to leverage the algorithms. The operator by itself is not a panacea as long as there are humans with all of our good and evil intentions around it.
But do you think it’s really solvable? Because I remember this Dilbert comic strip I saw once where Dilberts on a sales call and the person that he’s talking to says, “Your salesmen says your product cures cancer!” And Dilbert says, “That is true.” And the guy says, “Wait a minute! It’s true that it cures cancer or it’s true that he said that?” And so it’s like that, that statement, “Your salesperson said your product cures cancer,” is a true statement. But that subtlety, that nuance, that it’s-true-but-it’s-not-true aspect of it, I just wonder, it doesn’t feel like chess, this very clear-cut win/lose kind of situation. And I just wonder even if everybody wanted the true results to rise to the top, could we actually do that?
Again, I think technically it is possible. Of course, you know nothing will work perfectly, but humans also do not do perfect decisions. For example, Facebook already has an algorithm that can identify clickbait. And one of the signals is relatively simple, just look at the number of people, let’s say, who look at a particular headline, click on a particular link, and then how much time they spend there or whether they return and click backwards. The headline like, “Nine amazing things you can do to cure X,” and you go to that website and it’s something completely different, then you quickly return. Your behavior will be different than if you go to a website that matches the headline. And you know, Facebook and Google and other sites, they can measure those signals and they can see which type or which headlines are deceptive. The problem is that the ecosystem that has evolved seems to reward capturing attention of people, and headlines are more likely to be shared, are worth capturing attention of people, generate emotion in either anger or some cute things. We’re evolving toward internet of anger, partisan anger, and cute kittens. That’s the two extreme axes of what gets attention. I think the technical part is solvable. The problem is that, again, there are humans around it that make a very different motivation from you and me. It’s very hard to work when your enemy is using various cyber-weapons against you.
Do you think nutrition may be something that would be really hard as well? Because no two people – you eat however many times a day, however many every different foods, and there is nobody else who does that same combination on the planet, even for seven consecutive days or something. Do you think that nutrition is a solvable thing, or there are too many variables for there to ever be a dataset that would be able to say, “If you eat broccoli, chocolate ice cream, and go to the movie at 6:15, you’ll live longer?
I think that is certainly solvable. Again, the problem is that humans are not completely logical. That’s our duty and our problem. People know what is good for them, but sometimes they just want something else. We sort of have our own animal instinct that is very hard to control. That’s why all the diets work, but just not for a very long time. People who go on diets very frequently and then you know, find that it didn’t work and go on it again. Yes, for information, nutrition can be solved. How motivation to convince people to follow good nutrition, that is a much, much harder problem.
All right! Well it looks like we are out of time. Would you go ahead and tell the listeners how they can keep up with you, go on your website, and any ways they can follow you, how to get hold of you and all of that?
Yes. Thank you, Byron. You can find me on Twitter @KDnuggets, and visit the website KDnuggets.com. It’s a magazine for data scientists and machine learning professionals. We publish only a few interesting articles a day. And I hope you can read it, or if you have something to say, contribute to it! And thank you for the interview, I enjoyed it.
Thank you very much.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here.