In this episode, Byron and Bryon talk about intelligence, consciousness, emergence, automation and more.
Bryon is the CTO and co-founder of data.world - on a mission to build the world’s most meaningful, collaborative, and abundant data resource. Bryon is a recognized leader in building large-scale consumer internet systems and an expert in data integration solutions.
Byron Reese: This is Voices in AI, brought to you by Gigaom. I’m Byron Reese. Today my guest is Bryon Jacob. Bryon is an entrepreneur and technologist and is a CTO and co-founder of data.world, the social network for data. Bryon has been a coder since the age of ten, and received his bachelor’s and master’s degree from Case Western Reserve University, where he researched building cognitively realistic models of computation. Welcome to the show, Bryon.
Bryon Jacobs: Thank you. Great to be here.
Let’s start with the basics: What is artificial intelligence?
Artificial intelligence is a human-built machine emulating what would be traditionally thought of as a cognitive task that a human being would have to do; something that emulates natural intelligence, but within a human-constructed machine.
But let me take a step back even further then, what is intelligence?
That’s a fantastic question. If you think about artificial intelligence, most of what people are talking about with AI today are practical applications of narrow artificial intelligence, right, artificial intelligence that is directed at solving particularly difficult computational tasks that traditionally you would have thought you needed a person to solve.
If you look at science fiction, there’s a lot of more thought around a general artificial intelligence, something that actually can speak and interact and seem fully sentient, the same as a person. I think intelligence is not something that we have a strong quantified definition of. It’s that thing that people can do that we can seem to get close to approximating with advanced computing machinery today.
Okay. I’ll only ask one more definition kind of question. In what sense do you interpret it as being artificial? Is it artificial like artificial turf isn’t real grass it just looks like it, and therefore it’s not really intelligent? Or is it just artificial in the sense that it’s not biological, that it’s made?
So, it really is intelligent?
I think that we don’t know enough about exactly what it is that people do to quantify that machines that we built could or couldn’t do that. I take the point of view, definitely, that what we do arises out of physical processes, that as we get better and smarter at modeling physical processes, we can get increasingly closer to approximating that, so that, yes, there could be an intelligence that’s built completely out of technology that is exactly equivalent and is in fact “intelligence” the same way that people are intelligent.
How would you describe the state of the art now? Are we at the very beginnings of AI, and it’s a brand-new thing and we’re feeling our way along? Or are we at a point where we know what we’re doing and we’re on a path to making better and better and better and really amazing things, like the kinds of things you see in science fiction?
I mean, I think both. I think we are definitely at the crawling stage in terms of this. This has been an active area of real hands-on, practical research for thirty to forty or more years, but that’s a tiny amount of time when you consider how much more time there is in the future to continue understanding, both understanding ourselves and understanding what can be done with computing technology. So, I think we are definitely just at the infancy.
At the same time, we know a lot of things, particularly in the area of narrow, problem-specific types of artificial intelligence. There’s any number of things that we know that we can get machines to do better, where better means more efficiently, more accurately, faster than people. That doesn’t mean that those machines can replace people. It’s certainly not close to an artificial general intelligence. But we have gotten extremely good at having learning machines that can learn to do things in narrowly focused areas better than human beings can.
But, to be perfectly clear from your earlier comments, you are one hundred percent convinced we are just machines as well, and that computer machines are just another animal almost?
Yeah. I, maybe, take a tiny bit of issue with “one hundred percent convinced.” That is my belief, based on what I understand about the universe and applying my brand of rational thought to it, yep.
Okay, so we’re machines? Are we alive? Does that word have any meaning to you?
Are machines alive?
Okay, so what is life?
Well, this is not my scientific area of expertise, but certainly there’s a pretty well-defined definition for life.
Interestingly, there isn’t. There’s no consensus definition of life, and there’s no consensus definition of intelligence. It’s one of those things…who was it, about pornography, who said, “I can’t define it, but—”
“I know it when I see it.”
Life is self-replicating…
Some people say there are five characteristics. There’s a book that Nova Spivack, an earlier guest, wrote that had nineteen characteristics of life. And it’s really interesting because I’ve been reading a lot about the Gaia hypothesis lately, which posits that the Earth is alive, or at least behaves like a living system. So, just to ask straight out—and these are all opinions, I’m not trying to hold you to anything like in a future court of law—you’re saying that we’re machines and we’re alive, and the machines we have right now aren‘t alive yet, but they may be in the future.
Sure. I think it really depends on what context you want to quantify what life is. But I think, you know, if part of the calculus is, “Could a machine be sentient? Could a machine be an individual with an identity? Could it have legal rights? Could it be something that we have to treat the same way as we would treat a person?” I think the answer is yes. I don’t know if that maps to the same thing as “life.”
We’ll come to that, but first—humans are intelligent by virtue of our brains, and I guess it goes without saying, we don’t know really how the brain works?
Not fully, certainly not fully. We know some things, but we don’t—
Well, I would challenge that, and let me give you my case and you take it to task, if you like. So, most people look at the brain and they say it’s got one hundred billion neurons and it’s got countless connections between those, and from this lies intelligence. And that the reason we don’t understand the brain is because there’s just too many…it’s just too big.
There’s an animal called the nematode worm. It’s the most successful animal—ten percent of all living creatures are nematode worms—and we sequenced their genome very early. And they have—and someone will correct me on this—three hundred, plus or minus twenty, neurons in their brain. Three hundred, the number of cheerios in a bowl of cereal. And there has been an effort underway for twenty years to model that on a computer, to basically build a digital nematode worm.
Three hundred neurons, a number of connections you can count, and yet, not only do we not know how to do that, there’s certainly more than a minority of people in the project who say that we may never be able to. So, what do you do with that?
It’s interesting, actually, I don’t know anything about that research, but this relates really closely to the work of the research group I was in in graduate school. Dr. Randy Beer, at Case Western had a cross-functional group who worked between the computer science department and the biology department. Most of the team was focused on research around building robot cockroaches.
Effectively, what the research was centered around was building cockroaches that were modeled exactly after real cockroaches’ neural system. The exact number of neurons—and I’m going to get this slightly wrong—was, I think, 178 neurons in a cockroach brain. So, we’re talking about the same order of magnitude as we’re talking with the nematodes. And what we built was a neural network that modeled those 178 neurons with the same architecture, the same connectivity as existed in the cockroach brain. The idea was to see whether that neural construct was actually what derived the functional characteristics of the cockroach.
The experimentation was, would a cockroach learn to walk? Starting from an empty base state, could that neural system learn to walk given a goal of, I want to move as far forward in a certain direction as possible, just based on feedback, on the connectedness approach to that, to the neural network of that architecture. And what we were able to accomplish was not only would those cockroaches learn to walk, but they would learn the exact gate that a cockroach uses to walk? But without, kind of, designing that system, other than designing the same neural system, we could get these robots from a base state of an empty neural network, to walking the same way a cockroach walks.
I’m not familiar with that, but I would be highly suspect. Everything I know along those lines has been modeling physical systems, like, the joints of the creature. But I think I can say, confidently, that nobody knows how a thought is encoded. Like, when I say, “What color was your first bicycle?”
Was it really?
I am just making that up. I don’t know.
What’s something from way back you remember that you don’t think about very often?
The color of my first wagon, certainly was red.
Okay. So, the point is, when I just ask you this question: “What’s something from way back that you haven’t thought about in a long time?” You don’t have some area of your brain called “things I haven’t thought about in a long time,“ right? So somehow that thought is recorded in a way that is unknown to us.
I am of the opinion, and this is an opinion, that the most likely thing is that we are so reductionist in what a neuron is, we almost think of it as, like, this little “click, click, click,” and my speculation is it’s as complicated as a supercomputer. That a single neuron has far more complexity built into it than IBM Watson. And that, therefore, all these errands to model, I don’t care if it’s gotten neurons, you’re not going to be able to do it in the foreseeable number of centuries. What would you say to that?
I think it’s definitely not as simple as, there’s a neuron in there that’s encoding for that color. I think every thought that we have gets encoded somehow as a memory that’s spread out among the neural network in our brain. But the notion that each neuron is in and of itself more complex than a supercomputer, I’ve seen nothing that leads me to believe that that’s the case.
Okay. So, getting past the brain, and my position is that all we have is this vague word, “brain activity” which is analogous to somebody looking at New York City and saying that there’s movement. Like, that doesn’t give you any useful information about what’s happening in New York City.
But we do know that physical modification of the brain—through injury, through disease—maps directly to loss of memories, of cognitive capabilities.
Unquestionably. We know that this part of your brain sees color, and this part of your brain interprets emotions, but we don’t know how. We just say, “There’s glowing on this thing there,” but we don’t know what that—
And there’s a lot of evidence that suggests that the brain is incredibly capable of working around those kinds of injuries and capabilities.
The younger you are.
Younger, but also the more pliable. The more different ways you exercise and stretch your brain, the more different neural pathways you create, the less susceptible you are to deteriorative diseases later in life—diseases like Alzheimer’s start destroying the brain and therefore destroying memories. They do so less in people who have built more different pathways, which lends a lot to the argument that the information is not encoded in a neuron, so much as it’s encoded in the connection between many millions of neurons.
That’s the analogy of music isn’t the notes, it’s the space between the notes. It’s funny—this is a complete aside—I was a Sherlock Holmes buff growing up as a young boy, and Sherlock Holmes, famously, didn’t know the planets revolved around the sun, because he said he didn’t want to clutter up his mind attic with this unnecessary stuff. And then they preserved that in the new show Sherlock. I just read something recently that says old people forget for that very reason, that they just know too many things and they lose their ability to retrieve it all. So, it turns out Holmes was right.
I’m intrigued at really exploring this view, because on my spectrum of guests, you’re on the more mechanistic side with how you view all of this. I’m curious if you think there is a thing called the “mind.” Everybody knows we have a brain, and the brain has this activity, but there’s a concept that we have a “mind,” which some people say is emergent. It’s certainly a concept we use a lot, “He’s out of his mind!” and “Have you lost your mind?”
That the mind is the place from whence all the things the brain doesn’t seem to be able to do—like creativity, what is creativity—reside and which may be emergent. Or, it doesn’t have to be supernatural, it could be a quantum phenomenon or something. When you turn your mind to this, do you ever employ the concept of “mind” in any way, or is that not useful to you?
Sure. I mean, I think of it, definitely, as an emergent process from the physical. And I don’t pretend that I know any more than anyone else does about how, exactly, it emerges from that physical process. I think it is unquestionable that we don’t have a direct path that we can say, “Because we know that our brain has neurons, we know how there is an emergent mind.” But I also don’t see anything to believe that it could not be emergent, the way that we think, the way we conceive of ourselves as having a mind, you know?
One thing about your mind is, you think about your mind as the same thing as your identity, as your sense of self, as what you are, the thing that’s having thoughts. It’s a continuous story. It’s a sequence of events and experiences and beliefs that you’ve accumulated, and how much is that actually a thing? Is that an actual entity or how much is that just the language that we use to describe this collection of thoughts and experiences?
I mean, it certainly feels, to me sitting here, like I exist, and I have a mind. But we know that a lot of the memories we have, a lot of the stories we have that really provide that continuity of you as an individual, are false memories, or they’re stories that we’ve reconstructed from bits of memories, that very often people have clear recollections of things in ways that are not exactly the way they happened.
We’re taking a roundabout way there, but I want to get us to AGI, because whenever AGIs are discussed, they are human-level intelligence—that’s how you think of them. So, to get there, I think it would be useful to understand what human intelligence is. It’s a roundabout path, but one that I would defend.
So, I only want to go one more level up. We started with the brain, and the brain is this thing that we may or may not understand to varying degrees. It may produce this mind through emergence. Generally speaking, there are two kinds of emergence to think of. There’s weak emergence, which is where you could spend a year studying oxygen and a year studying hydrogen, and not realize that when you put them together you get water and that you get something that’s wet. That’s a brand-new idea, wet. Where did that wetness come from, right? And yet, a good scientist looks at it and says, “Ah, I see now, how wetness emerges from these two gases. I get that.” And everybody knows that this exists, weak emergence—it’s beehives, it’s anthills, it may be all those things; they’re definitely emergent.
But then there’s a notion of something called strong emergence, which isn’t reductionist. It says there are things that are emergent that you cannot derive from the properties of the individual elements. Like your sense of humor, no cell in your body has a sense of humor, but you have a sense of humor. So, a lot of the times people are resistant to strong emergence because they think it’s almost like a cheat, it’s like magic that you’re introducing in the system.
Do you have a thought on emergence, on what it is and how it works, and whether it is reductionist at its core? Because if intelligence is emergent, we would want it to be reductionist if we want to build our own, right?
Lot to unpack in there. So, the analogy that I always think of when I think about this is—and this is certainly not my original hypothesis, Marvin Minsky’s book Society of Mind is the main thing I think—that an ant colony is intelligent in ways that an individual ant isn’t. And that no amount of time you spend studying an ant and its behavior is going to fully account for the kinds of sophisticated behaviors that you can see in an entire ant colony exhibit.
Is it reductionist? Well, in a certain sense it is, because if you break it down and look at it, well, actually each of those ants is acting somewhat autonomously. It is acting according to its own design, and over a long period of time those ants have evolved to have, individually, very simple behaviors and stimulus responses that have evolved in their systems that gave advantage to the colony, that the colony has effectively evolved as an organism.
And if you trace that to us, we have evolved from single-celled organisms that lived in colonies that worked together, that then became multi-celled organisms, that then became us. There’s so many layers of abstraction between those underlying processes and the sophistication of something like the human mind, that to say whether it is or isn’t reductionist I think maybe isn’t the right question. Maybe it’s, “Is the reductionism tractable? Is it the sort of thing where it is reductionist, but it’s so complex that we’re not going to be able to go back and suss it out?” And then, “Does that matter?”
Because, if you’d want it to be reductionist so that we could reproduce the same thing, is that the only path to creating artificial general intelligence—to reduce it the same way that humans do it mechanically? I’d say almost certainly not, right? That the way that we think, using the neurons made of meat that we have is one way that evolution found a way to get to human level intelligence. But the same phenotypic behavior that we exhibit, could be gotten to through other pathways.
We went brain, to mind, to emergence, now let’s talk about consciousness. People often say we don’t know what consciousness is. This is not true. We know exactly what it is, nobody knows where it comes from—or, I suspect there’s somebody who lives in upstate New York who knows and is just not telling anybody, but we think nobody knows. Consciousness is this experience you have of feeling things, of sensing them. It’s the “I” that experiences the world. You could put a sensor on a computer that could measure heat, but the difference between the computer measuring heat and the computer feeling heat, that’s consciousness. So, is consciousness a real thing? How do we get it? Is it necessary for intelligence? Can we make computers that do it? Is it a real thing?
I certainly feel like I’m conscious, but I think this is a case where you almost turn the Turing test back on yourself and say, well, there is a collection of experiences, a collection of sensations, a collection of responses to that, and, among those, is this notion of continuity of thought, of mind, of consciousness. I don’t think of all those things as necessarily distinct things. And the question is, are they? I mean, you’re asking a great question.
I don’t know, is there an entity there that is me, that is conscious? Or is “consciousness” just the word that we use to describe that that’s how it feels? The language gets in the way. It almost makes it hard to even ask or answer the question, because what I want to say is, I don’t know if there’s an “I” behind all of these experiences that I attach to myself.
Well, let’s use a real-world example that, probably, everyone is familiar with. You’re driving along and a few minutes pass, and all of a sudden you, kind of, wake up, and you’re like, “Oh, my gosh, I don’t have any memory of driving to this spot.” And that period before you, kind of, woke up is intelligence without consciousness. You’re merging in traffic, you’re paying attention, you’re doing all these things; you’re clearly intelligent, but then you have that a-ha moment. You go, “Oh, my gosh,” and, all of a sudden, your consciousness is back on. I assume you’ve experienced something like that.
So, what is your materialistic understanding or explanation, or how do you even describe the difference between those two states?
Sure. I mean, without sounding like I’m going to reduce it to, “It’s this simple,” but is it fundamentally different from when a computer has some amount of processing capability, and it diverts that to where it needs to be to address certain problems? Is that really any different than a computer focusing its processing power on something else, and kind of letting a subsystem go on autopilot?
Well, that sounds almost like sophistry, because you can say, “Well, show me that it isn’t.” But I think people intuitively know what my iPhone’s doing and what I’m doing. When I get an amber alert, my iPhone pops it up. That’s different than me noticing my child is about to walk out into traffic. Those are very different things. But you would say not necessarily?
I don’t necessarily think those are fundamentally different things. Certainly, different by a matter of degree and complexity and sophistication. I’m not suggesting that an iPhone is the computational equivalent of the human mind, but I am suggesting that, does it have to be a fundamentally different thing, or could it be a matter of degree and complexity?
You mentioned sophistry, and you can take that to, “Are we all just a brain in a jar? Am I just a brain in a jar being fed these experiences? How do you know that I’m a conscious being, that I’m not just a figment of your imagination?” Those are all, kind of, ridiculous arguments, like, “Well, I need to accept that you’re a conscious being to exactly the same degree as I am.” But, I guess what it boils down to is I feel that there is an “I.”
I have this sense that I exist and that I’m this thing that’s having a continuity of experiences and thoughts and beliefs. But why is it that it’s impossible to think that a sufficiently advanced machine would also believe that about itself? And does that make it untrue because we could trace through that we started from, dumb silicon, and turned it into this thing that thinks it’s really a person? Well, maybe it is. And, in fact, turn that back on yourself. If you were just a machine, isn’t it possible that you would believe that you were something more, that there was some entity there?
No. I mean, I don’t think so. I think when people invoke Descartes, they don’t really think that they’re going into that, they really don’t.
And I don’t either.
They think the reason you invoke Descartes is you say, “How do we know what we know?” That’s really what you’re asking. How do I know that I’m here? How do I know this? How do I know that? And consciousness explains it. You know it because you’ve experienced it.
I guess, if I were to wrap it all up, I would say, “We have this one example of an AGI, us.” We have this brain, which, to varying degrees, we don’t understand. We have something that is perhaps emergent, a mind, that gives us these attributes we cannot reasonably derive from it. We have consciousness that we can experience. Then, the leap is, “Ergo, we can build that.” Like, “We don’t understand the brain, we don’t understand the mind, we don’t understand consciousness, but we can build it,” seems like an act of faith more than an act of science. What would you say to that?
Well, I think the notion that it’s all a mechanical process, and therefore it could be built, is very different from saying, “I know how to go do that.” So, is it science? I think it’s science to the degree that, it’s a line of inquiry, it’s something worth continuing to investigate, it’s a reason to try to solve some of these puzzles, both because it helps build some incredibly useful problem-solving technology, but also because it helps us understand more about ourselves.
Let’s switch gears a little bit. Let’s assume you’re right, that an AGI is mechanical and engineerable, and how long it’s going to take us to do it is an unknown. By the way, do you have an opinion on that?
Yeah, actually, my opinion will take a little bit of a left turn. I think an AGI that you’d identify from sci-fi is something that is—
You’re wearing a Data shirt. He’s wearing a Star Trek Data shirt, by the way. So, how long till you have that? Data’s an AGI, right?
Exactly. So what I would say is, although I think that’s a possible thing, will that ever be a thing that gets built in that form? No, not necessarily. I actually think what’s more likely, and we’re going to see happen increasingly, is where some of the notions of AGI and of narrow AI start to converge. Although it would be possible to do that if that was the goal that you set out to accomplish, and given enough time—which I don’t have a great speculation, you could be right that it’s hundreds of years away—to go directly after that goal and accomplish it, but I think what’s more likely is you’re going to start to see humans and technology converging. You’re going to start to see things that we would write that fit squarely in this world of artificial intelligence being utilized as tools by people, and the combination of people working with these machines is going to be something new and an evolutionary step.
Well, I think there’s three or four reasons that people suspect we’ll build mechanical humans. One is that, with our aging population and declining birth rates, we will need more caregivers and whatnot. And there will be an affinity towards the ones that are humanlike, if for no other reason than people instantly form emotional bonds and connections to them and they will value that.
I guess the question is will those rise to the level of AGI, or will those be specialized applications that have a humanoid form because it’s useful for that application?
Well, if you believe in Society of the Mind, Minsky thought we were essentially not an AGI, we’re a hack of a couple of hundred skills that our brain can do. So if they’re also a hack of a couple of hundred skills, they can make your tea in the morning and help you out of bed and all of that…
It’s amazing though, what people will connect with. We do it at an early age. Every little kid has their toy that they love, that they bond with, so we’re programmed to like and love things. Then you have the uncanny valley that if they’re close but not exact, then they’re creepy.
It’s really funny we’re having this conversation now, and, in the future, the path that we choose will be obvious and this conversation will be ridiculous, no matter which way it goes. “Of course we would make them humanoid,” or “Of course we wouldn’t make them humanoid,” and yet, from our vantage point, we can’t see that.
So, you think sooner than later, if you were a betting man? You said that we’ll merge, but there is a point at which you have a machine and you say, “How do you make a perfect pot of tea?” and it tells you. And then you say, “Should I break up with my girlfriend?” and it tells you. And then you say, “Hey, I’m working on a poem, help me come up with the last line,” and it’ll tell you. Like, something that does all that and it didn’t have to be trained on those things?
So when would that be? I’m expecting month and day.
[Laughs] I think, for a large subset of those kinds of problems, it’s decades, it’s not more than decades.
So, we have DNA with our two billion base pairs or whatever, and it’s not binary, it’s four letters, GTCA. You can remember it because, have you seen the movie Gattaca? So, those are the GTCA. It took me a while to notice that. And so it works out to be about 700MB, if I’m right on this, but then of course we share ninety percent of it with a banana, right, like all living life? And that’s like 70MB. And then if you compare us to chimps or something, you have 99.5 percent.
So, really, what gives us our intelligence is probably 7MB of code. Do you think it’s possible that an AGI really is that simple? Because, our computer programs today are of increasing complexity, not decreasing—code doesn’t decrease over time. Do you think an AGI is an elegant, simple answer that somebody somewhere in a garage will discover, or is it one thousand times bigger than Windows?
Well, I think the DNA analogy is a really interesting one, especially if you think about machine learning and how that is applied right now today for narrow AI problems. Somebody who has learned a little bit about machine learning, and knows how to use some of the off-the-shelf toolkits can write a couple hundred lines of code that will then cause a network of computers to go out and do massive computations and produce large amounts of interconnected data, which is then itself capable of solving much more sophisticated problems than you would expect to be able to solve with one hundred lines of code.
So I think that’s the analogy to DNA and our brain, that while it might be 7MB of code that’s the instruction for building a human brain, that human brain is a complex machine in and of itself that exists in the world, takes in inputs, processes that data, and produces a program, which is effectively what a trained neural network is, that can interact with input in really, really sophisticated ways.
Stephen Wolfram has a book called A New Kind of Science. I don’t know if he posits this in his book, or I heard him say this, but basically the idea is that with very simple rules, if you iterate them, you can create incredible complexity. And he speculated that the entire universe could be derived from maybe thirty lines of code, that it’s that simple. Does that seem preposterous to you, or is it, like, no, that’s how iterative systems work?
I mean, it makes a certain amount of sense. If you think about that thirty lines of code being certain physical constants, that we can observe as kind of being the base laws that hold everything else together, and everything else is recursively built on top of that, I think it’s basically the same analogy.
So, let’s go to the here and now, there’s a lot of fear wrapped up in AI, there’s two brands of it. There’s killer robots and then there’s automation and loss of jobs. There’s three viewpoints on the effect of robots on jobs. One of them views humans in kind of skilled and unskilled, and the robots are going to take the unskilled jobs and we’re going to have a permanent group of people who were unable to add economic value. We’re going to have the Great Depression kind of permanently. Second one is that, like all technologies—electricity and mechanization and assembly lines—humans just use it to increase their own productivity and everybody remains employable, and we never see spikes in unemployment history due to adoption of technology. The third one is we’re all going to lose our jobs, everybody, because we’re just machines, and the minute the machine can learn a new skill faster than a human can, we’re all out-classed.
It seems to me, philosophically, you’re in that third camp, but I wonder if, pragmatically, when you look in your crystal ball in the next ten or twenty or thirty years, what do you see happening?
I wouldn’t put myself in that third camp at all.
But philosophically there’s nothing that we can do that a machine won’t be able to do eventually, right?
And you’re saying decades for AGI, not centuries.
Just going back to that, I say decades for the kinds of augmentative AGI that you talked about—lots of tasks, lots of skills being reduced to things that machines can help us with significantly. I think the belief that physically there’s nothing preventing there being a fully conscious, fully sentient AGI, is different from how is AI going to impact human society and economy in the near future, in the predictable future.
And that’s what it goes back to is, I think what we’re going to see, pragmatically, happen is less that there are these distinct artificially intelligent entities, and more that there are increasingly sophisticated artificial agents that we use to augment our capabilities. So, I think I fall more in the camp of, for the foreseeable future, for as far out as it makes any sense to have any sort of reasonable prediction, you’re going to see more of the first case you indicated, you’re going to see that adding new artificial intelligence is going to increase productivity and create new capabilities.
I do think we’re going to see choppy waters. I think that we’re going to have a number of crises of what do with all these people who derive their identity, their self-worth from jobs that just don’t need to be done anymore? How do we fit them into the economy, into what society will become?
Well, I want to challenge that. Tell me your historical precedent for that happening?
So that’s a fair question. I’m not sure I have one.
Let me set my question up a little more. So, two hundred and fifty years of industrial revolution—and let’s just talk about the United States because I know it, but I think it applies to the west in general—unemployment is between five and nine percent for two hundred and fifty years. Other than the Depression, which wasn’t caused by technology, unemployment remained at five to nine percent.
And in that time, we had three big things happen. We had the assembly line. And if you think about the assembly line, that’s like AI, right? If you’re a craftsman and you’re building a car by hand, and all of a sudden Henry Ford is cranking them out, I mean, that’s an AI. And you would say, “Everyone who’s a skilled craftsman, boom!” And then you had the replacement of animals with machines. So, in twenty-two years we went from producing five percent of our energy with steam, to eighty percent of our energy with steam. And, what happened? Unemployment never left five to nine percent. Teamsters stopped handling animals and started handling machines. Then you had electricity, electrification of industry. That came along very fast and disruptive.
Each time another technology comes out, everybody says, “Oh, this time’s different. This time it’s going to be rough, unemployment, which has never gone above nine percent, is going to just be terrible and we’re going to have to rethink our social institutions and so forth.” What say you?
If you look back at that—and I don’t have the historical analysis in front of me to back this up—with each one of those, the timeframes over which these revolutions are changing things are getting shorter. I think what’s changed is, if you look at that progression over the last two hundred and fifty years or so years, what’s happening at this point is the changes are happening so fast that multiple change moments in what technology is doing to the landscape are happening in the space of a single person’s adult working life.
I think where we might be starting to see different kinds of things happening is that in most of those previous examples a lot of jobs got replaced, but they got replaced over a long enough period of time that most of the people whose jobs were replaced got to finish out their work and retire as opposed to midstream having the rug yanked out from underneath them.
And I do agree with the basic economic argument that we’re going to increase our capability, increase our productivity, do more with technology and ultimately the math works out. I worry that some of the changes are going to happen so fast because—if you think that things are building kind of along an exponential curve—we’re starting to get to the point where the change is happening at a fast enough rate that our institutions, our laws, our societal constructs can’t quite keep up with it.
Is that really the case though? I hear that, but I go back twenty years—I’ve been on the Internet for twenty-one years, so let’s say twenty years—in the last twenty years of the real widespread adoption of the consumer Internet, you had smartphones, you had eBay come along, you have Google with a $600 billion market cap, you have Apple at $600 billion, again, and kind of out of nowhere, you have Facebook with two billion users, you have Uber… You have all this revolution, all of these things that have happened, trillions upon trillions of dollars of value, changes in society brought about by the Internet, and still unemployment remains five percent to nine percent. It never goes out.
So, even in the period of Moore’s law, even in the period of computerization, even in the period of widespread, rapid adoption of the Internet, disintermediation, entire categories of employment wiped out, all of that and unemployment never goes up. So what’s different?
Well I think that five to nine percent number, the way that’s measured—this could venture more into, like, political and socio-economic commentary than thinking about AI—there’s trillions of dollars of value in that relatively low unemployment number, and it’s not evenly distributed. A lot of the people who are employed are underemployed and have really, kind of, a fragile existence that can’t tolerate too much more shift.
There’s a lot of folks who’ve stopped seeking employment that aren’t counted in those numbers, particularly if you go out rurally, at least in America. Just as Silicon Valley—and it’s a machine that I’m certainly part of and I benefit disproportionately from it—as that value gets accumulated there, there are folks who are not benefiting from that, and there’s not a clear path to how they get to participate in that.
I don’t see ways that AI and automation are going in the direction, where they’re going to be really benefiting that rural population, that blue collar employed class of folks that are likely to have a lot of their jobs replaced. I think those are the problems we have to solve. I think they’re solvable, and I think eventually we will. The challenge is going to be how can we address those problems proactively enough that it doesn’t become another Great Depression, that they don’t turn into violent types of situations.
I hear you, and I don’t want to cast myself as the Pollyanna-ish person who says, “Everything’s wonderful.” What I’m trying to figure out is, are things substantially different? Because, like you said earlier, there’s a lot to unpack there. With regard to workforce participation, it is true that it is down. But if you adjust it and you take out the Baby Boomers who are retiring, and you take out seasonality—the best people can tell is it’s between a quarter and a half percent of workforce participation decline is people, quote, “giving up,” so it’s not a widespread epidemic.
To say that you don’t see how AI and automation is going to benefit a certain class of people, I think is an equally unprecedented statement. It would be akin to saying, “I don’t see how the Industrial Revolution is going to help them. I don’t see how electrification is going to help people who raise draft animals. I don’t see how any of that is the case.” I still am wrestling with—
I meant that in the context of giving them a way to provide, economically, for their family. Obviously all these things are a benefit that trickles down in a consumer fashion, but the bogeyman that always gets held up—this might be an interesting one to unpack as just a practical example—is truck driving. It’s the argument that gets held up as, this employs—
3.2 million people, 2.5 million in supporting roles. I think it’s a terrible argument, frankly.
I suspected you would.
I think it’s the one everyone repeats. I think the problems with it as the example are, first of all, we agree that there’s probably less people going into truck driving now, because it’s so widely articulated that this may not be a growth profession, right? And then, of course, as time progresses, a certain number of people retire out of trucking, right? And a certain number of people just decide to do something different, right?
And then, we both know that as fast as technology advances, the adoption of technology doesn’t happen as quickly, right? We know that from the first computers in World War II to the PC took forty years. We know that from the Wright brothers to widespread commercial jetliners took forty years. We know that ARPANET to the Internet took forty years. I mean, these things take a long time, they take decades to work their way through the system, just like every other economic change that has happened.
You know, there are technology hurdles, they’re not in any sense insurmountable, but we’re not there yet—a bag blowing across the street still looks like a deer running across the street to a self-driving car. Then you get past that, and you have regulatory hurdles, then you have to get to social acceptance, then you have to get to cycling out all the old equipment and cycling in the new equipment, and by this time, whoops, all the truckers have retired and no new ones are coming in and people are doing all the new things.
It’s what’s happened with every single other technology. I mean, think of every job in 1900 that doesn’t exist, and every job in 1950. How many switchboard operators have you met recently? That’s just what happens. New technology comes along, and people retire or change or they don’t.
What I think happens is—and, look, I’m in media, Gigaom is a media company—media companies pedal fear and they pedal clickbait. Frey and Osborne, two fine scholars at Oxford, publish a study that says forty-seven percent of skills within jobs, forty-seven percent of the things you do in your daily job, can probably be automated, which is good news. But every time you see it reported, it says forty-seven percent of jobs will be lost. But I believe that that whole narrative of, “It’s all about to hit the fan,” is without basis and without precedent. What do you say?
Well, a couple of things. One, the forty-year time frame, I’d be curious as to whether those numbers are starting to compress and get shorter. I think, you know, it’s a very different argument if you think that that’s likely to happen in forty years versus twenty. I think that does make a material difference when you factor out the ability for people to gently retire out of professions and get onto the next new thing. I agree that a huge percentage of lots of jobs—truck driving is by far not the best example—a lot of legal and accounting type professions should be far more concerned, if there is cause for concern.
You point out, a lot of your job could be automated and that’s a good thing. I agree, but I still think it presents a problem in the sense that, if you say that forty-seven percent of a particular job can be automated away, there is a reasonable argument there to say that means the same amount of productivity could be accomplished with half as many people, or we can do more with that excess fifty percent productivity, but the pace at which that innovation comes across in other industries, is going to matter if it happens too fast.
I agree with all of that. And you’re right, the forty-seven percent, there’s plenty of examples of both things happening. Clearly there are far less people employed in agriculture because the cost of producing food plummets, but there are more bank tellers now than there were before the ATM was introduced, even though we automated much of that job because the cost of opening branches went down and banks opened more branches. So both of those can happen.
And I certainly don’t want to cast myself in a role of… I mean, I think we ought to be doing more as a society to prepare people for different jobs to help people ease in transition. I think there are all kinds of problems we need to solve. I think the germane question is, are things different this time? And that’s the one that I think bears scrutiny.
Let’s switch gears one final time, as we’re running out of time here, I’d like to hear more about data. world. You’re the co-founder of that with Brett Hurt and there’s another co-founder.
Two other co-founders, Jon Loyens and Matt Laessig.
Three questions; why did you start it, what are you trying to do, and where are you at?
Okay. So we started it. Jon and I have known each other for about twenty years. We both moved to Austin around the same time.
When you got out of prison if I remember correctly?
That’s correct, when I got out of grad school.
Oh, yeah, that was it. I knew it was one or the other.
We both worked for a company here in Austin called Trilogy, fairly well-known company from the first dot-com era. We’ve remained friends since then. We both ended up, through different paths, we ended up working together at a company here called HomeAway. It’s also where I met Matt. And Matt and Jon had worked with Brett at his previous company, Bazaarvoice. So, that’s how everyone was interconnected and knew each other. Brett and Matt actually have known each other for twenty years, going back to their business school as well. We’d all worked at a number of companies here in Austin, both started and worked at a number of the big success stories here in town.
We got together to brainstorm new business ideas and tried on a couple of ideas. The core idea for data.world, I guess I can say is something that I had originally conceived of, although it really became a real business when we got everyone together. The idea behind it was that there’s a great technology that’s been around for a long time—came out of the AI community—called Semantic Web, linked data. It’s a way of encoding data in a very standardized fashion, based around connected graphs of data, and the ability to layer semantics and meaning about what data is into the data. That technology has been really well developed in academics and certain industries, but hasn’t really made it through to the mainstream.
What we wanted to do was add a lot more intelligence to build an interconnected network of data to help people working with data and machines working with data, use it to its best capability to solve problems. The core of what data.world is a social network for the data itself, a way to bring data sets together, connect them, identify what real world entities that data’s about, and allow people to follow links from one data set to another, using this technology. But also to make it easy for people who are working with data to work together, and build a social network of people working with data to employ them as they’re working with data, to iteratively make the data itself more intelligent, bring more semantic information into it.
I got excited about the Semantic Web way back in the day, and it never really caught on. There’s an archived copy of the web called Common Crawl, with several billion pages of the web, and at one point, we pulled down several million pages to look at what percent of them had encoding in them, and it was two percent or three percent.
I’m surprised it’s that high.
And yet it was advocated for by the man who invented the web. He pushed really hard on it. Was it just too much hassle for the regular website person to encode, this is a person, this is a date, this is a place, this is a birthdate?
Yeah. I think it’s definitely an idea that has a bit of a learning curve, a bit of a mental model shift to comprehend, and I think that’s one reason it’s been somewhat slow to get adopted mainstream, and on the web, which is where it was intended. The technology itself, the RDF underlying technology as a data model, has been really well-developed in academia, but also then in industries like banking and finance, in pharmaceutical, and life science research.
Google leverages this extensively for things like their knowledge graph. Companies like Goldman Sachs have built their data lake, their model of the world, based on this technology. What you have is that large, wealthy organizations who have the capability to make that investment and get over that learning curve and really deeply ingrain this technology in how their organizations think, have absorbed it, have developed it, have really leveraged it to great effect.
It’s like any network effect concept. When the network is small, the incremental benefit to joining it is small. And if the cost is relatively high because the technology is complicated, there’s no incentive for the vast majority of people to do it. If you look back to Tim Berners-Lee’s TED talk in 2009—this is years after they’d developed the Semantic Web, and even then it wasn’t really catching on—and he came out and gave a great TED talk, and the main point he made was, I’m here talking to people at TED, because I feel like TED attendees are the sort of people who would do something just because it would be great if everyone did it. And I think that’s exactly the way that Semantic Web has gotten to where it is, is that there’s a certain number of people who get that this is really the best way to model information, as universal identifiers for entities, and modeling the relationships, having the intelligence, the knowledge about those entities be embedded in the relationships between them.
There was project out of the University of Galway where they’re measuring the linked open data cloud, basically the number of linked open data sets on the web. And if you go back and look at the evolution of that network since the inception of the Semantic Web, it’s still relatively small. If you look at the latest 2017 numbers, there’s only about 1,800 to 1,900 nodes in that graph. But the growth is absolutely exponential. It’s got the right shape to it. It is growing in the right way to believe that as the graph grows, the value in being part of the graph grows and more people have an incented to grow it.
So, I follow you guys pretty closely, and I see when Brett tweets, “We just added this data set, this government data set. It’s a great resource.” What does that mean, you added a data set?
So any data set that comes into our platform—whether a user brings it or we bring it in through a partner like a government agency that’s publishing data sets, any structured data—we parse that data, we build an RDF model of that data, so a linked data model of data. Now, initially, if that source data’s just a CSV or some tabular structure, that initial RDF model of it is not necessarily all that much more intelligent, right? It can just be a straight transliteration of that structured tabular data into an RDF model.
But now that it’s in that model, all the data in the system, regardless of its source format, is in an interoperable format. It’s all in this RDF format where everything is represented as atomic facts—subject, predicate, and object type relationships. And what that allows us to do is treat all of that data like data, homogeneously. I can go analyze that data, query it, understand it, we can start to find patterns of behavior.
A lot of what we do is look at how data scientists, analysts, and researchers work with data, and a huge percentage of everyone’s effort is spent cleaning, prepping, and understanding the data that they’ve acquired for how they can use it and how it applies to their problem. And that work is knowledge engineering. What they’re doing is effectively aligning that data to some taxonomy. They’re figuring out what that data means.
The problem is most of that work just gets thrown away. Most of the time, somebody will bring in a number of data sets, they will do that work to prep it and clean it for the purpose of their project, they’ll produce their result, and all that intermediate work is maybe just thrown away. At best it might be kept in a team repository, something where that person could go back and reuse it again, but that work is never kind of published back alongside the data.
What we want to do is take in those data sets, build this RDF model of them, and then, as people are working with the data and identifying what the real meaning behind the data is, what concepts this data is about, capturing that along with the data, so the next person who comes along doesn’t have to repeat that work.
And every time you do that, you build another linkage from one data set to everything else that’s about those same concepts, so that I can do things like, you know, if you bring a CSV to the platform and you’ve got a variable called zip code, that contains five-digit numbers, we can identify that it’s not just a set of five-digit numbers, those are zip codes and that’s a real world entity. The census uses something called “zip code tabulation areas,” which are basically polygons that it’s computed that roughly approximate the postal codes. Well now that I know you’ve got data that aligns to zip codes, I can surface to you that I have US census data here, would you like to see your data also broken down by racial demographics or by income, or by anything else that the census provides.
Stephen Wolfram who I mentioned earlier, has a site called Wolfram Alpha, and it’s an answer engine. And what he has purported to do is bring in all these data sets and do, kind of, what you’re talking about, normalize them, such that you can ask questions like, “What is the total number of wars started by presidents born on Fridays who had older sisters?” and it could answer that.
A question that’s never been asked before, it can answer it because all of these disparate data sources have been, as you’ve said, tagged in a certain consistent way so that it can be queried across multiple data sites. So, business aside, technically-speaking, is what he’s doing, what you’re doing?
Technically, there’s a lot of similarities. Like, you know, Stephen would talk a lot about computable data, the goal is to get your data into something that’s computable. And I think that the approach that Wolfram takes is great, and it’s necessary. I think to highlight the distinction between what Wolfram does and what data.world does is, Wolfram’s approach is very top-down. A lot of the value that they add is in taking these sources of data and curating them and hand tagging that with a team of ontological experts. I think that is certainly necessary to get some of the highly articulated, kind of, concept mapping that you’re talking about. I think those things are great, I think they’re necessary.
What we’re looking at is where most people aren’t really benefiting from that, because they’re working with data in flat, tabular types of formats that make great visual sense to a human being when they’re looking at them, but aren’t all that machine-processable without embedding some of this knowledge into it.
So I think we’re looking at, how do we give people a bottoms-up tool, where people are the subject matter experts in their data, and they’re working with it and they’re actually interacting with that data and adding knowledge to it as they work with it—how can we capture that knowledge in much the same way, and the same kind of technologies as something like Wolfram Alpha?
So, on one extreme, you have Wolfram, which is a system in which they are ingesting everything, and they are normalizing it, and it’s closed in the sense that I can’t just upload my data set to it. And on the other extreme, you have Tim Berners-Lee who says, “Hey, everybody should do it this way.” And you’re on that continuum saying, “Hey, we’re going to make it so that other people can add their data or reference their own data to these other data sets that other people have referenced as well.”
Yeah, absolutely. I think that’s a great way to look at that spectrum, and data.world as being a tool to help further the Tim Berners-Lee end of the spectrum. It’s a tool for anyone to be able to do this without having to become an expert. You know, a great analogy I like here is in the early days of the web, if you wanted to put up a webpage you had to learn HTML, you probably had to know something about HTTP, and how to set up a web server, then along came things like Blogger, and now the popular choice would be Medium.
Now, if you want to go post content on the web, you need to know how to write English and understand how to use a really simple user-friendly site like Medium. I think that’s where I’d like to see us sitting, a little bit more of a technical platform than Medium, for sure, but something where people can be publishing semantically-linked data, without having to really have a deep concept of what that means or what the technology that underlies it is.
And then what does the business look like of that? Who’s charged for what and who gets what for free?
Yeah. You know, we are currently in what we would call our preview release—although we’ve been in our preview release for about a year—and really what that means is all use of the platform is free for everyone right now, for any purpose. The thing that I can say about that is that we definitely want the majority of people who use data.world to always be able to use it for free.
And the things we want to be able to support are, if you’re publishing data and making it open and available for everyone to use, and if you’re using our platform to find other people’s open data, collaborate with them on things that are open and made available for anyone; we want you to be able to do that for free. We will monetize on folks who want to use our platform to use those same tools on private data, on data they keep to themselves, to have that data in proximity to a lot of open data so they can use that for analysis.
Two questions along those lines: One, what do you do about data quality? Because if I had just a few zip codes wrong, and somebody else has a few wrong, won’t errors in the data multiply on each other because you’re linking based on their flaws and my flaws? And then this third data set, it has flaws, so are you magnifying errors throughout it?
No, not necessarily. If you think about people posting erroneous, flawed, perhaps intentionally-misleading content on the web—there’s wrong data about everything on the web right now—the power you have is the power to decide which things you want to read?
Anyone can say anything about any topic on the web, and the same, kind of, fundamental precepts of Semantic Web is that any statement can be made by anyone about any topic. You get to know the authority of, the identity of who you’re listening to, and what you want to pay attention to. So while the burden of what data do you want to trust still rests with you, what we want to do is provide you a place to put that data out there, and to have some of that social signal around who else is using that data and for what purposes. And if the providence of this data is no good, be able to communicate that back to other users of the system.
Is the data in the system static? Or is the data manifested as feeds? So, for example, what if the data is the price of IBM stock which changes by the second? Does it handle dynamic data sets as well?
I guess the short answer is, right now, we don’t do a lot of anything that would be approximating near-real-time type feeds.
But, I mean, every data point is just a piece of data that’s true in that moment, even census data.
Exactly. Well, I think it’s interesting you brought this up. That’s true, but the census data stats would be, well this would be the 2015 census, right? It’s sampled at a moment in time.
But it has to have a time next to it. Every data element has to have a time associated with when it was true.
That’s right. There’s a temporal component, some data sets are just organized that way. Census is basically an annual sampling, so it’s very explicit. Most, kind of, reference data sets exist like that—they are sampled at a point in time, and that point-in-time sampling is encoded. The way we think about continuously-streaming data—the way that big data is generally handled for all types of streaming data, this is consistent with how we’ve kind of laid out our system—is, ultimately, everything is kind of file or batch-based when you look at the over-time accumulation of data.
So you’re accumulating a continuous stream of data, but you’re going to sample chunks of that data and write that out as batches or files. The big data processing frameworks out there—Hadoop, Spark and all the large parallel processing—what they’re really good at is taking lots of these files, distributed over large computer clusters, and computing things on them very quickly, and as real-time data coming in. You’re continuously taking batches of that data and adding it to that for batch processing.
And then if you want to get true near-real-time things, the current state of the art is what’s called lambda architecture, where you’re basically doing large batch operations on historical data and then you have a layer that just really keeps the last buffer of time of that streaming data, and you query about those systems and you merge them together. And that’s very much consistent with how we think about evolving to handle more and more stream-like data.
But most of the data sets on data.world, I think, are of the type where they are already sampled at a point in time. They’re either large reference data sets like census, or large derivative reports that are produced at a period of time, or there are things that are streaming that we’re putting in the daily sample, the weekly sample, the monthly sample of that data.
So bring all of that home for me, last question—second-to-last question: There’s a lot of people listening who, today, should go use your service now. Give me a use case, “If you’re a ______ with a ______ and need the ______, you should use data.world.”
Sure. If you’re a person with a problem that can be solved with data, you should use data.world. We don’t want to be everything to everyone. We do want to be kind of the collaboration tool for everyone working with data, so we think about a lot of different personas.
If you’re a data scientist who wants to pull data into a Python environment, we have a good tool for that. If you are a lot less technical than that, and you just want to look at interesting data and facts, and see some analyses and conclusions that other people have built from data, you can do that. If you’re a business analyst who’s accustomed to working with tools like Excel, you’d be very comfortable using our tool.
And we think that teams of people are really effectively using data.world—academic research teams, teams inside of companies that are doing data science and business analysis, and a lot of civic hacking groups, like Data for Democracy, are bringing people together to work with public data and use it to solve social problems.
Alright. And so the last question is how do people keep up with you, personally, if you maintain an online presence and Twitter and all that, and how do they keep up with data.world?
So you can track me on Twitter. I’m just Bryon Jacob, although my name is spelled funny. It’s B-R-Y-O-N J-A-C-O-B. I get Byron a lot.
And I get Bryon a lot, too.
Data.world on Twitter is @datadotworld with the dot spelled out. And the website is just data.world—it’s a .world top-level domain.
Everything on the platform is currently free and available to everyone and we want to keep that the case for the majority of users forever. So, go sign up and check it out. I think you’ll find something you’re interested in. If you’re listening to this podcast, there’s almost certainly something on data.world that you would find of interest.
Wonderful. This has been fantastic. We ran over. That’s always a sign of a great conversation. And I want to thank you for taking the time. It’s been a delight.
Thank you, Byron.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here.