Episode 107: A Conversation with Nir Bar-Lev

In this episode, Byron and Nir Bar-Lev discuss narrow and general AI and the means by which we build them out and train them.

:: ::

Guest

Nir has almost three decades of technology and business leadership experience. Prior to Allegro AI, Nir spent a decade at Google in senior leadership roles. Nir started his career at Google as the founding product lead for Google’s voice recognition platform, which today powers Google Assistant. Nir’s most recent leadership roles at Google were serving as the product lead for all of Google’s Search Advertising in EMEA – a business representing at the time over $14B in annual revenues; and culminating as the General Manager of BebaPay – a mobile payments solution. Nir is a University of Pennsylvania, Wharton Business School MBA graduate, an experienced engineer and an IDF elite technology unit alum.

Transcript

Byron Reese: This is Voices in AI brought to you by GigaOm and I'm Byron Reese. Today I'm excited my guest is Nir Bar-Lev. He is the CEO and the co-founder of allegro.ai. He holds a degree in law and economics from the University of Haifa. He holds a Bachelor of Science in software engineering. He holds an MBA from Wharton and probably a whole lot more. Welcome to the show, Nir.

Nir Bar-Lev: Hi Byron, thank you so much, I'm honored to be on the show.

So I'd like to start off with just a signposting kind of question, which is about the nature of intelligence and when we talk about AI, do you think we're really building something that is truly intelligent, or are we building something that can mimic intelligence, but it can never actually be smart?

So I think that you know when we talk about AI, there's always a futuristic talk about you know ‘General Intelligence’ which is attempting to really mimic human intelligence. But apart from academia and maybe a handful of locations in the industry, when we talk about AI in general we're actually talking about the ability to solve specific problems and really actually marry a couple things, right? We're mimicking the ability to learn a specific problem and how to solve it. And we're marrying it with actually some of the things that computers have already and have always done better than humans, -- which is being able to manage and manipulate huge amounts of data really really quickly, and do calculations really really quickly.

And when you think of it like that, where do you think we are? Do you think we have key insights that are going to serve us forward? You know, that like we're knowing fundamental truths or are we still like groping around in the dark, like even the techniques we do now may seem antiquated and outdated in a few years?

Yeah it's a good question, and you know first I have to say that I feel less equipped to answer that than some of the professors in university. I'm coming at it from a very industry specific viewpoint and really practical viewpoint, and as you know from where I'm sitting, we are just at the beginning of a revolution around again being able to solve very specific problems with AI much much better. And that is going to open a huge opportunity for us.

At the same time we're very, very, very far away from general intelligence. So I think that that's not necessarily going to get us there. The practices that we're using today in the industry and where the development is you know seems to some extent incremental in the sense that we're using deep learning as really the forefront of AI, but there isn't anything that is revolutionary in what's going to happen in the next, I would say, five to ten years. And those are revolutionary things are going to come from academia. What we're going to see is incremental developments in the science, but use revolutionary developments in the applicability of the science that already exists.

So your company is specifically trying to solve one problem relating to computer vision. So describe what you're trying to do and why it's so hard?

Absolutely. This goes to the heart of the applicability of what we're doing. So let's start with with maybe a context on AI and deep learning and why is it "intelligent." So when traditional software engineers try to solve a problem, they are basically actually tasked with building out this workflow or this idea of ‘if this, then that’ for the software that they want to design where the software they need to see out in advance what are all the different situations that this software is going to incur and what to do there in order to reach a certain goal.

And then once they design that, the task is really to simply translate that into something that a machine or a computer can understand into codes. Whereas with AI and deep learning specifically, the idea is that we have an algorithm called a neural network and that's a very, very simplistic way to try to mimic how the brain works. Literally it's a network of neurons and nodes that, through a process of training work fermentation a.k.a. "learning" builds this flow all on its own, and obviously then by definition, it's already translated into something I even understand.

Let me explain that a bit further. So let's take an example from computer vision: if in traditional computer vision we wanted to identify, say a person or a face, what scientists and engineers were required to do is they were required to figure out what differentiates humans from anything else or what differentiates one face from another. Come up with those parameters and then turn them out and turn them into mathematical formulas or vectors that can identify them, [and] code map. So for example a human, and this is obviously very simplistic, has two legs and two hands and protruding things and maybe the texture is in this way or that, obviously the color and there is always something that looks different, maybe the head, maybe the hair, and they literally have to figure those things out to come up with some sort of mathematical identifier of the human.

What deep learning will do is it will look at an image for example, and it would try to figure out those things by looking at all the different possibilities to identify a human, all the different traits of that image, and come up with something similar. But what's interesting is that we as humans have a notoriously difficult problem in really being able to describe what is different physically, in terms of something, one thing from the other. Whereas a computer actually isn't [challenged by that]. They can look at an image and look at multiple other, you know mathematical identification, call it ID prints, that we may not even notice as humans. And in that way it actually can come up with a result that's much better.

So it's going to go and it's going to look at all the different ways that an object is different from something else and codify that, through a training process which we can dive in deeper. So that's the way that it works. The idea behind this as you can see that this is a very, very different paradigm than traditional software, and the result of this is there's really two things. One is you need to have different people that actually manage this training process, as this process where they feed the neural network information to try to figure this out on its own and that experimentation process takes time, which is actually much closer to a scientific process than an engineering process. Actually the work that they do is actually closer to what a chemist or a physicist would do in a lab than an engineer.

And once you figure out what you want to do; once you figure out in this new paradigm that you want to adopt it because it gives you a better result, let's say you want to build an autonomous car, then you figure out that deep learning is able to much better identify different objects on the road that you need to be able to feed into the system for the car to navigate, you're going to have to adopt this paradigm. And with any process or paradigm, when you want to turn it into a product, you need to put an engineer back into place because a product is something that's engineered.

You want to scale up the process. You want to be able to repeat it, you want to be able to target a certain quality level in advance that you want to make sure that every unit that is built is at the same level. You want to do it faster and more affordably, and so you need a production line or you need infrastructure or you need a tool chain. It's called different in different areas but it's all the same. And to build that production line doesn't really exist today commercially. A great example of that is what Henry Ford did about 110 years ago with a Model T. Henry Ford did not invent a new car; Henry Ford invented the production line. And so before the Model T or before actually the production line, only people like Rockefeller could buy a car and after that anyone could buy a car because he could develop it en masse and at a certain quality level. And it was very much lower unit cost.

That's the problem we're trying to solve. Give companies this production line and also enable the people who are very different in terms of the skillset, the research scientists [with] tools that they don't have. If you have a master craftsman, if you have a master carpenter, that carpenter is still not going to be able to saw a straight line if they don't have an electric saw. Similarly here, you may have master research scientists, if they don't have the tools they're not going to be able to develop the highest quality products.

I gotcha. So, I understand what you're saying that there's a lot of infrastructure we need in place in order to be able to scale a lot of these projects. But I'm curious about vision in general like when I look at a pantry full of food, I can see you know there's a can of beans and a cake mix and what I'm doing is complex in the extreme, right? I'm using subtle cues of shading to see, ‘oh that's a circular can or a square box…’ I'm using a lifetime of experience that cans and boxes are in pantries, I'm using a lifetime of experience that ‘oh that label is a well-known brand.’ I can spot things like ‘oh there's a swollen can that's probably gone bad, there's a ripped box, there's a box it's fallen over, there's a box on its side, oh, somebody accidently put the dog's toy in here.’ All of that I do in a fraction of a second. And how do we teach computers to do all of that?

That's exactly the point. Going back to what we said before, really really at the beginning about the neural networks that are at the heart of this process, are very, very simplistic at the end of the day. And it's exactly what you said: you're using a lifetime of experience and you can then also deduce things really quickly in a second that you may have never even seen before, -- to come up to conclusions, and you can also use a very small amount of data.

If you're looking at a pantry or an example of identifying cats and dogs, if you're a two year old child or a one year old child and you're seeing a cat for the first time and you were told that that's a cat. That's it. You don't need anymore cats to be able to identify cats. Well because neural networks today are so early on, what we need to do is we need to actually mimic this lifetime of experience, but we also need to do much more than just show it a single image of a cat or one image of what's available in a pantry. You have to show a lot of images and we have to do it in multiple ways; whereas we can see some things that are partially occluded or from different angles and different lighting conditions and be able to automatically remove the aspects that the lighting conditions can create on the object to be able to separate you know the visual criteria that are coming from the lighting conditions to the actual physical criteria of that object.

To do that for a neural network, we're going to have to actually expose it to lots of images in different conditions to be able to teach it to make that separation. And so what happens is that we leave the process where it's all about being able to collect data that is representative of the physical world, -- enough of that, and also be able to identify those edge cases. Edge cases are those situations where the objects that you're trying to teach aren't exposed to the sensor in a very simple way where it's OK, it's a great lighting conditions, it's right in front of you, that’s great. But no, it may be in bad lighting conditions; it may be from a very bad angle partially occluded so that the system can still identify it.

That's really the work, those are the the edge cases and those are actually the biases. And biases are I think another angle that we need to look at because again think of it as something very, very extremely simplistic human. If you're trying to identify something in the pantry, there may be an object that's very rare there. I don't know maybe someone put... It's a pantry for all kinds of things that have to do with food and suddenly there's a detergent in there. Humans have seen detergents before and so they may know OK that actually doesn't belong, like I identify that as a detergent. But if all I did was you know expose a neural network to lots of images of pantries that never had detergents in them, well then that system is never going to be able to identify that as a detergent; or put another way, if we only expose it to a single picture out of millions of pictures of pantries to teach it to identify these objects in the pantry and only one picture has the image of a detergent because it's very rare, well there's a trait for neural networks that they tend to forget, just like humans do. It won't be exposed to enough images with detergents, and so then the job of the research scientist is actually to set up a dataset, that to some extent reflects the reality, but also addresses these edge cases and biases and for example, actually gives it slightly more images of these detergents than the data that actually has originally, so that the neural network can identify these weird edge situations.

That's where 90 something percent of the work of the research scientists, once they've built it in as a model, that's where they spend their time and that's actually something where it's severely lacking today in terms of tools that can support that process.

And so where are you in the lifecycle of your company? Your company is about three years old and so where are you in development?

So this is a very, very tech-heavy problem that we're trying to follow. And so we've spent quite a while on building the toolchain and production line and we've gone to market. We've been engaging with customers in alphas and betas for a year and a half now and really at the end of 2018, we started actual sales. So we say we're in the early stages of commercialization.

You know it's interesting because computer listening has all the same exact problems. It has to take a voice, and if a car drives by behind it, remove that car. It has to deal with letters that are very similar sounding, words that are homophones. Do you think to say that whatever technology you're building that assembly line would be applicable to other problems in AI?

Yes absolutely. So actually if you're thinking deep learning, deep learning itself in those situations where we need to identify objects or situations that are unstructured in nature, like images or like audio files or like natural language processing, -- where it's very difficult to actually be able to codify languages, and that's where deep learning itself... these are really the areas: computer vision, NLP and voice recognition. Our platform is actually very agnostic. It will help any company that's addressing perceptive data -- what we call perception -- that is any sensor data that's used to identify the physical world can leverage what we're doing, beyond just computer vision from audio, or think of sensors that are used in manufacturing and in multiple areas, [with] multiple benefits.

So if you take a vertical area of vision like just recognizing the differences in faces, and there's a lot of good that can come out - I mean a lot of bad too - but there's a lot that can come out of that. You can verify people's identity and therefore they can do banking online with their device. You can find criminals walking... you know there's all these things you can do in theory.

How far along is that that single problem of identifying faces, because you only have you know seven billion people on the planet, and granted sometimes they have beards and then they shave the beards and then they're wearing sunglasses. Then they have a tan and they don't have a tan, then they have a band-aid on their face and all of that. But where are we with that one problem, do you think?

From a scientific perspective it's a solved problem, already solved. So from a practical application, can a single piece of software right now identify every person? We’re not quite there yet, but when you think of who's the most advanced in that, it's actually China, and why is that?

Because they have half a billion people that are unbanked, that need to be able to conduct [transactions], verify their identity on their phones. Right?

Well that's one side of the equation. The other side is: remember again, to be able to teach, to build a detector, that's a model for a specific problem, you need to expose it to a lot of data. And in China some of these leading AI companies have gotten access -- and the government as you know also publicizes this -- to literally billions of images of people from millions of cameras across China and used that to actually build a model.

And so when you have so many images, again, with enough time and skill you can build a model. In a way that's what they've done and they've solved the problem. Those detectors will not work well on Western or Caucasian faces or other types of faces because they don't have training data for that. And that's why in the West and the U.S. and other locations, we're behind because our culture takes privacy slightly differently, but scientifically it's a done problem; it's solved.

That's a big statement because I mean like explain that -- why it's solved, because it's simply a matter of what?

The human face has enough differentiating characteristics, unique characteristics and we have enough, literally strong enough computer centers to be able to pick up on those today to the quality of video that we can get that's relatively cheap. You know HD, 4K etcetera, that we can really build a model that can uniquely identify one person from the other. The only thing we need right now is to have... a training set to train that so it can do the job.

And again if you think of China that's also a ‘done thing,’ they have big enough data sets. If you're seeing what they're doing there, they only have applications where they can feed in, they have cops walking around with glasses like you know the Google Glass that have computer vision in them. And they have a set of criminals that are fed and they can in real time identify those people in crowds of thousands and tens of thousands of people, and they've done that successfully. If we in the US had access to such a huge diverse dataset of people, we could do the same thing.

So, I'm an optimist about technology, I really am. Anybody who reads my writing would know that, but it does strike me that we've always [only] had privacy at all because there are just so many people. Like no government can listen to all the phone conversations, it's just impossible. No government can follow everybody everywhere, it's just impossible. But with these technologies, both of those things are quite possible.

Do you worry that these technologies are going to not only be misused, but misused at scale and that they're going to be used by totalitarian regimes to lock in their control and silence dissidents and all of the rest? Or do you think we'll figure out a way out of that knotty problem?

I mean I absolutely worry about that. I think that actually we're probably at an age where privacy as we know it, doesn't exist anymore. And actually that manifests itself because of these technologies and otherwise. But if you think of Facebook etc. that have nothing to do with necessarily AI, we now have enough data that's collected in locations where they're accessible and marry that with the ability to do big data analysis and then AI and really then privacy doesn't exist anymore. I think we're beyond that to some extent and we have to realize that.

And I think that we need to come to terms with that and figure out what we do and that's probably something that's beyond what I would, as a technology company, do. I think that this is something that governments or... this needs to be a wider conversation that happens in society, because if you give this to companies who are only motivated by making more money to their shareholders, you may come out of the other end with very bad results. If you just give it to the government, you may also come up with very bad results (i.e., NSA etc.). You know you talked about simply ‘too many’ humans. I'm not sure of that. I think we're already there. Every single phone call and everything that's happening on the Internet is already being monitored.

Well it's a sobering place to leave this conversation. So Nir, where can people keep up with what you're doing and with what your company... the URL is allegro.ai. But other than that, how can people keep up with your fascinating work?

I think that certainly you can go to the website. We also actively blog and also write things on social media and on LinkedIn, so you can go to our company profile on LinkedIn and follow us. We're also starting to now appear at trade shows and industry conferences. That's probably more to the industry level. Just this month we're going to be at the computer vision conference in Santa Monica at the Intel partner booth, and then next month at the AI Summit in London with Net App, so those are other places where you can keep up with us and it's all on the website and on our LinkedIn profile.

All right, well thank you so much. It's been a fascinating chat and I wish you all the best.

Thank you very much, it was a pleasure.