Byron speaks with Cindi Howsen of Thoughtspot about the direction of explainable AI and where we are going as an industry.
Cindi is an analytics and BI expert with 20-plus years’ experience and a flair for bridging business needs with technology. Cindi was previously a Gartner Vice President in data and analytics, lead author of the Analytics and BI Magic Quadrant, data and analytics maturity model, as well as research in data and AI for good, NLP/BI Search, and augmented analytics. She introduced the BI bake-offs and innovation panels at Gartner events globally and is a popular keynote speaker. Prior to this, she was founder of BI Scorecard, a resource for in-depth product reviews based on exclusive hands-on testing, contributor to Information Week, and the author of several books including: Successful Business Intelligence: Unlock the Value of BI & Big Data and SAP BusinessObjects BI 4.0: The Complete Reference. She served as The Data Warehousing Institute (TDWI) faculty member for more than a decade.
Prior to founding BI Scorecard, Howson was a manager at Deloitte & Touche and a BI standards leader for Dow Chemical. She has an MBA from Rice University and a BA from the University of Maryland.
Byron Reese: This is Voices in AI brought to you by GigaOm and I'm Byron Reese. Today my guest is Cindi Howson. She is the Chief Data Strategy Officer at ThoughtSpot. She holds a degree in English from the University of Maryland and an MBA from my alma mater, Rice University. Welcome to the show, Cindy.
Cindi Howson: Thank you, Byron. Yes Rice, where they had more trees than students.
That is one of the things that was in the original charter, that it'd always be that way and it makes a real difference on the campus, doesn't it?
It does. It's a beautiful campus.
And you know a lot of people aren't big into living in Houston. But the part where we lived had those beautiful oaks that hang over all the streets and it has all the museums and it's just a really, really lovely place to be. I'll only say one one small short story and then we'll get onto the topic at hand. But at Shepherd and Westheimer there's a Randall’s grocery store. And it might be Shepard and West Gray. And it's that grocery store that Boris Yeltsin went into and saw it and he said it was that moment he knew that the Soviet Union wasn't going to be successful, that their system had failed and that if people in the Soviet Union knew that this is where Americans get to shop, there would be riots the next day. And he pointed at that one grocery store in that one moment where it all became very clear to him. And I always think there should be a plaque there and there isn't. It's just a Randall's grocery store.
Interesting, well maybe you need to make that happen Byron.
It's a historical moment. Probably it's a Kroger's there now.
No, I drove by it recently to check because I was showing it to my son and I was like, hey that's where they realized communism failed. So in any case, thanks for being on the show. So you're the Chief Strategy Officer at ThoughtSpot. For those of the listeners who don't know, tell me about ThoughtSpot. What is the mission? What do you do?
Yeah. So ThoughtSpot is a BI [business intelligence] search and AI and analytics technology provider and we give users, any users, particularly business users, the ability to ask questions of their data using search and natural language query. And that is a very different paradigm than how most BI vectors have people work with their data, which is more power user oriented, I would say. The other thing where ThoughtSpot is very early to the market is on this next wave of disruption combining AI and analytics, -- telling you what you didn't even know to look for in your data: the hidden patterns, as we call them.
So let's start with the first half of that equation. Taking BI down to something that doesn't require a power user. Give me kind of a real world example of that. Who would be somebody that normally wouldn't ever think of themselves as a BI consumer who now would be?
Yes. Keep in mind Byron, I have worked in the BI and analytic space, it's almost embarrassing to say, but for more than 20 years now, and we have made some improvements in bringing data and BI to more people. But I do think it's more into power users.
So I'll give you an example of one retailer [and] imagine a college student even with no training, and their goal is just to sell telephones, smartphones or particular plans. So they can just type in the keywords and say “Show me my commissions this month versus last month” or “Show me how many people bought an unlimited data plan vs. a ‘pay as you go’ plan.” And it's as easy as a Google search or Google-like search. A number of the founders do come from Google, so they bring that IP to the product.
So you're saying it's a natural language, minimal training kind of approach to just ask what it is you want to know and it shall be answered?
It is and as you're very technical, and as your listeners are also very technical, I want to parse these words: ‘natural language’ because natural language processing (NLP) does have a precise meaning in the AI world, but search is also a particular type of technology and some vendors only take one approach.
ThoughtSpot uses both search, so if I find that keyword, I'm going to leverage it and I'm going to use algorithms to give you the best hit on those keywords, but I'm also going to use natural language. So if I wanted to see store sales this month versus last month, “versus” may not exist as a keyword, and those time periods don't exist, so that's where the natural language part comes in. But it is a combination of search and natural language. Did I get too picky there Byron?
Not at all. It's a useful distinction actually. And then the other half of the equation was bridging the world of AI and analytics. Can you talk a little bit about that? Like what's the end goal there?
The end goal is really to create a platform that is human scale. So one of the founders keeps saying he has a mission. So it is now my mission, too, to create a more fact driven world. And right now facts are hard to get to. Data is hard to get to. It's almost easier for somebody just to do a search on the Internet to query their own internal corporate data. But if you're trying to analyze a particular problem, if it's a manufacturing quality problem, being able to ask those questions and get to those insights easily, quickly is what we're trying to deliver.
I started a company called Knowingly [and] the goal of it was to solve what I thought was the biggest problem on the web, which is: you don't know what to believe and what not to believe. And it's really hard to say something is objectively true, in large part because everything really depends on the definition of the words you're using, [like] natural language versus search, for instance. And so how do you kind of get around that?
How do you make sure that (and there may not be a simple answer to this) but the question that is being posed in whatever form is answered? For instance, let me give you an example. So I had an Alexa and a GoogleHome on my desk, and I wrote this article about how they answer questions differently. Questions they should answer the same, like: ”How long is a year in minutes?” and ”Who designed the American flag?” And for both of those questions, they gave me different answers. And the reason with the first one was: one of them gave me a calendar year and one of them gave me 365.24 days (a solar year); and in another one, it was: one said Betsy Ross and one said Robert Hecht, and Robert Hecht was the guy who designed the 50 star configuration. So there was an inherent ambiguity in the question that made a [single] answer very hard. How do you solve a problem like that?
Yes, [that’s] so fascinating because you already introduced some concepts that I do worry about: the state of the industry here and that is also what are the biases both in the data sets, but also in the people programming the algorithms. So the first thing is the question you posed was to Alexa and GoogleHome. What they did is they went out to the Internet and ran a search. What we're talking about here is structured data, and that is largely internal corporate data, also some of our clients are bringing in weather data, economic data, what have you. But we're talking about numbers, -- not that World Wide Web of documents for example. So there is already a little more trust in that data.
Now I do think though your question is still a valid one. If I ask a question [such as] “How is manufacturing quality or on time shipments out of this particular plant?” Can I trust that data? So where did the data come from? We expose that to any user, so some of our design principles are trust and transparency. You always have to get accurate, correct results because as soon as you have a query that generates wrong results, nobody is going to ever trust it again. It's harder to get people to come back.
So we do give users the option to look at what was the actual query that was generated. What does that look like? And then users have control over saying, if they're using for example, one of the AI generated insights. So if you say “Why are our on time deliveries late? Why are we so much worse this quarter?” — the platform will generate a series of insights for you automatically running all these algorithms. Well you can say, “Was it based on weather? Was it based on a new driver? Was there some outage?” Or things like that. And you can toggle and say some of these things are noise and take that out of the feature selection, for example. So you do have to start with: Do we have the data? What is the source of the data? But also exposing the details behind that I think is what gives people confidence in the answers. Does that answer your question, Byron?
I think so. I mean there's no easy answer to it. Unfortunately language is ambiguous and people when they're asking the question [might say] “I didn't even know if I was asking a solar year or calendar year.” In any case, so you mentioned something about pushing information to people before they even know what question to ask. That's kind of the Holy Grail isn't it?
It is. It absolutely is. So it's about (I like to use this analogy), it's almost a cliche, but we talk about finding the proverbial needle in the haystack. So at a minimum, I think organizations are starting to get to that very basic analysis descriptive: what's going on. We're getting a little better at the diagnostic: why. Why did something happen? But now the predictive: tell me what is really important. Give me that signal above the noise; give me that needle in the haystack and I don't want just the needle, I want that gold needle or I want that pitch fork that's going to really be a problem.
And so this is the Holy Grail and the market at Gartner we talked about -- these waves of disruption in BI and analytics -- and this third wave of disruption that Gartner calls, as an umbrella term, ‘augmented analytics.’ We are very early in this, but ThoughtSpot is one of the first to market with these capabilities. And so I do think this is just the future of BI and analytics.
It does feel very much like early days, like all of a sudden we wake up with all this data and all this computing power -- and incidentally all these toolkits -- that we can now apply to this technology. But it still feels like... you're almost more shocked when it works, you know like ‘oh my gosh it did that.’ We don't expect it all to work now. How long do you think it's going to take us to make huge strides in that needle in the haystack, [or] gold needle in a haystack, Pitchfork world?
Yeah so this is where we have to [decide]: are we talking industry or are we talking about ThoughtSpot customers? But I mean some of our customers that have adopted this early... I look at a telecom customer -- they are enabling this to all their telco customers. So this is tens of thousands of customers. Or we have a banking customer, where it's helped them understand credit card promotions, taking an analysis that used to take two weeks manually, down to just two minutes. Or a travel company, that again is providing it to all their travelers.
So I think we're early days as an industry, but some of our customers are sharing the insights that they're finding with this that it just was not possible before. I do think part of it is, it's almost that perfect storm of everything coming together. We have machine learning and AI getting to a certain point of maturity. We have computing costs, so the cost of memory.
One of the other ingredients that is necessary to make this work is it has to be on all data. There have been past attempts to bring that combination of a search and NLP to different products, and they were met with mixed success usually because they were small data sets. The ThoughtSpot starting size of dataset is 250GB, and some of our clients are analyzing 70TB. And again all of that, there's engineering that goes into it, but it is the cost of computing that has steadily gone down over the last decade.
Okay, so it sounds like it's further along than I was describing and that's great. So you know the debate around explainability and just to frame it: some people believe that if an AI makes a decision about you, like whether to give you a loan or not, you're entitled to an explanation of why. Other people say “Look, these models are so complicated…” and there is not necessarily a why. If you ask Google “Why do I rank number three for some search and my competitor ranks number two?” they may very well say “We have no idea in fifty billion pages. We can't say why they're two and you’re three, that just can't be done.”
And there are those who say a high hurdle of explainability is an impediment to development. Where do you kind of weigh in on that?
Yes. So I think explainable AI is really important. And I do get worried that the US in particular is lagging a little bit here. I've seen some policies coming out of the EU that is making this a more pressing issue and the challenge, I see it from both sides. So look, as a technology provider [whether it’s] ThoughtSpot, Google, Amazon, nobody wants to give away their own intellectual property. At the same time, if we do not support explainable AI as an industry, we risk having bias and discrimination at scale.
So let's take one of the big headlines. We could choose facial recognition, which is a hot button right now, or even the COMPAS algorithm that is allowed to be used in the US for sentencing. Nobody really knows what goes into that model. So why did a white guy who actually was never, had not done small crimes before, was convicted for six years in jail for really stealing/borrowing a car? That seems like a very aggressive sentence, but the COMPAS model said you know, he's a risk of repeat offense.
I think what we need to get to is at least explain what were the inputs into that model and that people understand what does that even mean. So with the facial recognition, -- for the police force to be using it, and if they set the match, at hey 85% is good. Well I don't know, not before jail time certainly. Now if I'm using recognition on dolphin types in the ocean, 85% [is] fine. I'm happy with that.
So having a literate workforce that at least understands what these things mean, I think it's very important. And getting to a certain level where we can reveal what went into the model, what are the variables you can control without giving away intellectual property so that another competitor can reverse engineer it, -- we have to do that as an industry. What do you think? Do you agree?
I think these are all open questions. I do question whether, perhaps the industry talks more about explainability than some people care because there are unexplainable algorithms that govern our life and people don't seem to care that much about it. For instance, your credit score. You know you’re 723, I'm 642. Why? This is opaque as opaque can be. And yet people are like 642, that's what I am. And so it doesn't necessarily feel like there's a groundswell among the general public to say “I want explainability.” So what would you say to that?
Well I think there hasn't been a groundswell because there has not been enough of a backlash. But I think as more organizations use these things and larger people are punished... So your credit score, your bad credit score, individuals or sorry, yours is a good credit score. If you had a bad credit score though, as an individual, you would have no power. Now if you start saying that well funding... there was something that happened where a town lost particular funding in the school district based on a recommendation from an algorithm, that now affects many more people.
So there has to be an ability to be organized in order for anyone to effect change. If it's an individual person, that was the credit score. Good luck trying to fight that. You would need some consumers group to help fight that for you. And I do think people are starting to get more educated about that, [and] other things...[like] if you cancel all your credit cards or change jobs, your credit score will go down. So there has been some degree of education in the past few years, but I think some of these things, as we start to hear about the wrong person being jailed, or I mean I'm upset about [how] in China they're using facial recognition to target ethnic minorities or to profile students in schools who are looking bored. Well why is the student looking bored? Are they bored because the teacher's ineffective? Are they bored because they're actually at the wrong grade level or really are they just not a good student not paying attention? All of those are possibilities.
But if we abuse these algorithms, these AI models, I think people will get organized. So the time is to come. Some of the incidents in the last year, -- how Facebook was used to influence the political elections, I think these times are coming and we're early days. But I would like to see technology providers stay ahead of the curve because I don't think legislation is the answer. I think that can stifle innovation and creativity.
So to be clear, are you willing to say: “Yes, explainability may slow AI development but that's a cost we just all have to agree is worth it”? Or are you saying ”No that's a false choice, -- that explainability and AI development don't have anything... are co-dependent factors, are independent”?
So I am saying that explainable AI matters and transparency and trust are core design principles at ThoughtSpot. So we will have explainable AI to the extent that I can influence that. As an industry, I want all technology providers to pursue that. My concern is that when we don't and it's not a level playing field, then we risk it being regulated and when it's the government dictating the rules about things that they may or may not understand, that is what stifles innovation.
So let's pull the lens out from BI in general. And I'm curious just when you think of AI and its impact on the world, are you in the end, optimistic about the future or do you think this is a technology that on balance is gonna be much better than the harm? Are you worried or is the jury still out? What do you think the future is going to hold with all of these developments?
Yeah so this is tough Byron, because I am a glass half full person, and some might even say I have rose colored glasses. And that's partly true. There are dangers. There definitely are dangers. But some of my research in the last two years around data and ‘AI for good,’ I tell you it restores my faith in humanity. Not that I ever lost it, I think there are some bad apples out there. But when I look at how AI and data is being used in novel ways, whether it's to improve the needs of the homeless... we just had Memorial Day. Oh my gosh, it breaks my heart that veterans account for a large portion of the homeless population across the U.S., it just kills me. So how do we address that? Can we solve the needs there?
Or when I look at the bad side of society, -- that technology has also accelerated human trafficking. People that prey on [others over] the internet, but AI is also being used to reduce the risk of human human trafficking. Then I think these are good things. And there is a lot that's happening in the medical community as well around better treatment, cures for cancer. I think we'll get there. I think there absolutely will be missteps and abuses along the way. So this is where I do look at the convergence of ethics and diversity has to come together. And I do think good ultimately wins over evil. So if we make data and AI for good just a core way of doing business, I'm optimistic.
Well that's a good place to leave it: on an optimistic note. I want to thank you for a fascinating half hour. Can you tell people how they can keep up with either you or ThoughtSpot or both?
Yes. So I am on Twitter. My legacy Twitter handle (I can never change this from my old company) @BIScoreCard or on LinkedIn or go to www.ThoughtSpot.com. You'll see me blogging at least once a month, and we have our big user conference in the fall in October in Dallas, and I'm at various events around the world.
We'll all keep an eye on what you're doing. Thank you so much. And I hope you'll come back on the show down the road.
Definitely. Thank you Byron.