FintTech, ML model explainability, and Google’s new ideas – a transcript of Episode 9
Hi and welcome to the Loka podcast. I'm your host Bobby Mukherjee.
On this episode, I talk to Chanchal Chatterjee who leads a team as a machine learning specialist at Google.
This is a great episode for anyone that wants to get a primer on how machine learning is being used,
particularly in the financial services industry, we cover a lot of different use cases for machine learning.
Including fraud risk. And for those who really want to get into deep dive into ML model explainability,
Google is clearly a thought leader in really applying ML to problems. And this episode covers a lot of
Google's tools and helping in that, including TensorFlow and covering newer ideas like auto ML. Enjoy.
All right. So I'm here with my guests and first things first I'd like him to go ahead and introduce himself.
Hi, my name is Chanchal . I am a machine learning specialist at Google cloud. My job is to come to customers,
take their business problems to machine learning problems, and lead a team to execute on this machine learning
problems and create solutions.
Well, terrific. Chanchal, thanks so much for taking the time. It's a real pleasure to be here. The first
thing I'd look to understand is, you know, there's so many how many different facets of computer science
that one could study for you? When was your first introduction to this world of machine learning and what
started to make you kind of gravitate towards that?
Yeah, I did machine learning long time ago when machine learning was not the hardest thing to do.
The reason I did machine learning is because I was trying to solve the optical character recognition problem,
and I didn't know exactly how to solve the problem by using vision. Every character was slightly different.
I went to Purdue. Versiti where I met. My professor was old time friend. And I said, how can I solve this
problem where you have numerous characters coming through and they're constantly changing. And so we took
the problem up as an adaptive learning problem, and we found out that the solution wasn't very interesting.
So I said we should solve some more interesting problems, like trying to find more important features out
of characters or any other signals advice signals. And that's how I embarked on the journey of machine
learning. That was the beginning for a more interesting problem to solve. That's always a good bet.
One of the things that really fascinated me about you was kind of how you had deep dived and become a
specialist in use cases within, among other things, but within the financial services. It's vertical.
And I thought what we could spend the balance of our time doing is kind of talking through as much time
as we have for a lot of these use cases. You know, it's dealer's choice. So why don't you, why don't we
pick out the first one financial services?
We have many, many machine learning problems and at Google we see hundreds of these problems solve the
top 100 banks are our customers. In order to categorize them into multiple different categories, it
would be too hard to do, but at a high level financial fraud is a very big area of interest.
The next one is risk, how to mitigate risk and how to quantify risks as margins or interest rates.
There are other areas like how to read that sensitive data so that no sensitive data lands in any public place,
especially the cloud. The other area is conversational agents. Google is big on conversational agents.
We have Google home, which is very popular. So we also have a product on the cloud called dial-up Florida AI.
So that's another topic. Then Google has been doing ads and recommendations for 20 years. So we have
recommendation as another solution. Search will be another. And one of the big areas that is of great
interest to me is model explainability, which is very important from consumer standpoint, for regulator
standpoint, from developer standpoint, from management standpoint, So that's another area of great interest
forecasting, user behavior, and normally smart ticketing.
You name it. There's hundreds of applications.
So all of that's really fascinating. So if you were a beginner and trying to understand where to start
in financial services and applying specifically applying machine learning in that industry, like which
one of these use cases would you pick to sort of walk through as an example of how to apply?
It all depends. You start from the business first. Check the business, which are the most important
from the business standpoint, we don't start from the technology and then we convert that to technology
called problems. So for example, from the business side, fraud is more a preventive mechanism.
That is where you prevent loss.
So funny, whereas a risk or how to charge more margin or interest is more of a revenue generating solution.
So, I guess from a very layman's point of view, the revenue generation will be very important from the bottom
line perspective or the top line perspective for any business entity. However compliance is very important
in the financial services because you can incur terrible fines.
So financial fraud and similar and model explanation or regulator based model explanation will be very
important from. Loss prevention standpoint. So those would be the two top categories in mind, things
that are kind of helping you on the revenue, you know, revenue increasing side, and then the things
that are helping you kind of limit your losses and losses on the expense side of the equation.
Let's take fraud, for example. So at a high level, generically, because every bank is going to be set
up differently, but if we just take a generic kind of use case, like what are the. Inputs that are being
fed into kind of a machine learning construct. And then what, like how is that like playing out? So there
are multiple types of fraud.
There's the business banking fraud, which is very transactional based. Then you have credit card fraud,
which is also very transactional based. So this online e-business as well as credit card and the two top
categories of fraud. The other is money laundering and anti money laundering. Of course, that's the other
category, which is very important.
I put fraud and anti-money laundering as two separate categories. And within fraud here, banking upgrade,
cardia, check fraud. Yeah. ATM image, fraud, and so on and so forth. So let's look at the transactional
product, which is banking and the credit card. These are actually very complex problems to solve.
Because you'll have real time transactions from the time it credit card is swapped at the time when the fraud
is detected will have to be done in far less than a second. And there are a lot of bottlenecks in the system.
There could be a transaction servant that is very slow. That would be all. So the whole machine learning
problem becomes a very short latency problem to solve like 25 milliseconds.
So some of these are challenges.
Presumably the ML building blocks that are present today that were not available maybe five or six years ago.
So, so the ML building block, what is great about the cloud-based machine learning solutions is that
we have the entire ML infrastructure. What we call the ML platform.
It is very hard and complex to build them all platforms. I have done that many times in my career and you
only build bits and pieces. In order to build an ML solution, you have to build the entire ecosystem and
a machine learning code. It has been famously published by some Googlers. That machine learning code is
only 10% or 5% of the entire problem.
The other issues that they have to worry about in machine learning is how to ingest the data both real time
and store data. How to pre-process the data, how to extract the features, how to do continuous training.
How do you orchestrate the training? Because data is what we call non-stationary, which means that.
Your consumer base today, and tomorrow will be very different the transactions today and tomorrow.
So it is very time sensitive changes. You cannot build a model and use it forever. You have to constantly
train sometimes hourly daily, and therefore you need an infrastructure which has to do this whole training orchestration.
Then you have to store this models, which is a model repository. It has to properly version. It has to remit attack.
Then you also have to have a proper serving infrastructure so that you can serve the right model. If you're doing
a credit card transaction where it's a see banking purchase or with a mobile or a website might have different models
and therefore you need a proper intelligence serving infrastructure monitoring.
So there are lots of pieces. And it's very difficult to build all these pieces, which already has been done by Google.
Yeah. So there's no need to reinvent the wheel and absolutely make these yourselves.
So it just dawned on me that you said something very interesting when you were talking about just generically,
the example of credit card fraud.
For example, credit cards have been around for decades and decades and decades databases that are storing credit
card transactions. I've also been around, as I say, as old as Oracle, the company. What's an example of either
a hardware or software based building block ML building block that came onto the market, you know, from Google
that allowed one to take this transactional data that has been on kind of these older systems and start to
do recommendations about fraud in the sub-second category that wasn't possible. I don't know, five or 10 years ago.
Yeah. So the three things, the available of large volumes of data from multiple sources. So data was very fragmented
before. So now we have the ability to consolidate data with similar type of sources. So there are basically three
types of data. One is real-time data, which is streaming data, and that can be the transactions that can be click
stream the app and web logs.
So there are lots of streaming data you can mix with that social media data, for example. Because that's like,
I could be saying that I'm on vacation, but I'm posting up there and Twitter that I'm having fun in the Hills.
So there's social media data, which can also reveal a lot about your behavior. So the thing is then there is
the second category of data, which is batch data.
The batch data would be things like in a backing context, like accounts like marketing data, order management.
So those are batch data. Then you also have repository data, which are typically stored in CRM systems.
So enterprise data, warehouses, it is lot easier today. To have all this data in a similar source.
For example, Google cloud offers big query, which is a data warehouse, which is an incredible place to store all
your warehouse data and patch that in one place. And it's simple query to extract that you can pull streaming
data with golf cart sub or any other public sources. There's a lot of technical advances that has happened,
which has made data easy to access.
The second big innovation we have done is algorithms. The benefit there is that before it was very difficult
to take multiple types of data and build algorithms that will actually find insightful features. So people
is to spend very well-trained PhDs is to go in and do feature engineering, which is to look at this huge
volumes of data and find which features are important.
Now the advent of narrow networks has made it very easy to extract those features. So that's the second
innovation that has happened. And the third big innovation is the compute. That is, we have so much of
compute power with the GPU's, with the TPOs and the very powerful processes from Intel and AMD and all
this, the companies that we can now process is huge amounts of data in a short amount of time.
So these are the three pillars that have really propelled machine learning. From when I did my PhD to today,
we used to do machine learning models then, uh, with neural networks, but it was so hard to either build
or to deploy. Now it's so much easier.
Yeah. And I'm guessing back then you were doing sun spark work.
Most of the highest power processor. 550 megahertz processors. So those were immensely underpowered.
But is this interesting? Cause I mean, a lot of these algorithms that are used today have been around
for decades, but they just weren't, you couldn't get the full power out of them. Uh, also, but it's
not entirely true because ESD algorithms like convolutional networks are discovered 20, 30 years ago,
but they were never effectively used for face recognition.
For example. So the use of the data and also some innovations have happened in those algorithms too,
which is really enhance the ability to increase the performance, I guess, was Liz looking following
some of the historical performances that we had. Five years ago, doing a text recognition, a human
being is generally 3% out. The times you make it machines were had like almost 10 or five, five to
10% times. We made a mistake. Today lot of the text recognition solutions that Google produces or
many other companies produce can do 1%. Or less, and they're trying to reduce it down to 0.1.
So from a place where we were far worse than humans, that we are better than humans, it's not
entirely because of compute it's lot of algorithmic innovations have happened. That's pretty exciting
to see that rate of innovation that's going on right now.
Let's talk about some of these building blocks just for my listeners, just to give them more context
because they come up a lot and maybe you could give kind of a layman's explanation of some of these things.
So let's TensorFlow and just purely from a user standpoint,
like how does that building block fit into like say fit into something like fraud?
So flow is a machine learning platform. It is a way by which you can write complex machine learning
code in a simple way. There is a construct on top of TensorFlow, careless, careless is also owned by Google.
So it is basically simplifies the complexity of TensorFlow so that you can write machine learning
code back when I did my PhD in 1996, in order to build a decision tree. We had to take the C 4.4
library of machine learning of decision tree. You had to go hours and hours, figure out what function
to call for each of these steps.
And it was very difficult. It took you days, if not weeks implement today with carers running on who is
sighted learn for example, which is another package you can do decision tree in wildlife. Now, what did
TensorFlow do? TensorFlow is immensely useful to do complex, keep learning to do a narrow network that
is so complex 10, 20 years ago, even five years ago, to build those today with carers in five lines of
code, you can build a very complex, deep learning network.
And you can train your machine learning models and you can run your machine learning models.
It will be a very short snippet of code. Wow. It just allows you to accelerate your progress in.
Absolutely. It's so easy to build and therefore it is so easy to experiment and TensorFlow within
cloud machine learning engine has many, many very exciting components.
What are the competencies I want to particularly speak about is called hyper parameter tuning.
So when you build a complex machine learning model, it has many, many parameters. Like how many
layers of the neural network should you have? How many nodes, what are the different activation
functions, which are different functions in order to change the output?
So you could come up with thousands, if not millions of these different parameters, how do you change
them in this complex parameter space to find the optimum combination is very difficult and that sort
of hyper parameter tuning does. It's a very complex algorithm. Google invented partly open-sourced,
but it's available in the cloud in order to do that.
And in the spirit of making it easy to experiment, maybe this doesn't fit perfectly.
But tell us a little bit about what the promise of auto ML homeless.
So, um, there are three ways of doing machine learning on the Google cloud.
One is you can build machine learning solutions from scratch.
We call it your data, your model, you bring the customer, you bring in your data.
You bring in a data, scientist. Scientists will basically provide you the AI infrastructure
for you to build the models. The second one is auto ML. There you bring in your data.
We provide you the models. These are predefined models, not trained models. So you bring
in your data and you can train it with our models.
So we bring in all our experience in order to create these models for you. The third one is called API.
API is where we have pre-trained models. You bring in your data and the models are already trained and
you can use them as is. So that's the beauty of using machine learning on the cloud, because you can
now use any of these three different combinations in order to build your model from very simple,
to very complex.
That makes a lot of sense. And I think people are pretty excited about the possibilities of that with
what they can now do in a much shorter period of time. But thanks to auto ML. One of the things you
touched on earlier was there are many aspects of deploying a machine learning solution.
And one aspect of it is the sort of data ingestion. Process stage. I mean, it's, it's like table
stakes. You must do this. You what, what else? So what, talk to me a little bit about the challenges
in that and some of the building blocks or things that you've seen that have helped grudge.
So in the financial services industry lock up the confidential and proprietary data resides on premise.
So bringing it to the cloud is always a challenge because there's a lot of personalized data there.
So you need to de identify, did those data redact the sensitive content. We don't want any of that
on the cloud. So that is obviously a challenge. We have an API called the DLP API, which allows
you to basically redacted data.
You can also use third-party solutions in order to reduct sensitive content. That's the first challenge.
The second challenge you have is the multiple sources of data. For example, in financial services,
you might have a fraud repository. You might have some other system where you have stored like a
So you might have 10 different repositories in New York bank. Which is different. So constantly dating
and aggregating. All of that towards the cloud is a lot of different types of ETL work and involved
in getting that data over to the cloud. Even after you get the data to the cloud, it is also very
complex to pre-process the data so that you can create a single source promptly to you can build the fraud.
And just pausing for a sec. Could you talk a little bit about what would be an example of preprocessing
and why that's important?
The processing is very important. For example, if I am doing a transaction today, I typically withdraw a
hundred dollars from the bank and all of a sudden, one day I drop $10,000. So how would you know that
the $10,000 is an anomaly?
You need to know the average transaction for the last 30 days, the average transaction for the last
30 days compared to the current transaction is what we call a feature. And that average transaction
for the last 30 days will be the necessary confidence to extract the feature. So that's what you need
to pre-process the data in order to extract that feature for example.
Again, the, the, the sort of the cloud platform is what allows for a lot of that ingestion and
or hostessing to happen much more easily today than would have been.
Correct and can be stored perpetually. For example, you can offline create the last 30 days,
average transaction on a customer by customer basis, and you can keep it stored in your internal
repository so that you can create the features on the fly.
This is just example of one feature. You can do hundreds of features like this. And all of those have
to be processed. A new customer comes in and all of a sudden, a new feature vectors to be created for
that customer. And all of that becomes a lot easier to do once you've kind of got that base going in
And for that you need an automatic orchestration platform. These are not done manually. These are all
done automatically. As a new data point comes in or a new customer comes in. So there has to be a set
of triggers based on the amount of data type of users coming in type of consumers coming in. So you'll
have to constantly have an internal logic in order to create these things and create an initiative,
the training process.
So on that, on the training process, one of the things you talked about earlier was the need for a model
repository today. What have you seen as being effective in. Monitoring models once they're deployed.
Correct. So we have multiple ways of monitoring the whole model we have what is called tensor board,
which is a great way of visualizing the model training process and the prediction process.
In terms of the internals of the training, the collecting, the analytics we have, what is called
Stackdriver. The Stackdriver basically collects the analytics. Of how the CPU GPU, all that consumption.
Yeah. The other awesome thing. And then, uh, we should mention is TPU besides GPS. That is, I have done
projects in retail.
CNTP use have given three X the advantage in cost and three X, the advantage in processing, we were building
a very large model. A very large complex neural network model. And the data was also very large. So is 3d
volumetric data. Uh, with the dead-end model, we couldn't do much of training. After that, with the GPU.
When we moved to the TPU, we could get a, what we call a batch size that is a batch of 16 into the memory
in order to make it work. So I've seen myself that the TPS have been pretty beneficial. And when you said
the three X advantage, was that over GBS or just traditional CDP over GPS? Yeah. I saw that in one of the
We don't want to compare GPS to peers, but I'm just telling you, we are still a big user of GPS.
We love Nvidia, and we work with them. We want a lot of GPS to be on the cloud, but in this one example,
I've seen that we've got three X advantage in price and three X advantage in speed. That's pretty compelling.
Let's do an overview and then maybe drill down a little bit deeper into the notion of conversational agents.
Maybe kick us off with like, what is a conversational agent and then how, how is that applicable and applied
in the say the financial services use case?
Absolutely. So conversational agents are very helpful in call center management.
So we also have a complete product which is called conversational AI, which is a. conversational
agent management all the way from input to fulfillment, but there is a subset of that, which is
called dialogue flow. Yeah. Dial-up flow is a customized solution that we have built in order to
build chat bot agents.
So this is where. We can do delightful and natural conversational experiences. The key component of
that is that you have agents for different questions that a person could ask. We train those agents
with different types of utterances, because the same question can be asked in many different ways.
And then on the fulfillment side, we can integrate with messaging.
So with Google home and most people type of output, and those are called fulfillment. The other big
thing is that it has to support multiple types of languages. We have at least 30 supported languages.
You also need to have what you call session flow visualization so that you can see how the sessions
And it also helps to have prebuilt age it's like asking your name is very common in a lot of these.
So there are pre pre-built agents. Like we have many pre-built pages, at least 30, maybe more like more.
Again, at a high level, how does the training of the conversational agent process happen and what,
what is it very simple, actually take the multiple types of utterances.
You identify an agent for each different question and you take the utterances and you train the agent
for windows is very fast and very simple. Wow. The complexities and the fulfillment side.
Got it. Let's look at recommendation. Again, what is it at a high level and how could we absorb it?
Financial services, one of the biggest things is portfolio management, wealth management.
These are instances where you want to have a personalized service. So like when you create a automatic
creating platform or an automatic. Portfolio management platform like Wealthfront, then you do need to
have a personalized service.
So given a person's risk appetite, their point in life and income and so on and so forth, you can have
a combination of equities or bonds or the two major types of agents. And then you can create a sort of
a pyramid of what do you need. To invest in or to get the return that you want. So this is an example
of a personalized recommendation.
Google has been doing recommendation for a lot of years and we been sourced many recommendations solutions.
So for example, historically, we have done payments your products like symbol factorization,
machine distribution, distributed Waltrip is search for Google cloud walls. You would find the open source
code for that.
We also have later innovations like widened deep learning, brain-based watch recommendation, deep retrieval.
And then most of the head of curve, uh, recommendation engines that we have now are time Delta sequence modeling,
reinforcement learning. This is where the you are having dealing with non-stationary data, data that is changing
so that you can time sequence reinforcement learning is an active learning method.
So these are all new generation of learning methods. So Google has also been using recommendation in a lot of
our products, like search apps, maps, videos, music, and so on and so forth. And so we have come up with a,
we have two types of solutions. One is the recommendation that AI, which is an auto ML solution.
It's a retail centric product. And it has been very beneficial in terms of increasing the click through rate,
customer adoption, which is also known as CVR revenue generation and so on and so forth. But the other one is
that we have open source product like the distributed wall cell grid, um, make this after ideation that you
can download code, or you can just go to your cloud out, can use it as is.
I could see how that got it' start more in the e-commerce retail world, because people could sit and understand
like people like Amazon and so forth had made that very famous. If, you know, you read these books, you might
read some of these books, but you can, you can apply that same printer.
Yeah. So that is, uh, that concept or recommendation comes from basically, you have a set of users and you
have a set of products and you have a set of ratings.
And then, so you take the user's products and the ratings and you match and create a new product.
There is some very interesting variations of that. There is one that I have seen is where you do editorial
recommendations. So a big publishing house. They have editors as your customer. They want to know how an
editors should recommend a content to the users.
So that's a completely different model there. You don't have the users and products and ratings in between.
Your user is the editor and the products is all your content so that we can slightly different variations.
So the same algorithm cannot be used there, but all I'm saying is that the recommendations can have specific,
You see how that can be very powerful, how the mind blows and the possibilities on that front.
The second one is very interesting because you have creating trends, right? Like in journalism, you could create a new trend.
That's pretty wild. Let's talk about one thing I think is more near and dear to your heart model explainability.
What is that? And how can we use it?
So in the banking sector model explainable is very, very useful in many sectors like credit lending, transaction fraud,
anti money laundering, blah, blah, blah. But a key thing to understand is model explainability is not a monolithic
thing. You have multiple audiences.
You have the developer says the audience, which is how well, most of all, for it to be a model experiment,
how to build better models, better models. So we look at the model to look at this output, what features
went into it so that we can enhance those features and build another model and iterate through that process.
The second audience for model explainability is the consumers. You go to the bank and you ask for a loan
of 250 K and you only get 200 K or you get denied. So the person says, why did my own get denied?
You cannot just say it because the model came up with. So yeah, I can give the reasons, the reasons
your income wasn't sufficient or yet to my clone or whatever.
And you have to come up with that from the model itself. The third audience is regulators and regulators
want to do model explainability so that to see that you're compliant regulations and the forethought in
system management, I'll only address the first three. Each of them have different goals. So engineers
want to produce better models.
Consumers want to produce better analysis or want to see better analysis. So engineers models,
consumers analysis, regulators reports, each of them have different and end result.
Using the same basic principle. So as you look forward into 2020, and you know, you're spending
all this time in financial services and talking about these different things, which of these types
of applications are you thinking? This still early days. And you know, there's going to be significant
uptake in the year ahead, more than others, any jump out at you as kind of being, you know,
just getting more traction for whatever reason. Yes, that's right.
Yeah. So I think that lot of the use cases we talked about will be very, very useful in the financial
services and other services too.
I view risk or margin prediction going to be a huge topic of discussion. Because it's a revenue generator
and therefore, how do you appropriately choose the margin in any type of lending situation? So I think
that's a big topic of discussion. If you can predict how a stock is going to do, you could charge
appropriate amount of margin.
And if you can explain that well to your customer, I think you can make a lot of money. And then again,
even simple things like hedge funds and others, if you could predict which stocks are going to do better,
then that will be a big yeah. And who to put in best and right. Stock prediction is not just an individual
It's also a societal matter. Right? A lot of the time, I mean, who knew Snapchat will be so big, right?
So to understand how the underpinning of the society that led to some of the successes and also very difficult
machine learning problems to solve.
It's interesting. So if you think about the financial services world and all of these different use cases
within them, take an extreme example. Let's say that next year was a super down economy year and the market
takes up 20% or bigger, a bigger crash. I'm just saying.
We have a good year for short trading. If you could predict that accurately, you can still make a lot of money
in doing hard to borrow securities, out, to find which list will go down the most.
Yeah. I mean famously some people, I think, I think they made some Hollywood movies out of people doing exactly
that in the big short and in 2008 for those reasons. So who knows, maybe there's an opportunity there. I feel
like I borrowed you for a long time and I really appreciate it, but this has been an absolute pleasure.
Thank you so very much. Thank you very much.
I appreciate it.