Data storytelling, women in
tech, and ethical AI – a transcript of Episode 11
Bobby:
Welcome to Loka’s podcast, “What fascinates you?”
Conversations with entrepreneurs, engineers,
and visionaries who are driven to bring innovations to
life. I'm Bobby Mukherjee and today's
podcast is about the power of telling stories with your
data and how more diversity in our data
– and our teams – can uncover much greater outcomes.
Here to share her story through video chat is Ahna
Girshick – life-long researcher, creator, and
former manager of computational genomics at
Ancestry.com. Like many of our guests, Ahna’s career
journey is as fascinating as the innovation’s she
pursues.
Her research has spanned the worlds of science and data
– from neuroscience to machine learning and
computational storytelling.
She has received over 3000 citations press from the New
York times and NPR, and has been
featured at the museum of modern art for her work with
Philip glass and Bjork.
If you work with data in any capacity, if you want to
connect better with your customers,
or if you want to see more diversity in the tech world,
I think you'll come away with something
valuable from this interview.
Hello. Welcome to the show.
Ahna:
Hi, thank you. It's nice to be here.
Bobby:
Pleasure to have you here. So, the first thing I wanted
to talk about was when you and I were
talking previously, you had mentioned that your dad was
a scientist and your mom was an artist,
both at Stanford, which made me wonder what were
conversations like at your dinner table growing up?
Ahna:
I really grew up surrounded by both arts and sciences.
From both my parents from my father,
I saw his scientific journey. It was very academic. It
felt very innovative and stimulating.
And then also surrounded by the arts. So my mother was
painting inside the house and in her home studio.
So that felt very creative and beautiful. And my parents
had quite a bit of appreciation for each other's
professions, but at the same time, even as a kid, I
could see how different they were. And so, as a kid,
I love the arts. I love the sciences and it didn't
really dawn on me until probably fairly late in high
school.
And then increasingly in college that most of our
society is kind of set up in this... it's very
specialized,
right? So our educational systems, our professional
tracks where people tend to self-segregate is generally,
you know, it could be scientific, it could be artistic,
or it could be many other things, but.
There's more and more deeper and deeper specialization.
So that's great. I love going deep, but it also creates
some limitations on diversity of ways of thinking and
working, communicating. And so that was a little bit
disappointing. I think for me as a young adult and
confusing,
Do you remember at an early age, if you found yourself
drawn to one over the other,
I think I was probably a little bit more drawn to math
and science. But I'm not sure how natural that was.
It might've been because I internalize this kind of
story from my mother that I, you know, you don't want
to become a starving artist. Like I'm going to get a
good, get a computer science degree or something like
that.
I'm not quite sure, but I was always looking for ways to
connect them. And I, I still have been my whole career.
Bobby:
So one of the things that you had taught me when we were
just talking the last time was you provided this
lens on how to look at data, and you talked about the
power of stories and how. Data can be very dry and
it's not very meaningful in and of itself, and it can
become so much more powerful if you can craft a
powerful story around it.
So before we dive into the specific notion of creating
stories with data, I was just curious at a higher
level. What role does stories play in your life?
Ahna:
Stories are this uniquely human format for communicating
information. So we can trace stories back
like almost 5,000 years. And you know, I'm a mother
And so, quickly, I discovered even for very young
children, storytelling has this huge power. I think
we're wired to be captivated by stories because they
give us this framework for. Interpreting our lives.
And they're also very connecting, which is why they run
strong and families and communities, and get passed
down over centuries.
So I'm not a natural storyteller. I'm not a published
author of literature. Most of us aren't. But I
think when I was at ancestry, During my time as a
research scientist there, , I became really interested
in this idea of data driven stories and saw that my work
there was really kind of like computational
storytelling, which was a way for me to fit.
You know, maybe it's the story I was telling myself, but
it connected into my career quest to connect
AI and technology into the human experience.
Bobby:
Let's dig into that a bit. So you worked at ancestry as
a matter of computational genomics research
for a little under five years. What were your major
roles during that time?
Ahna:
I started as a research scientist and then later on I
was managing a research team and its history
is, uh, about a 40 year old company. Their main focus is
family history and users can create family
trees. And a Monday joined my association with building
a family tree, which is called genealogy.
You know, it was a hobby for a cookie Gregg uncle who I
didn't really want to talk to or something
like that. And I think a lot of people have that
association, but you know, I'm also a data geek.
And then I quickly understood that family trees,
especially at the scale ancestry has, they have
over a hundred million family trees.
They're this very rich data source. And there. Spatial,
because usually in family trees, people
say so-and-so is born in this place and they've been in
this place and died in this place. So they,
and that covers the whole globe. And they're also
temporal because they say. My mother was born
this year. My grandfather was born in this year, going
back, uh, the birth dates, death dates,
marriage dates.
They, you know, all the significant events when the
children were born and et cetera. So you can
think of them as this raw material for that creating a
computational history. And then when you
aggregate them, you can discover historical trends. So,
you know, we have textbook history which
was written. By historians, their version of the story.
And then there's this kind of computational aggregated
family tree story, which is potentially
the same and possibly a little different, right. So, uh,
hold on to that thought because I'm going
to return to it. But another way to look at your past is
genetics. And I was in the ancestry DNA.
Science research labs.
So we are very focused on genetics and, genetics tell
you about your past because all the DNA you have,
you inherited from your parents who inherited it from
their parents all the way back. Right. And, you know,
it's funny because in school, you don't get to combine
or even major in like history and genetics or
combine those data sets.
But that's what we did. So we're combining historical
data from the family trees and the genetics data,
which I think is something like pretty awesome and data
science. So if you find these disparate data
sources and you get to work sort of cross
disciplinarily, so I had the opportunity to do that
research
project.
We called it know, I think of it as data storytelling.
So the way we did this is first we built a social
network. So think of it like Facebook, but instead of
friendships, connecting to individuals, it's determined
based on genetic relatedness, how much DNA you share. So
siblings are going to be directly connected and
very close in that network.
Whereas people from opposite parts of the world are
going to be very far in that network. So think of this
as
a massive genetic social network. And then. You can use
clustering algorithms. You don't really need to know
what those are, but to find clusters of individuals in
that social network. So these clusters represent a group
of people there they're large clusters in the tens of
thousands or hundreds of thousands, but these clusters,
aren't representing a group of people who share DNA more
DNA with a toddler than they do with others.
And generally when you share DNA, especially when we're
going back like eight generations, it means to share a
common history.
You know, it was only in the past few hundred years that
people are traveling across the world to meet their
mate. Right.
Bobby:
Right. So I definitely believe in the power of
storytelling. And then you take something
like data, which untouched can be really dry and about
the furthest thing away from a story.
So if I'm trying to create better stories with data, one
of the things that I just picked up
is first you have to have a character or characters in
your story that people can empathize
with and relate to.
Ahna:
Yeah. I mean, I think connecting to people is a powerful
technique. It's probably not the only
technique, but we empathize with people. We empathize
with other people, especially if they can
relate to them.
If there are. Data products out there that are geared
towards customers. And so there, the people
are there already, and then there's also, , data
journalism and the news where it's looking at
large populations of people. So it's not necessarily
targeted towards individuals, but there's
still that human connection.
Bobby:
Right. So I think that seems to be, at least one of the
key ingredients in trying to make
a more compelling story.
Ahna:
Right. So if you're making a story, even if it's about
climate change or something like that,
and you're, looking at the data on that, how does that
affect people?
Because we care about people. Oh, you could talk about
how it affects animals too, I suppose, right?
Bobby:
No, no. I mean, exactly. But I think again, the key
ingredient is can have you created a character.
That, whether it's a person, animal or whatever form
that people will relate to and empathize with.
That's the key thing.
So switching gears a bit, , something that I would
really love your perspective on is, having had
this tremendous journey in the field of AI, , , being a
practitioner, you know, just your perspective,
is it different for women in the field of AI and machine
learning than for men?
Ahna:
Like, I have anecdotal answers to that question for
myself, but I think what I know is that,
you know, the more diversity we can bring to teams,
building AI systems, The more, algorithms
can reflect that diversity, which I think is a very good
thing. You know, diversity comes in many forms.
It's not just gender, right. It's race, but it's also a
diversity of educational institutions and diversity
of training. Right? So matching computer scientists with
anthropologists or artists or journalists,
Or matching , a Stanford PhD to someone who is
self-educated.
Hm.
Bobby:
So ?
Ahna:
Cognitive diverse.
There's many farms, but I think when we are willing to
create a more diverse team,
sometimes that might feel hard or different, but it ends
up creating different types of solutions.
I had definitely have seen that happen with us and with
other teams and constantly,
constantly looking for opportunities to make that happen
because the outcomes are fantastic.
Bobby:
So here we are, hopefully, hopefully in the tail end of
the pandemic in your mind,
did 2020 change things for data AI and storytelling.
Ahna:
Yeah. I mean, 2010 each changed the world, right? It was
just a moment. Right? I mean,
two things stand out to me. One, we didn't talk about
COVID research, but. Because of COVID.
I noticed that self-reported health data became more
mainstream and more accessible.
So addressing the COVID pandemic while we're all remote,
it kind of forced the healthcare
industry to be more accepting of that self-reported
data.
And also people were motivated to help. So one of the
things I did at ancestry was to help
coordinate a COVID research study. We had. Nearly a
million volunteers sign up to anonymously
participate, you know, to contribute to the scientific
understanding of the genetics and other
risk factors underlying COVID.
And I saw many other efforts like this, where
organizations were collecting data and cell phone
apps that like ping you every day. And you could enter
your zip code and what symptoms you were
feeling, that sort of thing. And that. Healthcare
industry, which has traditionally been rather
low tech, you know, looking for this data because it's
only in mass.
So that was one big change. And I'm hoping that
trajectory will continue because self-reported
data is really valuable, really powerful, and what it
can do , for science and healthcare.
And then the other big issue of course, of 2020 was, you
know, racial justice movements that
sort of Gulf the country.
But I think they also helped drive this kind of more
honest dialogue about, racial disparities
in the workplace machine learning. Last week on PBS, I
saw the coded bias documentary, which
if you haven't seen is, is amazing. And it's about bias
in AI. It's disturbing, but it didn't
tell me anything.
I didn't already know. And, kind of highlights bias and
training data by the lack of diversity
of engineers, designing the algorithms, but also the
business forces dominating that conversation.
And so, my hope is that those AI ethics organizations
that you mentioned, we'll be applying pressure
and working with, the big AI companies.
There's only a few really big ones right now, you know,
it's just to prioritize. Transparency,
for example, that would be a great start, but what's how
the algorithms are working and where
the data is coming from. And so people, you know, this
goes back to data storytelling that people
want to use, that product wants to understand why this
algorithm is.
No, maybe something serious, like telling them they have
cancer or something. Right. Like where did it,
how did it make that decision based on what data and how
was that learned? How do you contextualize it
within a population to understand? Because if you're
making a big decision about your healthcare or
getting a job, you know, you want to understand that
whole context behind it.
Bobby:
Where my optimism comes from for, for 21, I think is
those observations. You said, lay the groundwork
for some momentum in that direction. So you see, you see
better outcomes with things like model
explainability and less confusion about. How algorithms
are making these weird decisions that
we don't believe they shouldn't be.
So I think that is great. Cause for optimism.
Well, I know this has been fantastically useful and
engaging. I've learned a ton. I have many more
things to ask you, but. I really, really appreciate your
time. Thank you so much for being on the show.
Ahna:
Yeah, it was a pleasure.
Bobby:
That was Anna Gershon, researcher, creator, former
manager of computational genomics and
ancestry.com
and barrier breaker in AI.
If you're interested in learning more about Hannah and
her research, you can visit her website@mikedark.org.
And if you enjoyed our show, please like, and rate us
until next time. This is Bobby Muff, Virginia.