The Risks and Rewards of Big Data, Algorithms, and Machine Learning, with danah boyd

Featuring

Stephanie Sy

PBS NewsHour; Former Carnegie Council Host, Ethics Matter

About the Series

This series featured speakers from very different backgrounds, countries, and professions. What they have in common are strongly held moral convictions and a passionate commitment to their work, which makes for memorable conversations.

How do we analyze vast swaths of data and who decides what to collect? For example, big data may help us cure cancer, but the choice of data collected for police work or hiring may have built-in biases, explains danah boyd. "All the technology is trying to do is say, 'What can we find of good qualities in the past and try to amplify them in the future?' It's always trying to amplify the past. So when the past is flawed, it will amplify that."

STEPHANIE SY: Welcome to Ethics Matter. I'm Stephanie Sy.

Our guest today is danah boyd [styled lower case], a principal researcher at Microsoft Research and the founder and president of Data & Society, a research institute focused on understanding the role of data-driven technology in our society, technologies used by companies like Facebook, Google, and soon perhaps your employer, your government, and your police department. She has degrees in computer science as well as a Ph.D. in information, and has written two books on the intersection of society, culture, and Internet technologies. Danah is here to help us understand the limitations and the risks of big data, algorithms, and machine learning, and hopefully, danah, we will be able to define some of these terms as we get into the conversation.

Welcome to the Carnegie Council.

DANAH BOYD: Thanks for having me.

STEPHANIE SY: Are we in the midst of a technological upheaval that is drastically changing society? Is that why you started your research center?

DANAH BOYD: For me, it's not so much that we are in the midst of something that is changing, it's more that there is a moment where we're suddenly paying attention to a set of issues that have actually been going on for a very long time. When you step back, you actually can see patterns over a longer period, but we're in a moment where everybody is paying attention. It's a phenomenon, and people want to understand it. And there's nothing like that moment of phenomenon for people to get obsessed with hype, imagine all of the great things it will do, and also simultaneously be terrified.

So a lot of what I'm interested in with Data & Society is to make certain we can ground it, and sort of step back and say: "Okay, what is real? What do we know? What do we not know? How do we actually get our heads around what shifts are happening? What are the ramifications in light of a whole set of other social dynamics?" and really try to make certain that we're more informed in our approach to a lot of these technologies.

STEPHANIE SY: A phrase that I hear thrown around a lot in the last few years is "big data," and I'm sure that is something your research center looks into. How big has big data gotten, and how did we get here?

DANAH BOYD: I joke that big data often has nothing to do with bigness and rarely has anything to do with data, but it is in many ways the mythology that if we just collect more information about more people that we can do magical things, that we can solve problems that have been historically intractable. That is actually where we get ourselves into trouble. There are a lot of techniques and technologies that are actually doing data analytics across large swaths of data, and some of the most sophisticated have nothing to do with people: astronomy data, for example—pretty amazing; what we're seeing in terms of genetic analysis—unbelievable.

But a lot of what we talk about when we talk about big data is the idea that companies like Facebook have tremendous information about you and your practices and what you're doing, and they're trying to understand patterns. So a lot of what it becomes synonymous with is the idea of prediction, the idea that we could just take this data and predict something about you. The question is: Should we be doing those predictions? Under what terms? What are the outcomes when that information is wrong? What are the biases that are in that data? Who is manipulating that data, and what are the ramifications there?

The other thing about big data is that it has become collectively synonymous with artificial intelligence (AI), which is our other term.

STEPHANIE SY: We are going to get into artificial intelligence, but can you give us a broad definition of what you mean by big data? You brought up some of the ways data is collected, but when we talk about "big data," what are we actually referring to?

DANAH BOYD: From my perspective, big data is actually about a phenomenon, it's not actually about something that is collection of large swaths of data. It's about a set of business practices instead of technologies, and a set of beliefs of what we can do with a huge amount of information about people and their practices.

STEPHANIE SY: Part of that is algorithms and how big data is used by algorithms to make decisions.

DANAH BOYD: That has been a lot of the interesting transition, right? The phenomenon has been what do we do with all the analytics, or the information we have? How do we analyze it? Often we get into conversations then about machine learning. Machine learning is usually the next translation. So at that moment we can take all of this and not just do run-of-the-mill business analytics or statistical processing, but say, "How do we actually analyze this data for prediction?" A lot of machine learning algorithms are to cluster or to predict, to make specific decision-making processes available.

That actually is one of the reasons why you have to connect it to artificial intelligence because big data became almost synonymous with Big Brother, with big surveillance, and so that became a term that has been deprecated by a lot of different communities and been replaced with "artificial intelligence," where we actually meant many of the same things—large amounts of data analytics, the ability to do sophisticated machine learning, and more and more advances in machine learning.

STEPHANIE SY: The way I think most of us typically confront and use these technologies is every time we go on Google or Facebook. What are other examples? In other words, every time we search a term in Google, we're talking about lots of data that Google is then putting through an algorithm and some degree of machine learning—and, from my understanding, some degree of AI—to pop out your results. Every day these technologies are impacting us, and we know that they are on Google and on Facebook and those platforms. What are other ways and other examples of how big data and machine learning and algorithms are impacting our lives today?

DANAH BOYD: Within a core technology universe, think of any time you are given a recommendation: What movies should you watch? What things you should purchase next? What news articles should you read next? Those are all part of this ecosystem.

But, of course, it goes beyond the world of information technologies. It's also starting to shape things like medicine. How do we start to understand your cancer? What do we understand analytically about that cancer as it relates to a whole set of other cancers so that we give you better and more appropriate treatments?

We also see this in environments like criminal justice, which is where it can actually be a much more problematic environment. For example, risk assessment scores, looking at somebody who has been arrested to decide whether or not they should be given the opportunity to get bail or not based on patterns within their data compared to other people's data. You see those types of dynamics.

STEPHANIE SY: Let's start there, with criminal justice. That is an area in which algorithms are being applied that some say is ethically concerning. Let's parse that out a little bit more. What types of risks are there to using machine learning in criminal justice?

DANAH BOYD: The biggest challenge with criminal justice is that the public does not actually agree on the role of criminal justice. Is criminal justice intended to punish somebody? Is it intended to prevent them from committing future crimes? Is it intended to rehabilitate? What is the actual role of that? That's part one, because actually it starts to shape the kinds of data that we collect. We collect different data depending on what we view the role of criminal justice to be.

Next, we have a whole set of biases in how we've actually deployed criminal justice practices, how we've engaged with policing. Our policing structures have long been biased along axes like race, gender, income level, and communities.

STEPHANIE SY: In theory, wouldn't making machines more part of the process make it more neutral, make it less biased?

DANAH BOYD: It depends on what data it's using. The challenge here is what most of those systems are designed to do is to say, "Let me learn based on what I've known in the past to make decisions about what should happen in the future."

STEPHANIE SY: Give me an example of that.

DANAH BOYD: For example, when you're trying to do a predictive-policing algorithm, you're trying to say, "Where have there been criminal activities in the past?" Let me send, then, law enforcement to those sites where there is a higher likelihood of criminal behavior. Where has there been activity in the past? It's the places where police have chosen to spend time; it's the people that they've chosen to arrest.

STEPHANIE SY: And that might be based on their personal biases.

DANAH BOYD: Or a whole set of other values. For example, if you look at drug arrests, we actually know from the drug data in the United States that whites are far more likely to consume and to sell drugs. Yet, when we look at the arrest records of the United States, we overwhelmingly arrest blacks for both consumption and sales of drugs. As a result, when you're trying to predict who is most likely to engage in criminal activity around drugs, your algorithms are going to say, "Oh, well, actually it seems to be mostly something that occurs with African American individuals." That's not true. That's based on flawed data.

That's the problem in criminal justice. Could we design a system to be more responsible? Certainly. But it all depends on the data. The problem with machine-learning algorithms or big data or artificial intelligence is that when the data is flawed we are going to not only pipe that flawed bias through the system, but we're going to amplify it. The result is that we increase the likelihood that we are reproducing more and more biases in our data.

STEPHANIE SY: I know another area in which it's being explored—the use of data, algorithm, machine learning to pop out decisions—is in hiring practices, another example of how the outcomes are only as reliable as the data you put into them. Can you talk about that?

DANAH BOYD: Again, what you're trying to do is say: "I have had previous employees that have been very successful. Let's figure out who those employees are, and let's try to find more people like them." You look at information that you have as a company about what has made somebody successful.

The reality is that we actually bias in our decisions around hiring. We bias for people who are geographically closer to the office, which actually shapes a whole variety of factors that we don't necessarily think about; we bias based on personal perception, and this actually has a huge set of gendered implications; we bias based on people who have never had any departure in the workforce. Again, that really shapes dynamics around parenting or elder care.

When we then look at our data, we're like, "Oh, well, who is the ideal population?" We actually get to a very narrow population, and that's deeply problematic. This is one of the challenges when you see large employers turning over and saying: "I'm going to give you all this data. Please find for me who, amongst all the candidates, I should now hire."

There is some really interesting research to combat this—this is where technology can play the shifting role—and I'm particularly interested in the work by Sorelle Friedler and her colleagues. They were interested in large retail use of these hiring algorithms. What they did is, they said: "Okay, we're going to take all of the data from these retail folks, and we're going to look at equal opportunity categories. We're going to say, 'We're not going to allow for decisions that are biased on the basis of gender or race or national origin, etc.'"

Then they re-normalized the data set, which is basically a way of saying: "If I compare these two candidates and I actually change the weighting so that it's actually not a weighting based on gender, but it might be a weighting based on other performance categories, then I can pull out and learn what are the categories of performance that have nothing to do with these other protected categories." That's a way of actually shaping the system and saying, "Okay, look for categories that are not these protected categories or not proxies for these protected categories."

STEPHANIE SY: So there is this computational fix, but that relies on the computer engineers who are actually programming the algorithm?

DANAH BOYD: It's a very intended intervention to an algorithmic system. It's one of the reasons why it's important to realize that we can't do this if we don't understand the values that are getting embedded within our system, and then think about what values we actually want and redesign our systems according to the values we're aiming for.

STEPHANIE SY: So when it comes to hiring, when it comes to policing, there are already biases that are well-documented that are already embedded in the way humans execute and deploy criminal justice and hiring practices. Again, I go back to the idea that why does having computers involved in the process magnify the potential social problem?

DANAH BOYD: It really comes down to the fact that all the technology is trying to do is say, "What can we find of good qualities in the past and try to amplify them in the future?" It's always trying to amplify the past. So when the past is flawed, it will amplify that.

That is also why you have to look to domains where it is more likely that you can actually start to tease out these issues. You start to look in places where you can find clusters of data that actually are less likely to be biased, or how do you know the biases and re-normalize them? And those are much harder to do in socially fraught environments.

STEPHANIE SY: Are local police departments actually starting to deploy algorithmic machine learning, and can we trust them as a society to program these algorithms in a way that is fair and reflects the law?

DANAH BOYD: Keep in mind, it's not the police office that actually does the building of the system, it's built by third-party companies. They are hired by, or paid for by, the police force.

STEPHANIE SY: Do we trust the third-party companies?

DANAH BOYD: That's a question. Do we trust the values that the police force has structured? And I think that's a very contested feeling, especially in the United States right now.

What would it look like if instead of these systems being purchased by the police system that they were purchased for social services? What would that look like, if instead of intervening and trying to find somebody who is most likely to engage in criminal activity, you were instead trying to find somebody who is most likely suffering and could actually do meaningful outreach? That's a very different shift to the same set of technologies. It's a question of what are our values as a society, what are we really willing to spend time on?

It's also important to realize that we're doing a lot of these technologically mediated interventions for ideals of efficiency. Should criminal justice really be more efficient? Is that the value we should be aiming for? I think it should be more just, I think it should be more fair, I think it should be more responsible to society. I don't believe that efficiency is actually compatible with that.

STEPHANIE SY: It sounds like what you're saying is the efficiencies that might be gained from using algorithm and machine learning may actually be counterproductive and harmful when used in some of these institutions.

DANAH BOYD: I think that's absolutely true.

STEPHANIE SY: I actually read that social welfare programs are also looking at this. It was a child custody situation in Los Angeles where they're piloting a program to test whether machine learning can judge whether parents should be able to keep custody of their children. It's worrying not just because of the things you bring up, but because so many of us have very little understanding of how data and algorithms actually work, and there seems to be very little transparency.

This brings me to what I've heard described as the "black box." Can you talk about what that is?

DANAH BOYD: The idea of the black box is that we don't understand how the system works. We give it input variables, and then something happens, and we get something on the outside, and we're supposed to make decisions based on that. One of the major pushes is from a lot of people saying, "We should have transparency of what the black box does."

I actually tend to resist that, not because I think the black box is not problematic—I actually think the black box is hugely problematic—but I think that even when we have access to the code and have access to the data, even the most sophisticated computer scientists actually debate what's going on. If we rely on transparency as the solution, I think we miss the point.

For me, what we need to be transparent about is the process: What are we actually trying to design a system to optimize for? When are we trying to replace humans, or when are we trying to augment them?

Take, for example, the question of child protective services. What you really want is a system that is actually trying to give whoever the service providers are—whether we're talking about psychologists, whether we're talking about family services—information to be more strategic in how they engage with communities for certain desires that we all agree on: keeping families together in healthy ways, making certain that the child's welfare is taken care of, all of those values.

What would it mean, instead of an algorithm designed to say who should be taken away from their family, for it to say, "How do I find families where if I did an intervention right now, I could probably make certain that their family stays together?" That's just a shift in what the values and goals are; it's not about the technology.

That, to me, is the important thing about the transparency process. We want to be transparent about those values, and then we want to actually figure out, regardless of what that black box does, how it augments human decision-making.

STEPHANIE SY: There is another part of this which—when we talk about self-driving cars, a very similar debate comes up, which is that some people trust machines more, and some people trust humans more when it comes to judgment.

We know that we humans are flawed. But is there a sense that for municipalities or police departments that may be thinking about deploying machine learning, that they can trust machines more, that they can make more neutral moral judgments, and that because they have lots more data and lots more neural connections or whatever the language is, that they can make better decisions than humans can, and is there evidence that they can?

DANAH BOYD: The only decisions that they make are the decisions that are programmed for them to make.

STEPHANIE SY: That humans have programmed them to make.

DANAH BOYD: That humans have programmed them to make. So the question is, what have they programmed, and do we have the data to actually understand that?

Let's compare it to self-driving cars. What self-driving cars are optimized to do is not hit other cars. That's actually a pretty neutral and valuable goal. It still will mess up, but the goal is to do everything possible to not hit other cars, to stay safely on the road, not go over the road, to be able to respond as fast as possible to ice, etc.

We actually have environments where that kind of automation has been playing out for decades. The example that my colleague Madeleine Elish talks about is what it means to deal with autopilot. You get into a plane today, it's not actually flown by a pilot, it is mostly flown by an autopilot. That's actually a very good thing. That's the thing that overwhelmingly reduced crashes in airplanes, because actually those systems, the data, and that they know what to do, is actually more efficient and more effective.

But there's something funny: In the 1970s we ruled in debates around the FAA (Federal Aviation Administration) about whether or not we should have pilots in the plane. The decision was that a pilot should always be in the plane and be in a position to take over from the autopilot. What does that mean in practice? That means that a pilot has not practiced landing a plane or dealing with crises in a very long time, and when things are really wrong with the machine, they are now responsible for taking over.

STEPHANIE SY: Right, and they're expected to know everything that the machine does and be able to use this last minute—

DANAH BOYD: Respond immediately.

STEPHANIE SY: Yes.

DANAH BOYD: That ends badly.

The funny thing is that that human has become what Madeleine refers to as a "liability sponge." They're not there to actually solve the problem, they're there to actually take the liability for the entire system—

STEPHANIE SY: Away from the machine.

DANAH BOYD: —away from the machine. The result she talks about as a "moral crumple zone"—the crumple zone is that part of your car that takes pain on impact—and what does it mean to put the human in that moral crumple zone to really receive the impact?

Here's the question for self-driving cars: We are most likely in the next decade to see major automobile moments, so automated, self-driving cars are going to be better. The question is: Should the human be allowed to take over from the machine, when we know that that is most likely going to cause more problems in places like long haul? So there are going to be some tricky questions.

STEPHANIE SY: Long-haul trucking and things like that.

DANAH BOYD: Long-haul trucking, long-haul driving, where exhaustion is a huge part of what goes on. What makes humans have terrible judgement? Sleep deprivation, not being able to respond as fast to data, being distracted, all of these different environments.

But for me, self-driving cars is a very clear case. We know what the goal is. I don't see any parallel between that and what the goal of policing is because policing isn't about finding every criminal activity, policing is about a strategic deployment of positions of power to get a society to comply with a set of norms, structures, and laws. The result is that we are going to have an uneven distribution of whatever we do in policing, whereas the idea with the self-driving car is actually to make a very even distribution of what we're doing.

STEPHANIE SY: Can you program a machine, or can you get AI to learn morality so that it can make the sort of individual moral decisions that one must make, not only when it comes to child protective services and whether a child should stay with a family, but also in self-driving situations? We've talked about on this stage the moral difference between saving the driver in your car versus running over a group of schoolchildren and whether you can program—are machines at that advanced stage where you can actually program morality consistently?

DANAH BOYD: It depends on what we even mean by that. That's a philosophical question, which is to say what you can program a computer to do is to understand the norms and the formal instantiation of moral values by a society and then replicate the statistical dynamic of that. That is never going to be moral decision-making. That is about understanding the decision-making of a body of people, a body of data, and replicating it.

The question also is: Whose values are we actually asking them to model and replicate?

STEPHANIE SY: I guess the other question is, as companies and governments and police departments use and think about using machine learning, are they realizing that values play into it? Is that part of the discussion when they adopt these technologies?

DANAH BOYD: Values always play into it, but the values that they care about are often things like efficiency. Those are not my values.

STEPHANIE SY: Or making money. I want to bring this back to, I think, the real-world use again, the ways we often encounter these technologies, again through Facebook and through Google. How do companies like Facebook and Google use machine learning and algorithms, for example, to in their case optimize their bottom line? How do they account for values such as democracy and privacy and free speech?

DANAH BOYD: Take something like a search engine. That's probably the easiest example to make sense of. When you put in a search term like "cats," what you might want to get out of it is the Broadway show. What I might want to get out of it is pictures of soft, fuzzy things. Part of it is the system is trying to figure out, it's trying to make a good prediction of what, based on knowing about you, you actually meant by that very vague term. The result is that the data is used to start personalizing the search queries. The result is that you search for "cats," you get the Broadway show because we all know you love Broadway; I, who have clearly watched way too many cat videos, I'm getting lots of fuzzy animals, and that feels all nice and fine.

But what happens when I decide to figure out about, say, a political candidate? I want to search for the current mayoral candidates in my home city. What is the information that I should receive? I have a clear history of watching a particular segment of news. Let's say I regularly watch Fox News. Should I receive the Fox News link to the information about that candidate as the first thing? Or should I receive, for example, a New York Times response to it? The challenge with those is those are two different social views on a political candidate.

What Google is trying to do for its bottom line is to try to give you the information it believes you want the most. That is because it makes certain that you come back and are a return customer. It fulfills your goals, so you are more likely to spend time in its services and therefore click on its advertisements, etc.

STEPHANIE SY: This goes into that whole idea of confirmation bias, that what people want in general is for their views to be confirmed.

DANAH BOYD: What they want is to feel like they have control over the information that they're receiving. So the result is that combination of their perception, that they have control with their perception, that they're getting what they want, is what makes them commit to that particular technology.

This is the funny thing. People actually want to be given the information that confirms their worldview. They want the things that actually make them feel comfortable. It's hard work to deal with things that are contradictory; it's hard work to tease out information. People generally want things that are much more simple.

STEPHANIE SY: That's always been the case. There have always been, for example, traditional news media and newspapers that have tended to lean one way politically or the other, and somebody who is more left-leaning buys that newspaper. How has the Internet magnified that, if at all, or is the Internet just a reflection of what we've been doing all along as a society, which is getting the information that confirms our views?

DANAH BOYD: I'm a big believer that the Internet mirrors and magnifies the good, bad, and ugly. What that means is that it is the magnification that we're really concerned about.

For example, you've always gotten opinions and perspectives on political issues or social issues from your friends and peers. You turn to a site like Facebook, and you have all of your friends and peers. You want to see all of the content, you want to see their links, you want to see their baby pictures, you want to see the information that they give you, and Facebook wants to serve that up to you because you're really happy.

You have a crazy uncle. Your crazy uncle's views on the world are totally contradictory to yours, and you stomach your way through a Thanksgiving dinner every year, just like bracing yourself for it. You know that you have to at least pay attention politely so that your mother is not upset with you.

But what happens on Facebook is that you can mute that crazy uncle. Not only can you forcibly mute that crazy uncle, but more likely than not, you're not going to click on the stuff that the crazy uncle has posted anyway, so a lot of the algorithm is going to sit there and say, "It doesn't look like you want to hear the stuff from your crazy uncle." And in reality, you don't.

STEPHANIE SY: The other thing is, the crazy uncle may have a bunch of other crazies following him, and his message is then magnified over what might not be crazy.

DANAH BOYD: But again, it all depends on which place. In your worldview, you most likely have done everything possible to make him and his network disappear.

STEPHANIE SY: Even if the crazy uncle is full of facts.

DANAH BOYD: Even if the crazy uncle is full of facts.

STEPHANIE SY: What impact do you see this having on democracy, on open societies where all of these platforms—the last statistic I read is more than 60 percent of people now get their news through social media?

DANAH BOYD: We have the technologies today to self-segregate more than ever before. That means that we believe we're making informed choices, and our choices are allowing us to separate from people who view the world differently. That happens at every level.

That is certainly the case with a site like Facebook, but it is also the politics of something like school choice. You think that you have tons of data and information to make informed choices as a parent, and you will self-segregate. You will pull yourself away from people who are different than you. You believe yourself to be a part of a diverse environment, but yet the data consistently shows that the public as a whole self-segregates en masse. Nowhere is that more clear than New York City, which has a longstanding history of school choice that has actually enabled a much more segregated schooling environment than anywhere else in the country.

STEPHANIE SY: All of these technologies and platforms were supposed to make us more connected.

DANAH BOYD: That was the dream, right?

STEPHANIE SY: Are you feeling a sense of panic or urgency about the need for these social media companies and these technology companies to really confront these issues head-on—and are they?

DANAH BOYD: I think more I'm feeling a level of sadness about—those technologies were idyllic. We dreamed about what would happen because we imagine the best in people. People have used those technologies not just to do the really beautiful things, but to do some really devastating things as well.

It's that challenge that we're struggling with today. What does it mean, even at that light level of using it to self-segregate? This happens all over the place. There is workplace data on this. Mahzarin Banaji has beautiful work on how not only do we—so we know that more diverse populations in a workforce are more successful, but they perceive themselves to be less successful, and as a result all sorts of cultures of performance reviews result in more diverse teams actually having lower ratings, and they perceive themselves to be less happy. So when we have cultures where we value happiness in the workforce or perception of what it means to work together, we actually see the rise of homogeneous populations.

What's challenging for me is that as a society we're so obsessed with us individually, our choices, our opportunities, what will let us feel good, that we're not able to think holistically about what is healthy for society. That is a challenge at every level. We live in an individualistic society, and even though we can use technology to connect with people, we use it to magnify our relationships with people that we like, that we're interested in, who share our values.

STEPHANIE SY: There is the magnification part, and I also want to talk about the manipulation part.

In this past election, American intelligence believes that there was intervention by a foreign power, specifically by Russia. There is a sense that there was a manipulation of social media and other search platforms. The stakes are high in all the ways you describe them, but even to the point that on a geopolitical scale that's how high the stakes are. Was that a wake-up call?

DANAH BOYD: I think it's become a wake-up call for many. I think it has a long history.

Let's just take the core data architecture that is built into our Constitution—the census. The census is what allows us every 10 years to count the population in the United States and then to make decisions how we reapportion the population and how we distribute a lot of resources.

Since 1790 when we started actually doing this, people have manipulated that data source. They've manipulated it for all sorts of political gains, they've manipulated the outputs of it for all sorts of gerrymandering, they've tried to mess with it. Voting records? No different. We have a long history of trying to mess with voter registration.

That manipulation is not by external actors, there is also manipulation within our own state. Nowhere is that clearer than the history of Jim Crow and what we have done around a huge amount of racism in the United States.

Here we are walking into a 2016 election with a long history of every data type being messed with for economic gain, for political ideology, for fun and games, for foreign adversarial attacks. Of course people tried to mess with this election, they always have. The question is, what was different about this one, and how did it play out?

STEPHANIE SY: Okay. What was different?

DANAH BOYD: For me, what I saw again was that we started to see technologies be part of the equation, and they started being part of the equation on multiple fronts. On one hand, there was the moment of using technology to manipulate the media, and that perhaps is the one that is often most challenging.

STEPHANIE SY: How was the media manipulated?

DANAH BOYD: Any journalist knows that you get phone calls trying to get you to sell their product effectively or to tell their story or their opinion from the White House or whatever variation of it. Journalists have long dealt with very powerful actors trying to manipulate them directly. What they are less familiar with is a world of things that look organic designed to manipulate them.

Let's talk about some concrete examples. When you have decentralized populations who are running campaigns to get content onto Twitter to make it look natural, to produce sock puppets—basically fake accounts on Twitter—to then write out to you as a journalist and be like, "Hey, you know, what's going on with this Pizzagate thing?" And all of a sudden, you as a journalist are like, "What is happening? Somebody in the public has given me a tip. I need to pay attention."

Except it wasn't just somebody in the public; it's somebody who is intending to get a message to you very much designed to send you down a particular track. That's when we started to see massive coordinated efforts. These efforts had been happening for social media marketing for the better part of a decade, but we started to see it really turn political.

The interesting thing is the political coordination of it—at least that I got to witness—was, first, not foreign actors, it was people who were messing with systems. I watched this pattern with young people for the better part of 10 years.

STEPHANIE SY: So it was trolls and people who were just having fun?

DANAH BOYD: It started out that way. Years ago there was nothing funnier than to get Oprah Winfrey to say inappropriate things on TV. It was great. I watched teenagers build these skills in order to get Oprah to say something ludicrous. And they learned how to do this. That's a skill that is interesting when you start to think of how it can be mobilized.

Then we had a situation about four or five years ago where we had a lot of very misogynistic practices happening through technology. New techniques, things like "doxing," the idea of finding somebody's full information so that you can actually cause them harm. An example of causing them harm would be something like "swatting," which is the idea that I would call 911 and say that there's a bomb in your house. The result is that the police would send out a SWAT (Special Weapons and Tactics) team —swatting—to your house, cordon it off, looking for the bomb. But it was a hoax, it was not actually real.

These were things that were done to start attacking a group of women in a whole set of phenomenon known as "Gamergate." These were moments when these same networks started to take a more problematic turn. They started actually doing things that did a lot more harm to people.

These are the cornerstones of a lot of groups who began then trying to mess with journalists for the election. In the beginning, it was pure spectacle. It was hysterical to watch during the Republican primaries this weird candidate, who for all intents and purposes was a reality TV show star, be such a fun game to mess with because you get the journalists to obsess over him. And the answer is, yes, you can get journalists to obsess over something.

STEPHANIE SY: Do you think the obsession with Donald Trump started with these people on the Internet that were just trying to get journalists to take this candidate seriously?

DANAH BOYD: Oh, no. I think that they scaled it. I think these were dynamics that we had seen for a whole variety of reasons.

STEPHANIE SY: They scaled it, and then a foreign government hired them?

DANAH BOYD: No. Much of what we've seen in these crazy environments has not actually been coordinated at that level.

STEPHANIE SY: So it happened organically?

DANAH BOYD: There are versions of it that did. I think what we don't know at this point is what are some of the other formal versions that happened. We've seen the rise of white supremacists and white nationalists who started coordinating with these groups last summer.

STEPHANIE SY: It feels scary to hear you talk about this because it feels like we have surrendered our control entirely to these anonymous people who have figured out how to utilize these technologies to manipulate societies, governments, democracy, voters, journalists, every aspect of society that we could talk about that is dependent now on social media and online technologies.

DANAH BOYD: But that's been true of every technology throughout history.

STEPHANIE SY: Has it?

DANAH BOYD: Yes. That was the story of film. Look at the story of film and propaganda and the anxieties in the 1930s that we had because we thought it was a fascist media.

We've had these turns and we've had these moments where we had wake-up calls. What we're in the middle of right now is a serious wake-up call. And the question is what we're going to build in response to it.

Also, are we going to be able to address some of the root problems that are actually made visible during these moments, root problems of serious racism? That is not new in this country, but for so many people the Obama years meant, "Oh, we're past that." It's like, no. We're not even close to being past that.

Or these moments where we actually have to deal with destabilized identities. We have a large number of people in this country—especially young people—who don't feel secure in who they are or where they're going. They are so ripe for radicalization, and that is extremely scary. We, again, have seen this throughout history.

How do we get ahead of that and say: "Whoa. It's not just about who is engaged currently in horrible, nefarious, racist acts, but also who has the potential to be, where we have a moment we can actually turn them." I think that's where we should be starting to be responsible about our actions. When we think about the morality of these technologies, it's not just about thinking about the technologies, but the people as they're interfacing with them.

STEPHANIE SY: I agree that we can point to different technologies throughout time, even dating back to the printing press, as being sort of periods of, I think you've called it "moral panic" in your writings. But that brings me to artificial intelligence and the new dimension and the new risks and worries that we're hearing about with AI.

First of all, give me your sixth-grader definition of AI, and then let's talk about how that maybe changes the game a little bit.

DANAH BOYD: I think that what AI currently means is not the technical definition. It's actually about a set of business and social processes where we're going to take large quantities of information, and we're going to use it to train a decision-making algorithm to then produce results that we then go and use in different ways.

STEPHANIE SY: Okay. And eventually that machine will be trained to in some ways think on its own, make decisions based on huge amounts of data, machine learning. AI is sort of that next level.

DANAH BOYD: It's not "think on their own" in the way that we as humans think about thinking. It's about going beyond procedural decision-making to basically training an algorithm to design better algorithms for the broader system. But the values are still the whole way through.

The idea that the machines will suddenly wake up and start thinking—that is not the case. It's more that they will no longer just do exactly what they're told, they'll be designed to iterate themselves.

STEPHANIE SY: But doesn't that definition surrender part of a human being's ability to control that machine?

DANAH BOYD: Part of why we have always designed machines is to scale up our capacities. I can count pretty high, but a machine is going to be able to count a lot higher and a lot faster. I can certainly divide perfectly fine, but a machine is going to be able to divide a lot faster. Those are those moments of scale.

What you want is for technologies to be designed in ways that actually allow us to do things for which we simply don't have the capacity. Think about something in a medical realm—detection of cancer. We have to use tools to detect cancer. We have used a ton of tools throughout the history of medicine. We have the ability to use more and more sophisticated tools based on data, artificial intelligence systems, to be able to detect cancer faster, and over the next 10 years we're going to see phenomenal advancements with this. Those are the moments where I get very excited because that is leveling up a capacity that we don't have.

It's also about pattern matching in other contexts. I'll give you one. I'm on the board of Crisis Text Line, which is this amazing service where we counsel young people—and adults, but primarily young people—through text messaging with trained counselors. We use a lot of technologies to augment the capabilities of those counselors. A counselor may have had a hundred sessions, they have experienced those sessions, and they use their past knowledge to decide then how to interact with whoever is coming in their text message stream.

But what does it mean to use technology for that counselor to learn from the best practices of thousands of counselors and for the technology to sit in a relationship to her and say: "Guess what? I've seen this pattern before. Maybe you want to ask if this is a disordered eating issue." And that actually is that augmentation.

STEPHANIE SY: That's terrific, and obviously there are a lot of positive uses of AI.

Let's talk about in the context of our previous conversations, again, that idea that every time there is a new technology, society must reckon with how it expresses its values, and whether you feel like artificial intelligence presents yet another challenge to what we've already been talking about here in the deployment of algorithms and machine learning.

DANAH BOYD: I think it presents a challenge for the same reasons that I'm talking about it in these positive dimensions. Just because it can scale positive doesn't mean it will only scale positive. It will also scale negative. How do we grapple with that?

Again, I like history because it's a grounding force. Watching the history of scientists around nuclear energy is a really good reminder of this. They saw the potential for a much more environmentally friendly ability to achieve energy at a significant level. Of course, we also know where that technology got deployed in much more horrifying ways.

STEPHANIE SY: But we also assign our national governance structures, so that came in. I love this example because it leads me to the discussion we should have about solutions and what your research center is exploring as far as—when you talk about nuclear, you talk about the Non-Proliferation Treaty and all of the structures that came around it that then reflected humanity's values and its way to contain this technology.

DANAH BOYD: That's the moment we're in. We're seeing these technologies rolling out, we're seeing them scale both good and ugly, and now we have to actually grapple with how we deal with it. It's not going to be about removing, for example, the root, the uraniums, because that doesn't make sense for the AI system. It's going to be a lot of understanding how to assess the data that it's using, to assess the use cases of where it's acceptable or not acceptable, to see how warfare is going to be playing out using these systems.

We are going to be running into more and more battles as nation-states as well as corporations fight each other over the quality of data. We're going to run into challenges, for example, about accessing energy grids that use artificial intelligence to power the city. We're going to run into those problems. A lot of what we have to grapple with is how do we secure systems, how do we think about not just the vulnerabilities of access but the vulnerabilities of manipulation of data, how do we think about what environments should they be used in or what environments should they not be used in. Those are the hard problems that we're facing.

STEPHANIE SY: They are really hard. And part of that goes back to the issue that who controls the algorithm and who controls the data in a lot of cases is the one that controls how these decisions are made.

DANAH BOYD: Twenty years from now, I don't think that will be so simple.

STEPHANIE SY: Are you optimistic that there can be a regime that can grapple with these issues and hold the different players to account in ways that we saw with nuclear technology? Are you optimistic about that?

DANAH BOYD: Yes and no, and I say that because I think we've done a decent enough job on nuclear. We're still contending with it massively. We've done a lousy job on climate. We have the data on that, and we can't get our political processes together to actually do anything about it. So I can see both possibilities.

I think there are a lot of really good interventions or efforts being made. I think there are a lot of attempts to build out tools to understand what's going on with the system. The question for me is, it's not actually about the technology or about the systems; it's about us as agented actors in the society, and what will it take for us to mobilize to do the hard political work?

It's not clear to me. I can see us getting there, but I would have thought we would be a lot further on climate today than we are. That's the challenge.

STEPHANIE SY: Just one personal question before I let you go.

Given all of your research and everything you know about how our data is used—I just want to bring this down to the level where all of our data is online now—how do you manage your relationship with technology?

DANAH BOYD: A lot of what I do personally is thinking about why I'm using something and what I'm using it for and what are the decisions I can make to adjust. I'm a geek. I still run my own email server. I don't use anybody else's services.

STEPHANIE SY: What about your news and information? Because the way you make it sound, I feel like I can't trust anything that pops up on my feed or on my Google search without thinking, They think they know me.

DANAH BOYD: Part of it is the fun that I have in resisting it. But I have the luxury of being able to do that. I'm that person who reads news from across the spectrum and tries to triangulate. But I recognize that that's a privilege. That's a privilege of time, that's a privilege of resources. That's where I struggle, because I actually think that I'm in a much better place—I know that there are still limitations to that, but I am in a much better place.

My challenge is how to make certain that other people don't necessarily need to have that much time to be able to invest it in these activities.

STEPHANIE SY: Part of it is just the education that is coming out of this interview, that people maybe don't understand how their news feed populates, and by virtue of understanding it maybe it will also cause them to consume news differently.

DANAH BOYD: What do you do to purposefully decide to get information that contradicts your views? Can you stomach that? Because this is the interesting challenge: Most people, when they consume news that contradicts their views, double down on their own worldview. The hardest thing to do is to use information to be empathetic to somebody else, to truly understand. If we actually want to solve a lot of these challenges, we need to actually introduce empathy into this whole process.

STEPHANIE SY: Do you feel that democracy is at stake with the way these platforms are shaping how news is distributed and consumed?

DANAH BOYD: I think democracy is at stake right now, but I don't think it is simply about these technologies. I actually am far more concerned about how the existential crisis of what journalism has been going through for the last 10 years has destabilized journalism and made it really challenging for all sorts of incentives. News media has been financialized. That's causing massive challenges at every level. And you know this; it's local news challenges, etc. That content then feeds through social media and gets magnified. That makes it all the more worse.

But I think the biggest challenge to democracy—which again, we've seen throughout history—is polarization. There is no doubt that mainstream news media and social media are magnifying that polarization. But it's not just them, it's our political processes, it's our instability economically, it's our financialization.

Part of it for me is to not see any technology as the single silver bullet. If I make Facebook go away tomorrow, we won't have solved any of these problems. These problems are much more endemic. Facebook is one way to see it, but it is only one piece of that puzzle. What it does is it scales the other pieces, and that scaling is where we're really forced to contend with it.

I would argue that is our political state right now. Like I said, racism isn't new in this country, but, boy, has it scaled in the last six months in terms of its visibility and the pain that it is causing so many people. The question is, do we have the political and social will to respond to that or not?

STEPHANIE SY: Danah boyd, thank you so much. Fascinating insights.

DANAH BOYD: Thank you.

The Risks and Rewards of Big Data, Algorithms, and Machine Learning, with danah boyd

Featuring

danah boyd

Stephanie Sy

About the Series

You may also like

Re-examining our Capacity for Just Peace

Empowering Ethics in 2025

A Conversation with Carnegie Ethics Fellow Nadav Avihay

Contact

The Risks and Rewards of Big Data, Algorithms, and Machine Learning, with danah boyd

Featuring

danah boyd

Stephanie Sy

About the Series

Related Links

Share

Stay updated on news, events, and more

You may also like

Re-examining our Capacity for Just Peace

Empowering Ethics in 2025

A Conversation with Carnegie Ethics Fellow Nadav Avihay

Ethics Empowered

Sign up for news & events

Contact

Non traduit