What just happened: ‘Big data’ got personal
What the Cambridge Analytica story and last week’s congressional hearings with Facebook’s CEO are really all about is that people – not even “just” social media users, voters and policymakers – are waking up to the meaning of “big data.”
It’s a big story not only because Facebook has more than 2.2 billion users or because Cambridge Analytica may have helped Donald Trump become president, as mind-bending as both the data point and the possibility are. It’s big because it’s personal. People weren’t going to understand the implications of big data until Facebook was in the story. Although there are uncountable retailers, banks, publishers, campaigns, governments and bad actors benefiting from big data, Facebook brought it home to us because the data there is so visibly us – our own and significant others’ everyday likes and lives, in all our own words, photos and videos, posted by us.
Not that “big data” hasn’t been a lot more than all that for a long time. But with the personal part added to the results of two game-changing votes in the UK and US and the confusing mix of political news, information, misinformation, disinformation and advertising on Facebook that appeared to affect those votes, you get what may well turn out to be a story we’ll tell our grandchildren as well as children.
So here are some talking points for family and classroom conversations about this pivotal moment:
First, what is “big data”? Well, the dictionary definition is: “extremely large data sets that may be analyzed computationally [like with machine learning] to reveal patterns, trends, and associations, especially relating to human behavior and interactions.” Data is just information that comes in all kinds of forms: text, numbers, photos, videos, etc. Even though not all of it needs to stay private, what we’re finding out is, it’s hard to tell how much people and companies can tell about us when the kind of data that’s fine to make public gets blended with other data that’s stored or private. That unknown concerns us, which is why we’re hearing more and more calls for “transparency.” So you can tell from the definition that “big data” is about a whole lot more than lots of information; it’s more about what can be discovered from the data than the data itself. That can be all kinds of things, good and bad, from banks being able to find patterns of fraud to governments stopping infectious diseases from spreading to companies like Cambridge Analytica using people’s information to create and place ads aimed at getting people to vote a certain way.
So is social media big data? It’s only part of it. It’s just the very visible part that regular people like us contribute to. When we post comments, photos and videos, “like” others’ content, click on ads, buy things online, visit other sites, etc. we’re adding all kinds of information (called “psychographic data,” which I’ll explain in a minute) to the databases at social media companies and sometimes elsewhere, whether unethically, criminally or just mistakenly, as happened with Cambridge Analytica, which bought some 87 million people’s data from someone who Facebook says violated its policy. Facebook doesn’t sell data to other companies, it says; the way it makes money is from advertisers who, based on our detailed data in its ad placement system, place their ads on the pages of users who will really like the ads (and maybe buy the thing being advertised). Does that make sense? All that detailed information we share – and the technology I’ll tell you about in a minute – makes it possible for advertising to be more relevant, or more “highly targeted,” than ever before in the history of advertising, which makes it more valuable than ever to advertisers (because more likely to lead to a purchase). Some companies, called data brokers, do sell your data so that the buyers will have even more data on us to help them get even better at placing ads that will make us want to buy stuff.
What else makes up big data? Just about every kind of information we share anywhere – by playing online games, filling out online forms, taking online quizzes, setting up accounts in apps, banking online, shopping online, sending emails, taking out car loans, sharing health information, searching for information, and so on. Sometimes some of that information is in separate databases or data centers, and sometimes big chunks of it get mixed together and sold or hacked into by criminals who want to steal and sell our information. That’s why we hear about “data breaches” in the news, for example at credit bureaus that store all kinds of valuable information about us.
What technology made that possible? A number of tech developments, of course, starting with the Internet and digital technology enabling so much of the world’s information to move off of paper and onto digital devices and then connecting so many of those devices. But what helped “big data” take off from that foundation was a small set of tech developments about 10 years ago: 1) the ability to store almost unlimited information or data on a huge number of computers, connect them all together and search all that data like it was on a single computer, 2) machine learning, which started earlier but really took off when fueled with all that data so that it could detect patterns and “discover” things that couldn’t be “seen” before, and 3) the ability to do all that with all kinds of data, the old demographic kind that advertisers had used for a long time and a new, more random or unstructured kind called “psychographic” data.
So about psychographic data: Up until around the time that social media started to take off, also in the middle of the last decade, advertisers, political campaigns and others were mostly targeting us with the demographic data I mentioned above – information like age, gender, single/married, household income, geographic location, memberships, etc. Psychographic data is more random: for example, whether a person collects things, worries about their appearance, feels family’s important, likes fishing, works out, attends worship services, buys self-help books, etc. It’s the kind we post in social media. According to news reports (including this one at the New Yorker), it’s what a Cambridge University researcher collected through a quiz he created as an app on Facebook then sold to Cambridge Analytica.
Is that why everybody’s so worried about Cambridge Analytica? Well, there’s more to that part of the story. C.A. is based in the UK, and the British government is investigating what it did with voter data and whether it violated British law. Then Facebook will conduct its own investigation, its CEO Mark Zuckerberg said in the congressional hearings last week. But beyond that, C.A.’s parent company, SCL, has been called a “military contractor” by a US professor named David Carroll, who is suing its Cambridge Analytica subsidiary “to disclose how it came up with the psychographic targeting profile it had on him,” according to Columbia Journalism Review. Carroll says SCL has worked or is working with political campaigns in countries all over the world, using the same technique of blending demographic and psychographic data to see if it can influence election outcomes to benefit its clients.
What do we do about all this? That’s not clear yet. Some companies, such as Kik Messenger’s up in Toronto, are working on new business models (see TechCrunch and Coinbase), because big data is making the old free-content-paid-for-by-advertising of the network TV era feel threatening. Some people are thinking there needs to be regulation. But of what? If of social media companies, as I wrote in my last post, before that happens, we all and especially policymakers need to understand that companies like Facebook, Google and Twitter are now social institutions that need to be accountable to more than just shareholders. They’re not just tech companies, media companies, or even some blend of those (Claire Wardle, a scholar I cited here, called them “a hybrid form of communication”). Interestingly, even Mark Zuckerberg said to lawmakers last week that, though the details are important, he’s not against regulation. But, given the pace of technological change, any new laws will at least need expiration dates. And, as I hope was pretty clear above, this isn’t just about social media or elections or “fake news,” so regulation can’t only focus on those. This is about “big data,” which is about more than our data privacy even. It’s about how we maintain the safety and integrity of our identities, institutions and other things that matter to us in this ever more connected world.
It’s a puzzle, but we’ve got this. We will figure this out. We do need lots of perspectives and skill sets in the conversation – all the stakeholders, including tech and social media companies, which are doing some waking up of their own. New business models are entering the scene. Old-school adversarial and exclusionary approaches will only slow the process down. So will messages that claim technology users, including children, are just technology’s victims. We need to think critically not only about how technology is affecting us but also about claims that it’s hijacking our brains. As attorney Mike Godwin, who was the Electronic Frontier Foundation’s first staff counsel, put it in a recent discussion on Facebook, “My big question for those who believe Facebook has overcome the free will of 2 billion people: How did all of you escape?”
And if you’re talking with young people about all this, please don’t forget that Facebook, Twitter and other social media are also organizing and mobilizing tools as well as platforms for these young activists and many others around the world.
- Two columns by Steve Lohr in the New York Times on the origins of the term “big data”: in early 2013 and, in 2012, on “How Big Data Became So Big.” Some insightful quotes in his columns include: “Big Data is a tagline for a process that has the potential to transform everything” (Cornell University computer scientist Jon Kleinberg); “The keepers of big data say they do it for the consumer’s benefit. But data have a way of being used for purposes other than originally intended” (author Erik Larson in 1989); and “What you’re seeing is a marriage of structured databases and novel, less structured materials…. It can be a powerful tool to see far more” (Fred Shapiro, editor of the Yale Book of Quotations).
- “‘It just felt right’: David Carroll on suing Cambridge Analytica” in the Columbia Journalism Review
- “Facebook is not the problem. Lax privacy laws are.” – from the New York Times editorial board
- Illustrating how psychographic data gets mixed in: The New York Times’s Keith Collins and Larry Buchanan on how ad targeting has changed, with more and more psychographic data mixed in to the demographic kind
- “To Facebook — and Its Critics: Please Don’t Screw Up Our Internet,” by author and journalism professor Jeff Jarvis
- From the History Repeats Itself (in a Way) Dept.: “Why Mark Zuckerberg should read A Tree Grows in Brooklyn,” by Amy Davidson Sorkin at The New Yorker
- About the role of social media in society and thinking about what’s next, a blog post here at NetFamilyNews.org
- About another historic hearing in Washington, DC, this year – takeaways here at NFN from the formal one held by British MPs at George Washington University in February
[Disclosure: Facebook, Google, Snapchat and other companies have provided some funding to my nonprofit work over they years, including iCanHelpline.org, but I have been writing about youth and connected media since 1997, long before I began advising the industry.]