Dear Colleagues,
When I was young, the go-to source for important information was a reference book, like the Encyclopedia Brittanica. It kept its secrets about who I was, what I read, which sections got attention, and which didn’t. Those who watch over reference books, the librarians, are the custodians of human knowledge embedded in the materials in their care. They have long been admonished to maintain an ethic of “facilitating, not monitoring, access to information”. Indeed, in statements reflecting myriad court cases throughout the years, the right to privacy with regard to your utilization of library resources was affirmed again and again. Isn’t it a bit alarming, therefore, that without much fanfare, the protection of our right to privately access information has been completely upended. I take a look at the issues in this month’s newsletter.
We’re collectively ignoring the wisdom of the Library Bill of Rights
Today, our most personal information has value – by some estimates around $240/year for advertising value alone. I find it concerning that many people are completely unaware of how their personal information is used, or how much of it lands in the files of third parties.
Consumers do, however, seem to have a sneaking feeling that all is not well in this area. In a 2016 report of American’s attitudes about data privacy, the TRUSTe/National Cyber Security Alliance (NCSA) Consumer Privacy Index found that more people were worried about privacy than about losing their primary source of income. And yet, only a fraction of the respondents to the survey understood how their data is being captured, aggregated and compiled into individual dossiers.
It has been in the interest of these data collectors to keep people in the dark, by legally covering their behavior with terms of service that run into the thousands of words. Today, data brokers can cavalierly buy and sell your most personal information (do you smoke? Are you gay? Do you like the 50 Shades of Grey series?), without oversight, and most of us don’t even know it. With the sight of Mark Zuckerberg being hauled before Congress, the introduction of the General Data Protection Regulation (GDPR) in Europe, and ever-more examples of how personal data in the wrong hands producing horrible outcomes, are we on the brink of an inflection point in how use of personal information is governed? Let’s consider the evidence.
What has changed in the buying and selling of information.
Part 1: The ease of combining databases.
The cost of cross-matching multiple databases has plummeted, creating new possibilities to target and link formerly separate bits of information. In the pre-digital era, information about where you live, how you drive, what diseases you had, what run-ins with the government you may have had, what you like to spend money on and your political leanings were, to some extent, available to third parties with an interest in finding out. It was effortful and expensive, and the depth of the information was often not so great. Today, data brokers can match multiple databases against each other cheaply and at lightning speed. Further, as manual databases are digitized, the cost of accessing these records drops dramatically.
Tim Sparapani, an expert on the data broker industry, says that most of us would be stunned to find the amount of personally-identifiable information on many of our most intimate behaviors that is freely available to anyone with the budget to buy it. And, you don’t have to get to a Facebook scale to benefit from people volunteering to give their personal information to you. A 60 Minutes investigation found that data brokers are perfectly happy to set up websites such as GoodParentingToday.com, whose “community” offer specific details about their lives (Are you expecting? Do you have adequate life insurance? Do you have pets?). In their privacy policy (which is pretty understandable to the ordinary person, as these texts go), they state:
We may share your personal information with other companies in order to provide users with selected retail and other types of offers. In addition to marketing and advertising, these companies may use your data for services that include but are not limited to, verification, validation, reference, lookup, risk mitigation and data enhancement.
What Take5 Solutions (the company behind this site and a whole bunch of others) doesn’t tell you is that once you’ve told them something about yourself, they can cross-match the clues you’ve given them against massive other databases that can fill out the picture in much more granular detail. Scarily, there is no particular oversight about what these organizations store or who they are prepared to provide it to.
Individual databases, on their own, may contain innocuous information. But, harness the power to combine them and entirely unanticipated outcomes can occur. And, we’ve known this for a long time. For instance, in 2009, researchers were able to predict individual’s social security numbers by combining data from different subsets – without a data breach and without accessing any privileged information.
What has changed in the buying and selling of information.
Part 2: You are traveling amidst a digital throng that knows everything you’re doing online
When logging on to a website, most of us think we’re just, well, logging on to that website. What we don’t realize is that third parties with whom we may not have any relationship at all, are watching us, tracking our movements and capturing that information in their databases. They know how we moved through the site, what we clicked on, what we read, what we skipped over, and again, combining this information with what is already in our dossiers, can create a much richer view of who we are.
And, while this is pretty much an open secret, the average person seems to have no idea that telling any website anything may put that information out there in the public domain.
Moreover, just logging on to the internet creates a digital footprint that tells interested parties about your online behavior. While users are dimly aware that websites track them using cookies, what many are not aware of is there are also what are called “third-party cookies”. For instance, if you logged onto a news site and that site has a Facebook “like” button on it, a cookie is placed on your computer that Facebook can access. So even if you have never visited Facebook or don’t have an account, the social network still receives information about what you’ve been doing on the web. So much for all those folks who say, “I don’t have a Facebook account, why should I worry about the network having my data.”.
What has changed in the buying and selling of information.
Part 3: Your environment is watching you
Right. So, most of us know that the Internet is a bit of a Wild West when it comes to our privacy, and we can only hope that our information doesn’t leak out to the wrong people. But, what about when the rest of our world starts to take on Internet-like tracking?
The New York Times recently reported that in millions of households, so-called smart televisions are set up to track what you’re watching, and not only that, but link that information to all the other devices (such as a smartphone) you may have that link to the same network. One company involved in this activity is Samba TV. As the Times reports:
Once enabled, Samba TV can track nearly everything that appears on the TV on a second-by-second basis, essentially reading pixels to identify network shows and ads, as well as programs on Netflix and HBO and even video games played on the TV. Samba TV has even offered advertisers the ability to base their targeting on whether people watch conservative or liberal media outlets and which party’s presidential debate they watched.
This is a holy grail for advertisers, because they can immediately capture information about the effectiveness of a television ad based on subsequent visits to their sites by people being tracked. As Christine DiLandro, then a marketing director at Citi, discussed at an industry event, the ability to connect people’s real-time viewing behavior with digital activity was “a little magical.”. Companies can pay Samba and its ilk to show their ads after a competitor’s have been viewed, after their own ads have been viewed or after a particular show is seen. And, this is all perfectly legal, as long as the companies can claim to have provided consumers with information that accurately describes the tracking – even if it is buried in a thousand-page long acceptance policy.
Elsewhere in our homes, for the 47.3 million people with access to a smart speaker, it’s a guarantee that the device is listening. And, in incident after incident, it is very clear that the speakers are collecting and capturing data in the background of our lives. For instance, an Amazon Echo device recorded private conversations between a husband and wife in Seattle and sent snippets of the recordings to an acquaintance hundreds of miles away. More and more for the dossier.
Once data are out there, it’s anyone’s guess how it might be used
All of this means that we have had massive changes in the constraints that used to keep information relatively bounded. The problem is that institutions, both commercial and governmental, haven’t kept up. Abuses and mistakes made by organizations of personal data are alarming to privacy advocates.
Employees at ride-hailing company Uber, for instance, used its “God View” feature to spy on the activity of everyone from Beyoncé to critical journalists. In 2012, an audit discovered that fully half of Minnesota’s law enforcement personnel had queried its driver database for questionable reasons, leading to millions of dollars in settlements from people whose records were illegally viewed. Police officers in other jurisdictions have been found to use official databases for queries that have nothing to do with their jobs.
And of course, Facebook’s scandals have now been revealed to have potentially played a major role in elections throughout the world by basically handing over very detailed personal information to third party developers. The unprecedented amount of data made available for commercial purposes has begun to stagger politicians. As Molly Scott Cato, a British MEP for the Green Party, observed, “I watched [Zuckerberg] walk in and he looked pretty scared,”… “He’s totally out of his depth — he talks about setting Facebook up in college with this homey story and I’m, like, ‘Christ, this guy has the fate of European democracy in his hands and he doesn’t know what to do.”
For good reason, a recent IBM survey found that 78% of their respondents said that keeping their data private was “extremely important” but only 20% “completely trust” organizations to keep their information safe.
David vs. Goliath: Privatizing Profits while Socializing Costs
At the core of all these privacy issues is a major imbalance in economic interests. In what we may come to realize is a Faustian bargain, consumers have accepted that they can get services for free by agreeing to hand over personal information. And, in many cases, companies profiting from the use of user data are completely legal in doing so.
In the case of the Samba TV-viewing monitoring system, as David Kitchen, a software engineer who sounded an alarm about the company observed, “The thing that really struck me was this seems like quite an enormous ask for what seems like a silly, trivial feature,” Mr. Kitchen said. “You appear to opt into a discovery-recommendation service, but what you’re really opting into is pervasive monitoring on your TV.”
In another major economic and power imbalance, rules restricting the ability of internet service providers (ISP’s) to sell your browsing history were effectively eliminated. As the Electronic Frontier Foundation points out, these are at least five creepy things ISP’s can do with your data given weaker regulations, because they have done it before. The first is to sell your information to marketers. The second is to “hijack” your searches by diverting you to a website the ISP was paid to promote. The third is really invasive and involves actually looking at your traffic and using it for sundry purposes. The fourth is pre-installing a monitoring device on your phone that logs what apps you use and what websites you visit and sends that information back to your ISP. And, perhaps the scariest is that some companies can install ‘supercookies’ on your devices that you don’t know about, can’t delete and from which there is no escape.
In a highly critical article by Cory Doctorow, the lopsided economics of personal information are the centerpiece of his argument that there is something fundamentally problematic about the way data collection and distribution is governed. Because the small gains made by advertisers are not commensurate with the potential risks to society, he argues that “amassing huge dossiers on everyone who used the internet could create real problems for all of society that would dwarf the minute gains these dossiers would realize for advertisers.”
The dilemma here is that situations, such as these, are almost impossible for markets to resolve by themselves. Because the costs are diffuse, and the gains are private, actors have a powerful incentive to benefit themselves, even if the larger society suffers. Digital privacy regulation is thus a classic ‘tragedy of the commons’ problem in which the gains to some actors are substantial but result in widespread costs or even the ultimate destruction of the resources being exploited.
Privacy advocates have been warning about the dangers of unfettered access to and ability to manipulate information since the earliest days of the World Wide Web. Indeed, in 1996, no lesser a luminary than Tim Berners-Lee wondered about the effects of the Net.
Will it enable a true democracy by informing the voting public of the realities behind state decisions, or in practice will it harbor ghettos of bigotry where emotional intensity rather than truth gains the readership? It is for us to decide, but it is not trivial to assess the impact of simple engineering decisions on the answers to such questions.
Unfortunately, it seems the second scenario has unfolded. As Doctorow points out, we need to make a distinction between personal information begin used for automated persuasion versus personal information being used for automated targeting. Persuasion, as in trying to change someone’s mind about something they believe, is very difficult. Targeting, however, can be far more insidious, permitting a ‘long tail’ of people with idiosyncratic characteristics to find one another, and permitting identification of individuals by governments and other actors whose motivations can be suspect. As Doctorow says,
Gathering huge dossiers on everyone in the world is scary in and of itself: in Cambodia, the autocratic government uses Facebook to identify dissidents and subject them to arrest and torture; the US Customs and Border Protection service is using social media to find visitors to the US guilty by association, blocking them from entering the country based on their friends, affiliations and interests. Then there are the identity thieves, blackmailers, and con artists who use credit bureau data, leaked user data, and social media to ruin peoples’ lives. Finally, there are the hackers who supercharge their “social engineering” attacks by harvesting leaked personal information in order to effect convincing impersonations that trick their targets into revealing information that lets them break into sensitive networks.
Facebook got itself into hot water in 2016 (one of a long string of such instances) when it allowed advertisers to exclude certain ethnic groups from receiving an ad – in this case, users labeled African American, Asian American or Latino. Shocked observers pointed out that this practice runs afoul of the Fair Housing Act and that such practices are “predatory” because they exclude often vulnerable communities from access to information and opportunity.
Scary stuff indeed.
Taming the Wild West of Personal Information
Alessandro Aquisiti, in a 2012 essay, laid out the goal: “a future in which privacy by design and by default minimally interfere with the benefits that can be extracted from the analysis of individuals’ data.” He goes on to observe that “Achieving that goal will require more than self-regulation and technological ingenuity, however. It will require direct policy intervention, and will rely on our society’s collective call for a future in which the balance of power between data subjects and data holders is not so dramatically skewed, as current technological and economic trends are suggesting it may be.”
We are starting to see the willingness of policymakers to tackle privacy issues, although often with great reluctance. The General Data Protection Regulation in Europe began to legally specify individuals’ rights to some control over their own information, but with an important loophole – if you want to use a service, you almost always have to agree to share your data.
Growing public concern over privacy in the United States crystallized into a ballot initiative in California that garnered over 600,000 signatures. The initiative would have been even stricter than a privacy law the California legislature rushed into passing on the condition that the Californians for Consumer Privacy ballot initiative be withdrawn. Predictably, the tech companies benefitting from unfettered use of personal data fought hard against the ballot measure and, with some grumbling, agreed to support the legislation. As what happens in California is often a bellwether, it is highly likely that the California model will be adopted by other states.
Privacy rights are beginning to attract formidable support in other quarters. Marc Benioff, the Founder and CEO of Salesforce has publicly advocated for a national privacy law which would also assign property rights to individuals data to those individuals. He compares the manner in which Facebook profits from disclosure of private information to the marketing of cigarettes – an addictive product which many use, but which isn’t good for them.
In an even more aggressive development, a new company Hu-Manity (full disclosure – I am on their Board of Advisors) is dedicated to creating a 31st Human right, in which “every human being owns their inherent data forever, and said inherent data is categorized as property.” They propose an economic model, based on the block chain, which could put power back in the hands of data owners by intermediating companies’ ability to aggregate data. Characterizing data as property would be a huge advance. Remember that capitalism depends on property rights – once personal data is defined as property, a wealth of precedent about how it can be used can be developed. It is widely recognized that property ownership is one mechanism for avoiding the worst of the tragedy of the commons problems. In addition, allowing individuals to benefit from the monetization of their own data has the promise of also allowing them to generate income that does not depend on their skills or the whims of employers – potentially a universal basic income that could provide an economic cushion.
I am not one to predict the future. But the signs are clear that the current free-for-all in the market for personal information is likely to change in some non-trivial ways. Just as protests about the introduction of genetically modified food in Europe led to companies offering that product to pull back, protests about the use of personal information are also likely to lead to companies such as Facebook retreating. Indeed, 82 percent of people who learned of their new rights under the GDPR said they intended to exercise more control over their data. When enough consumers are unpleasantly surprised by the existing state of things, pressure tends to build, to the point at which there is a backlash.
Wouldn’t it be interesting if we have a back to the future moment about privacy, and rediscover the protections around free inquiry that were codified in the Library Bill of Rights? In thinking about who gets to know about what inquiries we make, that might not be a good place to start. In short: “In a library (physical or virtual), the right to privacy is the right to open inquiry without having the subject of one’s interest examined or scrutinized by others. Confidentiality exists when a library is in possession of personally identifiable information about users and keeps that information private on their behalf”.