google


Michael Zimmer offers an excellent discussion of this week’s controversy regarding Facebook’s removal of an image of two men kissing. I want to put this up next to the recent article by Mike Ananny in the The Atlantic, where he interrogates the possible reasons why, when he went to load the gay social networking app Grindr, Google’s App Market ‘recommended’ an app that tracks sex offenders.

As we begin to unravel how and why content platforms and app stores make curatorial decisions about the content they provide, we are asking the kinds of questions both Zimmer and Ananny ask about these instances. Are we looking at the result of a human intervention or an algorithmic one? (Is it even possible or productive to draw this distinction so clearly?) Was this intentional or accidental? (And is it too simple to equate human judgment with an intentional choice and an algorithmic conclusion with a lack of intention?) Does this judgment, however it was made, adhere to or exceed the site’s stated rules and expectations? (And, implied in that, is a reprehensible judgment acceptable simply because it isn’t hypocritical?) And, perhaps the hardest question, what are the consequences of these decisions, for users and for the contours of public discourse? Does the removal of images of men kissing, while allowing thousands of images of heterosexual kisses to remain, help to marginalize public expressions of gay intimacy? Does the recommendation link between gay social life and sex offenders reinforce an association in some people’s minds about gay men as sexual predators?

I find all of these questions intensely important ones to ask, and am struck by the fact, or at least perception, that this issue has been more publicly visible as of late. Facebook has faced trouble over the years for how it applies its rules, particularly around nudity: much of its trouble came from the disputed removal of images of women breastfeeding. Livejournal faced a similar controversy in 2006. Apple has drawn scrutiny and sometimes ire for recent removals of apps from anti-gay churches, political satire, and an app for Wikileaks, but questions about what their review criteria are and when they’ll be exerted have been raised since the app store first opened.

But perhaps what is trickiest here is to consider both of these examples together. What is the comprehensive way of understanding both kinds of interventions, both the removal of content and the shaping of how content that does remain in an archive will be found and presented? In my own research I have been focusing on the former: the decisions about and justifications for removing content perceived as objectionable, or disallowing it in the first place. But in some ways, this kind of border patrol, of what does and does not belong in the archive, is the most mundane and familiar of these interventions. We know how to raise questions about what NBC will or will not show, what The New York Times will or will not print. We need to examine these kinds of judgments together with a spectrum of choices these sites and providers are increasingly willing and able to make:

- techniques for dividing the archive into protective categories (age barriers, nation-specific sub-archives)
- mechanisms for displaying or blocking content based on explicitly indicated user preferences
- making predictive adjudications on whether to display something based on aggregate user data (national origin, previously viewed or downloaded content, aggregate judgments based on the preferences of similar users)
- categorization and tagging of content to direct its flow
- search and recommendation mechanisms based on complex algorithmic combinations of aggregated user purchases or activity, semantic categories and meta-information
- value- or activity-based mechanisms for navigating content, such as ‘bestseller’ or ‘most emailed’ lists, offered as objective criteria
- structural mechanisms for the preferred display of content to first-time and to returning users
- choices about ‘featured’ or otherwise prioritized content

All of these are complex combinations of technical design, human judgment (whether in anticipation of problematic content or in the moment of its encounter, all struggle with values, both the provider’s economic priorities and legal obligations, their assessment of the wants and hesitations of their community, and the broader cultural norms they believe or claim they are approximating.

These sites, I believe, still have the appearance of neutrality and totality on their side (maybe not Apple). Despite the increasing occurrence of these incidents, most users still experience these sites as open and all-encompassing, and most will not run into an edge in their own surfing, where something is simply disallowed. So the complex curation of these sites, along all of the dimensions I mentioned above, quietly shapes archives that, by and large, still feel unmediated — every video one could imagine, or whatever users want to post. To the degree that this perception continues to persist (and is actively maintained by the providers themselves) it will remain difficult to raise the questions that Zimmer and Ananny and others are trying to raise, about not just the fact that these sites are curated, but the way that the mechanisms by which they’re curated, the subtle forces that shape what is available and how it is found, and the way different justifications for curating at all, shape the digital cultural landscape, and subtly shape and reinforce what Mary Gray (scroll down in the comments to Ananny’s essay) called the “cultural algorithms,” the associations and silences in our culture around controversial viewpoints, images, and ways of life.

The Chronicle for Higher Ed reported this week that a decision was handed down in the copyright case against Turnitin, the plagiarism detection site. (Quickie: Schools subscribe to Turnitin, and teachers require their students to submit their papers to them before handing them in. Turnitin compares the new paper against their database of existing papers, indicates whether there’s plagiarism or not. And, they add the new paper to their database, meaning the database grows. Four students sued the parent company, iParadigm, for copyright violation, in that the site makes a copy of their paper, and [in cases where it later detects plagiarism] can occasionally distribute that paper to specific faculty.) Turnitin claimed fair use, that their use is transformative and does not hurt the commercial value of the original. The court agreed.

The Chronicle came to the same conclusion I did when I first heard the news — this is very good news for Google Books. If the APA lawsuit against Google ever goes to court, Google is going to need to argue that, though they do make single copies of books, they do so not for their redistribution or in a way that harms the commercial value of the original, but for a different use. I have argued elsewhere that, though I think Google should be allowed to do this, that trying to stretch and pull fair use to cover all of these “indexing” kinds of activities is problematic for fair use. Apparently, this court saw fit to extend fair use to cover this.

I don’t claim that this is original, and I bet I could guess who’s already said something like this, if I had an afternoon to go look at their books/blogs/articles. This is just a thought, walking out of my class today, a way I found I could make sense of something worth making sense of.

The topic this week was whether the classic concerns about media concentration around broadcasting and publishing, i.e. the worry that more and more media outlets are owned by fewer and fewer companies, applies and raises the same implications in new media industries, such as the search business. The point I think I closed with today, though it’s only coming clear in my head now, is that the concerns we had for traditional media emerged from the “economic imperative of mass appeal”: If your business model depends on helping an advertiser get the same message in front of as many eyes as possible, and the economics are such that it costs a whole lot to make the movie or show that’s going to draw them in but cheap to get that show to a huge audience, then the tendency is to try for a mass audience, make one thing as appealing to as many as possible, and be sure its something tht advertiser won’t shy away from. And from that, the risks and abuses that can come from media concentration are of a certain kind: shying away from volatile topics, homogenizing the content, chasing past successes, failing to report on news that might damage your own business or that of your advertisers. (This is not to say that this always or even endemically happens, but that it can, and does.)

On the other hand, in the search industry, the business model is to attempt to give each user what they’re looking for, not give them all the same thing. And advertisers pay to associate themselves to specific terms and pages, not to be everywhere for everyone. So the business logic, and with it the risks that emerge from economic concentration, come not from mass appeal but from the “economic imperative of comprehensiveness”. The best search engine will be the one that catalogs the most of the web, or the most of the web that’s relevant to the most people, and serves that index up in a way that satisfies users requests, or seems to. The goal is to give every user to the right advertiser, every advertiser to the right user. And it benefits the search company to find ways to bring users to them and to keep them there, not just by doing search well, but by building themselves into other services so users are channeled back to them. (Google does this by building its search into a browser toolbar, into other websites, by building the search into GMail and YouTube and Picasa and Google Maps and Google Books and iPhones and so on…) This is the “googlization of everything” that Siva Vaidhyanathan has been writing about.

And, thus, all the kinds of risks and abuses that have emerged around Google’s dominance in the search industry and around concentrated corporate ownership in the new media realm all stem from this economic imperative of comprehensiveness. It is not about content control or political timidity, as it can be with traditional media. Instead, its Google choosing to scan books first and letting copyright owners opt-out (rather than asking them all for permission first, which would have been legally safer) — the value of that library will depend in large part on being able to say that its “everything,” or close to it. Its the temptation to mine GMail messages and search queries and Deja News posts as consumer data to better fit ads to users and search terms, because Google needs to know as much as it can about every user and every kind of interest, no matter how obscure. Its the Google Maps “street view,” where privacy concerns come second to the impulse to document every inch of every street corner.

This framework, I’m sure, was inspired by Elizabeth van Couvering’s dissertation work on search engines, part of which was assigned reading for my class today.

According to Machinist and CNet News, Google has promised the court that it will launch a technology for YouTube designed to automatically locate and take down material that infringes copyright. Google is being sued by Viacom and by a consortium of European sports teams for not sufficiently patrolling the video site for instances of their content being posted by users. The law requires Google to respond to take-down notices submitted by copyright owners; the case, if it doesn’t get settled before going to court, will deal with what counts as a reasonable response.

The plan to automatically filter YouTube for infringing content should take us right back to the Napster case. As I predicted in the book, we’re already collectively forgetting that the court did not shut down Napster. It merely required Napster to filter its network, blocking users from accessing copyrighted material on other users’ computers by removing it from its search results. There was a lot of back and forth about how effective the filter that Napster installed was, and how diligent the RIAA was about providing Napster with the information it needed to filter out its member companies’ content, but it didn’t matter because, with so much music unavailable, the network dried up and users went elsewhere.

So, what’s different here? First, in the intervening time, the technology for filtering has certainly improved. Google has not gone into detail about how their YouTube filter will work, but it will certainly benefit from six years of innovation in such tools. Moreover, all the content is stored at YouTube. Napster had to recognize in real time that a logged-in user was offering something they shouldn’t, whereas Google has the entire database just sitting there, ready to be scanned and filtered. And, in terms of long-term consequences, the value of YouTube is not overwhelmingly its provision of copyrighted content, they way Napster’s was — an effective filter is not likely to kill off the site.

On the other hand, part of the problem is that YouTube is a massive and constantly fluctuating corpus — precisely the problem Google is being sued for in the first place. Despite being diligent about removing content, they can’t seem to keep up with all the users uploading clips from TV shows and movies, and all the take down notices coming from the studios and broadcasters. Presumably, an automatic filter is intended to improve on whatever they’re currently doing. But, it will also presumably suffer from the same problems Napster’s filter did. First, users will game the system, trying to beat the filter. Napster users started renaming files with obvious spelling errors, to avoid the early filter that looked for artist names, even going so far as converting them to pig latin, i.e. “itneybray earsspay,” or reversing the name, i.e. “yentirb sraeps”. More importantly, the filter will likely identify false positives, removing content that shouldn’t in fact be removed. And there’s great incentive for Google/YouTube to over filter (to appease the court and avoid a lawsuit) and little incentive for them to protect those users who get caught up in that net, or to reinstate their videos.

My particular concern is that the filter will depend on some form of visual recognition and pattern matching — i.e., it will look for what is likely to be Stephen Colbert’s face, and assume it has liekly located an unauthorized clip from The Colbert Report. Napster upgraded its filter, from one that blocked according to filenames to a system of audio recognition that compared the music itself to known songs. The risk, as usual, is for fair use. Would a news documentary or a video parody that included a few seconds of Colbert get caught in the filter plans to impose?

I was talking with my student Dima today, and we were going over the recent controversy about Google’s “Street View” map feature and its potential privacy implications. And it occurred to me that Google has adopted a very powerful strategy for how it introduces new features, one that changes the game for how public consideration of its implications goes. Rather than announcing that it is about to begin to take photographs of every point on every street of major U.S. cities and posting them online, so you can see faces and license plates and questionable behavior and right into front windows, and then face the potential debater or outrage, they simply do it. They do it without fanfare, without even any public knowledge (a pretty amazing accomplishment for a project of this scope — but they seem to do it all the time).

So they still face the public debate, whether it sways in their favor or not. But the debate happens in the context of an existing feature — and, as is typical of Google, a beautifully designed and intuitive one — which can argue for its own value. If we were having this privacy debate about a feature yet to be designed, I think it would be much easier to see it only in the light of privacy risks, and the debate might even be intense enough to discourage Google from doing it. But now, its harder to argue when the tangible value of the feature is so palpably obvious.

This doesn’t always work — the uproar about the Facebook “News Feed” feature, which simply appeared rather than being announced, may have been actually more intense because it was already up and running, already revealing people’s every action on the network to all of their contacts. But it does let Google win a lot of support from those who might say, “sure, its got some privacy implications, but look how handy it is!” And, as Dima pointed out, it’s free. Which got us thinking about the cultural implications of free. Chris Anderson, author The Long Tail, is apparently working on a book called Free for 2008, discussing the cultural implications of goods that are priced at zero. Here’s one. There is an illusion of benevolence that seems to come with Google’s offerings: hey, here’s the greatest search engine you’ll ever find! Hey, do you want intuitively designed maps of the entire continent? Here you go. Need a better email client? Why not take ours.” Its not as if these are actually acts of benevolence. Google is a for-profit company, and quite a profitable one. But because there’s no visible price tag, no subscription fee, these services feel like gifts. When you pay for that music subscription service, or buy that expensive software, you are faced with the undeniable fact that the provider wants your money, and even in our consumer culture that comes with skepticism — am I being hoodwinked into a lousy product? Does this company have my best interests at heart, or just their own? I wonder if Google, and other providers of “free” stuff, subsequently get a bit of a pass from their consumers because of this seeming generosity.

This would be funny if it weren’t such a sadly chronic misunderstanding of copyright. Engadget reported yesterday that Richard Charkin, the CEO of Macmillan Publishers, stole a couple of laptops from the Google table at the BookExpo American convention, returning them later
noting that “there wasn’t a sign by the computers informing him not to steal them.” This was a painfully misinformed commentary on the Google Books project, where Google is working to scan all printed books in order to make a searchable index of all written literature. (I have commented on this case before, at InsideHigherEd, if you want background.) This is yet another example of the painfully endemic assumption, one especially shared by and perpetuated by the content and publishing industries, that copying=theft. (Here’s just one example I’ve been writing about: click on “what is piracy?” to see what I mean.) Its not true, at all, in a legal sense or in a cultural sense; Lawrence Lessig goes point by point on how wrongheaded the parallel Charkin is making is. Maybe these kind of puerile antics are common inside of the corporate spaces into which I rarely venture, or maybe we really still are in the very heart of the copyright wars, or maybe book publishers are just now experiencing the shock + outrage + haughtiness + opportunism that the software, music, and movie industries already got over. But as a language game, the claim that copying=theft is a powerful discursive tactic, one that is going to have more consequence than any particular case or piece of software will.