Monday, May 07, 2018

Content Moderation at Scale, 2/2

You Make the Call: Audience Interactive (with a trigger warning for content requiring moderation)

Emma Llanso, Center for Democracy & Technology & Mike Masnick, Techdirt

Hypo: “Grand Wizard Smith,” w/user photo of a person in a KKK hood, posts a notice for the annual adopt-a-highway cleanup project.  TOS bans organized hate groups that advocate violence.  This post is flagged for review.  What to do?  Majority wanted takedown, but 12 said leave it up, 12 flag (leave up w/a content warning), 18 said escalate, and over 40 said take down.  Take down: he’s a member of the KKK.  Keep up: he’s not a verified identity; it doesn’t say KKK and requires cultural reference point to know what the hood means/what a grand master is.  Escalate: if the moderator can only ban the post, the real problem is the user/the account, so you may need to escalate to get rid of the account.

Hypo: “glassesguru123” says same sex marriage is great, love is love, but what do I know, I’m just a f----t.  Flagged for hate speech. What to do?  83 said leave it up.  5 for flag, 2 escalate, 1 take it down.  Comment: In Germany, you take down some words regardless of content, so it may depend on what law you’re applying.  Most people who leave it up are adding context: not being used in a hateful manner. But strictly by the policy, it raises issues, which is why some flag it.

Hypo: “Janie, gonna get you, bitch, gun emoji, gun emoji, is that PFA thick enough to stop a bullet if you fold it up & put it in your pocket?”  What to do? 57 take it down, 27 escalate, and 1 said leave it up/flag the content.  For escalate: need subject matter expert to figure out what a PFA is.  [Protection from Family Abuse.] Language taken from Supreme Court case about what constituted a threat.  I wondered whether there were any rap lyrics, but decided that it was worrisome enough even if those were lyrics.  Another argument for escalation: check if these are lyrics/if there’s an identifiable person “Janie.” [How you’d figure that out remains unclear to me—maybe you’ll be able to confirm that there is a Janie by looking at other posts, but if you don’t see mention of her you still don’t know she doesn’t exist.]  Q: threat of violence—should it matter whether the person is famous or just an ex?

Hypo: photo of infant nursing at human breast with invitation to join breast milk network.  Flagged for depictions of nudity. What to do? 65 said leave it up, 13 said flag the content, 5 said escalate, and 1 said take it down.  Nipple wasn’t showing (which suggests uncertainty about what should happen if the baby’s latch were different/the woman’s nipple were larger).  Free speech concerns: one speaker pointed that out and said that this was about free speech being embodied—political or artistic expression against body shame.  You have this keep-it-up sentiment now but that wasn’t true on FB in the past.  Policy v. person applying the policy.

Hypo: jenniferjames posts a site that links to Harvey Weinstein’s information: home phone, emails, everything— “you know what to do: get justice” Policy: you may not post personal information about others without their consent.  This one was the first that I found genuinely hard.  It seemed to be inciting, but not posting directly and thus not within the literal terms of the policy. I voted to escalate.  Noteworthy: fewer people voted. Plurality voted to escalate; substantial number said to take it down, and some said to leave it/flag it.  One possibility: the other site might have that info by consent!  Another response would block everything from that website (which is supposed to host personal info for lots of people).

Hypo: Verified world leader tweets: “only one way to get through to Rocket Man—with our powerful nukes. Boom boom boom. Get ready for it!”  Policy: no specific credible threats.  I think it’s a cop out to say it’s not a credible threat, though that doesn’t mean there’s a high probability he’ll follow up on it. I don’t think high probability is ordinarily part of the definition of a credible threat. But this is not an ordinary situation, so. Whatever it is, I’m sure it’s above my pay grade if I’m the initial screener: escalate. Plurality: leave it up. Significant number: escalate.  Smaller number of flag/deletes.  Another person said that this threat couldn’t be credible b/c of its source; still, he said, there shouldn’t be a presidential exception—there must be something he could say that could cross the line. Same guy: Theresa May’s threat should be treated differently.  Paul Alan Levy: read the policy narrowly: a threat directed to a country, not an individual or group.

Hypo: Global Center for Nonviolence: posts a video, with a thumbnail showing a mass grave. Caption: source “slaughter in Duma.”   “A victorious scene today,” is another caption apparently from another source. I wasn’t sure whether victorious could be read as biting sarcasm. Escalate for help from an area expert. Most divided—most popular responses were flag or escalate, but substantial #s of leave it up and take it down too. The original video maybe could be interpreted as glorifying violence, but sharing it to inform people doesn’t violate the policy and awareness is important. The original post also needs separate review. If you take down the original video, though, then the Center’s post gets stripped of content. Another argument: don’t censor characterizations of victory v. defeat; compare to Bush’s “Mission Accomplished” when there were hundreds of thousands of Iraqis dead.

Hypo: Johnnyblingbling: ready to party—rocket ship, rocket ship, hit me up mobile phone; email from City police department: says it’s a fake profile in the name of a local drug kingpin. Only way we can get him, his drugs, and his guns off the street. Policy: no impersonation; parody is ok. Escalate because this is a policy decision: if I am supposed to apply the policy as written then it’s easy and I delete the profile (assuming this too doesn’t require escalation; if it does I escalate for that purpose). But is the policy supposed to cover official impersonation?  [My inclination would be yes, but I would think that you’d want to make that decision at the policy level.] 41 said escalate, 22 take down, 7 leave it up, 1 flag. Violate user trust by creating special exceptions.  Goldman points out that you should verify that the sender of the email was authentic: people do fake these.  Levy said there might be an implicit law enforcement exception. But that’s true of many of these rules—context might lead to implicit exceptions v. reading the rules strictly.

1:50 – 2:35 pm: Content Moderation and Law Enforcement
Clara Tsao, Chief Technology Officer, Interagency Countering Violent Extremism Task Force, Department of Homeland Security

Jacob Rogers, Wikimedia Foundation: works w/LE requests received by Foundation. We may not be representative of different companies b/c we are small & receive a small number of requests that vary in what they ask for—readership over a period of time v. individual info. Sometimes we only have IP address; sometimes we negotiate to narrow requests to avoid revealing unnecessary info.

Pablo Peláez, Europol Representative to the United States: Cybercrime unit is interested in hate speech & propaganda. 
Dan Sutherland, Associate General Counsel, National Protection & Programs Directorate, U.S Department of Homeland Security: Leader of a “countering foreign influence” task force. Work closely w/FBI but not in a LE space.  Constitution/1A: protects things including simply visiting foreign websites supporting terror.  Gov’t influencing/coercing speech is something we’re not comfortable with. Privacy Act & w/in our dep’t Congress has built into the structure a Chief Privacy Officer/Privacy Office. Sutherland was formerly Chief Officer for Civil Rights/Civil Liberties.  These are resourced offices w/in dep’t and influence issues.  DHS is all about info sharing, including sensitive security information shared by companies.

Peláez: Europol isn’t working on foreign influence. Relies on member states; referrals go through national authorities.  EU Internet Forum brings together decisionmakers from states and private industry. About 150-160 platforms that they’ve looked at; in contact w/about 80. Set up internet referral management tool to access the different companies.  Able to analyze more than 54,000 leads.  82% success rate.

Rogers: subset of easy LE requests for Wikipedia & other moderated platforms—fraudulent/deceptive, clearly threats/calls to violence. Both of those, there is general agreement that we don’t want them around. Some of this can feed back into machine learning.  Those tools are imperfect, but can help find/respond to issues. More difficult: where info is accurate, newsworthy, not a clear call to violence: e.g., writings of various clerics that are used by some to justify violence. Our model is community based and allows the community to choose to maintain lawful content.

LE identification requests fall into 2 categories: (1) people clearly engaged in wrongdoing; we help as we can given technical limits.  (2) Fishing expeditions, made b/c gov’t isn’t sure what info is there. Company’s responsibility is to educate/work w/company to determine what’s desired and protect rights of users where that’s at issue.

YT started linking to Wikipedia for controversial videos; FB has also started doing that.  That is useful; we’ll see what happens.

Sutherland: We aren’t approaching foreign influence as a LE agency like FBI does, seeking info about accounts under investigation or seeking to have sites/info taken down. Instead, we support stakeholders in understanding scope & scale & identifying actions they can take against it. Targeted Action Days: one big platform or several smaller—we focus on them and they get info on content they must remove. 

Peláez: we are producing guidelines so we understand what companies need to make requests effective.  Toolkit w/18 different open source tools that will allow OSPs and LE to identify and detect content.

What Machines Are, and Aren’t, Good At
Jesse Blumenthal, Charles Koch Institute: begins with a discussion that reminds me of this xkcd cartoon.

Frank Carey, Machine Learning Engineer, Vimeo: important to set threshold for success up front. 80% might be ok if you know that going in.  Spam fighting: video spam, looks like a movie but then black screen + link + go to this site for full download for the rest of the 2 hours.  Very visual example; could do text recognition.  These are adversarial examples. Content moderation isn’t usually about making money (on our site)—but that was, and we are vastly outnumbered by them. Machine learning is being used to generate the content.  It’s an arms race. Success threshold is thus important.  We had a great model with a low false positive rate, and we needed that b/c if it was even .1% that would be thousands of accounts/day. But as we’d implement these models, they’d go through QA, and within days people would change tactics and try something else. We needed to automate our automation so it could learn on the fly.

Casey Burton, Match: machines can pick up some signs like 100 posts/minute really easily but not others. Machines are good at ordering things for review—high and low priority.  Tool to assist human reviewers rather than the end of the process. [I just finished a book, Our Robots, Ourselves, drawing this same conclusion about computer-assisted piloting and driving.]

Peter Stern, Facebook: Agrees. We’re now good at spam, fake accounts, nudity and remove it quickly.  Important areas that are more complicated: terrorism.  Blog posts about how we’ve used automation in service of our efforts—a combo of automation and human review.  A lot of video/propaganda coming from official terrorist channels—removed almost 2 million instances of ISIS/Al Qaeda propaganda; 99% removed before it was seen. We want to allow counterspeech—we know terror images get shared to condemn. Where we find terror accounts we fan out for other accounts—look for shared addresses, shared devices, shared friends. Recidivism: we’ve gotten better at identifying the same bad guy with a new account. Suicide prevention has been a big focus. Now using pattern recognition to identify suicidal ideation and have humans take a look to see whether we can send resources or even contact LE.  Graphic violence: can now put up warning screens, allow people to control their experience on the platform.  More difficult: for the foreseeable future, hate speech will require human judgment. We have started to bubble up slurs for reviewers to look at w/o removing it—that has been helpful.  Getting more eyes on the right stuff. Text is typically more difficult to interpret than images.

Burton: text overlays over images challenged us. You can OCR that relatively easily, but it is an arms race. So now you get a lot of different types of text designed to fool the machine.  Machines aren’t good at nuance.  We don’t get too much political, but we see a lot of very specific requests about who they want to date—“only whites” or “only blacks.”  Where do you draw the line on deviant sexual behavior? Always a place for human review, no matter how good your algorithms.

Carey: Rule of thumb: if it’s something you can do in under a second, like nudity detection, machine learning will be good at it.  If you have to think through the context, and know a bunch about the world like what the KKK is and how to recognize the hood, that will be hard—but maybe you can get 80% of the way.  Challenge is adversarial actors.  Laser beam: if they move a little to the left, the laser doesn’t hit them any more. So we create two nets, narrow and wide. Narrow: v. low false positive rate. With wider net that goes to review queue.  You can look at confidence scores, how the model is trained, etc.

Ryan Kennedy, Twitch?: You always need the human element.  Where are your adversaries headed?  Your reviewers are R&D.

Burton: Humans make mistakes too. There will be disagreement or just errors, clicking the wrong button, and even a very low error rate will mean a bunch of bad stuff up and good stuff down. 

Blumenthal: we tend not to forgive machines when they err, but we do forgive humans. What is an acceptable error rate?

Carey: if 1-2% of the time, you miss emails that end up in your spam folder, that can be very bad for the user, even if it’s a low error rate.  For cancer screening, you’re willing to accept a high false positive rate.  [But see mammogram recommendations.] 

Stern, in response to a Q about diversity: We are seeking to build diverse reviewers, whose work is used for the machine learning that builds classifications.  Also seeking diversity on the policy team, b/c that’s also an issue in linedrawing. When we are doing work to create labels, we try to be very careful about whether we’re seeing outlying results from any individual—that may be a signal that somebody needs more education.We also try to be very detailed and objective in the tasks that we set for the labelers, to avoid subjective judgments of any kind.  Not “sexually suggestive” but do you see a swimsuit + whatever else might go into the thing we’re trying to build. We are also building a classifier from user flagging.  User reports matter and one reason is that they help us get signals we can use to build out the process.

Kennedy, in response to Q about role of tech in dealing w/ live stream & live chat: snap decisions are required; need machines to help manage it.

Carey: bias in workforce is an issue but so is implicit bias in the data; everyone in this space should be aware of that. Training sets: there’s a lot of white American bias toward the people in photos.  Nude photos are mostly of women, not men. You have to make sure you’re thinking about those things as you put these systems in place.  Similar thing w/wordnet, a list of synonyms infected w/gender bias. English bias is also a thing.

Q: outsourced/out of the box solutions to close the resource gap b/t smaller services and FB: costs and benefits?

Burton: vendors are helpful.  Google Vision has good tools to find & take down nudity.  That said, you need to take a look and say what’s really affecting our platform.  No one else is going to care about your issues as much as you do.

Carey: team issues; need for lots of data to train on, like fraud data; for Vimeo, nudity detection was a special issue b/c we don’t have a zero nudity policy.  We needed to ID levels of nudity—pornographic v. HBO. We trained our own model that did pretty well. Then you can add human review. But off the shelf models didn’t allow that.  Twitch may have unique memes—site tastes are different.  Vendors can be great for getting off the ground, but they might not catch new things or might catch too many given the context of your site.

Kennedy: vendors can get you off the ground, but we have Twitch-specific language.  Industry standards can be helpful, raising all ships around content moderation.  [I’d love to hear from someone from reddit or the like here.]

Q re automation in communication/appeals: Stern says we’re trying to improve. It’s important for people to understand why something did/didn’t get taken down. In most instances, you get a communication from us about why there was a takedown. Appeals are really important—allow more confidence in the process b/c you know mistakes can be corrected.  Always a conundrum about enabling evasion, but we believe in transparency and want to show people how we’re interacting w/their content. If we show them where the line is, we hope they know not to cross.

Burton: There are ways to treat bots differently than humans: don’t need to give them notice & can put them in purgatory. We keep info at a high level to avoid people tracking back the person who reported them and going after them.

David Post, Cato Institute

Kaitlin Sullivan, Facebook: we care about safety, voice, and fairness: trust in our decisionmaking process even if you don’t always agree w/it. Transparency is a way to gain your trust.  New iteration of our Community Standards is now public w/full definition of “nudity” that our reviewers use. We also want to explain why we’re using these standards. You may not agree that female nipples shouldn’t be allowed (subject to exceptions such as health contexts) but at least you should be able to understand the rule.  Called us “constituents,” which I found super interesting.  Users should be able to tell whether there is an enforcement error or a policy decision.  We also are investing more in appeals; used to have appeals just for accounts, groups, pages. We’ve been experimenting w/individual content reviews, and now we have an increased commitment to that.  We hope to have more numbers than IP, gov’t requests, terror content soon.

Kevin Koehler, Automattic: 30% of internet sites use WordPress, though we don’t host them all. Transparency report lists what sites we geoblock due to local law & how we respond to gov’t requests. We try to write/blog as much as we can about these issues to give context to the raw numbers. Copyright reports have doubled since 2015; gov’t info requests 3x; gov’t takedowns gone up 145x from what they once were. Largely driven by Russia, former Soviet republics, and Turkey; but countries that we never heard from before are also sending notices, sometimes in polite and sometimes in threatening terms.

Alex Walden, Google: values freedom of expression, opportunity, and ability to belong.  400 hours of content uploaded every minute. Doubling down on machine learning, particularly for terrorist content. Including experts as part of how we ID content is key.  Users across the board are flagging lots content; the accuracy rates of ordinary users are relatively low, while trusted flaggers are relatively high in accuracy. 8 million videos removed for violating community guidelines, 80% flagged by machine learning. Flagà human review. Committed to 10,000 reviewers in 2018.  Spam detection has informed how we deal w/other content.  Also dealing w/scale by focusing on content we’ve already taken down, preventing its reupload.  Also important that there’s an appeals process. New user dashboard also shows users where flagged content is in the review process—was available to trusted flaggers, but is now available to others as well.

Rebecca MacKinnon, New America’s Open Technology Institute: Deletions can be confusing and disorienting. Gov’ts claim to have special channels to Twitter, FB to get things taken down; people on the ground don’t know if that’s true. Transparency reports are for official gov’t demands but it’s not clear whether gov’ts get to be trusted flaggers or why some content is going down. Civil society and human rights are under attack in many countries—lack of transparency on platforms destroys trust and adds to sense of lack of control.

Human rights aren’t measured by lack of rules; that’s the state of nature, nasty brutish and short. We look to see whether companies respect freedom of expression. We expect that the rules are clear and that the governed know what the rules are and have an ability to provide input into the rules, also there is transparency and accountability about how the rules are enforced.  Also looking for impact assessment: looking for companies to produce data about volume and nature of information that’s been deleted or restricted to enforce TOS and in response to external requests.  Also looking in governance for whether there’s human rights impact assessment.  More info on superusers/trusted flaggers is necessary to understand who’s doing what to whom. We’re seeing increasing disclosure about process over time.

If the quality of content moderation remains the same, then more journalists and activists will be caught in the crossfire.  More transparency for gov’ts and people could allow conversations w/stakeholders who can help w/better solutions.

Koehler: reminder that civil society groups may not be active in some countries; fan groups may value their community very strongly and so appeals are an important way of getting feedback that might not otherwise be available.  Scale is the challenge. 

Post asked about transparency v. gaming the system/machine learning [The stated concern for disclosing detection mechanisms as part of transparency doesn’t seem very plausible for most of the stuff we’re talking about.  Not only is last session’s point about informing bots v. informing people a very good point, “flagged as © infringement” is often pretty clear without disclosing how it was flagged.]

Sullivan: gaming the system is often known as “following the rules” and we want people to follow the rules. They are allowed to get as close to the line as they can as long as they don’t go over the line.  Can we give people detailed reasons with automated removal?  We have improved the information we have reviewers identify—ask reviewers why something should be removed for internal tracking as well as so that the user can be informed.  A machine can say it has 99% confidence that a post matches bad content, but that’s different—being transparent about that would be different.

Koehler: the content/context that a user needs to tell you the machine is doing it wrong is not the same content that the machine needs to identify content for removal: nudity as a protest, for example.

No comments: