Friday, June 30, 2023

Transatlantic Dialogue Workshop, Institute for Information Law (IViR), Amsterdam Law School Part 3: Algorithms, Liability and Transparency

Chair: Martin Senftleben

Impulse Statement: Sebastian Felix Schwemer

Recommendation systems; transparency is the approach to recommender systems, which intersects with privacy/data protection. How much can we throw recommendation of information and moderation of information in the same bowl? Algorithmic recommendation/moderation: we’re interested in platforms but there’s a world beyond platforms where automation is an issue, such as the DNS. Keeping in mind that regulatory focus is platforms and VLOPs in terms of algorithmic moderation.

Transparency is a wicked question: who for and how. Not only rules in Art. 14 p. 4 but also a requirement about terms & conditions: balancing of fundamental rights when they use algorithmic content moderation. Affects decisionmaking. Obligation to report on use of automated means for content moderation—for all intermediary service providers, not just VLOPs, including accuracy and possible errors. Relates to Q of benchmarking/how do we evaluate the quality of this decisionmaking? DSA doesn’t have answers to this. Decision quality might look very different across fields: © might have a yes/no answer, but misinformation might be very tricky. Very little information on what kind of human competence is needed.

Impulse Statement: Rebecca Tushnet

Benchmarking: spam detection—interesting that until a political party got interested there was no inquiry into reliability, and still no standard for spam detection quality other than “don’t screen out political fundraising.” Related to the abject quality of real content moderation: it is literally beneath our notice and we have contempt for the people who carry out abject functions.

VLOP definition versus “sites that actually have the problems against which the DSA is directed”—not only fashion sites; nonprofits as a special issue—Wikipedia, Internet Archive, AO3 which does not recommend anything; compare to DMCA Classic and DMCA Plus, where some large entities have repeated © issues and lots of valid notices and others simply don’t—DMCA is a reasonable system for most of them: Etsy has problems, but not ones that make sense to frame in any way as Instagram’s or Parler’s.

DSA’s separation of overarching patterns from individual decisions is good if maintainable, but doesn’t fit easily into US framework—Texas and Florida laws are both indications of what politicized targeting look like and relatively unsurprising in reliance on private claims (though outsized damage awards show the targeting).

Problems with scale: inherent inconsistency. This is usually shorthanded as “we make 100 million decisions a day, so even a tiny error rate means a large absolute number of errors.” But it is more than that: inconsistency and conflicting decisions. We have to accept that—indeed, it will mostly go undetected—but we also have to accept that the existence of conflicting decisions does not mean that either one is wrong. Compare: TM applications—in the US system at least, it is an explicit principle of law that one cannot dispute a registration decision by pointing to others that seem factually similar (or even identical) but went the other way; see also: grading by school teachers.

This is related to the DSA mandatory appeals system, which does look like Texas and Florida. One size fits all for YouTube comments and entire accounts; not a great model—the same degree of due process for everything instead of allowing services to focus only on serious disputes like when someone loses an account. Significant concerns: disproportion in the demographics of who appeals moderation, already well known as an issue—men, English speakers. But inconsistency even w/in categories will also be necessary to live with.

Senftleben: Overarching issues—relationship of different things addressed in DSA—moderation and recommendation: are they comparable/pose similar problems? Are there already new problems unaddressed such as generative AI? Then getting public interest/human rights balancing into the system—who is responsible for checking this is done properly at the platform level. Consistency of decisions across countries, cultural backgrounds, appeals. [To be clear: I don’t think it’s just about cultural backgrounds or demographics: two people w/ the same background will also make different decisions on the same facts and that’s not necessarily a wrong. (Also: ought implies can and what I’m arguing is that consistency cannot be achieved at this scale.)]

Goldman: disparate impact can come from many sources, often impossible to tell what they are. Lots of evidence of disparate impact in content moderation whose causation will be disputed. Humans v. Machines: there’s a cost to having humans in the loop: worker wellness. Regulators just don’t value it and that is a problem in balancing costs and benefits.

Daphne Keller: DSA prohibits inconsistency: you have a right of appeal to resolve hard judgment calls and get to consistency. [But: ought implies can; I would think a civil law system is ok with calling each judgment a tub on its own bottom.]

Leistner: in context of European court system, a small PI injunction will stay locally; there are divergent results already b/c it only goes to European courts in rare circumstances. Thus you have inconsistencies based on the same standards.

Hughes: inconsistency at first decision is not the same thing as inconsistency at the appeal level. The TTAB and PTO try to be consistent. [We disagree about this. They certainly try to have rules, but they also don’t hold that they’re required to treat the same facts the same way—there might be undisclosed differences in the facts or a different record and they don’t try to find those differences, just presume they’re there. This is aided by the fact that different TM applications will, by virtue of being different TM applications, have slightly different features from previous applications—which is also true of stuff that gets moderated. The school discipline cases also show that even when you get to the second level the variety of circumstances possible make “consistency” a hopeless ideal—the “impersonate a teacher Finsta” will play out differently at different schools so the facts will always be differentiable.]

Schwemer: Nondiscriminatory and nonarbitrary which is the DSA standard doesn’t necessarily require consistency in that strict sense.

Keller: suppose you have a rule: I will remove everything the machine learning model says is nudity, knowing it has a 10% error rate.

Van Hoboken: No—there’s a due process reconsideration requirement.

Keller: but the model will give the same result on reconsideration.

Van Hoboken: still not ok. [I take it because that’s not a fair rule to have?]

Keller: so that is a requirement for hiring people.

Rachel Griffin: Analyzing meaning in context is getting better—so will people start saying it’s ok to leave it to the machine?

Niva Elkin-Koren: We assume that a good decision by a court/oversight body on particular facts will immediately translate into the system, but that’s not true. One reason is that there is a huge gap between the algorithmic system running on everything and the decision of a panel on one removal. Translation gap: so we have to ask whether there is bias as well as compliance/error rates. Agree there’s no evidence that human in the loop improves process, but we could encourage regulators/implementors to enhance the opportunities for making humans more efficient—researchers can help with this. Avoiding humans who participate just for rubber-stamping the system itself.

Séverine Dusollier: Inconsistencies: we know as professors that we aren’t completely consistent in grading; we know that morning differs from afternoon—we fight it but it is human nature. There is something that you can’t completely analogize with machine inconsistency—we might have some randomness in both, but we are also informed by bias/political perspective/etc. The machine will also be inconsistent but perhaps in better ways, but what we have to do is rely on social science studies about how they actually show bias entrenched in machine and human decisions. The consequences are not the same: a decision on online platforms has different impacts. [I agree that we don’t know what machine inconsistency will look like; of course the inputs to the machine come from humans!]

Wikipedia doesn’t make recommendations. Sometimes the answer you get from the community is so sexist and misogynistic that it shows it’s still a system that needs intervention. [Agreed, but my point was that it doesn’t have the “algorithms” that people accuse of hurting democracy/spurring anorexia/etc. because it’s not optimized for ad-display engagement. So the mechanisms for addressing the problems will necessarily be different as will definition of the problems.]

Samuelson: Audits are really important to counteract human tendency towards inaction (and perhaps bias in who appeals), but there are no standards! Financial audits work because we have standards for the kind of activities we should and shouldn’t be doing in accounting. Right now there is an emerging new profession for computing audits; but right now the promise exceeds the capability. We need standards for what the audits should look like and what the profession that does the audits should look like.

Van Hoboken: it’s not very clear how to judge moderation especially at the level at which Art. 14 is drafted. Surface specificity, but application to many different types of services means there’s lots to do. Inconsistency is a problem of focus precisely b/c we want to see diversity in approaches to content moderation. We might want “free speech friendly” services in the mix and “safe for kids” ones. Both are good things. There are also different ways to achieve those results. Media pluralism in DSA can be broadened to say there’s value in pluralism generally: DSA doesn’t tell you how to do things. When we add in the human cost of moderation, we should accept that social media companies can’t do it all.

Senftleben: inconsistency is a pathological perspective; pluralism is a democratic one. [Although I was thinking more about whether a breastfeeding photo shows too much nipple, etc.]

Comment: just because a human is in the loop doesn’t mean they can make an easy decision. Legality of © use in the EU, under 27 different legal regimes, with lots of grey zones in parody/creative uses, is not simple for a human. You’d probably have to request a comparative © law professor. What kind of human are we talking about and what are the standards they are to apply?

Senftleben: isn’t the human value the intuition we bring? Failures and all?

Schwemer: in online framework, differentiate b/t different stages in decision process. There is no human involvement/judgment required in initial decisions. Ex post only, once there is an appeal. 2018 Commission recommendation was the blueprint for the DSA, but that talked about “oversight” rather than human review. Oversight relates to design/development/operations but “review” is ex post, which is an important difference. Desirability of human involvement in operations of first moderation, not just ex post redress. There’s a huge cost to the humans involved, which DSA overlooks. AI Act actually mentions something about training and competences of humans, but that relates to oversight of design/development, not operations.

Keller: FB conversation in which appeals resulted in reversals less often than random sampling of removal decisions for review. Transparency report: There’s about 50% success for appeals under FB’s nudity/terrorism policies and a lot lower for harassment/bullying. So our knowledge is mixed: seems unlikely that FB is wrong 50% of the time about nudity.

Big audit firms seem reluctant to sign up for content moderation auditing b/c they’re accustomed to working from standards, which don’t exist. They’re legally accountable for “reasonable” results—they’d have to vouch for way more stuff than they’re comfortable vouching for given the lack of existing standards. This is why civil society groups are more invested in being at the table: they need to be there as standards are developed, not just a conversation b/t Deloitte and Meta.

Elkin-Koren: we use the term audit but the free speech tradeoffs here are different than the tradeoffs involved in financial audits. The purpose is not to show compliance with a standard but to provide us with info we need to decide whether the system is biased against particular values or to decide what values it does and should reflect. It has to be understood as a work in progress [in a different way than financial audits].

Keller: It would be awesome if the DSA allows descriptive audits, but it’s not clear to her there’s room for that, or that there’s room for civil society to participate deeply in analyzing the information.

Samuelson: financial audits work through transparency about what GAAP are—both a procedure and a set of substantive rules. Then people in the auditing business know what they have to comply with. If Meta and Deloitte do this, they’ll do it in house and not publicize it. So another issue the Commission will have to grapple with is oversight of how standards are set.

Comment: DSA really does require an audit to assess compliance w/ due diligence obligations including risk mitigation/fundamental rights. The risk mitigation obligations are so fluffy/vague that this might define in practice what mitigation means. There is going to be a huge flavor of “compliance” here. What are you going to do when 10 audits say that Meta complied w/its obligations and there are appeals that show this might not be the case?

Van Hoboken: audit requirement was specifically linked to risk mitigation approach. In some sense: “We want other people to do our job.” Changes the character of the audit. It’s an institution-building move, European standard for algorithmic transparency—developing capacity to do that. Government-funded research can also help.

Frosio: what about lawful but awful content? Center stage in UK debates. Seems that DSA doesn’t want lawful but awful content filtered out automatically—UK Online Safety Bill seems to have retreated from obligation to remove lawful but awful content, substituted by more protections for children/new criminal offenses/more control for users over what they see in social media.

Art. 14: entry point for terms and conditions, which can restrict lawful but awful content; applies not just to illegal content but content prohibited by terms of service—services have to protect fundamental rights. But if it’s lawful but awful, what is the fundamental right at issue and how is there to be balancing with terms & conditions?

Griffin: fundamental rights are not only freedom of expression. EU regulators are concerned w/child safety, so automated moderation of nudity/pornography/self-harm may be necessary to do that effectively. Unlikely that courts/regulators will disapprove of that. Regulatory priorities are going in the opposite direction.

Frosio: Some of that will be legal/illegal. My question is more general: what should we be do with lawful but harmful content? Should we think it’s ok to block harmful content although it is lawful? What does that mean about what balancing fundamental rights means? Who’s going to decide what harmful content is? At one point, pro-democratic/revolutionary content was “harmful.” [LGBTQ+ content, and anti-LGBTQ+ content, is a vital example. What does “protecting children” mean?]

Schwemer: if an intermediary service provider does screen lawful but awful content, it is restricted in terms and conditions both substantively and procedurally. What about spam? Spam is not illegal in Europe. That would be a case of moderating awful but not illegal content.

No comments: