Chair: Martin Senftleben
Impulse Statement: Sebastian Felix Schwemer
Recommendation systems; transparency is the approach to
recommender systems, which intersects with privacy/data protection. How much
can we throw recommendation of information and moderation of information in the
same bowl? Algorithmic recommendation/moderation: we’re interested in platforms
but there’s a world beyond platforms where automation is an issue, such as the
DNS. Keeping in mind that regulatory focus is platforms and VLOPs in terms of
algorithmic moderation.
Transparency is a wicked question: who for and how. Not only
rules in Art. 14 p. 4 but also a requirement about terms & conditions: balancing
of fundamental rights when they use algorithmic content moderation. Affects
decisionmaking. Obligation to report on use of automated means for content
moderation—for all intermediary service providers, not just VLOPs, including
accuracy and possible errors. Relates to Q of benchmarking/how do we evaluate
the quality of this decisionmaking? DSA doesn’t have answers to this. Decision
quality might look very different across fields: © might have a yes/no answer,
but misinformation might be very tricky. Very little information on what kind
of human competence is needed.
Impulse Statement: Rebecca Tushnet
Benchmarking: spam detection—interesting that until a
political party got interested there was no inquiry into reliability, and still
no standard for spam detection quality other than “don’t screen out political
fundraising.” Related to the abject quality of real content moderation: it is
literally beneath our notice and we have contempt for the people who carry out abject
functions.
VLOP definition versus “sites that actually have the
problems against which the DSA is directed”—not only fashion sites; nonprofits
as a special issue—Wikipedia, Internet Archive, AO3 which does not recommend
anything; compare to DMCA Classic and DMCA Plus, where some large entities have
repeated © issues and lots of valid notices and others simply don’t—DMCA is a
reasonable system for most of them: Etsy has problems, but not ones that make
sense to frame in any way as Instagram’s or Parler’s.
DSA’s separation of overarching patterns from individual decisions
is good if maintainable, but doesn’t fit easily into US framework—Texas and
Florida laws are both indications of what politicized targeting look like and relatively
unsurprising in reliance on private claims (though outsized damage awards show
the targeting).
Problems with scale: inherent inconsistency. This is usually
shorthanded as “we make 100 million decisions a day, so even a tiny error rate
means a large absolute number of errors.” But it is more than that: inconsistency
and conflicting decisions. We have to accept that—indeed, it will mostly go
undetected—but we also have to accept that the existence of conflicting
decisions does not mean that either one is wrong. Compare: TM applications—in the
US system at least, it is an explicit principle of law that one cannot dispute
a registration decision by pointing to others that seem factually similar (or
even identical) but went the other way; see also: grading by school teachers.
This is related to the DSA mandatory appeals system, which
does look like Texas and Florida. One size fits all for YouTube comments and
entire accounts; not a great model—the same degree of due process for
everything instead of allowing services to focus only on serious disputes like
when someone loses an account. Significant concerns: disproportion in the
demographics of who appeals moderation, already well known as an issue—men, English
speakers. But inconsistency even w/in categories will also be necessary to live
with.
Senftleben: Overarching issues—relationship of different
things addressed in DSA—moderation and recommendation: are they comparable/pose
similar problems? Are there already new problems unaddressed such as generative
AI? Then getting public interest/human rights balancing into the system—who is
responsible for checking this is done properly at the platform level. Consistency
of decisions across countries, cultural backgrounds, appeals. [To be clear: I
don’t think it’s just about cultural backgrounds or demographics: two people w/
the same background will also make different decisions on the same facts
and that’s not necessarily a wrong. (Also: ought implies can and what I’m
arguing is that consistency cannot be achieved at this scale.)]
Goldman: disparate impact can come from many sources, often
impossible to tell what they are. Lots of evidence of disparate impact in
content moderation whose causation will be disputed. Humans v. Machines: there’s
a cost to having humans in the loop: worker wellness. Regulators just don’t value
it and that is a problem in balancing costs and benefits.
Daphne Keller: DSA prohibits inconsistency: you have a right
of appeal to resolve hard judgment calls and get to consistency. [But: ought
implies can; I would think a civil law system is ok with calling each judgment
a tub on its own bottom.]
Leistner: in context of European court system, a small PI
injunction will stay locally; there are divergent results already b/c it only
goes to European courts in rare circumstances. Thus you have inconsistencies
based on the same standards.
Hughes: inconsistency at first decision is not the same
thing as inconsistency at the appeal level. The TTAB and PTO try to be
consistent. [We disagree about this. They certainly try to have rules, but they
also don’t hold that they’re required to treat the same facts the same way—there
might be undisclosed differences in the facts or a different record and they
don’t try to find those differences, just presume they’re there. This is aided
by the fact that different TM applications will, by virtue of being different
TM applications, have slightly different features from previous applications—which
is also true of stuff that gets moderated. The school discipline cases also
show that even when you get to the second level the variety of circumstances
possible make “consistency” a hopeless ideal—the “impersonate a teacher Finsta”
will play out differently at different schools so the facts will always be
differentiable.]
Schwemer: Nondiscriminatory and nonarbitrary which is the
DSA standard doesn’t necessarily require consistency in that strict sense.
Keller: suppose you have a rule: I will remove everything
the machine learning model says is nudity, knowing it has a 10% error rate.
Van Hoboken: No—there’s a due process reconsideration requirement.
Keller: but the model will give the same result on
reconsideration.
Van Hoboken: still not ok. [I take it because that’s not a
fair rule to have?]
Keller: so that is a requirement for hiring people.
Rachel Griffin: Analyzing meaning in context is getting
better—so will people start saying it’s ok to leave it to the machine?
Niva Elkin-Koren: We assume that a good decision by a court/oversight
body on particular facts will immediately translate into the system, but that’s
not true. One reason is that there is a huge gap between the algorithmic system
running on everything and the decision of a panel on one removal. Translation
gap: so we have to ask whether there is bias as well as compliance/error rates.
Agree there’s no evidence that human in the loop improves process, but we could
encourage regulators/implementors to enhance the opportunities for making
humans more efficient—researchers can help with this. Avoiding humans who
participate just for rubber-stamping the system itself.
Séverine Dusollier: Inconsistencies: we know as professors that
we aren’t completely consistent in grading; we know that morning differs from
afternoon—we fight it but it is human nature. There is something that you can’t
completely analogize with machine inconsistency—we might have some randomness
in both, but we are also informed by bias/political perspective/etc. The machine
will also be inconsistent but perhaps in better ways, but what we have to do is
rely on social science studies about how they actually show bias entrenched in
machine and human decisions. The consequences are not the same: a decision on
online platforms has different impacts. [I agree that we don’t know what
machine inconsistency will look like; of course the inputs to the machine come
from humans!]
Wikipedia doesn’t make recommendations. Sometimes the answer
you get from the community is so sexist and misogynistic that it shows it’s still
a system that needs intervention. [Agreed, but my point was that it doesn’t
have the “algorithms” that people accuse of hurting democracy/spurring
anorexia/etc. because it’s not optimized for ad-display engagement. So the
mechanisms for addressing the problems will necessarily be different as will
definition of the problems.]
Samuelson: Audits are really important to counteract human
tendency towards inaction (and perhaps bias in who appeals), but there are no
standards! Financial audits work because we have standards for the kind of
activities we should and shouldn’t be doing in accounting. Right now there is
an emerging new profession for computing audits; but right now the promise exceeds
the capability. We need standards for what the audits should look like and what
the profession that does the audits should look like.
Van Hoboken: it’s not very clear how to judge moderation
especially at the level at which Art. 14 is drafted. Surface specificity, but
application to many different types of services means there’s lots to do. Inconsistency
is a problem of focus precisely b/c we want to see diversity in approaches to
content moderation. We might want “free speech friendly” services in the mix
and “safe for kids” ones. Both are good things. There are also different ways
to achieve those results. Media pluralism in DSA can be broadened to say there’s
value in pluralism generally: DSA doesn’t tell you how to do things. When we
add in the human cost of moderation, we should accept that social media companies
can’t do it all.
Senftleben: inconsistency is a pathological perspective;
pluralism is a democratic one. [Although I was thinking more about whether a
breastfeeding photo shows too much nipple, etc.]
Comment: just because a human is in the loop doesn’t mean
they can make an easy decision. Legality of © use in the EU, under 27 different
legal regimes, with lots of grey zones in parody/creative uses, is not simple
for a human. You’d probably have to request a comparative © law professor. What
kind of human are we talking about and what are the standards they are to
apply?
Senftleben: isn’t the human value the intuition we bring? Failures
and all?
Schwemer: in online framework, differentiate b/t different
stages in decision process. There is no human involvement/judgment required in
initial decisions. Ex post only, once there is an appeal. 2018 Commission
recommendation was the blueprint for the DSA, but that talked about “oversight”
rather than human review. Oversight relates to design/development/operations
but “review” is ex post, which is an important difference. Desirability of
human involvement in operations of first moderation, not just ex post redress. There’s
a huge cost to the humans involved, which DSA overlooks. AI Act actually
mentions something about training and competences of humans, but that relates
to oversight of design/development, not operations.
Keller: FB conversation in which appeals resulted in
reversals less often than random sampling of removal decisions for
review. Transparency report: There’s about 50% success for appeals under FB’s
nudity/terrorism policies and a lot lower for harassment/bullying. So our
knowledge is mixed: seems unlikely that FB is wrong 50% of the time about
nudity.
Big audit firms seem reluctant to sign up for content
moderation auditing b/c they’re accustomed to working from standards, which don’t
exist. They’re legally accountable for “reasonable” results—they’d have to vouch
for way more stuff than they’re comfortable vouching for given the lack of
existing standards. This is why civil society groups are more invested in being
at the table: they need to be there as standards are developed, not just a
conversation b/t Deloitte and Meta.
Elkin-Koren: we use the term audit but the free speech
tradeoffs here are different than the tradeoffs involved in financial audits.
The purpose is not to show compliance with a standard but to provide us with
info we need to decide whether the system is biased against particular values
or to decide what values it does and should reflect. It has to be understood as
a work in progress [in a different way than financial audits].
Keller: It would be awesome if the DSA allows descriptive audits,
but it’s not clear to her there’s room for that, or that there’s room for civil
society to participate deeply in analyzing the information.
Samuelson: financial audits work through transparency about
what GAAP are—both a procedure and a set of substantive rules. Then people in
the auditing business know what they have to comply with. If Meta and Deloitte
do this, they’ll do it in house and not publicize it. So another issue the Commission
will have to grapple with is oversight of how standards are set.
Comment: DSA really does require an audit to assess
compliance w/ due diligence obligations including risk mitigation/fundamental
rights. The risk mitigation obligations are so fluffy/vague that this might
define in practice what mitigation means. There is going to be a huge flavor of
“compliance” here. What are you going to do when 10 audits say that Meta
complied w/its obligations and there are appeals that show this might not be
the case?
Van Hoboken: audit requirement was specifically linked to
risk mitigation approach. In some sense: “We want other people to do our job.”
Changes the character of the audit. It’s an institution-building move, European
standard for algorithmic transparency—developing capacity to do that.
Government-funded research can also help.
Frosio: what about lawful but awful content? Center stage in
UK debates. Seems that DSA doesn’t want lawful but awful content filtered out
automatically—UK Online Safety Bill seems to have retreated from obligation to
remove lawful but awful content, substituted by more protections for
children/new criminal offenses/more control for users over what they see in
social media.
Art. 14: entry point for terms and conditions, which can
restrict lawful but awful content; applies not just to illegal content but
content prohibited by terms of service—services have to protect fundamental
rights. But if it’s lawful but awful, what is the fundamental right at issue
and how is there to be balancing with terms & conditions?
Griffin: fundamental rights are not only freedom of
expression. EU regulators are concerned w/child safety, so automated moderation
of nudity/pornography/self-harm may be necessary to do that effectively.
Unlikely that courts/regulators will disapprove of that. Regulatory priorities
are going in the opposite direction.
Frosio: Some of that will be legal/illegal. My question is
more general: what should we be do with lawful but harmful content? Should we
think it’s ok to block harmful content although it is lawful? What does that
mean about what balancing fundamental rights means? Who’s going to decide what
harmful content is? At one point, pro-democratic/revolutionary content was “harmful.”
[LGBTQ+ content, and anti-LGBTQ+ content, is a vital example. What does “protecting
children” mean?]
Schwemer: if an intermediary service provider does screen
lawful but awful content, it is restricted in terms and conditions both substantively
and procedurally. What about spam? Spam is not illegal in Europe. That would be
a case of moderating awful but not illegal content.
No comments:
Post a Comment