Chair: João Quintais
Samuelson: Joel Reidenberg’s Lex Informatica
is a foundational text worth revisiting. Riffs off of the concept of law of
trade; what happened was that people engaged in inter-area commerce made up
sales law through their practices. Informal rules became law; he was thinking
that Lex Mercatoria was a metaphor for Lex Informatica, where similarly we need
to think about new tools. Commission has tried to invent these new tools.
Proposed AI Act disclosure of data—if you don’t want us to
cough up every URL on the internet, what do you want? “We used Common Crawler”?
What is the purpose for which disclosure is sought? Whether you want a map the
size of the territory depends on the goal—is it collective licensing? [Would
that even help get the money to the right people? I guess that’s rarely the big
concern of people demanding collective licensing.]
Eleonora Rosati: EU Parliament wants to get AI Act to finish
line in 2023. Goal: framework for trustworthy AI. Continues on lines of transparency/disclosure.
But also can’t exist w/o thinking of other frameworks; shows how fragmentary EU
law is. Consider: deepfakes and training data. Original EC proposal provided
training material disclosure, but didn’t clarify what permission was needed (if
any). Now refers to “without prejudice to applicable © rules.” No mention of
whether permission is required for deepfakes.
Justin Hughes: you can have deepfakes about floods and
tornadoes, not just about people. In effort to address free expression they’ve
also added unnecessary bangs and whistles. Current proposal: Deepfakes are
defined as things that falsely appear to be authentic or truthful, which
requires disclosure, except if they’re evidently created satirical, artistic,
or fictional (which seems like it wouldn’t falsely appear authentic or truthful).
“Sufficiently detailed” summary of use of training data protected by © is
required, but as/more interesting is requirement of generative AI to have
adequate safeguards against generation of content in breach of EU law (which
means ©). [I assume they also mean CSAM and other things to be named later.]
Art. 27 of DSA is recommender system transparency; are they high-risk AI
systems w/in the meaning of the AI Act? Yes in Parliament’s version. That means
direct overlap in rules. His view: some recommender systems should be prohibited
AI, if social media use is addictive.
Sebastian Schwemer: Understand where it comes from—new legislative
framework for product regulation. Talk to those who followed the broader
process.
Sean O’Connor: training and outputs may need different
safeguards. Each has different relationships to ©.
Eric Goldman: dictating how content is published is the
fundamental framework of the Act—we’re going for the idea that gov’t will dictate
that, which he dislikes extremely.
Quintais: they realized that they hadn’t clearly covered
generative AI and panicked and started introducing new rules.
Daphne Keller: Such a mistake to add generative AI—the policy
questions around AI for criminal sentencing, whether you get a loan, etc. are
so important and deserve attention—would be better to deal with content
generation/speech separately. Use in content moderation—deciding what to take
down—v. using in recommendation—do you have to guard against addiction in
recommendation?
Quintais: Drafters didn’t talk to the people doing the DSA
or the overlaps. Depending on what happens next, there might be real overlap.
Matthias Leistner: if you take measures to avoid substantial
similarity in the models, you might stave off fundamental challenges to ©
principles that show up only in case law—no protection for ideas or style,
though protection for characters. Taking measures to limit the models might be
a good strategy to deal with the long-term danger of loss of those principles.
Use of existing works to train is a separate issue.
Quintais: for the first time, have heard © lawyers say there’s
a need to protect style—not a good development.
Hughes: doesn’t think that AI output is speech.
Goldman: does. Collect information, organize it, disseminate
it. AI does those things which are what makes a publication.
Hughes: expression is by humans.
Goldman: makes a different choice.
Keller: readers have a right to read what they’re interested
in.
Niva Elkin-Koren: when
I prompt ChatGPT and interact w/it, that is speech.
Hughes: if an algorithm
suggests content written by human, there’s still human participation in the
underlying creation. Recommendation automation itself shouldn’t be speech b/c
it’s not human.
Elkin-Koren: ranking
search results should be considered speech b/c it reflects an opinion about how
to rank information implemented by code.
Samuelson:
explainability as a different factor—if it’s not possible to explain this
stuff, generative AI may not have much of a future in Europe. [Of course “the
sorting principle is stuff I like” is not really explainable either, even if
there is in fact a deterministic physical source in my brain. But scale may
make a difference.] “As explainable as possible” might work.
Keep in mind that
standard-setting also favors power: who can afford to go to all the meetings
and participate throughout. Delegating public authorities to private entities.
Different regulatory structures for different entities—when telcos became
broadband providers, had to decide where they would be regulated, which is
similar to the Qs raised by definitions of covered AI—regulatory arbitrage.
Senftleben: use of
collecting societies/levies can be a better regulatory answer than a cascade of
opt-out and then a transparency rule to control whether opt-out is honored and
then litigation on whether it’s sufficiently explained. If we’re afraid we
might lose freedom of style/concepts, telling © owners to accept a right of
remuneration is an option.
Matthias Leistner: don’t give in too soon—remuneration already
frames this as something requiring compensation if not control, but that’s not obvious.
Note that Japan just enacted a very strong right for use for machine learning,
and the anime/comics industries didn’t object to it apparently.
Van Hoboken: May need new speech doctrines for, e.g.,
incorporating generative AI into political speech.
Schwemer: we might want special access to data for purposes
of debiasing AI as a uniquely good justification for, e.g., copying for
training.
Bernt Hugenholtz: these companies want to move forward, and
if they can get certainty by paying off rightsholders they will do so; probably
not collective licensing although the societies would like that; they don’t
have mandates. Instead firms will get rid of uncertainty through cutting big
private deals.
Senftleben: we can give a collective licensing mandate if we
choose—the only way to get money to individuals.
Hugenholtz: but levy systems take forever to introduce too.
We’ve never had a levy regulation.
Elkin-Koren: Google already has an enormous advantage over
newcomers; making everyone who enters pay a levy would kill competition
forever. [I also wonder about what a levy would mean for all the individual
projects that use additional datasets with existing models to refine them.]
Senftleben: his idea is to put a levy on the output.
Samuelson: but they’re demanding control of the input in the
US, not the output (unless it is infringing in the conventional sense).
Frosio: In US it is obvious that training the machine is
fair use; not the case in Europe. What do we do? [Some discussion of how
obvious this was; consensus is that’s the way to bet although the output will
still be subject to © scrutiny for substantial similarity.]
Some discussion of German case holding that, where full
copies of books were in US, Germany only had authority over snippets shown in
search, and those were de minimis. Frosio: French decision held Google Books violated
copyright/quotation right didn’t apply. At some point some countries are going
to find this infringing, and there will be a divide in the capacity to develop
the tech.
Keller: Realpolitik: if platforms can be compelled to carry
disinformation and hate speech, the platforms’ main defense is that they have
First Amendment rights to set editorial policy through content moderation and
through ranking—this was relatively uncontroversial (though is no
longer!). Eugene Volokh thinks that ranking algorithms are more speechy than
content moderation b/c former are written by engineers and bake in value
judgments; she thinks the opposite. There’s caselaw for both, but Volokh’s
version has been embraced by conservatives.
Leistner: why a levy on the output if it’s distant enough to
not infringe a protected work? If you have a levy on the input, why? Results
don’t reflect inputs/the model itself doesn’t contain the inputs, so people
will just train the models outside Europe. So that means that you’d need to
attach levies to output, but that’s just disconnected from ©--an entirely new
basis.
Dussolier: If the issue is market harm from competition w/an
author’s style, a levy is not compensation for that—it harms specific people
and if it is actionable it should be banned, not subjected to a levy.
Elkin-Koren: if generative models destroy the market for human
creativity, does that mean we pay a levy for a few years and then © ceases to
exist? What is the vision here?
Frosio: another question is who is liable: if we focus on
output, liability should be on end users—end users are the ones who instruct model
to come up w/something substantially similar and publish the output.
Samuelson: a global levy is not feasible; also, most of the works
on which the models have been trained are not from the big © owners or even
from small commercial entities—it’s from bloggers/people on Reddit/etc—how would
you even get money to them? [I mean, I’m easy to find 😊]