Friday, June 30, 2023

Transatlantic Dialogue Workshop, Institute for Information Law (IViR), Amsterdam Law School Part 5: Beyond the DSA

Chair: João Quintais

Samuelson: Joel Reidenberg’s Lex Informatica is a foundational text worth revisiting. Riffs off of the concept of law of trade; what happened was that people engaged in inter-area commerce made up sales law through their practices. Informal rules became law; he was thinking that Lex Mercatoria was a metaphor for Lex Informatica, where similarly we need to think about new tools. Commission has tried to invent these new tools.

Proposed AI Act disclosure of data—if you don’t want us to cough up every URL on the internet, what do you want? “We used Common Crawler”? What is the purpose for which disclosure is sought? Whether you want a map the size of the territory depends on the goal—is it collective licensing? [Would that even help get the money to the right people? I guess that’s rarely the big concern of people demanding collective licensing.]

Eleonora Rosati: EU Parliament wants to get AI Act to finish line in 2023. Goal: framework for trustworthy AI. Continues on lines of transparency/disclosure. But also can’t exist w/o thinking of other frameworks; shows how fragmentary EU law is. Consider: deepfakes and training data. Original EC proposal provided training material disclosure, but didn’t clarify what permission was needed (if any). Now refers to “without prejudice to applicable © rules.” No mention of whether permission is required for deepfakes.

Justin Hughes: you can have deepfakes about floods and tornadoes, not just about people. In effort to address free expression they’ve also added unnecessary bangs and whistles. Current proposal: Deepfakes are defined as things that falsely appear to be authentic or truthful, which requires disclosure, except if they’re evidently created satirical, artistic, or fictional (which seems like it wouldn’t falsely appear authentic or truthful). “Sufficiently detailed” summary of use of training data protected by © is required, but as/more interesting is requirement of generative AI to have adequate safeguards against generation of content in breach of EU law (which means ©). [I assume they also mean CSAM and other things to be named later.] Art. 27 of DSA is recommender system transparency; are they high-risk AI systems w/in the meaning of the AI Act? Yes in Parliament’s version. That means direct overlap in rules. His view: some recommender systems should be prohibited AI, if social media use is addictive.

Sebastian Schwemer: Understand where it comes from—new legislative framework for product regulation. Talk to those who followed the broader process.

Sean O’Connor: training and outputs may need different safeguards. Each has different relationships to ©.

Eric Goldman: dictating how content is published is the fundamental framework of the Act—we’re going for the idea that gov’t will dictate that, which he dislikes extremely.

Quintais: they realized that they hadn’t clearly covered generative AI and panicked and started introducing new rules.

Daphne Keller: Such a mistake to add generative AI—the policy questions around AI for criminal sentencing, whether you get a loan, etc. are so important and deserve attention—would be better to deal with content generation/speech separately. Use in content moderation—deciding what to take down—v. using in recommendation—do you have to guard against addiction in recommendation?

Quintais: Drafters didn’t talk to the people doing the DSA or the overlaps. Depending on what happens next, there might be real overlap.

Matthias Leistner: if you take measures to avoid substantial similarity in the models, you might stave off fundamental challenges to © principles that show up only in case law—no protection for ideas or style, though protection for characters. Taking measures to limit the models might be a good strategy to deal with the long-term danger of loss of those principles. Use of existing works to train is a separate issue.

Quintais: for the first time, have heard © lawyers say there’s a need to protect style—not a good development.

Hughes: doesn’t think that AI output is speech.

Goldman: does. Collect information, organize it, disseminate it. AI does those things which are what makes a publication.

Hughes: expression is by humans.

Goldman: makes a different choice.

Keller: readers have a right to read what they’re interested in.

Niva Elkin-Koren: when I prompt ChatGPT and interact w/it, that is speech.

Hughes: if an algorithm suggests content written by human, there’s still human participation in the underlying creation. Recommendation automation itself shouldn’t be speech b/c it’s not human.

Elkin-Koren: ranking search results should be considered speech b/c it reflects an opinion about how to rank information implemented by code.

Samuelson: explainability as a different factor—if it’s not possible to explain this stuff, generative AI may not have much of a future in Europe. [Of course “the sorting principle is stuff I like” is not really explainable either, even if there is in fact a deterministic physical source in my brain. But scale may make a difference.] “As explainable as possible” might work.

Keep in mind that standard-setting also favors power: who can afford to go to all the meetings and participate throughout. Delegating public authorities to private entities. Different regulatory structures for different entities—when telcos became broadband providers, had to decide where they would be regulated, which is similar to the Qs raised by definitions of covered AI—regulatory arbitrage.

Senftleben: use of collecting societies/levies can be a better regulatory answer than a cascade of opt-out and then a transparency rule to control whether opt-out is honored and then litigation on whether it’s sufficiently explained. If we’re afraid we might lose freedom of style/concepts, telling © owners to accept a right of remuneration is an option.

Matthias Leistner: don’t give in too soon—remuneration already frames this as something requiring compensation if not control, but that’s not obvious. Note that Japan just enacted a very strong right for use for machine learning, and the anime/comics industries didn’t object to it apparently.

Van Hoboken: May need new speech doctrines for, e.g., incorporating generative AI into political speech.

Schwemer: we might want special access to data for purposes of debiasing AI as a uniquely good justification for, e.g., copying for training.

Bernt Hugenholtz: these companies want to move forward, and if they can get certainty by paying off rightsholders they will do so; probably not collective licensing although the societies would like that; they don’t have mandates. Instead firms will get rid of uncertainty through cutting big private deals.

Senftleben: we can give a collective licensing mandate if we choose—the only way to get money to individuals.

Hugenholtz: but levy systems take forever to introduce too. We’ve never had a levy regulation.

Elkin-Koren: Google already has an enormous advantage over newcomers; making everyone who enters pay a levy would kill competition forever. [I also wonder about what a levy would mean for all the individual projects that use additional datasets with existing models to refine them.]

Senftleben: his idea is to put a levy on the output.

Samuelson: but they’re demanding control of the input in the US, not the output (unless it is infringing in the conventional sense).

Frosio: In US it is obvious that training the machine is fair use; not the case in Europe. What do we do? [Some discussion of how obvious this was; consensus is that’s the way to bet although the output will still be subject to © scrutiny for substantial similarity.]

Some discussion of German case holding that, where full copies of books were in US, Germany only had authority over snippets shown in search, and those were de minimis. Frosio: French decision held Google Books violated copyright/quotation right didn’t apply. At some point some countries are going to find this infringing, and there will be a divide in the capacity to develop the tech.

Keller: Realpolitik: if platforms can be compelled to carry disinformation and hate speech, the platforms’ main defense is that they have First Amendment rights to set editorial policy through content moderation and through ranking—this was relatively uncontroversial (though is no longer!). Eugene Volokh thinks that ranking algorithms are more speechy than content moderation b/c former are written by engineers and bake in value judgments; she thinks the opposite. There’s caselaw for both, but Volokh’s version has been embraced by conservatives.

Leistner: why a levy on the output if it’s distant enough to not infringe a protected work? If you have a levy on the input, why? Results don’t reflect inputs/the model itself doesn’t contain the inputs, so people will just train the models outside Europe. So that means that you’d need to attach levies to output, but that’s just disconnected from ©--an entirely new basis.

Dussolier: If the issue is market harm from competition w/an author’s style, a levy is not compensation for that—it harms specific people and if it is actionable it should be banned, not subjected to a levy.

Elkin-Koren: if generative models destroy the market for human creativity, does that mean we pay a levy for a few years and then © ceases to exist? What is the vision here?

Frosio: another question is who is liable: if we focus on output, liability should be on end users—end users are the ones who instruct model to come up w/something substantially similar and publish the output.

Samuelson: a global levy is not feasible; also, most of the works on which the models have been trained are not from the big © owners or even from small commercial entities—it’s from bloggers/people on Reddit/etc—how would you even get money to them? [I mean, I’m easy to find 😊]

No comments: