Thursday, June 29, 2023

Transatlantic Dialogue Workshop, Institute for Information Law (IViR), Amsterdam Law School Part 2: Data Access

Impulse Statement: Christophe Geiger: Relevance to © exceptions and limitations—access to © protected work is important for this work. Research organizations have exception in © Directive and also are vital to DSA, so we must look at both. Only digital coordinator-approved researchers are allowed access, with some limited exceptions similar to fallback provisions in DSM Directive art. 4.

Impulse Statement: Sean Flynn: Data protection can be seen as protecting right to privacy but can interfere with right to research. Need balancing/narrow tailoring. Duty to protect: duty to regulate third parties—protecting both privacy rights and researchers in data held by third parties. Duty to promote right of society to benefit from research—similar to duty to create libraries—use the idea to check if we’re balancing rights correctly, regulating appropriate third parties, creating institutions to implement rights.

Europeans were less generous in concepts of “educational”/ “scientific” research than his US perspective—formal research organizations may be required. Journalists in some key categories: are they involved in scientific research? Consumer organizations?

Senftleben: Subordinated to goals of the DSA—research has to be about systemic risk (or mechanisms used by platforms to control systemic risk), which interferes with the freedom of research. If we want researchers to understand what is going on, you have to open up the data silos anyway. Thus there would have been more than enough reason to include a provision opening up data for research in general—trust the research community to formulate the questions. Not reflected in provision. Para. 12 opens up a bit b/c it goes outside the vetted researcher dynamic, but systemic risk defines what can be done with the data.

Keller: the provision formally sets out a really dumb procedure: the researcher formulates the data request without any contact w/platform, gets approval from authority, then goes to platform, which has to respond in 2 weeks. Unlikely to be a format/type of query that is immediately possible to collect, and the platform can only object on 2 enumerated grounds. So the workaround is to create a more dynamic feedback process so researchers can ask for what platforms can actually give. Hopefully an entity set up to deal w/GDPR issues can also check whether the researcher is asking for the right data/what the parameters should be. Hangs on reference to “independent advisory mechanisms” to prevent the process actually described in the DSA from happening.

Elkin-Koren: Example of studying society, not just digital platforms: studying health related factors not caused by platforms but for which platforms have tons of data. Basic/exploratory research where you don’t know the specifics of data you want or specifics of research question but would benefit from exploring what’s there. The key innovation of the DSA is turning research from a private ordering Q into one of public ordering.

Quintais: you have to be careful about who you invite into your research—if the researcher is from outside the jurisdiction they may have to be excluded from the data.

Leistner: one strategy is to interpret research as broadly as possible; another is to ask whether the exception is exclusive. NetDGZ used to have a broader scope; can a member state choose to keep/provide a new exception for research purposes, maybe it is at liberty to do so—there’s no harmonization for general access to data for research purposes. Maybe that is necessary, and it would have to transcend the various IP rights, including © and trade secrets.

Keller: note that having platforms store data in structures amenable to researchers also makes them more attractive to law enforcement. Plus, researchers are likely to find things that they think are evidence of crimes. National security claims: NATO actually indicated that it wanted to be considered a covered research organization. In the US there’s a very real 1A issue about access, but the Texas/Florida social media cases include a question about transparency mandates—not researcher access like this but not unrelated. Also 4A issues.

Comment: No explicit consideration of IP in grounds for rejection but third-party data leads to the same place.

Van Hoboken: Bringing different parts of civil society together for platform accountability for VLOPs; data access is the way to bring in researchers on these risks/mitigation measures. If this provision didn’t exist, you’d have VLOPs doing risk audits/mitigation measures but no way to compare. Some requests will be refused if the platforms say “this isn’t really a risk.” Platforms may also have incentives to deny that something is a mitigation measure to avoid research access. Mid-term value—won’t work fast and maybe will ultimately be defeated.

Goldman: What are Internet Observatory’s experiences w/benefits & threats by House Republicans?

Keller: serious internet researchers among the many academic researchers in the US targeted by various far right people including members of Congress and journalists with good relations w/Elon Musk, targeted as Democratic elite censorship apparatus: allegedly by identifying specific tweets as disinformation, they contributed to suppression of speech in some kind of collusion w/gov’t actors. About 20 lawsuits; [Goldman: subpoenas—information in researchers’ hands is being weaponized—consider this as a warning for people here. His take: they’re trying to harm the research process.] Yes, they’re trying to deter such research and punish the people who already did it, including students’ information when students have already had their parents’ homes targeted. Politicians are threatening academic speech b/c, they say, they’re worried about gov’t suppressing speech.

Goldman: consider the next steps; if you have this information, who will want it from you and what will they do with it? A threat vector for everyone doing the work.

Keller: relates to IP too—today’s academic researcher is tomorrow’s employee of your competitor or of the gov’t; researchers are not pure and nonoverlapping w/other categories.

Elkin-Koren: is the data more secure when held by the platform, though? Can subpoena the platform as well as the university.

Goldman: but you would take this risk into account in your research design, though.

Van Hoboken: At the point this is happening, you have bigger democratic problems; in Europe we are trying to avoid getting there and promote research that has a broader impact. But it’s true there are real safety and politicization issues around what questions you ask.

Goldman: bad faith interpretation of research: the made up debate over

RT: Question spurred by a paper I just read: is the putative “value gap” in © licensing on UGC platforms a systemic risk? Is Content ID a mitigation measure?

[various] Yes and no answers. One: © infringement is illegal content, so you could fit it in somewhere, but to create a problem, it would have to go beyond the legal obligations of Art. 17 b/c there’s already a specific legal obligation.

Keller: don’t you need to do the research to figure out if there’s a problem?

Yes, to study effects of content moderation you need access; can get data with appropriate questions. Could argue it’s discriminatory against independent creators, or that there is overfiltering which there isn’t supposed to be. But that’s not regulated by Art. 17.

Catch-22—you might have to first establish that Content ID is noncompliant before you can get access.

Frosio: you might need the data to test whether there is overblocking. [Which is interesting—what about big © owners who say that it’s not good enough & there’s too much underblocking? Seems like they’d have the same argument in reverse.]

Would need a very tailored argument.

Quintais Follow-up: had conversations with Meta—asked for data to assess whether there was overblocking and their response was “it’s a trade secret.”

Samuelson: Art. 40 process assumes a certain procedure for getting access. One question is can you talk to the platforms first despite the enumerated process. Some people will probably seek access w/o knowing if the data exists. There’s an obligation to at least talk to the approved researchers. But the happy story isn’t the only story: platforms could facilitate good-for-them research.

A: the requirements, if taken seriously, can guard against that—have to be a real academic in some way to be a vetted researcher; reveal funding; not have a commercial interest; underlying concept: the funder can’t have preferred access to the results. Platforms can already fund research if they want to.

Flynn: Ideological think tanks?

A: probably won’t qualify under DSA rules.

Samuelson: but the overseers of this access won’t be able to assess whether the research is well-designed, will they?

A: that’s why there’s an inbetween body that can make recommendations. They propose to provide expertise/advice.

Leistner: Art. 40 comes with a price: concentration of power in Commission, that is the executive and not even the legislature. Issues might arise where we are as scared of the Commission as US folks are of Congress at the moment. That doesn’t mean Art. 40 is bad, but there are no transparency duties on the Commission about what they have done! How the Commission fulfills this powerful role, and what checks and balances might be needed on it, needs to be addressed.

Paddy Leerssen: Comment period: US was the #1 country of response b/c US universities are very interested in access. Scraping issues: access to publicly accessible data/noninterference obligations. How far that goes (overriding contracts, © claims, TPMs) is unclear. Also unclear: who will enforce it.

Conflict with open science/reproducibility/access to data underlying research. Apparent compromise: people who want to replicate will also have to go through the data request process.

Leistner: but best journals require access to data, and giving qualified critics access to that underlying data—your agreement with Nature will say so.

No comments: