Thursday, October 05, 2017

FTC loses motion to dismiss because court doesn't deal well with statistics

FTC v. Quincy Bioscience Holding Co., 2017 WL 4382312, No. 17 Civ. 124 (S.D.N.Y. Sept. 28, 2017)

Lawyers and especially generalist-by-necessity judges need to understand some statistical basics. When they don’t, they let bad science shape consumers’ decisions and even law.  The FTC and NY’s AG sought to hold Quincy liable for false advertising.  Quincy sells a dietary supplement known as Prevagen, whose active ingredient, apoaequorin, is a dietary protein originally derived from the jellyfish Aequorea victoria. Defendants claim that “Prevagen improves memory,” that it “has been clinically shown to improve memory,” that “A landmark double-blind and placebo controlled trial demonstrated Prevagen improved short-term memory, learning, and delayed recall over 90 days,” that Prevagen “Helps with memory problems associated with aging,” that “Prevagen is clinically shown to help with mild memory problems associated with aging,” and that Prevagen can support “healthier brain function, a sharper mind and clearer thinking.”

The primary support for these claims is the Madison Memory Study, a randomized, double-blind, placebo-controlled study involving 218 adults between the ages of 40 and 91. Participants were assigned “AD8” scores of 0 through 8, with an AD8 score of 2 used to differentiate between those who are cognitively normal or very mildly impaired (with scores of 0-2) and those with higher levels of impairment (with scores of 3-8). At intervals during the 90-day trial, participants were assessed on a variety of cognitive skills. “No statistically significant results were observed for the study population as a whole on any of the cognitive tasks.”  However, test cell participants in the AD8 0-1 subgroup showed statistically significant improvements over those who received the placebo in three of the nine tasks (measuring memory, psychomotor function, and visual learning), and showed a “trend toward significance” in two more tasks (measuring verbal learning and executive function). Test cell participants in the AD8 0-2 subgroup showed statistically significant improvements over those who received the placebo in three of the nine tasks (measuring executive function, attention, and visual learning), and showed a “trend toward significance” in one more task (measuring memory). Thus, the study concluded, “Prevagen demonstrated the ability to improve aspects of cognitive function in older participants with either normal cognitive aging or very mild impairment, as determined by AD8 screening.”

The FTC allegegd that “the researchers conducted more than 30 post hoc analyses of the results looking at data broken down by several variations of smaller subgroups for each of the nine computerized cognitive tasks,” and that post hoc subgroup analysis “greatly increases the probability that the statistically significant improvements shown are by chance alone.” As a result, “the few positive findings on isolated tasks for small subgroups of the study population do not provide reliable evidence of a treatment effect.”  Further, plaintiffs alleged that Quincy’s theory was that apoaequorin enters the human brain to supplement endogenous proteins that are lost during the natural process of aging, but there are no studies showing that orally-administered apoaequorin can cross the human blood-brain barrier. Instead, Quincy’s studies allegedly show that orally-administered apoaequorin is rapidly digested in the stomach and broken down into amino acids and small peptides like any other dietary protein.

In a footnote, the court said that these studies were “contradicted by canine studies whose relevance plaintiffs challenge,” and also that the FTC’s argument “loses force when applied to the results of the subgroup study which make it clear that something caused a statistically significant difference between those subjects who took Prevagen and those given a placebo” (emphasis added).

And here, in the footnote, we have the core of the problem: that “something” causing the statistically significant difference is, at a minimum, plausibly random error.  When you analyze 20 different subgroups, and one of them shows a statistically significant difference at the .05 confidence level, that is exactly what you would expect when the hypothesis that there is no effect is true: 19 out of 20 times, experimental results from the sample match underlying truth, and 1 out of 20 times they don’t.  That’s literally (numerically) what .05 confidence means.  And it’s also part of why post hoc subgrouping is so risky and potentially misleading: once you slice and dice, you have decreased your sample size and increased the chances of getting a false positive.  If the only evidence you had were from the subgroup, then yes, the results support the hypothesis of efficacy, but you can’t ignore that you also have the evidence from the other subgroups.  Moreover, a related reason why post hoc subgrouping is dangerous is that it’s post hoc because you had no preexisting reason to suspect a difference in reaction to the test substance.  Occam’s Razor works well here: the simplest and most plausible explanation is that the subgrouped results, which aren’t even for the same cognitive tasks across groups (thus making a posited mechanism other than random error even harder to come up with), are positive as a result of random error. This is why lawyers desperately need statistics classes.

Xkcd has this on lock, as usual with math stuff:

Despite this, the court found that the FTC didn’t plausibly allege that the representations at issue were false or unsubstantiated, given that “the Madison Memory Study followed normal well-accepted procedures, conducted a ‘gold standard’ double blind, placebo controlled human clinical study using objective outcome measures of human cognitive function using 218 subjects.”  The parties agreed that it failed to show a statistically significant improvement in the experimental group over the placebo group as a whole. The court said “[t]hat confined plaintiffs’ attack to the studies of subgroups,” but that’s an odd way to frame it: the best evidence we have is that the claims that Quincy actually made, phrased generally in the advertising, and not directed at people with low AD8, are untrue.  The best evidence we have that the claims Quincy made are substantiated comes from the subgrouping, but at a minimum the claims would then not be properly qualified, and the better evidence is from the study as a whole.  Anyway, ignoring that, the court ruled that, as to the subgroups, “the complaint fails to do more than point to possible sources of error but cannot allege that any actual errors occurred.”  

The court thought that the FTC’s post hoc argument was merely theoretical.  “They say that findings based on post hoc exploratory analyses have an increased risk of false positives, and increased probability of results altered by chance alone, but neither explain the nature of such risks nor show that they affected the subgroups performance in any way or registered any false positives.”  When I was in practice, we had a case where we ended up having a math professor testify. He concededly had absolutely no expertise in trademark law, or surveys, or drug errors, but he was really helpful in explaining statistics, and why a supposedly positive result from a “confusion analysis” didn’t mean that confusion was likely.  (Bayes’ theorem, so useful.)  Given the substantiation standard and the other evidence from the study as a whole and the evidence about the blood/brain barrier, it is at least plausible that the positive results were false positives. Perhaps that could be refuted by replicating the study and seeing whether the same subgroups and tasks show up as significant, as a start.  The court thought there was no “reason to suspect that these risks are so large in the abstract that they prevent any use of the subgroup concept, which is widely used in the interpretation of data in the dietary supplement field.”  Even if that’s true, (a) what is the court doing deciding this on a motion to dismiss? And (b) post hoc subgrouping is a very different animal.  There’s a very good book about this, Richard Harris’ Rigor Mortis, which I highly recommend to the interested. Still, “[a]ll that is shown by the complaint is that there are possibilities that the study’s results do not support its conclusion,” which isn’t enough for plausibility.

The court dismissed the coordinate state law claims to be renewed (I hope) in state court, if not on appeal.


Amy Kapczynski said...

Great post. I think the prob runs deeper than judges needing statistical training, because there are so many ways to monkey with trial data and its presentation, and post-hoc subgroup analysis is just one of them. I learn about new ones all the time from my med school colleagues. Fundamentally, I think this implies that you need some kind of deference to agencies where they are addressing these kinds of sophisticated questions that demand serious scientific knowledge...

Anonymous said...

Ugh, infuriating!

David Abrams said...

But what about the many cases with no agency involvement and the key questions are statistical in nature?

Bruce Boyden said...

Same judge as in Steinberg v. Columbia Pictures.