FTC v. Quincy Bioscience Holding Co., 2017 WL 4382312, No. 17
Civ. 124 (S.D.N.Y. Sept. 28, 2017)
Lawyers and especially generalist-by-necessity judges need
to understand some statistical basics. When they don’t, they let bad science
shape consumers’ decisions and even law.
The FTC and NY’s AG sought to hold Quincy liable for false
advertising. Quincy sells a dietary
supplement known as Prevagen, whose active ingredient, apoaequorin, is a
dietary protein originally derived from the jellyfish Aequorea victoria. Defendants
claim that “Prevagen improves memory,” that it “has been clinically shown to
improve memory,” that “A landmark double-blind and placebo controlled trial
demonstrated Prevagen improved short-term memory, learning, and delayed recall
over 90 days,” that Prevagen “Helps with memory problems associated with
aging,” that “Prevagen is clinically shown to help with mild memory problems
associated with aging,” and that Prevagen can support “healthier brain
function, a sharper mind and clearer thinking.”
The primary support for these claims is the Madison Memory
Study, a randomized, double-blind, placebo-controlled study involving 218
adults between the ages of 40 and 91. Participants were assigned “AD8” scores
of 0 through 8, with an AD8 score of 2 used to differentiate between those who
are cognitively normal or very mildly impaired (with scores of 0-2) and those
with higher levels of impairment (with scores of 3-8). At intervals during the
90-day trial, participants were assessed on a variety of cognitive skills. “No
statistically significant results were observed for the study population as a
whole on any of the cognitive tasks.”
However, test cell participants in the AD8 0-1 subgroup showed
statistically significant improvements over those who received the placebo in
three of the nine tasks (measuring memory, psychomotor function, and visual
learning), and showed a “trend toward significance” in two more tasks
(measuring verbal learning and executive function). Test cell participants in
the AD8 0-2 subgroup showed statistically significant improvements over those
who received the placebo in three of the nine tasks (measuring executive
function, attention, and visual learning), and showed a “trend toward
significance” in one more task (measuring memory). Thus, the study concluded,
“Prevagen demonstrated the ability to improve aspects of cognitive function in
older participants with either normal cognitive aging or very mild impairment,
as determined by AD8 screening.”
The FTC allegegd that “the researchers conducted more than
30 post hoc analyses of the results looking at data broken down by several
variations of smaller subgroups for each of the nine computerized cognitive
tasks,” and that post hoc subgroup analysis “greatly increases the probability
that the statistically significant improvements shown are by chance alone.” As
a result, “the few positive findings on isolated tasks for small subgroups of
the study population do not provide reliable evidence of a treatment
effect.” Further, plaintiffs alleged
that Quincy’s theory was that apoaequorin enters the human brain to supplement
endogenous proteins that are lost during the natural process of aging, but
there are no studies showing that orally-administered apoaequorin can cross the
human blood-brain barrier. Instead, Quincy’s studies allegedly show that
orally-administered apoaequorin is rapidly digested in the stomach and broken
down into amino acids and small peptides like any other dietary protein.
In a footnote, the court said that these studies were “contradicted
by canine studies whose relevance plaintiffs challenge,” and also that the
FTC’s argument “loses force when applied to the results of the subgroup study
which make it clear that something
caused a statistically significant difference between those subjects who took
Prevagen and those given a placebo” (emphasis added).
And here, in the footnote, we have the core of the problem:
that “something” causing the statistically significant difference is, at a
minimum, plausibly random error. When you analyze 20 different subgroups, and
one of them shows a statistically significant difference at the .05 confidence
level, that is exactly what you would expect
when the hypothesis that there is no effect is true: 19 out of 20 times, experimental
results from the sample match underlying truth, and 1 out of 20 times they
don’t. That’s literally (numerically)
what .05 confidence means. And it’s also
part of why post hoc subgrouping is so risky and potentially misleading: once
you slice and dice, you have decreased your sample size and increased the
chances of getting a false positive. If
the only evidence you had were from the subgroup, then yes, the results support
the hypothesis of efficacy, but you can’t ignore that you also have the evidence from the other subgroups. Moreover, a related reason why post hoc
subgrouping is dangerous is that it’s post hoc because you had no preexisting
reason to suspect a difference in reaction to the test substance. Occam’s Razor works well here: the simplest
and most plausible explanation is that the subgrouped results, which aren’t
even for the same cognitive tasks across groups (thus making a posited
mechanism other than random error even harder to come up with), are positive as
a result of random error. This is why lawyers desperately need statistics
classes.
Xkcd has this on lock, as usual with math stuff:
Despite this, the court found that the FTC didn’t plausibly
allege that the representations at issue were false or unsubstantiated, given
that “the Madison Memory Study followed normal well-accepted procedures,
conducted a ‘gold standard’ double blind, placebo controlled human clinical
study using objective outcome measures of human cognitive function using 218
subjects.” The parties agreed that it
failed to show a statistically significant improvement in the experimental
group over the placebo group as a whole. The court said “[t]hat confined
plaintiffs’ attack to the studies of subgroups,” but that’s an odd way to frame
it: the best evidence we have is that the claims that Quincy actually made, phrased
generally in the advertising, and not directed at people with low AD8, are
untrue. The best evidence we have that
the claims Quincy made are substantiated comes from the subgrouping, but at a
minimum the claims would then not be properly qualified, and the better evidence is from the study as a whole. Anyway, ignoring that, the court ruled that,
as to the subgroups, “the complaint fails to do more than point to possible
sources of error but cannot allege that any actual errors occurred.”
The court thought that the FTC’s post hoc argument was
merely theoretical. “They say that
findings based on post hoc exploratory analyses have an increased risk of false
positives, and increased probability of results altered by chance alone, but
neither explain the nature of such risks nor show that they affected the
subgroups performance in any way or registered any false positives.” When I was in practice, we had a case where
we ended up having a math professor testify. He concededly had absolutely no
expertise in trademark law, or surveys, or drug errors, but he was really
helpful in explaining statistics, and why a supposedly positive result from a
“confusion analysis” didn’t mean that confusion was likely. (Bayes’ theorem, so useful.) Given the substantiation standard and the
other evidence from the study as a whole and the evidence about the blood/brain
barrier, it is at least plausible that the positive results were false positives.
Perhaps that could be refuted by replicating the study and seeing whether the
same subgroups and tasks show up as significant, as a start. The court thought there was no “reason to
suspect that these risks are so large in the abstract that they prevent any use
of the subgroup concept, which is widely used in the interpretation of data in
the dietary supplement field.” Even if
that’s true, (a) what is the court doing deciding this on a motion to dismiss?
And (b) post hoc subgrouping is a very different animal. There’s a very good book about this, Richard
Harris’ Rigor Mortis, which I highly recommend to the interested. Still, “[a]ll
that is shown by the complaint is that there are possibilities that the study’s
results do not support its conclusion,” which isn’t enough for plausibility.
The court dismissed the coordinate state law claims to be
renewed (I hope) in state court, if not on appeal.
Great post. I think the prob runs deeper than judges needing statistical training, because there are so many ways to monkey with trial data and its presentation, and post-hoc subgroup analysis is just one of them. I learn about new ones all the time from my med school colleagues. Fundamentally, I think this implies that you need some kind of deference to agencies where they are addressing these kinds of sophisticated questions that demand serious scientific knowledge...
ReplyDeleteUgh, infuriating!
ReplyDeleteBut what about the many cases with no agency involvement and the key questions are statistical in nature?
ReplyDeleteSame judge as in Steinberg v. Columbia Pictures.
ReplyDelete