Factor Zoo and Multiple Testing

Harvey, Liu, and Zhu (2016) document the explosion of proposed factors in asset pricing and argue that standard statistical significance thresholds are inadequate given the extent of multiple testing. Their paper catalogues 316 factors from 313 papers and proposes that new factors need a t-statistic exceeding 3.0 to be credible.

The problem

By 2012, over 300 factors had been published claiming to explain the cross-section of expected returns. The standard significance threshold of t > 2.0 (p < 0.05) assumes a single independent test. When hundreds of factors are tested, the probability that at least some pass by chance is near certainty. The authors argue “most claimed research findings in financial economics are likely false.”

Multiple testing framework

The paper adapts two approaches from statistics:

Family-wise error rate (FWER): controls the probability of even one false discovery. Using Bonferroni or Holm adjustments, this is very conservative.
False discovery rate (FDR): controls the expected proportion of false discoveries among all discoveries. Less stringent, allows some false positives if many true factors exist.

Both approaches raise the required t-statistic well above 2.0. The paper provides historical cutoffs: in 1967 (one factor tested), t > 2.0 sufficed. By 2012 (316 factors), a new factor needs t > 3.0 under FWER assumptions. The threshold continues to rise.

Factor taxonomy

The 316 factors are classified into:

Common risk factors (113): financial (46), macro (40), microstructure (11), behavioral (3), accounting (8), other (5)
Characteristics (202): financial (61), microstructure (28), behavioral (3), accounting (87), other (24)

Implications

Many published anomalies are likely spurious, products of data mining and publication bias
“Nonresults” (factors that fail) rarely get published, biasing the literature toward false positives
New factor discoveries should be evaluated against the full history of prior tests, not in isolation
Out-of-sample validation (McLean and Pontiff 2015 show post-publication degradation) and economic theory provide additional filters

Connection to other work

This paper is the academic anchor for practitioner concerns about data mining in factor investing, discussed in Arnott et al. (2019). campbell-harvey has continued this research program with further work on factor replication and statistical methods. The Hou, Xue, and Zhang (2015) paper independently echoes the concern, noting that about half of 80 anomalies tested are insignificant in the broad cross section.

Sources

…and the Cross-Section of Expected Returns (File, DOI)

Factor Investing Wiki

Explorer