The Efron-Morris Rule: A Practical Application of the Empirical Bayes Paradigm

Written by lossfunctions | Published 2025/09/09
Tech Story Tags: loss-function | empirical-bayes | shrinkage-estimators | james-stein-estimators | bayesian-statistics | efron-morris-rule | multivariate-gaussian | econometrics

TLDRThis article explores variations on the James-Stein estimator, focusing on the Efron-Morris rule and its implications for shrinkage in statistical analysis.via the TL;DR App

Table of Links

Abstract and 1. Introduction

  1. The Compound Decision Paradigm
  2. Parametric Priors
  3. Nonparametric Prior Estimation
  4. Empirical Bayes Methods for Discrete Data
  5. Empirical Bayes Methods for Panel Data
  6. Conclusion

Appendix A. Tweedie’s Formula

Appendix B. Predictive Distribution Comparison

References

3. Parametric Priors

The hyperparameters, v0 and s0 can be estimated by maximum likelihood. The null hypotheses H0 : µi = 0 have p-values,

Another prominent option for the penalty, P, is treat the coordinates of β as if they were drawn iidly from the Cauchy distribution, as considered by Johnstone and Silverman (2004) and Castillo and van der Vaart (2012). Although such priors are generally structured to shrink coefficients toward zero, this is typically rationalized by some form of prior standardization of the design matrix. The empirical aspect of these procedures is generally restricted to choice of the tuning parameter λ representing the scale of the prior density. However, more flexibility can be achieved by permitting larger parametric families, for example Azevedo et al (2020) a class of Student priors for large-scale A/B testing settings with location, scale and degrees of freedom of the prior estimated by maximum likelihood.

Returning to the simple Gaussian sequence model with scalar parameeters θi , in Figure 2 we contrast several forms of shrinkage: linear shrinkage with the classical Stein rule, the lasso procedure that shrinks moderately when y is near zero, and the Cauchy penalty that shrinks very aggressively near zero while large departures from zero are shrunken very little. It should be stressed that tuning the location and scale of these penalties offers some flexibility, but the selection of a functional form for such parametric priors involves a leap of Bayesian faith that may trouble some researchers.

Thus far we have focused entirely on settings in which our base model, φ(y|θ) is Gaussian. Parametric mixture priors play an important role in many other corners of statistics. Poisson models are often paired with gamma mixing, and the modern literature on survival analysis, is permeated by parametric models of “frailty.” As anticipated by the pioneering critique of Heckman and Singer (1984), choosing a specific parametric model for frailty can be difficult so it is natural to turn to nonparametric methods for guidance.

Authors:

(1) Roger Koenker;

(2) Jiaying Gu.


This paper is available on arxiv under CC BY 4.0 DEED license.


Written by lossfunctions | Loss Function fuels the quest for accuracy, driving models to minimize errors and maximize their predictive prowess!
Published by HackerNoon on 2025/09/09