Authors:
(1) Mohamed A. Abba, Department of Statistics, North Carolina State University;
(2) Brian J. Reich, Department of Statistics, North Carolina State University;
(3) Reetam Majumder, Southeast Climate Adaptation Science Center, North Carolina State University;
(4) Brandon Feng, Department of Statistics, North Carolina State University.
Table of Links
1.1 Methods to handle large spatial datasets
1.2 Review of stochastic gradient methods
2 Matern Gaussian Process Model and its Approximations
3 The SG-MCMC Algorithm and 3.1 SG Langevin Dynamics
3.2 Derivation of gradients and Fisher information for SGRLD
4 Simulation Study and 4.1 Data generation
4.2 Competing methods and metrics
5 Analysis of Global Ocean Temperature Data
6 Discussion, Acknowledgements, and References
Appendix A.1: Computational Details
Appendix A.2: Additional Results
6 Discussion
SG methods offer considerable speed-ups when the data size is very large. In fact, one can take hundreds or even thousands of steps in one pass through the whole dataset in the time it takes for only one step if the full dataset is used. This enables fast exploration of the posterior in significantly less time. GPs however fall within the correlated setting case where SGMCMC methods have received limited attention. Spatial correlation is a critical component of GPs and naive subsampling during parameter estimation would lead to random divisions of the spatial domain at each iteration. By leveraging the form of the Vecchia approximation, we derive unbiased gradient estimates based on minibatches of the data. We developed a new stochastic gradient based MCMC algorithm for scalable Bayesian inference in large spatial data settings. Without the Vecchia approximation, subsampling strategies would always lead to biased gradient estimates. The proposed method also uses the exact Fisher information to speed up convergence and explore the parameter space efficiently. Our work contributes to the literature on scalable methods for Gaussian process, and can be extended to non Gaussian models i.e. classification.
Acknowledgements
This research was partially supported by National Science Foundation grants DMS2152887 and CMMT2022254, and by grants from the Southeast National Synthesis Wildfire and the United States Geological Surveyβs National Climate Adaptation Science Center (G21AC10045).
References
Aicher, C., Ma, Y.-A., Foti, N. J. and Fox, E. B. (2019) Stochastic gradient MCMC for state space models. SIAM Journal on Mathematics of Data Science, 1, 555β587.
Aicher, C., Putcha, S., Nemeth, C., Fearnhead, P. and Fox, E. B. (2021) Stochastic gradient MCMC for nonlinear state space models. arXiv preprint arXiv:1901.10568.
Argo (2023) Argo Program Office. https://argo.ucsd.edu/. Accessed: 2023-11-26.
Baker, J., Fearnhead, P., Fox, E. B. and Nemeth, C. (2019) Control variates for stochastic gradient MCMC. Statistics and Computing, 29, 599β615.
Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2008) Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 825β848.
Barbian, M. H. and AssuncΒΈao, R. M. (2017) Spatial subensemble estimator for large geostatistical Λ data. Spatial Statistics, 22, 68β88.
Chee, J. and Toulis, P. (2018) Convergence diagnostics for stochastic gradient descent with constant learning rate. In International Conference on Artificial Intelligence and Statistics, 1476β 1485. PMLR.
Chen, C., Ding, N. and Carin, L. (2015) On the convergence of stochastic gradient MCMC algorithms with high-order integrators. In Neural Information Processing Systems. URLhttps: //api.semanticscholar.org/CorpusID:2196919.
Chen, H., Zheng, L., Al Kontar, R. and Raskutti, G. (2020) Stochastic gradient descent in correlated settings: A study on gaussian processes. Advances in neural information processing systems, 33, 2722β2733.
Chen, T., Fox, E. and Guestrin, C. (2014) Stochastic gradient Hamiltonian Monte Carlo. In International conference on machine learning, 1683β1691. PMLR.
Cressie, N. (1988) Spatial prediction and ordinary kriging. Mathematical geology, 20, 405β421.
Cressie, N. and Johannesson, G. (2008) Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 209β226.
Dalalyan, A. S. and Karagulyan, A. (2019) User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stochastic Processes and their Applications, 129, 5278β5311.
Datta, A., Banerjee, S., Finley, A. O. and Gelfand, A. E. (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111, 800β812.
Dubey, K. A., J Reddi, S., Williamson, S. A., Poczos, B., Smola, A. J. and Xing, E. P. (2016) Variance reduction in stochastic gradient Langevin dynamics. Advances in neural information processing systems, 29.
Durmus, A. and Moulines, E. (2017) Nonasymptotic convergence analysis for the unadjusted Β΄ Langevin algorithm. The Annals of Applied Probability, 27, 1551 β 1587. URLhttps: //doi.org/10.1214/16-AAP1238.
Finley, A. O., Datta, A., Cook, B. D., Morton, D. C., Andersen, H. E. and Banerjee, S. (2019) Efficient algorithms for Bayesian nearest neighbor Gaussian processes. Journal of Computational and Graphical Statistics, 28, 401β414.
Furrer, R., Genton, M. G. and Nychka, D. (2006) Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15, 502β523.
Gelfand, A. E. and Schliep, E. M. (2016) Spatial statistics and Gaussian processes: A beautiful marriage. Spatial Statistics, 18, 86β104. URLhttps://www.sciencedirect.com/ science/article/pii/S2211675316300033. Spatial Statistics Avignon: Emerging Patterns.
Guhaniyogi, R. and Banerjee, S. (2018) Meta-kriging: Scalable Bayesian modeling and inference for massive spatial datasets. Technometrics.
Guinness, J. (2018) Permutation and grouping methods for sharpening Gaussian process approximations. Technometrics, 60, 415β429. URLhttps://doi.org/10.1080/00401706. 2018.1437476. PMID: 31447491.
β (2019) Gaussian process learning via fisher scoring of vecchiaβs approximation.
Guinness, J., Katzfuss, M. and Fahmy, Y. (2018) Gpgp: fast Gaussian process computation using Vecchiaβs approximation. R package version 0.1. 0.
Hardt, M., Recht, B. and Singer, Y. (2016) Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning, 1225β1234. PMLR.
Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M. et al. (2019) A case study competition among methods for analyzing large spatial data. Journal of Agricultural, Biological and Environmental Statistics, 24, 398β425.
Heaton, M. J. and Johnson, J. A. (2023) Minibatch Markov chain Monte Carlo algorithms for fitting Gaussian processes. arXiv preprint arXiv:2310.17766.
Heidelberger, P. and Welch, P. D. (1981) A spectral method for confidence interval generation and run length control in simulations. Communications of the ACM, 24, 233β245.
Hinton, G., Srivastava, N. and Swersky, K. (2012) Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited on, 14, 2.
Kang, E., Liu, D. and Cressie, N. (2009) Statistical analysis of small-area data based on independence, spatial, non-hierarchical, and hierarchical models. Computational Statistics & Data Analysis, 53, 3016β3032.
Katzfuss, M. and Cressie, N. (2011) Spatio-temporal smoothing and em estimation for massive remote-sensing data sets. Journal of Time Series Analysis, 32, 430β446.
Katzfuss, M. and Guinness, J. (2021) A general framework for Vecchia approximations of Gaussian processes. Statistical Science, 36, 124β141.
Kaufman, C. G., Schervish, M. J. and Nychka, D. W. (2008) Covariance tapering for likelihoodbased estimation in large spatial data sets. Journal of the American Statistical Association, 103, 1545β1555.
Kim, S., Song, Q. and Liang, F. (2022) Stochastic gradient Langevin dynamics with adaptive drifts. Journal of statistical computation and simulation, 92, 318β336.
Kingma, D. P. and Ba, J. (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Li, C., Chen, C., Carlson, D. and Carin, L. (2016a) Preconditioned stochastic gradient Langevin dynamics for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence, vol. 30.
Li, W., Ahn, S. and Welling, M. (2016b) Scalable MCMC for mixed membership stochastic blockmodels. In Artificial Intelligence and Statistics, 723β731. PMLR.
Ma, Y., Ma, Y.-A., Chen, T. and Fox, E. B. (2015) A complete recipe for stochastic gradient MCMC. In Neural Information Processing Systems. URLhttps://api. semanticscholar.org/CorpusID:17950949.
Ma, Y.-A., Foti, N. J. and Fox, E. B. (2017) Stochastic gradient MCMC methods for hidden Markov models. In International Conference on Machine Learning, 2265β2274. PMLR.
Mardia, K. V. and Marshall, R. J. (1984) Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika, 71, 135β146. URLhttp://www.jstor.org/ stable/2336405.
Nemeth, C. and Fearnhead, P. (2021) Stochastic gradient Markov chain Monte Carlo. Journal of the American Statistical Association, 116, 433β450. URLhttps://doi.org/10.1080/ 01621459.2020.1847120.
Newton, D., Yousefian, F. and Pasupathy, R. (2018) Stochastic gradient descent: Recent trends. Recent advances in optimization and modeling of contemporary problems, 193β220.
Robbins, H. and Monro, S. (1951) A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22, 400 β 407. URLhttps://doi.org/10.1214/aoms/1177729586.
Roberts, G. O. and Rosenthal, J. S. (1998) Optimal scaling of discrete approximations to Langevin diffusions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60. URLhttps://api.semanticscholar.org/CorpusID:5831882.
Rue, H., Martino, S. and Chopin, N. (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 319β392.
Saha, S. and Bradley, J. R. (2023) Incorporating subsampling into Bayesian models for high-dimensional spatial data. arXiv preprint arXiv:2305.13221.
Sang, H., Jun, M. and Huang, J. Z. (2011) Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors. The Annals of Applied Statistics, 2519β2548.
Stein, M. L. (1999) Interpolation of spatial data: some theory for kriging. Springer Science & Business Media.
β (2002) The screening effect in Kriging. The Annals of Statistics, 30, 298 β 323. URLhttps: //doi.org/10.1214/aos/1015362194.
β (2011) 2010 Rietz lecture: When does the screening effect hold? The Annals of Statistics, 39, 2795β2819. URLhttp://www.jstor.org/stable/41713599.
Stein, M. L., Chi, Z. and Welty, L. J. (2004) Approximating likelihoods for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66, 275β296.
Teh, Y., Thiery, A. and Vollmer, S. (2016) Consistency and fluctuations for stochastic gradient Β΄ Langevin dynamics. Journal of Machine Learning Research, 17.
Vecchia, A. V. (1988) Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society: Series B (Methodological), 50, 297β312.
Welling, M. and Teh, Y. W. (2011) Bayesian learning via stochastic gradient Langevin dynamics. In International Conference on Machine Learning. URLhttps://api.semanticscholar. org/CorpusID:2178983.
Woodard, R. (2000) Interpolation of spatial data: some theory for kriging. Technometrics, 42, 436.
This paper is available on arxiv under CC BY 4.0 DEED license.