Authors:
(1) Mohamed A. Abba, Department of Statistics, North Carolina State University;
(2) Brian J. Reich, Department of Statistics, North Carolina State University;
(3) Reetam Majumder, Southeast Climate Adaptation Science Center, North Carolina State University;
(4) Brandon Feng, Department of Statistics, North Carolina State University.
Table of Links
1.1 Methods to handle large spatial datasets
1.2 Review of stochastic gradient methods
2 Matern Gaussian Process Model and its Approximations
3 The SG-MCMC Algorithm and 3.1 SG Langevin Dynamics
3.2 Derivation of gradients and Fisher information for SGRLD
4 Simulation Study and 4.1 Data generation
4.2 Competing methods and metrics
5 Analysis of Global Ocean Temperature Data
6 Discussion, Acknowledgements, and References
Appendix A.1: Computational Details
Appendix A.2: Additional Results
4.3 Results
Table 1 gives the MSE results. Our SGRLD method outperforms all the others with very low MSE across parameters. In particular, the SGMCMC methods all outperform the NNGP method. In our experiments, we noticed that the NNGP method suffers from very slow mixing due to the M-H step necessary for sampling the covariance parameters. In fact, even if we start the
NNGP sampling process at true values of the covariance parameters, and reduce the variance of the proposal distribution, the acceptance rate of the M-H step stays below 15%. None of the SGMCMC methods requires any such step as long as the learning rate is kept small.
For the ESS results in Table 3, the SGRLD method offers superior effective samples per unit time for all the parameters. The pSGLD and MSGLD method seem to adapt to the curvature of the variance parameter, with pSGLD offering higher effective samples than SGRLD. This suggests that the computed preconditioner in pSGLD adapts mainly to the curvature of the variance term,
but fails to measure the curvature of the smoothness and range. A similar behavior is also observed in the other two methods, MSGLD and ADAMSGLD. On the other hand, the ESS for SGRLD is of the same order for all the parameters. We believe this indicates that using the Fisher information matrix as a Riemannian metric provides an accurate measure of the curvature and results in higher effective samples for all the parameters. The NNGP method provides low effective sample sizes compared to the other three methods due to the low acceptance rate from the MH correction step.
Given the performance of the SG based methods in this simulation study, especially the SGRLD, we conducted an additional simulation study where we focus on point estimates instead of fully Bayesian inference. In Appendix A.2, we tweak the SGRLD method and turn it into a SG Fisher scoring (SGFS) algorithm for point estimates. We compare this method to the full data gradient Fisher scoring method (Guinness, 2019) already implemented in the GpGp R package (Guinness et al., 2018). We find improved speed and estimation precision compared to the GpGp package.
This paper is available on arxiv under CC BY 4.0 DEED license.