This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.
Authors:
(1) U Jin Choi, Department of mathematical science, Korea Advanced Institute of Science and Technology & [email protected];
(2) Kyung Soo Rim, Department of mathematics, Sogang University & [email protected].
In this mathematical study, we delve into the realm of statistical inference and introduce a novel approach to variational non-Bayesian inference. Most significantly, we propose a new method for uniquely determining the hidden PDF solely from a random sample while leveraging theory.
Beyond merely approximating the shape of the unrevealed PDF, we yield results that provide practical assistance in inferring important moments, such as the mean and variance, from the estimated PDF.
These days, methods utilizing artificial intelligence are widely employed, but they generally exhibit a drawback of depending on initial conditions, i.e., a prior distribution, and iterative computations, i.e., a backpropagation. The limitations of such approaches lie in their reliance on these conditions to achieve global optimization. We emphasize the significant contribution of the proposed method to predictive and classification models, even without any information on populations, highlighting its potential applications in various domains such as finance, economics, weather forecasting, and machine learning, all of which present unique challenges.
The most well-known method for approximating hidden PDFs is the variational approach in a Bayesian context. In contrast, our study extends this problem by seeking to precisely determine unknown PDFs through a system of equations. Our approach includes proving the Fréchet differentiability of entropy to establish the uniqueness of the energy function space in the Wiener algebra. We then derive the unique determination of the energy function through the minimization of KL-divergence.
Leveraging the Ergodic theorem, we elucidate that solutions to equations comprising polynomial function series are the coefficients of the energy function and numerically substantiate the convergence of partial sums of the energy function obtained from a finite number of equations.
In summary, our mathematical exploration has unveiled the potential of variational Non-Bayesian inference in Wiener space. We anticipate that these mathematical ideas can offer an innovative framework for probability density estimation and predictive modeling. Ultimately, our research emphasizes its potential contribution to reshaping mathematical horizons and expanding the boundaries of knowledge in the field of statistical methodology.
[1] K. Bharath A. Saha and S. Kurtek, A geometric variational approach to bayesian inference, Journal of the American Statistical Association 115 (2020), no. 530, 822–835.
[2] P. Alquier and J. Ridgway, Concentration of tempered posteriors and of their variational approximations, The Annals of Statistics 48 (2020), no. 3, 1475–1497.
[3] A. R. Barron and C. Sheu, Approximation of density functions by sequences of exponential families, The Annals of Statistics 19 (1991), no. 3, 1347–1369.
[4] C. M. Bishop and M. E. Tipping, Probabilistic principal component analysis, J. R. Statist. Soc. B 61 (1999), no. 3, 611–622.
[5] F. W. Olver, D. Lozier, R. F. Boisvert and C. W. Clark, NIST Handbook of Mathematical Functions, Cambridge University Press, 2010.
[6] C Brouder and F Patras, One-particle irreducibility with initial correlations, (2011).
[7] I. Csiszar, I-divergence geometry of probability distributions and minimization problems, Ann. Probab. 3 (1975), no. 1, 146–158.
[8] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals and Systems 2 (1989), 303–314.
[9] L. Grafakos, Classical and Modern Fourier Analysis, Prentice Hall, 2003.
[10] D. E. Rumelhart, G. E. Hinton and R. Williams, Learning representations by back-propagating errors, Nature 323 (1986), 533–536.
[11] G. E. Hinton, A Practical Guide to Training Restricted Boltzmann Machines, University of Toronto, 2010.
[12] K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural Networks 4 (1991), no. 2, 251–257.
[13] Y. Katznelson, An introduction to harmonic analysis (third ed.), New York: Cambridge Mathematical Library, 2004.
[14] D. M. Blei, A. Kucukelbir and J. D. McAuliffe, Variational inference: A review for statisticians, Jour. of the American Statistical Association 112 (2017), no. 518, 859–877.
[15] S. Kullback and R. A. Leibler, On information and sufficiency, Ann. Math. Statistics 22 (1951), no. 1, 79–86.
[16] H. M. Möller and R. Tenberg, Multivariate polynomial system solving using intersections of eigenspaces, Journal of Symbolic Computation 32 (2001), no. 5, 513–531.
[17] B. Mourrain, V. Y. Pan and O. Ruatta, Accelerated solution of multivariate polynomial systems of equations, SIAM Journal on Computing 32 (2003), no. 2, 435–454.
[18] W. Rudin, Functional Analysis, McGraw-Hill, 1991.
[19] D. W. Stroock, Probability theory, an analytic view, Cambridge University Press, 1993.
[20] B. Sturmfels, Solving systems of polynomial equations (cbms regional conference series in mathematics), Amer. Math. Soc., 2002.
[21] N. Wiener, Tauberian theorems, Annals of Mathematics 33 (1932), no. 1, 1–100.
[22] F. Zhang and C. Gao, Convergence rates of variational posterior distributions, The Annals of Statistics 48 (2020), no. 4, 2180–2207.