Table of Links
-
Some recent trends in theoretical ML
2.1 Deep Learning via continuous-time controlled dynamical system
2.2 Probabilistic modeling and inference in DL
-
3.1 Kuramoto models from the geometric point of view
3.2 Hyperbolic geometry of Kuramoto ensembles
3.3 Kuramoto models with several globally coupled sub-ensembles
-
Kuramoto models on higher-dimensional manifolds
4.1 Non-Abelian Kuramoto models on Lie groups
4.2 Kuramoto models on spheres
4.3 Kuramoto models on spheres with several globally coupled sub-ensembles
-
5.1 Statistical models over circles and tori
5.2 Statistical models over spheres
5.3 Statistical models over hyperbolic spaces
5.4 Statistical models over orthogonal groups, Grassmannians, homogeneous spaces
-
6.1 Training swarms on manifolds for supervised ML
6.2 Swarms on manifolds and directional statistics in RL
6.3 Swarms on manifolds and directional statistics for unsupervised ML
6.4 Statistical models for the latent space
6.5 Kuramoto models for learning (coupled) actions of Lie groups
6.6 Grassmannian shallow and deep learning
6.7 Ensembles of coupled oscillators in ML: Beyond Kuramoto models
-
Examples
7.2 Linked robot’s arm (planar rotations)
7.3 Linked robot’s arm (spatial rotations)
7.4 Embedding multilayer complex networks (Learning coupled actions of Lorentz groups)
2.3 Deep Learning in non-Euclidean spaces
A great deal (probably, majority) of data sets are naturally represented in non-Euclidean geometries. This fact has been widely recognized in ML only recently, motivating research efforts in non-Euclidean data representations and ML algorithms over curved spaces. Inferring curvature and symmetries encoded in data sets is of the crucial importance in many tasks.
The necessity of geometric methods in ML is easy to justify. It is apparent even in basic setups. In order to support this point, we provide three illustrative cases.
A) When learning rotations in the 3-dimensional space, traditional NN architectures, which involve Euclidean addition and averaging of vectors, are inappropriate.
B) When optimizing over a certain family of probability distributions (say, for inference problems), it is advisable to take into account the intrinsic geometry of this specific family. For instance, Gaussian policy parametrization is typically used in stochastic RL algorithms over continuous strategy sets. However, intrinsic geometry of the family of Gaussian distributions N (a, Σ) is hyperbolic. If one applies standard Euclidean gradient descent to the problem of learning parameters a and Σ, it is likely that the algorithm will perform incorrectly and learn the matrix Σ which is not positive definite. This does not correspond to any probability distribution. Therefore, one should adapt the gradient descent by taking into account geometry of the manifold N (a, Σ).
C) Although the Gaussian family usually provides the most convenient statistical model for probabilistic ML algorithms over Euclidean spaces, it is not suitable when learning orientations in space. Data of this kind one require families of probability distributions over the sphere.
Inferring the curvature and symmetries hidden in data sets is a central problem of geometric ML. Roughly, these sets can be classified into spherical data (data embedded into spaces with strictly positive curvature), Euclidean data (data with zero curvature) and hyperbolic data (data with strictly negative curvature). Of course, such a classification is highly oversimplifying as the majority of real-life big data have mixed curvature [28, 29]
An apparent example of spherical data are orientations in the Euclidean space. Another less obvious example is the space of categorical distributions (probability distributions over a finite set) equipped with Fisher information metric. The natural gradient update (i.e. update w. r. to Fisher metric) on the manifold of categorical distributions amounts to optimization in spherical geometry.
On the other hand, hyperbolic data are even more ubiquitous in Science. A great deal of data sets have some (possibly hidden) hierarchical structure. Such data are naturally embedded into manifolds with hyperbolic geometry. For example, the power law (also known as the Pareto-Zipf law) for degrees of nodes in complex networks implies hyperbolic geometry, and vice versa [30, 31]. Other examples of inherently hyperbolic data are common in word embeddings and natural language processing [32, 33], molecular structures [34], Gaussian distributions [35]. In general, one might claim that most of biological data have inherently hyperbolic geometry.
Optimization on manifolds is a young subdiscipline within the broad field of mathematical optimization. Although particular problems of this kind (such as Wahba’s problem) occasionally appeared in the literature for a long time, systematic approaches have been elaborated in XXI century [36, 37]. Nowadays, advances in the theory of optimization on manifolds are, to a great extent, motivated by applications to geometric ML. For some examples, we refer to ML algorithms based on optimization over hyperbolic [38, 39, 40] or spherical [41, 42] geometries. Recently, novel architectures of spherical [42] and hyperbolic [38] NN’s have been proposed for dealing with geometric data.
In parallel, probabilistic modeling in geometric ML exploits statistical models over Riemannian manifolds. This motivated a growing interest in applications of directional statistics to ML[43, 44, 45]. Directional statistics is the subdiscipline within general statistics and probability, which deals with observations on compact Riemannian manifolds. Classical and probably the most comprehensive reference on this field is the book of Mardia and Jupp [46].
Another approach to probabilistic modeling in geometric ML is provided by normalizing flows over Riemannian manifolds. Some researchers[47, 48, 49, 50] reported experiments with normalizing flows over spheres, tori and other manifolds for the density estimation problem.
Summarizing, encoding geometric features of data in deep learning models evolved into the emerging field of Geometric Deep Learning [51].
2.3.1 Learning (coupled) actions of transformation groups
In Geometric DL there is an important class of problems, where the goal is to learn transformations, such as rotations in the d-dimensional space, conformal mappings, groups of isometries, etc. Such problems arise in robotics (movement prediction [52] and imitation learning [53]), in analysis of facial expressions [54], computer vision [55], etc. The corresponding algorithms rely on optimization over Lie groups [56, 57], as well as on learning probability distributions over Lie groups [58, 59, 60].
One conceptual approach to problems of this kind is the longstanding idea of NN’s with non-Euclidean neurons (and possibly weights) [61, 62]. More recently, this line of reasoning resulted in novel architectures, named equivariant neural networks [63]. Equivariant NN’s are designed in such a way to ensure that outputs are transformed consistently under symmetry transformations of the inputs.
Very recently, several researchers reported on pioneering efforts in RL with non-Euclidean spaces of states (and actions). So far, robotics is the dominant field of applications. There are two possible approaches to stochastic policies in RL problems of this kind. Some researchers proposed policy parametrizations on the tangent space using standard statistical models (typically the Gaussian family), and projecting the learned policies onto manifolds via exponential map [64]. The second approach consists in parametrization of policies using families of distributions over Riemannian manifolds, thus employing results from directional statistic [59].
Author:
(1) Vladimir Jacimovic, Faculty of Natural Sciences and Mathematics, University of Montenegro Cetinjski put bb., 81000 Podgorica Montenegro ([email protected]).
This paper is
