Table of Links Abstract and 1. Introduction Preliminaries Proposed Approach 3.1 Notation 3.2 Nueral Networks on SPD Manifolds 3.3 MLR in Structure Spaces 3.4 Neural Networks on Grassmann Manifolds Experiments Conclusion and References A. Notations B. MLR in Structure Spaces C. Formulation of MLR from the Perspective of Distances to Hyperplanes D. Human Action Recognition E. Node Classification F. Limitations of our work G. Some Related Definitions H. Computation of Canonical Representation I. Proof of Proposition 3.2 J. Proof of Proposition 3.4 K. Proof of Proposition 3.5 L. Proof of Proposition 3.6 M. Proof of Proposition 3.11 N. Proof of Proposition 3.12 C FORMULATION OF MLR FROM THE PERSPECTIVE OF DISTANCES TO HYPERPLANES D HUMAN ACTION RECOGNITION D.1 DATASETS HDM05 (Muller et al., 2007) It has 2337 sequences of 3D skeleton data classified into 130 classes. Each frame contains the 3D coordinates of 31 body joints. We use all the action classes and follow the experimental protocol in Harandi et al. (2018) in which 2 subjects are used for training and the remaining 3 subjects are used for testing. FPHA (Garcia-Hernando et al., 2018) It has 1175 sequences of 3D skeleton data classified into 45 classes. Each frame contains the 3D coordinates of 21 hand joints. We follow the experimental protocol in Garcia-Hernando et al. (2018) in which 600 sequences are used for training and 575 sequences are used for testing. NTU60 (Shahroudy et al., 2016) It has 56880 sequences of 3D skeleton data classified into 60 classes. Each frame contains the 3D coordinates of 25 or 50 body joints. We use the mutual actions and follow the cross-subject experimental protocol in Shahroudy et al. (2016) in which data from 20 subjects are used for training, and those from the other 20 subjects are used for testing. D.2 IMPLEMENTATION DETAILS D.2.1 SETUP D.2.2 INPUT DATA For SPDNet and SPDNetBN, each sequence is represented by a covariance matrix (Huang & Gool, 2017; Brooks et al., 2019). The sizes of the covariance matrices are 93×93, 60×60, and 150×150 for HDM05, FPHA, and NTU60 datasets, respectively. For SPDNet, the same architecture as the one in Huang & Gool (2017) is used with three Bimap layers. For SPDNetBN, the same architecture as the one in Brooks et al. (2019) is used with three Bimap layers. The sizes of the transformation matrices for the experiments on HDM05, FPHA, and NTU60 datasets are set to 93 × 93, 60 × 60, and 150 × 150, respectively D.2.3 CONVOLUTIONAL LAYERS D.2.4 OPTIMIZATION For parameters that are SPD matrices, we model them on the space of symmetric matrices, and then apply the exponential map at the identity. Thus, we can optimize all parameters on Euclidean spaces without having to resort to techniques developed on Riemannian manifolds. D.3 TIME COMPLEXITY ANALYSIS D.4 MORE EXPERIMENTAL RESULTS D.4.1 ABLATION STUDY Tab. 4 reports the mean accuracies and standard deviations of GyroSpd++ with respect to different settings of β on the three datasets. GyroSpd++ with the setting β = 0 generally works well on all the datasets. Setting k = 3 improves the accuracy of GyroSpd++ on NTU60 dataset. We also observe that setting k to a high value, e.g., k = 10 lowers the accuracies of GyroSpd++ on the datasets. Output dimension of convolutional layers Tab. 5 presents results and computation times of GyroSpd++ with respect to different settings of the output dimension of the convolutional layer on FPHA dataset. Results show that the setting m = 21 clearly outperforms the setting m = 10 in terms of mean accuracy and standard deviation. However, compared to the setting m = 21, the setting m = 30 only increases the training and testing times without improving the mean accuracy of GyroSpd++. Design of Riemannian metrics for network blocks The use of different Riemannian metrics for the convolutional and MLR layers of GyroSpd++ results in different variants of the same architecture. Results of some of these variants on FPHA dataset are shown in Tab. 6. It is noted that our architecture gives the best performance in terms of mean accuracy, while the architecture with Log-Cholesky geometry for the MLR layer performs the worst in terms of mean accuracy. D.4.2 COMPARISON OF GYROSPD++ AGAINST STATE-OF-THE-ART METHODS Finally, we present a comparison of computation times of SPD neural networks in Tab. 10. Authors: (1) Xuan Son Nguyen, ETIS, UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France (xuan-son.nguyen@ensea.fr); (2) Shuo Yang, ETIS, UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France (son.nguyen@ensea.fr); (3) Aymeric Histace, ETIS, UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France (aymeric.histace@ensea.fr). This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. [3] https://github.com/dalab/hyperbolic_nn. [4] https://github.com/kenziyuliu/MS-G3D. [5] https://github.com/zhysora/FR-Head. [6] https://github.com/Chiaraplizz/ST-TR. Table of Links Abstract and 1. Introduction Abstract and 1. Introduction Preliminaries Proposed Approach 3.1 Notation 3.2 Nueral Networks on SPD Manifolds 3.3 MLR in Structure Spaces 3.4 Neural Networks on Grassmann Manifolds Experiments Conclusion and References Preliminaries Preliminaries Preliminaries Proposed Approach 3.1 Notation 3.2 Nueral Networks on SPD Manifolds 3.3 MLR in Structure Spaces 3.4 Neural Networks on Grassmann Manifolds Proposed Approach 3.1 Notation 3.1 Notation 3.2 Nueral Networks on SPD Manifolds 3.2 Nueral Networks on SPD Manifolds 3.3 MLR in Structure Spaces 3.3 MLR in Structure Spaces 3.4 Neural Networks on Grassmann Manifolds 3.4 Neural Networks on Grassmann Manifolds Experiments Experiments Experiments Conclusion and References Conclusion and References Conclusion and References A. Notations A. Notations B. MLR in Structure Spaces B. MLR in Structure Spaces C. Formulation of MLR from the Perspective of Distances to Hyperplanes C. Formulation of MLR from the Perspective of Distances to Hyperplanes D. Human Action Recognition D. Human Action Recognition E. Node Classification E. Node Classification F. Limitations of our work F. Limitations of our work G. Some Related Definitions G. Some Related Definitions H. Computation of Canonical Representation H. Computation of Canonical Representation I. Proof of Proposition 3.2 I. Proof of Proposition 3.2 J. Proof of Proposition 3.4 J. Proof of Proposition 3.4 K. Proof of Proposition 3.5 K. Proof of Proposition 3.5 L. Proof of Proposition 3.6 L. Proof of Proposition 3.6 M. Proof of Proposition 3.11 M. Proof of Proposition 3.11 N. Proof of Proposition 3.12 N. Proof of Proposition 3.12 C FORMULATION OF MLR FROM THE PERSPECTIVE OF DISTANCES TO HYPERPLANES D HUMAN ACTION RECOGNITION D.1 DATASETS HDM05 (Muller et al., 2007) It has 2337 sequences of 3D skeleton data classified into 130 classes. Each frame contains the 3D coordinates of 31 body joints. We use all the action classes and follow the experimental protocol in Harandi et al. (2018) in which 2 subjects are used for training and the remaining 3 subjects are used for testing. HDM05 (Muller et al., 2007) FPHA (Garcia-Hernando et al., 2018) It has 1175 sequences of 3D skeleton data classified into 45 classes. Each frame contains the 3D coordinates of 21 hand joints. We follow the experimental protocol in Garcia-Hernando et al. (2018) in which 600 sequences are used for training and 575 sequences are used for testing. FPHA (Garcia-Hernando et al., 2018) NTU60 (Shahroudy et al., 2016) It has 56880 sequences of 3D skeleton data classified into 60 classes. Each frame contains the 3D coordinates of 25 or 50 body joints. We use the mutual actions and follow the cross-subject experimental protocol in Shahroudy et al. (2016) in which data from 20 subjects are used for training, and those from the other 20 subjects are used for testing. NTU60 (Shahroudy et al., 2016) D.2 IMPLEMENTATION DETAILS D.2.1 SETUP D.2.1 SETUP D.2.2 INPUT DATA D.2.2 INPUT DATA For SPDNet and SPDNetBN, each sequence is represented by a covariance matrix (Huang & Gool, 2017; Brooks et al., 2019). The sizes of the covariance matrices are 93×93, 60×60, and 150×150 for HDM05, FPHA, and NTU60 datasets, respectively. For SPDNet, the same architecture as the one in Huang & Gool (2017) is used with three Bimap layers. For SPDNetBN, the same architecture as the one in Brooks et al. (2019) is used with three Bimap layers. The sizes of the transformation matrices for the experiments on HDM05, FPHA, and NTU60 datasets are set to 93 × 93, 60 × 60, and 150 × 150, respectively D.2.3 CONVOLUTIONAL LAYERS D.2.3 CONVOLUTIONAL LAYERS D.2.4 OPTIMIZATION D.2.4 OPTIMIZATION For parameters that are SPD matrices, we model them on the space of symmetric matrices, and then apply the exponential map at the identity. Thus, we can optimize all parameters on Euclidean spaces without having to resort to techniques developed on Riemannian manifolds. D.3 TIME COMPLEXITY ANALYSIS D.4 MORE EXPERIMENTAL RESULTS D.4.1 ABLATION STUDY D.4.1 ABLATION STUDY Tab. 4 reports the mean accuracies and standard deviations of GyroSpd++ with respect to different settings of β on the three datasets. GyroSpd++ with the setting β = 0 generally works well on all the datasets. Setting k = 3 improves the accuracy of GyroSpd++ on NTU60 dataset. We also observe that setting k to a high value, e.g., k = 10 lowers the accuracies of GyroSpd++ on the datasets. Output dimension of convolutional layers Tab. 5 presents results and computation times of GyroSpd++ with respect to different settings of the output dimension of the convolutional layer on FPHA dataset. Results show that the setting m = 21 clearly outperforms the setting m = 10 in terms of mean accuracy and standard deviation. However, compared to the setting m = 21, the setting m = 30 only increases the training and testing times without improving the mean accuracy of GyroSpd++. Output dimension of convolutional layers Design of Riemannian metrics for network blocks The use of different Riemannian metrics for the convolutional and MLR layers of GyroSpd++ results in different variants of the same architecture. Results of some of these variants on FPHA dataset are shown in Tab. 6. It is noted that our architecture gives the best performance in terms of mean accuracy, while the architecture with Log-Cholesky geometry for the MLR layer performs the worst in terms of mean accuracy. Design of Riemannian metrics for network blocks D.4.2 COMPARISON OF GYROSPD++ AGAINST STATE-OF-THE-ART METHODS D.4.2 COMPARISON OF GYROSPD++ AGAINST STATE-OF-THE-ART METHODS Finally, we present a comparison of computation times of SPD neural networks in Tab. 10. Authors: (1) Xuan Son Nguyen, ETIS, UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France (xuan-son.nguyen@ensea.fr); (2) Shuo Yang, ETIS, UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France (son.nguyen@ensea.fr); (3) Aymeric Histace, ETIS, UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France (aymeric.histace@ensea.fr). Authors: Authors: (1) Xuan Son Nguyen, ETIS, UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France (xuan-son.nguyen@ensea.fr); (2) Shuo Yang, ETIS, UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France (son.nguyen@ensea.fr); (3) Aymeric Histace, ETIS, UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France (aymeric.histace@ensea.fr). This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. available on arxiv available on arxiv [3] https://github.com/dalab/hyperbolic_nn. [4] https://github.com/kenziyuliu/MS-G3D. [5] https://github.com/zhysora/FR-Head. [6] https://github.com/Chiaraplizz/ST-TR.