paint-brush
Exploring the Impact of Riemannian Metrics in Human Action Recognition Tasks Using GyroSpd++by@hyperbole

Exploring the Impact of Riemannian Metrics in Human Action Recognition Tasks Using GyroSpd++

by HyperboleDecember 3rd, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

We evaluate GyroSpd++ in human action recognition using three datasets (HDM05, FPHA, NTU60), reporting on performance, convolutional layer design, and optimization. Ablation studies and comparisons with state-of-the-art methods highlight its advantages and challenges.
featured image - Exploring the Impact of Riemannian Metrics in Human Action Recognition Tasks Using GyroSpd++
Hyperbole HackerNoon profile picture

Abstract and 1. Introduction

  1. Preliminaries

  2. Proposed Approach

    3.1 Notation

    3.2 Nueral Networks on SPD Manifolds

    3.3 MLR in Structure Spaces

    3.4 Neural Networks on Grassmann Manifolds

  3. Experiments

  4. Conclusion and References

A. Notations

B. MLR in Structure Spaces

C. Formulation of MLR from the Perspective of Distances to Hyperplanes

D. Human Action Recognition

E. Node Classification

F. Limitations of our work

G. Some Related Definitions

H. Computation of Canonical Representation

I. Proof of Proposition 3.2

J. Proof of Proposition 3.4

K. Proof of Proposition 3.5

L. Proof of Proposition 3.6

M. Proof of Proposition 3.11

N. Proof of Proposition 3.12


C FORMULATION OF MLR FROM THE PERSPECTIVE OF DISTANCES TO HYPERPLANES


D HUMAN ACTION RECOGNITION

D.1 DATASETS

HDM05 (Muller et al., 2007) It has 2337 sequences of 3D skeleton data classified into 130 classes. Each frame contains the 3D coordinates of 31 body joints. We use all the action classes and follow the experimental protocol in Harandi et al. (2018) in which 2 subjects are used for training and the remaining 3 subjects are used for testing.


FPHA (Garcia-Hernando et al., 2018) It has 1175 sequences of 3D skeleton data classified into 45 classes. Each frame contains the 3D coordinates of 21 hand joints. We follow the experimental protocol in Garcia-Hernando et al. (2018) in which 600 sequences are used for training and 575 sequences are used for testing.


NTU60 (Shahroudy et al., 2016) It has 56880 sequences of 3D skeleton data classified into 60 classes. Each frame contains the 3D coordinates of 25 or 50 body joints. We use the mutual actions and follow the cross-subject experimental protocol in Shahroudy et al. (2016) in which data from 20 subjects are used for training, and those from the other 20 subjects are used for testing.

D.2 IMPLEMENTATION DETAILS

D.2.1 SETUP



D.2.2 INPUT DATA



For SPDNet and SPDNetBN, each sequence is represented by a covariance matrix (Huang & Gool, 2017; Brooks et al., 2019). The sizes of the covariance matrices are 93×93, 60×60, and 150×150 for HDM05, FPHA, and NTU60 datasets, respectively. For SPDNet, the same architecture as the one in Huang & Gool (2017) is used with three Bimap layers. For SPDNetBN, the same architecture as the one in Brooks et al. (2019) is used with three Bimap layers. The sizes of the transformation matrices for the experiments on HDM05, FPHA, and NTU60 datasets are set to 93 × 93, 60 × 60, and 150 × 150, respectively


D.2.3 CONVOLUTIONAL LAYERS



D.2.4 OPTIMIZATION


For parameters that are SPD matrices, we model them on the space of symmetric matrices, and then apply the exponential map at the identity.



Thus, we can optimize all parameters on Euclidean spaces without having to resort to techniques developed on Riemannian manifolds.

D.3 TIME COMPLEXITY ANALYSIS

D.4 MORE EXPERIMENTAL RESULTS

D.4.1 ABLATION STUDY



Tab. 4 reports the mean accuracies and standard deviations of GyroSpd++ with respect to different settings of β on the three datasets. GyroSpd++ with the setting β = 0 generally works well on all the datasets. Setting k = 3 improves the accuracy of GyroSpd++ on NTU60 dataset. We also observe that setting k to a high value, e.g., k = 10 lowers the accuracies of GyroSpd++ on the datasets.


Table 4: Results (mean accuracy ± standard deviation) of GyroSpd++ with respect to different settings of β on the three datasets (computed over 5 runs).


Table 5: Results and computation times (seconds) of GyroSpd++ with respect to different settings of the output dimension of the convolutional layer on FPHA dataset (computed over 5 runs). Experiments are conducted on a machine with Intel Core i7-8565U CPU 1.80 GHz 24GB RAM.


Output dimension of convolutional layers Tab. 5 presents results and computation times of GyroSpd++ with respect to different settings of the output dimension of the convolutional layer on FPHA dataset. Results show that the setting m = 21 clearly outperforms the setting m = 10 in terms of mean accuracy and standard deviation. However, compared to the setting m = 21, the setting m = 30 only increases the training and testing times without improving the mean accuracy of GyroSpd++.


Design of Riemannian metrics for network blocks The use of different Riemannian metrics for the convolutional and MLR layers of GyroSpd++ results in different variants of the same architecture. Results of some of these variants on FPHA dataset are shown in Tab. 6. It is noted that our architecture gives the best performance in terms of mean accuracy, while the architecture with Log-Cholesky geometry for the MLR layer performs the worst in terms of mean accuracy.


D.4.2 COMPARISON OF GYROSPD++ AGAINST STATE-OF-THE-ART METHODS



Finally, we present a comparison of computation times of SPD neural networks in Tab. 10.


Table 6: Results (mean accuracy ± standard deviation) of GyroSpd++ with different designs of Riemannian metrics for its layers on FPHA dataset (computed over 5 runs).


Table 7: Results of our networks and some state-of-the-art methods on HDM05 dataset (computed over 5 runs).


Authors:

(1) Xuan Son Nguyen, ETIS, UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France ([email protected]);

(2) Shuo Yang, ETIS, UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France ([email protected]);

(3) Aymeric Histace, ETIS, UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France ([email protected]).


This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

[3] https://github.com/dalab/hyperbolic_nn.


[4] https://github.com/kenziyuliu/MS-G3D.


[5] https://github.com/zhysora/FR-Head.


[6] https://github.com/Chiaraplizz/ST-TR.