paint-brush
Stable Nonconvex-Nonconcave Training via Linear Interpolation: Related workby@interpolation

Stable Nonconvex-Nonconcave Training via Linear Interpolation: Related work

tldt arrow

Too Long; Didn't Read

This paper presents a theoretical analysis of linear interpolation as a principled method for stabilizing (large-scale) neural network training.
featured image - Stable Nonconvex-Nonconcave Training via Linear Interpolation: Related work
The Interpolation Publication HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Thomas Pethick, EPFL (LIONS) [email protected];

(2) Wanyun Xie, EPFL (LIONS) [email protected];

(3) Volkan Cevher, EPFL (LIONS) [email protected].

Lookahead The Lookahead algorithm was first introduced for minimization in Zhang et al. (2019). In the context of Federated Averaging in federated learning (McMahan et al., 2017) and the Reptile algorithm in meta-learning (Nichol et al., 2018), the method can be seen as a single worker and single task instance respectively. Analysis for Lookahead was carried out for nonconvex minimization (Wang et al., 2020; Zhou et al., 2021) and a nested variant proposed in (Pushkin & Barba, 2021). Chavdarova et al. (2020) popularized the Lookahead algorithm for minimax training by showing state-of-the-art performance on image generation tasks. Apart from the original local convergence analysis in Chavdarova et al. (2020) and the bilinear case treated in Ha & Kim (2022) we are not aware of any convergence analysis for Lookahead for minimax problems and beyond.


Cohypomonotone Cohypomontone problems were first studied in Iusem et al. (2003); Combettes & Pennanen (2004) for proximal point methods and later expanded on in greater detail in Bauschke et al. (2021). The condition was relaxed to the star-variant referred to as the weak Minty variational inequality (MVI) in Diakonikolas et al. (2021) and the extragradient+ algorithm (EG+) was analyzed. The analysis of EG+ was later tightened and extended to the constrained case in Pethick et al. (2022).


Proximal point The proximal point method (PP) has a long history. For maximally monotone operators (and thus convex-concave minimax problems) convergence of PP follows from Opial (1967). The first convergence analysis of inexact PP dates back to Rockafellar (1976); Brézis & Lions (1978). It was later shown that convergence also holds for the relaxed inexact PP as defined in (8) (Eckstein & Bertsekas, 1992). In recent times, PP has gained renewed interest due to its success for certain nonmonotone structures. Inexact PP was studied for cohypomontone problems in Iusem et al. (2003). Asymptotic convergence was established of the relaxed inexact PP for a sum of cohypomonotone operators in Combettes & Pennanen (2004), and later considered in Grimmer et al. (2022) without inexactness. Last iterate rates were established for PP in ρ-comonotone problems (with a dependency on ρ) (Gorbunov et al., 2022b). Explicit approximations of PP through a contractive map was used for convex-concave minimax problems in Cevher et al. (2023) and was the original motivation for MirrorProx of Nemirovski (2004). See Appendix A for additional references in the stochastic setting.