paint-brush
Stable Nonconvex-Nonconcave Training via Linear Interpolation: Acknowledgements & Referencesby@interpolation

Stable Nonconvex-Nonconcave Training via Linear Interpolation: Acknowledgements & References

tldt arrow

Too Long; Didn't Read

This paper presents a theoretical analysis of linear interpolation as a principled method for stabilizing (large-scale) neural network training.
featured image - Stable Nonconvex-Nonconcave Training via Linear Interpolation: Acknowledgements & References
The Interpolation Publication HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Thomas Pethick, EPFL (LIONS) [email protected];

(2) Wanyun Xie, EPFL (LIONS) [email protected];

(3) Volkan Cevher, EPFL (LIONS) [email protected].

10 Acknowledgements

This work was supported by Google. This work was supported by Hasler Foundation Program: Hasler Responsible AI (project number 21043). This work was supported by the Swiss National Science Foundation (SNSF) under grant number 200021_205011. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement n° 725594 - time-data).

References

Banach, S. Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fund. math, 3(1):133–181, 1922.


Bauschke, H. H. and Combettes, P. L. Convex analysis and monotone operator theory in Hilbert spaces. CMS Books in Mathematics. Springer, 2017. ISBN 978-3-319-48310-8.


Bauschke, H. H., Moursi, W. M., and Wang, X. Generalized monotone operators and their averaged resolvents. Mathematical Programming, 189(1):55–74, 2021.


Bertsekas, D. P. Incremental proximal methods for large scale convex optimization. Mathematical programming, 129(2):163–195, 2011.


Bianchi, P. On the convergence of a stochastic proximal point algorithm. In CAMSAP, 2015.


Bravo, M. and Cominetti, R. Stochastic fixed-point iterations for nonexpansive maps: Convergence and error bounds. arXiv preprint arXiv:2208.04193, 2022.


Brézis, H. and Lions, P. L. Produits infinis de résolvantes. Israel Journal of Mathematics, 29(4): 329–345, 1978.


Cevher, V., Piliouras, G., Sim, R., and Skoulakis, S. Min-max optimization made simple: Approximating the proximal point method via contraction maps, 2023. URL https://arxiv.org/abs/ 2301.03931.


Chavdarova, T., Pagliardini, M., Stich, S. U., Fleuret, F., and Jaggi, M. Taming GANs with Lookahead-Minmax. arXiv preprint arXiv:2006.14567, 2020.


Combettes, P. L. Quasi-Fejérian analysis of some optimization algorithms. In Studies in Computational Mathematics, volume 8, pp. 115–152. Elsevier, 2001.


Combettes, P. L. and Pennanen, T. Proximal methods for cohypomonotone operators. SIAM journal on control and optimization, 43(2):731–742, 2004.


Diakonikolas, J., Daskalakis, C., and Jordan, M. I. Efficient methods for structured nonconvexnonconcave min-max optimization. In International Conference on Artificial Intelligence and Statistics, pp. 2746–2754. PMLR, 2021.


Eckstein, J. and Bertsekas, D. P. On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1):293–318, 1992.


Gidel, G., Berard, H., Vignoud, G., Vincent, P., and Lacoste-Julien, S. A variational inequality perspective on generative adversarial networks. arXiv preprint arXiv:1802.10551, 2018.


Gorbunov, E., Loizou, N., and Gidel, G. Extragradient method: O (1/k) last-iterate convergence for monotone variational inequalities and connections with cocoercivity. In International Conference on Artificial Intelligence and Statistics, pp. 366–402. PMLR, 2022a.


Gorbunov, E., Taylor, A., Horváth, S., and Gidel, G. Convergence of proximal point and extragradientbased methods beyond monotonicity: the case of negative comonotonicity. arXiv preprint arXiv:2210.13831, 2022b.


Grimmer, B., Lu, H., Worah, P., and Mirrokni, V. The landscape of the proximal point method for nonconvex–nonconcave minimax optimization. Mathematical Programming, pp. 1–35, 2022.


Ha, J. and Kim, G. On convergence of Lookahead in smooth games. In International Conference on Artificial Intelligence and Statistics, pp. 4659–4684. PMLR, 2022.


Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in neural information processing systems, 30, 2017.


Hsieh, Y.-P., Mertikopoulos, P., and Cevher, V. The limits of min-max optimization algorithms: Convergence to spurious non-critical sets. In International Conference on Machine Learning, pp. 4337–4348. PMLR, 2021.


Iusem, A. N., Pennanen, T., and Svaiter, B. F. Inexact variants of the proximal point algorithm without monotonicity. SIAM Journal on Optimization, 13(4):1080–1097, 2003.


Korpelevich, G. Extragradient method for finding saddle points and other problems. Matekon, 13(4): 35–49, 1977.


Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. 2009.


Lee, S. and Kim, D. Fast extra gradient methods for smooth structured nonconvex-nonconcave minimax problems. Advances in Neural Information Processing Systems, 34:22588–22600, 2021.


Luo, Y. and Tran-Dinh, Q. Extragradient-type methods for co-monotone root-finding problems.


McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.


Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.


Nemirovski, A. Prox-method with rate of convergence o (1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.


Nichol, A., Achiam, J., and Schulman, J. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.


Obukhov, A., Seitzer, M., Wu, P.-W., Zhydenko, S., Kyl, J., and Lin, E. Y.-J. High-fidelity performance metrics for generative models in PyTorch, 2020. URL https://github.com/toshas/ torch-fidelity. Version: 0.3.0, DOI: 10.5281/zenodo.4957738.


Opial, Z. Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bulletin of the American Mathematical Society, 73(4):591–597, 1967.


Patrascu, A. and Irofti, P. Stochastic proximal splitting algorithm for composite minimization. Optimization Letters, 15(6):2255–2273, 2021.


Patrascu, A. and Necoara, I. Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization. The Journal of Machine Learning Research, 18(1):7204–7245, 2017.


Pethick, T., Latafat, P., Patrinos, P., Fercoq, O., and Cevher, V. Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems. In International Conference on Learning Representations, 2022.


Pethick, T., Fercoq, O., Latafat, P., Patrinos, P., and Cevher, V. Solving stochastic weak Minty variational inequalities without increasing batch size. In The Eleventh International Conference on Learning Representations, 2023.


Pushkin, D. and Barba, L. Multilayer Lookahead: a nested version of Lookahead. arXiv preprint arXiv:2110.14254, 2021.


Rockafellar, R. T. Monotone operators and the proximal point algorithm. SIAM journal on control and optimization, 14(5):877–898, 1976.