Authors:
(1) Simone Silvestri, Massachusetts Institute of Technology, Cambridge, MA, USA;
(2) Gregory Wagner, Massachusetts Institute of Technology, Cambridge, MA, USA;
(3) Christopher Hill, Massachusetts Institute of Technology, Cambridge, MA, USA;
(4) Matin Raayai Ardakani, Northeastern University, Boston, MA, USA;
(5) Johannes Blaschke, Lawrence Berkeley National Laboratory, Berkeley, CA, USA;
(6) Valentin Churavy, Massachusetts Institute of Technology, Cambridge, MA, USA;
(7) Jean-Michel Campin, Massachusetts Institute of Technology, Cambridge, MA, USA;
(8) Navid Constantinou, Australian National University, Canberra, ACT, Australia;
(9) Alan Edelman, Massachusetts Institute of Technology, Cambridge, MA, USA;
(10) John Marshall, Massachusetts Institute of Technology, Cambridge, MA, USA;
(11) Ali Ramadhan, Massachusetts Institute of Technology, Cambridge, MA, USA;
(12) Andre Souza, Massachusetts Institute of Technology, Cambridge, MA, USA;
(13) Raffaele Ferrari, Massachusetts Institute of Technology, Cambridge, MA, USA.
Table of Links
5.1 Starting from scratch with Julia
5.2 New numerical methods for finite volume fluid dynamics on the sphere
5.3 Optimization of ocean free surface dynamics for unprecedented GPU scalability
6 How performance was measured
7 Performance Results and 7.1 Scaling Results
9 Acknowledgments and References
5.3 Optimization of ocean free surface dynamics for unprecedented GPU scalability
In hydrostatic ocean models with a free surface, the vertically-averaged, two-dimensional “barotropic mode” has dynamics orders of magnitude faster than the three-dimensional “baroclinic” component, and must be treated by a special “barotropic solver”. Due to communication overhead, barotropic solvers in current ocean models — whether implicit or explicit — are a major bottleneck that accounts for between 40% [22] to 60% [48, 36] of the cost of a typical IPCC-class ocean simulations.
Oceananigans’ excellent scalability is enabled by an innovative optimization of the parallel barotropic solver. An increase in computation is traded in for decreased communication latency by leveraging the two-dimensionality of the barotropic problem. Our new barotropic solver is based on explicit subcycling of the barotropic mode. Increasing the width of the barotropic halo to equal the number of explicit subcycles (typically between 10–30) greatly decreases the frequency of communication. As a result, communication is required once per time-step rather than every subcycle, reducing the frequency of communication by a factor of 10 to 30. The cost of the barotropic solver is therefore less than 10% of the total cost of a time step. Due to the sparsity of communication enabled by our novel barotropic solver, all communication operations can be overlapped with computational workloads as sketched in figure 2.
This paper is available on arxiv under CC BY 4.0 DEED license.