Abstract:
Numerical models are used extensively for simulating complex physical
systems including fluid flows, astronomical events, weather, and
climate. Many researchers struggle to bring their model developments
from singlecomputer, interpreted languages to parallel highperformance
computing (HPC) systems. There are initiatives to make interpreted
languages such as MATLAB, Python, and Julia feasible for HPC
programming. In this talk I argue that the computational overhead
is far costlier than any potential development time saved. Instead,
doing model development in C and unix tools from the start minimizes
porting headaches between platforms, reduces energy use on all
systems, and ensures reproducibility of results.
## brcon2020  20200502
title: Energy efficient programming in science
author: Anders Damsgaard (adc)
contact: anders@adamsgaard.dk
gopher://adamsgaard.dk
https://adamsgaard.dk
## About me
* 33 y/o Dane
* #bitreichen since 20191216
present:
* postdoctoral scholar at Stanford University (US)
* lecturer at Aarhus University (DK)
previous:
* Danish Environmental Protection Agency (DK)
* Scripps Institution of Oceanography (US)
* National Oceanic and Atmospheric Administration (NOAA, US)
* Princeton University (US)
#pause
academic interests:
* ice sheets, glaciers, and climate
* earthquake and landslide physics
* modeling of fluid flows and granular materials
## Numerical modeling
* numerical models used for simulating complex physical systems
* nbody simulations: planetary formation, icebergs, soil/rock mechanics
* fluid flows (CFD): aerodynamics, weather, climate
* domains and physical processes split up into small, manageable chunks
## From idea to application
1. Construct system of equations

v
2. Derivation of numerical algorithm

v
3. Prototype in highlevel language

v
4. Reimplementation in lowlevel language
## From idea to application
,.
 1. Construct system of equations 
 
  
 v  _
  ___   __
 2. Derivation of numerical algorithm  / _ \ / /
   (_)  <
   \___/_\_\
 v 
 
 3. Prototype in highlevel language 
`'
 _ _
v   ___   __
 / _ \ / /
4. Reimplementation in lowlevel language _ (_)  <
(_)\___/_\_\
## Numerical modeling
task: Solve partial differential equations (PDEs) by stepping through time
PDEs: conservation laws; mass, momentum, enthalpy
example: Heat diffusion through homogenous medium
∂T
 = k ∇²(T)
∂t
domain:
..
 
 T 
 
''
## Numerical modeling
domain: discritize into n=7 cells
.++++++.
       
 T₁  T₂  T₃  T₄  T₅  T₆  T₇ 
       
'++++++'
#pause
* Numerical solution with highlevel programming:
MATLAB: sol = pdepe(0, @heat_pde, @heat_initial, @heat_bc, x, t)
Python: fenics.solve(lhs==rhs, heat_pde, heat_bc)
Julia: sol = solve(heat_pde, CVODE_BPF(linear_solver=:Diagonal); rel_tol, abs_tol)
(the above are not entirely equivalent, but you get the point...)
## Numerical solution: Lowlevel programming
example BC: outer boundaries constant temperature (T₁ & T₇)
* computing ∇²(T)
.++++++.
       
t  T₁  T₂  T₃  T₄  T₅  T₆  T₇ 
       
'\+\+/\+/\+/\+/+/'
 \  \ /  \ /  \ /  \ /  / 
 \  /  /  /  /  / 
 \  / \  / \  / \  / \  / 
.+\/+\/+\/+\/+\/+.
       
t + dt  T₁  T₂  T₃  T₄  T₅  T₆  T₇ 
       
'++++++'
< dx >
## Numerical solution: Lowlevel programming
* explicit solution with central finite differences:
for (t=0.0; t r_norm_max)
r_norm_max = fabs((T_new[i]  T[i])/T[i]);
tmp = T;
T = T_new;
T_new = tmp;
} while (r_norm_max < RTOL);
}
## HPC platforms
* Stagnation of CPU clock frequency
* Performance through massively parallel deployment (MPI, GPGPU)
* NOAA/DOE NCRC Gaea cluster
* 2x Cray XC40, "Cray Linux Environment"
* 4160 nodes, each 32 to 36 cores, 64 GB memory
* infiniband
* total: 200 TB memory, 32 PB SSD, 5.25 petaflops (peak)
## A (non)solution
* highlevel, interpreted code with extensive solver library > lowlevel, compiled, parallel code
* suggested workaround: port interpreted highlevel languages to HPC platforms
#pause
NO!
* high computational overhead
* many machines
* reduced performance and energy efficiency
## A better way
1. Construct system of equations

v
2. Derivation of numerical algorithm

v
3. Prototype in lowlevel language

v
4. Add parallelization for HPC
## Example: Icesheet flow with sediment/fluid modeling
._____ ATMOSPHERE
> ```..
ICE `._________________ __
> > vvvv vvv
_________________ __
> ,'
,' <>< OCEAN
> / ><>
____________________________________/___________________________________
SEDIMENT >
________________________________________________________________________
* example: granular dynamics and fluid flow simulation for glacier flow
* 90% of Antarctic ice sheet mass driven by ice flow over sediment
* need to understand icebasal sliding in order to project sealevel rise
## Algorithm matters
sphere: git://src.adamsgaard.dk/sphere
C++, Nvidia C, cmake, Python, Paraview
massively parallel, GPGPU
detailed physics
20,191 LOC
#pause
3 month computing time on nvidia tesla k40 (2880 cores)
#pause
* gained understanding of the mechanics (what matters and what doesn't)
* simplify the physics, algorithm, and numerics
#pause
1d_fd_simple_shear: git://src.adamsgaard.dk/1d_fd_simple_shear
C99, makefiles, gnuplot
single threaded
simple physics
2,348 LOC
#pause
real: 0m00.07 s on laptop from 2012
#pause
...guess which one is more portable?
## Summary
for numerical simulation:
* highlevel languages
* easy
* produces results quickly
* does not develop lowlevel programming skills
* no insight into numerical algorithm
* realistically speaking: no direct way to HPC
* lowlevel languages
* require lowlevel skills
* saves electrical energy
* directly to HPC, just sprinkle some MPI on top
## Thanks
20h && /names #bitreichen