Why are most climate models in Fortran?

Why are most climate models in Fortran?(partee.io)

119 points by JohannMac 5 years ago | 132 comments

cbkeller 5 years ago |

High performance scientific computing is very much still a FORTRAN, C, and C++ game. And of those, FORTRAN has some compelling advantages in terms of first-class built-in support for multidimensional arrays, and quite excellent compilers. And, as others have noted, until the `restrict` keyword in C99, there were optimizations in FORTRAN that were not even possible in C.

I mostly used C for my (small-scale) HPC work in grad school because it’s what I knew best, but at several points I wished I had learned Fortran instead.

Probably one of the only “higher level” languages that’s ever been used for serious petascale scientific computing is Julia (first with Celeste on the astro side, possibly soon with CliMA for climate modeling), which not coincidentally follows similar array indexing conventions as FORTRAN. And while that’s what I mostly use now, I don’t see Fortran going away any time soon.

If anything, with cool new projects like LFortran [1] making it possible to use Fortran more interactively, it’s probably quite a good time to learn modern Fortran!

[1] https://lfortran.org/

yodelshady 5 years ago | |

Open question: why are multi-dimensional arrays, and matrices specifically, so neglected in almost every other language?

They map well to practically freakin' everything, for what seems like.. not that much effort on the language design side, but an enormous amount of tedious, duplicated effort on the user side.

cbkeller 5 years ago | | |

That is a great question, and while I don’t know for sure, I think a relevant anecdotal observation that stands out to me is that many languages which do have first-class multidimensional arrays also have one-based indexing. And I would speculate this in turn is because most people who wanted matrices badly enough to make them a language feature wanted to do linear algebra with them, where you generally also want one-based notation since that is how all the equations in textbooks and papers are written.

So a language with multidimensional arrays is in a lose-lose position of having to choose to either satisfy the linear algebraists at the cost of alienating general-purpose programmers who want to do pointer arithmetic, or else satisfy the latter while alienating the core demographic for multidimensional numeric arrays.

Personally, I’m fine with (or even slightly prefer) one-based for my own scientific computing, despite starting with C, since it really is more elegant for linear algebra, and I have never found myself needing or wanting to do pointer arithmetic in a language that does have good multidimensional arrays — but clearly it is still a major turn-off to many others.

znpy 5 years ago | | |

My dumb guess is that whereas Fortran was kinda designed with scientific computation as a first class use case and c/c++ let you do anything with memory (and make it easy to use inline assembly to access specific instructions), most other languages are designed as general purpose language without scientific (as in "high performance") computing as a main design objective.

And thus specialized data type, operators and syntax look like overhead. And thus language designers leave it out.

hyperrail 5 years ago | | |

One interesting wrinkle is that traditionally (dating back to FORTRAN IV at least), Fortran compilers store matrices in RAM address space using column-major order, where consecutive elements are in the same column, not the same row. [1]

Most languages that prioritize Fortran code interop also adopt column-major order, but most other languages that support multidimensional arrays do row-major order. I'm not sure why Fortran went column-major but because it did, a lot of libraries designed for Fortran callers (such as LAPACK and all BLAS implementations) need to be told that input arrays have been transposed when they come from languages like C++.

[1] https://en.wikipedia.org/wiki/Row-_and_column-major_order

leephillips 5 years ago | | |

That’s one reason why APL and its descendants are so powerful in the hands of people who have become fluent in viewing computation through the lens of arrays.

tryonenow 5 years ago | | |

Because the only place you really need them is in mathematical/scientific computing, and while many scientists tend to pick up a little programming for their research, the vast majority of computer scientists and programmers, in particular the ones with the interest and ability to work on languages, are not concerned with mathematical modeling/processing. It is a fairly rare interdisciplinary combination.

hntrader 5 years ago | | |

This is a great question and I hope you get an answer to this.

The number of woeful ad-hoc solutions I've seen to people handling matrix data in Java/C# for what should've been otherwise very basic analysis ...

Something with pandas-like capability in a lower level language would be amazing.

hilbert42 5 years ago |

1. There are thousands of scientific subroutines written in Fortran that are stable and well tested - in fact, there are well established formal libraries of them that go back over 60 years. The researchers know and trust them.

2. Despite the sneers and derision that Fortran has been subjected to from non-Fortran programmers over recent decades, Fortran is an excellent language to do intensive scientific and mathematics work. Compilers are optimized for calculation/math speed and large intensive calculations. From the outset, it could handle complex numbers, double precision, etc., etc. natively without having to resort to calling libraries/special routines as other languages had to do back then.

3. Scientific enterprise alongside mainframes and supercomputers have well established and stable ways of working including program and data exchange etc. Essentially, a well established computing ecosystem/infrastructure surrounds scientific computing that researchers know and understand well. There is no need to change it as it works well. Moreover, it's a stable and reliable environment when compared to other computing environments - Fortran was introduced long before the cowboys entered the computing field, back then the programming/computing environment was more formal and structured, this contributed to that stability.

For the record, Fortran was the first language I learned, and my programming back then was done on an IBM-360 using KP-26 and KP-29 keypunches and 80-column Hollerith cards.

dwheeler 5 years ago |

The actual title is, "Why are Climate models written in programming languages from 1950?".

I think the assumption that "old is bad" is the cause of many, many, many foolish decisions. Useless code rewrites, company reorganizations that are not significant improvements, and many other bad ideas hinge on this Worship Of The New. Why are we using an alphabetic system originally developed c. 1800BC? It's old, we should switch to new writing systems every 10 years because they're new, right :-)?

Older is not better. Newer is not better. Better is better. There's no point in switching something if the destination isn't better, and even if it's clearly better, it needs to be so much better that it's worth the switching cost.

madhadron 5 years ago | |

And it's just a misunderstanding. I think it was Perlis who said, "We don't know what the programming language of the future will look like, but we know it will be called FORTRAN."

gnufx 5 years ago | | |

Hoare said "I don’t know what the language of the year 2000 will look like, but I know it will be called Fortran". He also said "ALGOL 60 is alive and well and living in Fortran 90", which is a decent compliment from him.

anthk 5 years ago | |

This. The OS I use dates back to BSD 4.4, which is a rewrite and rehasing on some OS which is about 50 years old.

The audio plug is over 100 years old, and modern TTY's date back to what, 80 years? If it works, it works.

Also, damn Calculus is over 200 years old. Or maybe 2000, depending if you compare it to the method of exhaustion or not.

colllectorof 5 years ago |

Engineers are horrified to learn that mathematical modeling is done in a language created in the 50s, but aren't bothered by the fact that the dominant computer interaction model used in the field right now dates back to pre-WW2 teleprinters. Someone is lacking in self- and historic awareness.

What practical problems does Fortran cause when used for numerical computing?

enriquto 5 years ago | |

Somebody should warn them about avoiding accidentally using Pythagoras theorem, that was introduced 2500 years ago.

amelius 5 years ago | | |

Unlike computer software, mathematical theorems don't suffer from bit rot.

Trex_Egg 5 years ago | | |

Funny

pron 5 years ago | |

Or that the languages they use to express more important things that computer programs all date back much longer than that.

sampo 5 years ago |

Hint for commenters: Since Fortran 90, it's spelled Fortran, not FORTRAN. By using the latter you signal that your experience on the topic is from 30 years ago.

https://en.wikipedia.org/wiki/Fortran#Fortran_90

jcranmer 5 years ago | |

But Fortran is case insensitive, so it should be no problem if you spell it Fortran or FORTRAN or fORtRaN.

fuzzfactor 5 years ago | |

>it's spelled Fortran, not FORTRAN. By using the latter you signal that your experience on the topic is from 30 years ago.

Excellent idea to help filter out those having the lesser number of decades experience.

hilbert42 5 years ago | | |

Correct, but as I mentioned above, one should use the original acronym form when referring to an old version that was specifically named that way.

hilbert42 5 years ago | |

Right. Sometimes old habits die hard and I use FORTRAN when I mean Fortran but I don't do it intentionally nor do I do it out of ignorance. (I note that as I type this into Firefox, its speller still wants to capitalize the word! As I've discovered so do many other editors and word-processors.)

In recent years I've adopted the following nomenclature and you'll note I've done so here in my earlier posts. That is to treat the name of each specific version as a proper name. As FORTRAN IV was originally called that including the Romanized numerals for the version number I use that out of respect for those who originally named it in the same way I'd always use say John and not john. Nowadays, when I refer to Fortran in its generic sense I use its new default name rather than its old acronym form.

shortlived 5 years ago | |

Interesting... iOS will autocorrect to “FORTRAN”

hatmatrix 5 years ago |

C and C++ are definitely the competitors to Fortran; not Chapel or Python. In the life sciences, large amounts of Fortran code has been rewritten in C/C++. But they have orders of magnitude more funding than climate science and teams of professional programmers to maintain the code.

Fortran is a domain-specific language for scientists, and excels at array arithmetic (for graph-based problems though, maybe look elsewhere). Even badly-written code can run reasonably fast, which is not the same for C/C++. There is also the decades of concerted hardware and compiler optimizations that make Fortran hard to beat on HPC systems.

It's not as readable as Python, but it's more readable than C/C++ written by a professional programmer.

cozzyd 5 years ago | |

There's a saying among physicists that you can write Fortran in any language.

hatmatrix 5 years ago | | |

I've seen it with my own eyes. It's not pretty, but usually works.

Hankenstein2 5 years ago |

I work at one of the labs mentioned and get paid for running not only the climate models but mesoscale models as well, which are also written in Fortran.

The premise of the article is that Fortran, 70 years later is still an appropriate tool to use for crunching numbers which it absolutely is but it neglects one major problem.

Like the COBOL issue that was all the rage 20 years ago, it is difficult to hire younger generation programmers that want to and are excited to develop in Fortran.

enriquto 5 years ago |

Not only "climate models"... a large chunk of scipy is just a thin Python layer over decades-old Fortran code. That many physicists chose to use the real thing instead of the fisher-price interface speaks in favor of them.

analog31 5 years ago | |

Actually, as a user of the fisher-price interface, I'm glad that it can bind to C and FORTRAN libraries, so my numerics are based on the highest quality code.

enriquto 5 years ago | | |

I'm also a very happy user of the dumbed-down interfaces. But I find it strange when my fellow users are surprised or even horrified at the fact that some people still write and maintain Fortran and C code. Hell the Python interpreter they use is written in C! But apparently if you write in C you are some sort of old man yelling at clouds.

wrnr 5 years ago | |

Gonum wraps the same code, from what i could tell fortran seem to have a neat way to handle more specialised number systems like dual numbers and hyperbolic numbers.

spacedome 5 years ago | | |

You can do operator overloading in fortran to add a new number type, don't remember the details but I wrote a quaternion library once.

cozzyd 5 years ago | |

It is sad indeed that so few new languages have a stable ABI like Fortran and C.

waynesonfire 5 years ago |

Something really strange happens when an industry sector is highly populated. Thinking, fortran vs javascript or freebsd vs linux.

Seems like a sector with high population and low barrier to entry is prone to illusory superiority that lowers the quality of the system.

m463 5 years ago |

Has anyone LOOKED at fortran recently?

Some excerpts from https://en.wikipedia.org/wiki/Fortran

Fortran 90:

- Ability to operate on arrays (or array sections) as a whole, thus greatly simplifying math and engineering computations.

- whole, partial and masked array assignment statements and array expressions, such as X(1:N)=R(1:N)*COS(A(1:N))

Fortran 2003:

- Object-oriented programming support: type extension and inheritance, polymorphism, dynamic type allocation, and type-bound procedures, providing complete support for abstract data types

Fortran 2008:

- Sub-modules—additional structuring facilities for modules; supersedes ISO/IEC TR 19767:2005

- Coarray Fortran—a parallel execution model

- The DO CONCURRENT construct—for loop iterations with no interdependencies

- The BLOCK construct—can contain declarations of objects with construct scope

Fortran 2018:

- Further interoperability with C

CookieMon 5 years ago |

How is Fortran coming along with GPUs? (last I looked it was being done with proprietary compiler language extensions, but that was a while ago)

Are modern supercomputers faster than a cluster of consumer-grade GPU cards?

ch_123 5 years ago | |

> How is Fortran coming along with GPUs? (last I looked it was being done with proprietary compiler language extensions, but that was a while ago)

There is support for CUDA in Fortran. In fact, Nvidia purchased one of the main Fortran compiler vendors (PGI) and is open sourcing their compiler as flang.

CUDA is the predominant GPU programming model in the HPC space. There are open standards, but they are nowhere nearly as widely used.

> Are modern supercomputers faster than a cluster of consumer-grade GPU cards?

Fundamentally, supercomputers use the same processors and GPUs that you find in consumer hardware. The differences tend to lie in A) the sheer quantity of hardware used (think millions of cores for Top 10 systems), B) high bandwidth, low latency interconnects and C) some market segmentation by hardware vendors (e.g. Nvidia deliberately limits the double-float performance of consumer hardware)

CookieMon 5 years ago | | |

Wow, the PGI compiler becoming open-sourced is awesome.

sampo 5 years ago | |

> Are modern supercomputers faster than a cluster of consumer-grade GPU cards?

On the top500 list, #1 does 400,000 TFlop/s, #500 does 1000 TFlop/s. How much would the kind of GPU cluster you're thinking of do?

https://www.top500.org/lists/top500/2020/11/

simplicio 5 years ago |

Another advantage to Fortran in academic settings, at least the older version of Fortran, is that there isn't much to the language. Someone already familiar with programming can pick it up in a day or two.

So if your a Prof with a large code-base that you want to have a stream of Grad-students, undergrad research assistants, assoc. Profs etc. contribute to before they move on, having a language that doesn't require squandering half a semester on learning to code before you can start doing actual science is a big bonus.

airhead969 5 years ago |

Fortran will always be around because there's too much investment in it.

A nuclear reactor simulator I ported from UNIX to Win32 in 1998 was several million lines of code written by nuclear engineers (not software engineers) and physicists. It's over 60 years old now.

spartee 5 years ago |

Author here - Thanks for reading!! Lots of great comments here. Happy Pi day!

complex_pi 5 years ago |

Fortran is hot at the moment :-) See "Resurrecting Fortran" (blog post by Ondřej Čertík https://news.ycombinator.com/item?id=26445438 https://ondrejcertik.com/blog/2021/03/resurrecting-fortran/ ) about the promising Fortran "standard lib" and the Fortran community in general (see the related website https://fortran-lang.org/ ). The story here is interesting but lacks a bit about this wider context.

jessaustin 5 years ago |

TFA would seem more reliable if it didn't have such a needlessly obscurantist Fortran-python code comparison. Nothing about the two languages calls for different base case logic! That is, in order to prevent confusion, the "Python3" code should have been this:

  def fibonacci(n):
    if n < 2:
      return n
    else:
      return fibonacci(n-1) + fibonacci(n-2)

hyperrail 5 years ago | |

From the other side, I thought the Fortran code had too much syntactic ceremony.

I haven't written Fortran in a while, but I was pretty sure that for illustrative examples like this, you could dispense with the entire MODULE declaration, the use of END FUNCTION Fibonacci instead of just END, and the usually-optional :: separator between the variable's type and name.

Something like this? Again, no recent experience:

  implicit none

  recursive function fibonacci(n) result (fib)
    integer n
    integer fib
    if (n < 2) then
      fib = n
    else
      fib = fibonacci(n - 1) + fibonacci(n - 2)
    endif
  end

(The IMPLICIT NONE has to stay because of the now-regrettable Fortran convention that without it, the type of a variable is determined by the first character of the name (n would be integer because variables starting with m, n, i, etc. are integer, while fib would be floating point).)

pjmlp 5 years ago |

Why doing the examples in Fortran 90 in 2021 blog post, when 2018 is the most recent standard revison?

jcranmer 5 years ago | |

I'm not an expert in Fortran, but my understanding is that Fortran 90 is basically the equivalent of C++11--it added a slew of major features (such as free format code and array notation) that makes it a pretty different language from pre-90 code. Even if newer language revisions add some more useful features, it's the core set from Fortran 90 that's worth differentiating, much as I might describe modern C++ code as C++11 even if the project requires C++14 or C++17.

ch_123 5 years ago | | |

This is absolutely correct - Fortran standards after 90/95 have mostly added extra features, rather than fundamental changes to how people write Fortran. Fortran 2003 added OO support, but I don’t believe that has seen widespread adoption.

hatmatrix 5 years ago | |

I don't think it would look different. There is a big difference between Fortran 77 and Fortran 90, but less between Fortran 90 and Fortran 2018, at least for this example.

chris_va 5 years ago |

(disclaimer: I work in a Climate&Energy R&D Lab)

I don't entirely agree with the overall assertion of this article. The author has some valid points, but I think it misses the forest for the trees.

TLDR: I think Fortran tooling and HPC clusters are a self-reinforcing local maximum. They are heavily optimized for each other, but at the cost of innovation and extensibility.

For example, we'll never get a fully differentiable climate model in Fortran. The tooling does not exist, and there are not enough Fortran developers to make a serious dent in the tooling progress made outside of the HPC world. The MPI stacks these codes rely on are not great for hardware outside of a supercomputer, and Fortran codes basically are built around full interconnect. I have many PFLOPs at my disposal that I cannot use because these codes are too brittle without being entirely rewritten.

At the end of the day, everything is a Turing machine, so you can technically do whatever you want in Fortran or any other language (or mix and match), but strategically staying in Fortran leaves a lot of resources on the table.

cbkeller 5 years ago | |

Well Fortran was, notably, one of the first languages to have proper source-to-source autodiff (TAPENADE) [1-3], so it’s probably not impossible, though my choice for a fully differentiable climate model would personally be Julia, like the CliMA folks at Caltech [4].

[1] https://doi.org/10.1145/2450153.2450158

[2] http://www-tapenade.inria.fr:8080/tapenade/index.jsp

[3] http://www-sop.inria.fr/ecuador/tapenade/distrib/README.html

[4] https://clima.caltech.edu/

TheRealKing 5 years ago |

For heavens sake, let's stop the discussion of Fortran array index starting at 1. In Fortran the starting index can be anything. A(-10:-5) is valid, A(-10:10) is valid, A(1:10) is valid, A(0:10) is valid. Choose what you want and do not complain about it again, please. No other language has this amazing capability.

tianlong 5 years ago |

These considerations are valid also in another fields such as in computational quantum physics/chemistry. Major software are written in Fortan and in C++. I work in the ML community now, after many years of quantum chemistry and when I say that I know Fortran people usually laugh :).

dgellow 5 years ago |

I'm not familiar at all with the world of high performance scientific computing. Are C++, Rust, Nim, Zig, & co. even remotely considered as potential candidates in the future, or is it really only C and Fortran with no expectation to see much changes? Just curious.

JustSomeNobody 5 years ago |

Because there is absolutely nothing wrong with using Fortran for what Fortran is very good at.

anthk 5 years ago |

BLAS uses Fortran I think, and Lapack.

Good luck calling that slow.

Another clueless JS hipster, maybe.

gnufx 5 years ago | |

The reference BLAS in Fortran is indeed slow. I'm not aware of any tuned version in Fortran. It might be possible to re-write the BLIS structure in Fortran and get reasonable performance on, say, Haswell, but not on SKX, if we talk x86.

anthk 5 years ago | | |

>Tuned.

Intel on HPCs.

person_of_color 5 years ago |

Anyone doing gen-art in Fortran?

crb002 5 years ago |

The tooling is stable so when you link to FORTRAN binaries the ABI tends not to bitrot. IMHO FORTRAN might get a borrow checker before C/C++ to be on parity with Rust for memory safety.