The Making of SciPy 1.0(numfocus.org) |
The Making of SciPy 1.0(numfocus.org) |
https://github.com/DanburyAI/SG_DLB_2017/blob/master/Noteboo...
In addition, recent versions of Matlab have broken some common plotting tools, contourf in particular. Where this once produced elegant and light weight results it now generates a tessellated mess with tiny gaps between elements that look awful in paper manuscripts and baloon the file size for no good reason. The Mathworks response? It fixes some edge case.[0]
I find many of students we take on as interns often lack more than rudimentary coding skills making the jump to Python easier.
[0] https://uk.mathworks.com/matlabcentral/answers/162257-proble...
A few of them have transitioned from Matlab. Even though we're a commercial business, buying each engineer a Matlab seat with packages is not a trivial expense, and asking for the money requires more management attention than we really need. With Matlab, it's: "You need Matlab for what? Why are you working on that?"
With Python, it's: "You solved that problem? Great."
But there are of course some caveats. Learning Python per se won't turn you into a professional software developer. A lot of people list Python on their resume's, and our interviewers are engineers and managers who only know that Python is a good buzzword to have. Learning Python ultimately translates into a widely varying ability to actually do anything. One of my colleagues has really taken to it, and has made an effort to develop disciplined programming skills beyond "coding." A couple others are using it for basic scripting, data analysis, and so forth. I've seen some horrendous code, and I'm torn about whether to intervene, when they are in fact getting useful results. Usually, I tell them about some things to learn on their own before they start their next project.
But that problem -- what really constitutes good programming skill for people who aren't formally trained, being hired by people who don't really know either -- isn't confined to Python.
Library support is perhaps better with SciPy, but Octave is free as in freedom, fast, works.
But SciPy is a great project and everyone can learn something from their successes.
(a) writing your research as a bunch of Fortran or C kernels and intrgrate them with automatic bindings such as f2py? Especially Fortran is a great fit for NumPy datastructures - because they are the same.
(b) Use high performance python environments like Numba, NumbaPro (GPU) or even Cython?
With SciPy, the community is much bigger and present in every imaginable scientific field. Whenever someone comes up with a new algorithm, there's a python implementation before you know it. I'm not going to pretend that this isn't a huge added value, but I find the way GP phrased themselves a little bit annoying, as it implies the language Julia is is the issue. It is honestly a more beautiful and more efficient language to program in.
(also, these days you can call out to Python and R without significant overhead, so you can use any package out there; Julia has a pretty good interop story)
If I have to write everything important and difficult in Fortran or C kernels, then I'm losing the productivity of the higher level language. And in this case it would be writing kernels from scratch instead of tweaking high level package code if I want to change data structures and implement new algorithms for existing problems. This makes it hard to directly re-purpose the internal linesearch code from the SciPy optimizers without Python in-between into my own Fortran optimizer, or add new projections and stuff to existing SciPy implementations and have that be a separately maintained package but dependent on SciPy.
Even just talking about this starts to sound like a mess from the past, but fortunately this problem is already solved by Julia so there's no reason to have to deal with it.
>(b) Use high performance python environments like Numba, NumbaPro (GPU) or even Cython?
Well the biggest thing against Numba is once again you have to write everything from scratch if you want to do things non-trivial. They finally have a way for you to extend Numba to add a class, but you don't by default get the whole Python standard library compatible with it, and classes defined in user's packages aren't automatically compatible with it. So you have to start re-writing basic data structures and other types to get the full algorithm JIT'd. And Python's standard library isn't all setup with @generated_jit to then dispatch everything effectively. So you end up having to build tons of things on your own because of the lack of library support, and it adds development time. Then there are restrictions on allocations, no easy way to do distributed parallelism, take in functions to my functions to inline them (this is important for writing things like ODE solvers, but with Numba you have to do a hack to make this almost work but it won't optimize this well), etc. Essentially it hits a feature wall when it gets past microbenchmarks and to real programming. This all hearkens back to your first comment:
>Let's say you're exploring with NumPy
Exactly: if you're content with building everything off of arrays yourself then Numba will do fine. But its limitations significantly impact the ability to compose full packages in native Python with constant propagation, interprocedural optimization, etc.: all the stuff you you depend on static languages doing in order to optimize large-scale software. On top of that, it's a difficult dependency to install. So in the end, it's great for JITing little blurbs, but I quickly hit the edge when trying to fight it.
Cython is similar to Numba, where you can setup your own fused types and extension classes, re-writing the standard library, etc., but now you've taken to writing essentially the stdlib for a language (or at least the parts you need) just to be allowed to use this "easy accelerator" for a demanding project.
If it was 10 years ago these issues would be okay, but these days it stands out as a productivity and performance barrier. Julia has already solved this problem, and hell even LuaJIT is easier to build this kind of software with. Or if you're going to fuss around with compiled code, we can use D or Rust, or even go all the way back to C++ and Fortran. In fact, it's quite telling that even fairly recent big "Python" projects like TensorFlow, PyTorch, FEniCS, etc. all built their whole package in C++ and made Python bindings instead of trying to use these accelerators. Meanwhile people are punching out large Julia projects are using pure Julia without hesitation.
To an end user of packages it really doesn't matter how it was implemented, but when your work and research is in developing these kinds of large software projects, these limitations make a huge difference.