Design of GNU Parallel (2015)(gnu.org) |
Design of GNU Parallel (2015)(gnu.org) |
Perl 5.8.0 is over 20 years old (https://dev.perl.org/perl5/news/2002/07/18/580ann/) while centOS 3.9 was released in 2007! At the same time it seems not-that-old and ancient.
My personal anecdote with gnu parallel was running into it while working in academia. It worked well and saved me some time, but I felt that it was unreasonable of a tool to ask for a citation to parallelise a script - it seemed that matplotlib, jupyter and co would need one as well. On the other hand, I decided to not use it, because I also feel that authors can ask for whatever they want.
It still works, though you would have to archive/vendor dependencies
But I still think making citation for gnu parallel is unreasonable. There is a huge body of software, of which gnu parallel is probably the least important, that contributed to (at least my) research. Blowing up citation lists with those makes the citation list borderline useless.
It makes citations into advertising space for software - it's bad enough being coerced to make it an advertisement for reviewers papers.
I would have thought it's black magic with assembler optimisations for MIPS and special considerations for HP-UX...
This is such a lovely and interesting writeup, it's wonderful that people take their time to share so generously!
[1] : an 11k loc petal script, you can read along here: https://github.com/gitGNU/gnu_parallel/blob/master/src/paral...
A sample use case would be having a file that has words in it, one per line, and you want to run a program that operates on each word (device name, dollar amount, whatever). Sure, you can use a loop, but if the words and actions are independent, parallel is one way to spin up N copies of your program and pass it a single word from the file. Can get around Python's GIL without having to use multiprocessing or threads (as a more concrete example).
Didn't realise that it busy waits, but I'm typically running it on a not very busy server with tens of cores.
A) You don't understand. Please read the "Citation notice" section in the article.
B) You understand but don't use GNU Parallel.
C) You understand and use GNU Parallel in a non-academic setting and find the hassle of supplying --no-notice to be onerous vs the effort to write/maintain your own tool.
D) You understand and use GNU Parallel in an academic setting and have cited Ole or plan to cite Ole.
From the article, nearly 10 years ago Ole added the citation behavior after discussing it with his users: https://lists.gnu.org/archive/html/parallel/2013-11/msg00006...
Ole's citations took off roughly coincident with this behavior being added: https://scholar.google.com/citations?hl=en&user=D7I0K34AAAAJ... (click "Cited By" and notice the bar chart).
But quite useless as it'll print poorly and is overall a waste of resources to have that lovely beach scene in the background.
https://zenodo.org/record/1146014/files/GNU_Parallel_2018.pd...
The i7 on my laptop with quite a few CPUS/threads and a few optimisations got the job finished in 10 minutes.
(I later put the Hadoop use on my resume, not the GNU parallel. That's the joke of modern job hunting. There is no interested in what you did, just buzzwords and leetcode. Luckily there are still a few places that value real work or I'd be too old to get a job. :) )
https://github.com/shenwei356/rush
I use it pretty extensively with ffmpeg, imagemagick and the like.
I'd been using the mmstick/parallel for a while, but it moved to RedoxOS repos and then stopped being updated, while still having some issues not ironed out.
And with the `--halt now,done=1` option (that I think is relatively recent?) it means that if any of the parallel processes exit, parallel would exit itself, the whole container will shut down, and external orchestration would start another one if needed.
Which is a shame - 95% of my make usage is PHONY targets where I have a task and not a generated artifact. My current use case would have greatly benefited from the native parallelism and the ability to restart only failed files.
These are a must have today.
- entr. It runs a command on file/directory changes.
- spt. Simple pomodoro technique. A good timer to help yourself to work and take rests.
- herbe. It works great as a notifier for spt. Add "play" from sox to write a script to both
notify and play a sound in parallel.
- sox/ffmpeg/imagemagick. Audio, video and image production and conversion on the CLI. A must have.
- catdoc/antiword/odt2txt/wordgrinder+sc-im+gnuplot. Word/Excel/Libreoffice files reading and editing on the terminal. Gnuplot with help with sc-im. This can be a beast over SSH. With Gnuplot compiled with sixel support (and XTerm) you can do magic.
- iomenu - cat bookmarks.txt | iomenu | xargs firefox. Pick from a list of items (one per line) and choose. I think it has fuzzy-finding matches.
I have several more. Simple battery meter (sbm), grabc to grab a color from the screen,
pointtools+catpoint to do "presentations" over a terminal, nncp-go+yggdrasil for
ad-hoc networking and secure encrypted backups between devices...No need for massive distributed clusters when you have a simple perl oneliner
seq 0 10000 | parallel dd if=/dev/urandom of=/mnt/foo/input bs=10M count=10 seek={}0 dd if=/dev/urandom of=/mnt/foo/input bs=10M count=100000
in the amount of time that it took?Example of installing it in a Debian/Ubuntu container during container build, here's an example Dockerfile:
RUN apt-get update \
&& apt-get -yq --no-upgrade install \
supervisor \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists /var/cache/apt/*
Then it's possible to create a configuration file, for example /etc/supervisord.conf, to specify what should run and how: [supervisord]
nodaemon=true
[program:php-fpm]
command=/usr/sbin/php-fpm8.0 -c /etc/php/8.0/fpm/php-fpm.conf --nodaemonize
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:nginx]
command=/usr/sbin/nginx
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
And finally it can be run inside of the container entrypoint, along the lines of this in docker-entrypoint.sh: #!/bin/bash
echo "Software versions..."
nginx -V && supervisord --version
echo "Running Supervisor..."
supervisord --configuration=/etc/supervisord.conf
Here's more information about the configuration file format, in case anyone is curious: http://supervisord.org/configuration.htmlIt should be noted that this package will bring in some dependencies, though, which may or may not be okay, depending on how stringent you are about space usage and what's in your containers, example for a Ubuntu container:
The following NEW packages will be installed:
libexpat1 libmpdec3 libpython3-stdlib libpython3.10-minimal libpython3.10-stdlib libreadline8 libsqlite3-0 media-types
python3 python3-minimal python3-pkg-resources python3.10 python3.10-minimal readline-common supervisor
0 upgraded, 15 newly installed, 0 to remove and 0 not upgraded.
Need to get 6905 kB of archives.
After this operation, 25.7 MB of additional disk space will be used.
(just found the piece of software itself useful for this use case, figured I'd share my experiences)My problem is that it's not always immediately clear how software that would normally run as a systemd service could be launched in the foreground instead. It usually takes a bit of digging around.
But if inside Docker when something else already has the job of restarting things if they fall over, then it feels a bit over complicated in that there are multiple ways of doing the restarting. Plus, I think there is a touch more visibility - it's all just command line arguments to parallel:
parallel --will-cite --line-buffer --jobs 2 --halt now,done=1 ::: \
"some_proc some args" \
"another_proc some more args"Maybe it's not dead. Maybe it's just finished. Does everything need to keep changing? Change isn't always improvement, and even if it is, if you have to maintain backwards compatibility, sometimes the conceptual load of having to keep the old ways and the new ways in your head all the time isn't worth it.
Maybe we should start letting things just be finished.
Why does a language being stable mean it's dead? Is Awk dead?
Keeping long-term backward compatibility does not necessarily mean dying. C is 50 years old and still alive. I have written a lot more Perl than Python. IMHO, Perl is dying because its syntax is arcane and confusing. We can't solve this problem unless we design a brand new language.
Asking for citations is fine. But GNU parallel wants to treat it like a requirement of using the software, without making it a condition of the copyright: "== Is the citation notice compatible with GPLv3? ==
Yes. The wording has been cleared by Richard M. Stallman to be compatible with GPLv3. This is because the citation notice is not part of the license, but part of academic tradition."
This is disingenuous, because citing every tool you use in preparing a scientific work is not part of academic tradition. And the statement that "If you pay 10000 EUR you should feel free to use GNU Parallel without citing." doesn't make any sense in the "academic tradition" framing. If Ole thinks citations are required by academic tradition, that shouldn't change if I pay him enough money.
"If you disagree with Richard M. Stallman's interpretation and feel the citation notice does not adhere to GPLv3, you should treat the software as if it is not available under GPLv3. And since GPLv3 is the only thing that would give you the right to change it, you would not be allowed to change the software.
In other words: If you want to remove the citation notice to make the software compliant with your interpretation of GPLv3, you first have to accept that the software is already compliant with GPLv3, because nothing else gives you the right to change it. And if you accept this, you do not need to change it to make it compliant."
And this is legal nonsense. If I release something under a license, and then break that license, that doesn't nullify the original license. Claiming otherwise would allow me to un-copyleft someone else's code.
Whether or not it's standard is irrelevant. Ole asked you to cite him if you use it. So, if you publish academically, either don't use it or cite him. If not using GNU Parallel hinders your science then the tool must be material to your work flows.
For comparison, how many dumb citations do people add to their papers that point to marginally relevant work coming out of the same research center or academic lineage? Those aren't scientifically relevant but they are standard. Let's not pretend the academy is full of citation purists.
Quite honestly, I think the behavior is on the highest order of jerkishness. A nice request could be done in the documentation, instead the path chosen is to bully users of the software.
Once more, because it is free software, we are free to use it despite what Ole thinks. We are free to patch it out too.
Why? Whether something has contributed meaningfully to my research is my decision, not Ole's. Not having light "hinders my science", so I'll be sure to cite Edison on all my papers.
I agree with the sibling commentator that Ole's behavior is jerkish. Not because he asked for citations, but that he misleads users by claiming his request is standard, when it is decidedly not. He also obfuscates the voluntary nature of his request as much as possible, to make it seem like citing is a legal requirement. And he is inflammatory in responding to people who make the perfectly valid decision to not cite him, or to patch the notice out.
I have never felt this, and that is not how FOSS works. By definition, they cannot restrict how you use the software. Thus, the citation request is just a request. Hypothetically, you could slander and ruin the author's life (the extreme polar opposite of a citation) and still freely use the software.
This is no different than an author asking users to retweet, post on reddit, etc. Certainly it may be annoying to some, but it does not restrict how you may use or fork the software.
One could fork GNU parallel to remove the copyright, and let the democratic public user base vote on whether they care enough to use your fork, or if they think you (or the other author) are an asshole, etc.
Right, and that's what the rest of my sentence was meant to convey. However, the author goes to great extent to obfuscate this fact, as this faq demonstrates: https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/cita...
Not once in that 2000-word rant does Ole outright state that citation is entirely voluntary, and not a condition of the license. Instead, he describes the notice's "GPLv3 compatibility" in a way that incorrectly states you must either respect the license notice or treat the software as it is not open-source. He also responds with vitriol to people who do choose to fork his software, as evidenced here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=905674
I wouldn't have a problem with the program's current behavior if it simply made you type 'i understand' instead of 'will cite', and made clear that it was a non-binding request. As is, the program attempts to sound like a license agreement while Ole insists to maintainers it is not.
E) I understand and use GNU Parallel and also completely disagree with the author's insistence that citing tools is appropriate.
Even in your second link, almost everything listed are papers about Parallel itself. If I was writing about Parallel, I'd be fine with citing it. If instead it's the means to another end, I wouldn't.
As others point out, it's further annoying because it doesn't even make any sense to begin with. If it was asking for donations or something I could maybe even get behind it, but the current message is pretentious and useless. It serves no real purpose.
The Golden Rule.
You would be pissed if you spent years on something, felt it was a contribution, saw the community use it, asked them to cite it, and weren't cited.