GNU Parallel Cheat Sheet [pdf](gnu.org) |
GNU Parallel Cheat Sheet [pdf](gnu.org) |
From the man page:
"--citation Print the BibTeX entry for GNU parallel and silence citation notice. If it is impossible for you to run --bibtex you can use --will-cite. If you use --will-cite in scripts to be run by others you are making it harder for others to see the citation notice. The development of GNU parallel is indirectly financed through citations, so if your users do not know they should cite then you are making it harder to finance development. However, if you pay 10000 EUR, you should feel free to use --will-cite in scripts."
Asking for donations/citations is one thing, but putting this junk in the man page about 10000 EUR and nagging users is quite an annoyance. How GNU allows such junk in their man pages puzzles me. Obviously the GPL allows one to remove the nagware and redistribute, but I don't know if anyone has forked it.
It's a great tool I'm sure, but I've been able to get by using just xargs, flock, etc., for most usecases.
This isn't nearly the case, so until then blaming FOSS authors for some experimentations is just unwarranted.
Citing it or not is an issue of academic practice/considerations (whether its use was a significant part of the research etc.). Mandating it through nag messages is too much.
What's next? make will print ads while the compilation runs? GIMP will watermark my images if I don't pay 10K or promise to cite it if I make figures with for my paper?
So again, my main confusion is about how this can be an official GNU tool.
They spent a lot of time and effort, and made a cool thing and gave it away for free. If it bothers you so much, just add the flag. Or patch it out.
"GNU Parallel is indirectly funded through citations. It is therefore important for the long term survival of GNU Parallel that it is cited. The citation notice makes users aware of this."
It's a bit like saying:
"Webkit is indirectly funded by iPhones. It is therefore important for the long term survival of Webkit that people purchase iPhones. The iPhone notice make users aware of this."
Imagine if every utility, library, or driver in a typical Linux distribution took this approach. :(
I encourage Debian et al. to adopt a "no nagware" policy.
> Programs whose authors encourage the user to make donations are fine for the main distribution, provided that the authors do not claim that not donating is immoral, unethical, illegal or something similar; in such a case they must go in non-free.
BTW, the nagware code has been removed in Debian unstable:
To me the dialog box is actually worse, because the program often blocks until you close the dialog box (not 100% sure if that is the case with Firefox).
With GNU Parallel you run 'parallel --citation' once, and you are done. We are talking an effort of 15 seconds or less.
When I install a library I often have to run the install command and it often takes longer than 15 seconds.
Finally, I would like to understand why you do not just use another utility? Would that not solve your issue?
- --joblog writes out a detailed logfile of the jobs, which can be used to resume from interrupted runs with --resume{,-failed}
- `--slf filename` can be used to provide a list of ssh logins to remote worker nodes to run jobs. Importantly, parallel will automatically reread this list when it changes. This lets you very easily distribute batch jobs across preemptible gcloud vms (or ec2 spot instances) and gracefully handle worker nodes appearing/disappearing with just a few lines of bash https://gist.github.com/gpittarelli/5e14fb772ce0230a3c40ffad...
- When used with bash, parallel can run bash functions if you export them with `export -f functionName` .
For ad-hoc system modifications I've found myself using tmux's synchronize-panes feature, or xargs. For anything bigger or more involved then I break out Ansible/Chef/Puppet depending on which client project I'm working on.
I remember one place I worked at had a huge elaborate configuration/deployment system hand written by the head IT guy which used Parallel+bash+perl extensively. Thing is, while it was a great system, I could make the same changes in Ansible or Puppet with a couple of lines and push them within minutes, while making changes using the hand written system might take hours. Plus no logging and poor error handling led to all sorts of problems with that system, despite it being a real labour of love by that wacky Finnish dude.
However this sheet is really nice because it is just one side of a letter/A4 piece of paper and lays out the information clearly. I definitely want to mess around with Parallel now because of this cheat sheet. I wonder how it was typeset or laid out on the page? I try to write my own cheat sheets but they always seem way too sparse with too much white space. Maybe it is written in LaTeX or similar.
I also use it as a rudimentary queue system for stacking up the next jobs (while scripts stack up the next jobs, but..).
It had a bit of a learning curve because the docs are really technical and not geared towards new users enough, but reading and re-reading and trying some examples helped cement.
Here are a few ways I use it:
echo "Number of RAR archives: "$(ls .rar | wc -l)
ls .rar | parallel -j0 1_1_rarFilesExtraction
ls -d stocks_all/Intraday/*.txt | parallel -j${ccj}% 1_2_stockFileProcessing {}
I'd like to scale this to work with multiple machines (as Parallel can do) but I get really tempted to just write my own parallel processor just to rely on my own code.
I wrote a very different style of command parallelizer that I named lateral. It doesn't require constructing elaborate commandlines that define all of your work at once. You start a server, and separate invocations of 'lateral run' add your commands to a queue to run on the server, including their filedescriptors. It makes for easier parallelization of complex arguments.
Take a look if this sort of thing interests you, as I haven't seen anyone write one like this before. Its primary difference is the ease with which each separate command can output to its own log, and the lack of need to play games with shell quoting and positional arguments.
Check it out: https://github.com/akramer/lateral
Can you make a comparison between lateral and sem?
Can a single lateral server queue be used across multiple host machines? And in the other direction, can lateral launch and monitor processes that reside across multiple machines?
https://github.com/Miserlou/Loop
The author of GNU Parallel wrote a pretty detailed comparison, which you can find in the linked README.
Additionally, you can run a series of unrelated commands that aren't from a list/piped in with parallel using the `--` syntax:
`parallel -j 3 -- ls df "echo hi"`
You can limit system load using parallel, which as far as I know isn't possible with xargs: `parallel -l L` where L is the average system load you want to remain beneath.
- Splitting input lines into multiple fields and building more complex commands from them
- Running jobs on remote nodes
- Pausing/resuming batch jobs (--joblog)
- ETA and progress bars
- Passing data to programs on stdin and generally many, many other ways of distributing and collecting data that xargs can't do
You can see a bunch of examples at: https://www.gnu.org/software/parallel/man.html
$ PAGER=cat man xargs | wc -l
259
$ PAGER=cat man parallel | wc -l
3985PAGER="wc -l" man xargs
(although my man page for xargs is just 211 lines)
I normally use xargs for simple things and if it’s a regular business operation I’d setup a task queue but there’s a fair amount of work in the middle where it’s nice to have a solid tool with most of the features you could want built in and tested.
You might want to look at:
https://unix.stackexchange.com/questions/104778/gnu-parallel...
I'll say that field separation / null termination is a bit annoying for xargs/find etc-but more so perhaps for novice users of shell. I do like shell pipelines, but quoting can be nearly.
I see loads of commercials for buying iPhones. I do not see a lot of commercials for citing GNU Parallel.
If I have to make a citation, it will cost me no money, but one line of text if I write an article. If I have to buy an iPhone, it will cost me many hours of work.
To me the two things are not even close to be similar.
But I can find one similar aspect: No one forces you to use an iPhone.
1. It included a click-wrap agreement in violation of the Debian Free Software Guidelines.
2. Fishing for inappropriate citations should not be encouraged, as it compromises the integrity of scholarship.
At least with web browsers they are user-facing and you only have a few of them to deal with.
I never choose parallel intentionally, but I still encounter the nagware messages in the output of scripts that other people wrote. And disabling the nagware message on my laptop doesn't disable it in a container, in the cloud, etc.. It's very annoying.
Wasting 15 seconds of human time certainly isn't scalable over dozens or hundreds of utilities. And applying the Steve Jobs computation[1], 15 seconds * 1 million users = nearly six months of wasted human lifetime.
Fortunately Debian-unstable seems to have fixed the issue by removing the nag message (which violates the DFSG). With luck this will propagate into mainline and into all of the downstream distributions like Ubuntu.
[1] https://www.folklore.org/StoryView.py?story=Saving_Lives.txt
Here’s my thought process:
- the GNU Parallel author(s) want/wants people to use and contribute to it.
- they think that most users are academics who write papers and that potential users will find the project after reading the citation, which may or may not be true
- they include a nagware message that “reminds” users to cite the software
- despite the message being controversial and being the subject of the #1 comment in an otherwise unrelated HN thread about the software in general, an FAQ is written to back up the existence of this message
This brings me to the question of whether the inclusion of this message acts more as a deterrent to potential contributors and users. I agree with the motivation, but the means feels petty and undercuts the original goal.
"In other words: It is preferable having fewer users, who all know they should cite, over having many users, who do not know they should cite.
If the goal had been to get more users, then the license would have been public domain.
...
The citation notice is about (indirect) funding - nothing else."
Does that fit with your assumption that "the GNU Parallel author(s) want/wants people to use and contribute to it"?
Googling for "bsd parallel command" doesn't seem to show anything relevant.
My configure.ac recipe for the proper parallel is this, setting logs_all to the GNU parallel version.
dnl GNU parallel, skip the old non-perl version from moreutils so far
AC_CHECK_PROGS([PARALLEL], [parallel])
logs_all=logs-all-serial.sh.in
if test -n "$PARALLEL"; then
AC_MSG_CHECKING([PARALLEL version])
parallel_version=`$PARALLEL --version 2>&1 | head -n1 | cut -c14-`
case "$parallel_version" in
[0-9]*) AC_MSG_RESULT([$parallel_version])
logs_all=logs-all-parallel.sh.in
;;
*invalid*)
PARALLEL=
parallel_version="skip old moreutils version, need GNU parallel"
esac
AC_MSG_RESULT([$parallel_version])
fi
AM_CONDITIONAL([HAVE_PARALLEL], [test -n "$PARALLEL"])moreutils parallel was written 2008, and added to moreutils 2009, just when Ole asked them. https://git.joeyh.name/index.cgi/moreutils.git/commit/?id=0f...
https://www.gnu.org/software/parallel/history.html
Hence I called it a "fork". Independent, yes, but when you know about the project, steal its name and put it into wide distribution under that same name, because you think your name has a better chance of being adopted, this is called a fork. Like a pitchfork. Poking into the original authors eyes with a sharp instrument.
I’m confused as to how “[not including citations] would not have been sustainable in the long term” unless either citations become money at some point or the author is motivated sufficiently by citations to the extent that they would otherwise not work on the project.
If you are an author or are involved in the project, please know that this isn’t intended to be an attack, I’m just interested as to why a project would do something that seems counterintuitive (at least from my point of view).
https://lists.gnu.org/archive/html/parallel/2013-11/msg00006...