GNU Parallel, where have you been all my life?(alexplescan.com) |
GNU Parallel, where have you been all my life?(alexplescan.com) |
I was expecting something simple as 'parallel -j10 curl https://whatever' but couldnt find the right syntax in less time that took me to prepare a dirty shell script that did the same.
wrk -t2 -c100 -d30s -R2000 http://127.0.0.1:8080/index.html
> This runs a benchmark for 30 seconds, using 2 threads, keeping 100 HTTP connections open, and a constant throughput of 2000 requests per second (total, across all connections combined).Some distros include `ab`[2] which is also good, but wrk2 improves on it (and on wrk version 1) in multiple ways, so that's what I use myself.
parallel -j 10 curl 2> /dev/null \
::: $(for i in {1..10};do echo 'https://whatever.com';done) /usr/bin/parallelBut because of the mini learning curve on each use and because I find I need a little more boiler plate to use parallel, I use xargs -P more often, only using parallel when I need its special features (e.g. multiple hosts or collating the output streams).
Oh also, parallel itself can be a bit of a resource hog. (Obviously that depends a lot on how you're using it-- but I mean in cases where xargs' usage is unnoticeable I sometimes have to change the size of my jobs to get parallel out of the way).
Thanks for the link to the book: https://zenodo.org/record/1146014
Here are a bunch of examples: https://www.gnu.org/software/parallel/parallel_examples.html
A fun one I end up using ~monthly or so for various things (usually with more switches added as needed):
GNU Parallel as queue system/batch manager
# start queue
true >jobqueue; tail -n+0 -f jobqueue | parallel
# add job
echo my_command my_arg >> jobqueue
# to start queue for remote execution
true >jobqueue; tail -n+0 -f jobqueue | parallel -S ..Before I learned of parallel, I tried a hack where I'd manually assemble jobs into batches, and wait on the batches before starting the next. It achieved very low system utilization, because inevitably, one job each the batch takes much longer than the rest. A slight improvement (still not good), is to use `split` to chop your jobs file into $num_cores chunks, and background each chunk. But still, this gets low utilization. Problem being that you aren't using a thread/worker pool.
Parallel (or, TIL, xargs) can maintain 100% system utilization, until the very last $num_cores jobs.
But it can be done in pure BASH: https://gist.github.com/mped-oticon/b11dafa937e694ce4fa6fbf2...
GNU parallel supports expansion, which bash_parallel doesn't. However bash_parallel works with bash functions, which GNU parallel doesn't.
Nix + GDAL + GNUParallel + autoscaling groups === massive geospatial data processing pipeline
#!/bin/bash
cat - | parallel --line-buffer --pipe --roundrobin jq "$@"
E.g. in Python this would all be very easy to do. Just start a bunch of threads and e.g. invoke subprocess.run() from them.
I am trying to use Python by default when writing scripts nowadays, but sometimes the best tool for the job isn't Python or writing your own Python.
From this perspective, the languages of the glue, the libraries, and the external code all matter less than the ease of writing the glue; interfacing with the external code; and maintaining the libraries. The best language for this probably comes down to a combination of what you're comfortable writing (and reading, and maintaining) and what kinds of tasks you're trying to solve.
For me personally, using Python glue and libraries strikes a pretty good balance here. Writing a script "in Python" doesn't mean you need to reinvent the wheel. If you think `parallel` provides a better interface for map-reduce parallelism than `subprocess` (or than a library function you've written on top of `subprocess`), no problem: you can just call `parallel` from Python (and you'll probably find yourself writing a library function on top of it to abstract away the fact that it's a shell script).
But if you're much more effective working in Bash than Python, then writing your glue and developing your libraries in Bash could be the way to go.
Then you _have_ resort to 'wait <pid>' with the 20 lines of bash coded need to manage all those PIDs. I have a large editor bash snippet just for that.
It might be a culture thing. In .NET code I see people running things in parallel a lot within code but maybe this is less so for linux tools.
Maybe functional programming style could lend to a parallel-first programming style, with heuristics to decide when it isn’t worth it.
Imagine a world where there were only GPUs for example - then everyone by default would be running parallel-first code, and in that imaginary world you would need to do nothing to run a series of bash commands piping into each other in parallel.
Done that many, many times and honestly combining python with parallel is in many cases the best way to go. Write your python script to be as fast as possible on one core and then use parallel to run it on all your cores. This has the added advantage that you can go from running on all the cores on your machine to running on all the cores on a 100 machine cluster by just changing a couple of lines of code.
In a proper programming language, we'd have something like
parallel [1..5], i => { sleep random()*10+5; possibly_flaky i }
// [{"Seq": 4, "Host": ":", "Starttime": 1692491267...
And `parallel` would only have to worry about parallelization.Instead, the shell environment forces programs to invent their own parameter separator (:::), a templating format ({1}), and a way to output a list of structures (CSV-like). You can see the same issues in `find`, where the exec separator is `\;`, the template is `{}`, and the output is delimited by \n or \0. And `xargs` does it in yet another different way.
It's very hard to acquire and retain mastery over a toolbox where every tool reinvents the basics. If you ever found yourself searching "find exec syntax" multiple times in a week, it's not your fault.
As for alternatives, I'm a fan of YSH[1] (Javascript-like), Nushell[2] (reinvented from first-principles for simplicity and safety) and Fish[3] (bash-like but without the footguns). Nushell is probably my favorite from the bunch, here's a parallel example:
ls | where type == dir | par-each { |it|
{ name: $it.name, len: (ls $it.name | length) }
}
[1] https://www.oilshell.org/release/latest/doc/ysh-tour.htmlIt isn't even just the newer shells that have solved this, zsh also has a solution out of the box¹. The extensive globbing support in zsh can largely replace `find`, and things like zargs allow you to reuse your common knowledge throughout the shell.
For example, performing your first example with zargs would use regular option separators(`--`), regular expansion(`{1..5}`), and standard shell constructs for the commands to execute.
I'll contrive up an example based around your file counter, but slightly different to show some other functionality.
f() { fs=($1/*(.)); jo $1=$#fs }
zargs -P 32 -n1 -- **/*(/) -- f
That should recursively list directories, counting only the files within each, and output² jsonl that can be further mangled within the shell². You could just as easily populate an associative array for further work, or $whatever. Unlike bash, zsh has reasonable behaviour around quoting and whitespace too.Edit to add: I'm not suggesting zargs is a replacement for parallel, but if you're only using a small subset of its functionality then it may be able to replace that.
¹ https://zsh.sourceforge.io/Doc/Release/User-Contributions.ht...
Spending extra time doing simple things — because you need to Google e.g. "how to pass multiple space-separated arguments from a string to a command" — is also a waste of time.
Honest question, as I’m struggling to leave the shell environment once the program gets too large. I could use Perl, but $? and the likes get quickly out of hand. Python’s support for pipes was difficult last time I used it, but that may have changed. What would you recommend?
GNU xargs implements limited parallelization, and is compiled C. This functionality is present within busybox, including the Windows version.
https://www.linuxjournal.com/content/parallel-shells-xargs-u...
GNU Parallel will have much greater functionality, but it will not reach as far as xargs.
Time to rewrite it in Rust /s
:p
If you feel like the answer is rewriting the shell, the answer is practically never rewriting the shell. It's learning to use it.
parallel 'sleep {= $_=rand()*10+5; =} ; possibly_flaky {}' ::: {1..5}
The {= =} escapes to perl, so you have a full programming language available.Contrast that with GPU shaders where one C-style loop operates on buffers separate from system memory, and can't access system services like network sockets or files. GPUs have around 32 or 64 physical cores, so theoretically that many shaders could run simultaneously, although we rarely see that in practice. And we'd need bare-metal drivers to access the GPU cores directly, does anyone know of any?
The closest thing now is Apple's M1 line, but it has specialized NN and GPU cores, so missed out on the potential of true symmetric multiprocessing.
The reason I care about this so much is that with this amount of computing power, kids could run genetic algorithms and other "embarrassingly parallel" code that solves problems about as well as NNs in many cases. Instead we're going to end up with yet another billion dollar bubble that locks us into whatever AI status quo that the tech industry manages to come up with. And everyone seems to love it. It reminds me of the scene in Star Wars III when Padme notes how liberty dies with thunderous applause.
Many thanks to Ole Tange for developing the wonderful tool and helping the users on Stack Overflow sites to this day.
Shameless plug, I am developing a tutorial on GNU Parallel to be presented at eScience conference in Cyprus this year: https://www.escience-conference.org/2023/tutorials/gnu_paral...
mkdir /tmp/g
seq 1 10 | tr \\n \\0 |
xargs -0n2 -P4 bash -c 't=$EPOCHREALTIME; sleep $((RANDOM%5)); echo "$@" >/tmp/g/$t' d0
cat /tmp/g/*
Another one is xargs -P "$(nproc)" --process-slot-var=s sh -c 'grep X "$@" >>/tmp/g.$s' d0
cat /tmp/g.*
You can also cobble together that second style with a custom config setup wherein a command is given $s and responds with some host names and there might be an `ssh` in front of the `grep`, for example. That `d0` argument (for $0) is a bit janky and there can be shell quoting issues, of course. But then again, you may not have hostile filenames/whatever. Remote loadavg adaptation might be nice, but then again, maybe you control all the remotes. Similarly, I could not get back-to-back executions of the EPOCHREALTIME thing closer than 250 microseconds. So, collision basically will not happen even though it probably could in theory.It’s like xargs with sane defaults and a couple tricks of its own.
It looks like the actual name of the task-spooler command on Debian after install is “tsp”, not “ts”. So no collision :)
Now it just remains to be seen if the package by default allows the tasks to continue to run after I log out, or if systemd will annoyingly kill the tasks after I disconnect from ssh the same way systemd annoyingly kills my “screen” sessions when I disconnect ssh, and there is some cumbersome thing you have to do on each of your systems to have systemd not kill “screen” :(
And still answering every xargs Stackoverflow question with "you should use GNU Parallel" instead of answering the question? That really gets old quickly when googling for xarg answers.
These are just some of the reasons I'll never use parallel. xargs is perfectly fine for most usecases, and it can do everything I need it to.
find ./ -type f -iname '*.jpg' -size +1M -print0 | parallel -0 mogrify -format webp -quality 80 {} xargs -n 1 -P 8 #!/usr/bin/env bash
set -e
main() {
if [ "$1" = "handle-file" ]; then
shift
handle-file "$@"
else
find . \
-type f \
-not -path '*/optimized/*' \
-print0 \
| xargs \
-0 \
-L 1 \
-P 8 \
-I {} \
bash -c "cd \"$PWD\" && \"$0\" handle-file \"{}\""
fi
}
handle-file() {
echo "handle-file $1 ..."
}
main "$@"* Grouped output (prevents one process from writing output in the middle of another's output) * In-order output (task a output first, task b output second even though they ran in parallel) * Better handling of special characters * Remote execution
More here: https://www.gnu.org/software/parallel/parallel_alternatives....
I was a little put off by the annoying/scary citation issue mentioned by another commenter, so I am not sure I will use parallel.
I want to pipe the output of parallel processes into a utility that I wrote for progress printing (https://github.com/titzer/progress), but I think that neither of these solutions work; my progress utility will have to do this on its own.
Edited to add: finally got signed in to work, you create the script via:
parallel --embed > scriptname.sh
It's about 14,000 lines of awesome and works on "ash, bash, dash, ksh, sh, and zsh"In this case, we don't have control over the docker images used to build our apps.
See also https://hn.algolia.com/?q=gnu+parallel for other related discussions.
For example, translating a large list of IPv4 ranges into a standard format for a firewall rule-set parser:
cat ~/blacklist.p2p | parallel --ungroup --eta --jobs 20 "ipcalc {} | sed '2!d' " | grep -Ev '^(0.|255.|127.)' >> ~/blacklist_p2p_converted
Makes an annoyingly slow task tolerable, as parallel doesn't block while fetching to preserve order. We probably should rewrite this to be more efficient, but this task is run infrequently.
Happy computing =)
While there are cases when it makes sense to stick to what is specified by POSIX, there are also cases when the POSIX specification is so obsolete that using POSIX instead of some free ubiquitous programs is a big mistake.
Among these latter cases are writing scripts for a POSIX shell instead of writing them for bash and using xargs instead of parallel.
Second paragraph: I want to test my test-tester.
OP 100% fell down a rabbit-hole.
"they execute extensive scenarios against a live service over HTTP"
Any time I've seen people think they've needed to test live services, over HTTP... it means that there are far deeper issues.
Curious if anyone else has experiences with it, honestly been surprised at how little I've heard about it
2) Shared resources (in-memory mutable data, hardware devices) mean the ratio of contention to CPU work goes up when you have more cores.
3) Cores on a single die need to share the same constraints - thermal limits and transistor count. So you're best off having enough powerful cores to get you to a sweet spot of single-core performance vs multi-core parallelism.
4) It's hard to provide a performant and useful many-core machine model. Cache coherence makes it easier to program a many-core machine, but limits performance. Without it, you're stuck with distributed systems-style problems.
An AMD GPU is a grid of independent compute units on a memory hierarchy. At the fine grain, it's a scalar integer unit (branches, arithmetic) and a predicated vector unit, with an instruction pointer. Ballpark of 80 of those can be on a given compute unit at the same time, executed in some order and partially simultaneously by the scheduler. GPU has order of 100 compute units, so that's ~8k completely independent programs running at the same time.
You've got a variety of programming languages available to work with that. There's a shared address space with other GPUs and the system processors, direct access to system and GPU local memory. Also some other memory you can use for fast coordination between small numbers of programs.
There's a bit of a disconnect between graphics shaders, the ROCm compute stack and what you can build on the hardware if so inclined. The future you want is here today, it just has a different name to what you expected.
If there's no straightforward way to do that, then I'm afraid that hardware represents a huge investment in the wrong direction.
Because a GPU can be built from the general-purpose multicore CPU I'm talking about. But a CPU can't be built from a GPU.
What I'm getting at is that if I have to "drop down" to an orthodox way of solving problems, rather than being able to solve them in the freeform way that my instincts leads me, then I will always be stifled.
Intel ca. 2010, probably
The real problem we are facing is that our programming models aren't parallel by default.
>By Moore's Law, we could have had MIPS machines with 1000 cores around 2010, and 100,000 to 1 million cores today, for under $1000.
You can have 10000 RISC-V cores on an FPGA but nobody cares. Why? Because even a bit serial processor (that means it processes one bit per clock cycle, or 32 clock cycles for a 32 bit addition) runs into memory bandwidth limitations very quickly if you have enough of them. Main memory is very slow compared to registers and caches. The only way to utilize this many cores is by having a workload that is entirely latency bound. Your memory access pattern is perfectly unpredictable. The moment you add caching, the number of cores you can have shrinks dramatically and companies like AMD are not slimming down their CPUs, they are adding more and more cache. Their highest end processors have almost a gigabyte of cache.
I agree about the programming models not being parallel by default, and that's one of the things that I specifically rail against in most of my comments. MATLAB/Octave is a good introduction to what parallel programming could be. Also the endless doubling down on large caches, because the multicore design I have in mind would mostly eliminate cache and use that die area for cores and local memories.
I think we're slightly talking past each other here though. The CPU I want to build would have around 10-256 cores on 90s tech. So the same transistors holding 1 Pentium Pro would allow for 1-2 orders of magnitude more MIPS or RISC-V cores and local memories. The design is so simple that I think that's why it was missed by the big fabs.
Today there's little demand for 1000+ cores, but that's partly because nobody can see what they could do. But we can't design the thing, because the status quo has us all working pedal to the metal in first gear to make rent. It's a chicken and egg problem that has a lower likelihood of being solved as time goes on. Which is why I think we're on the wrong timeline, because if the system worked then actual innovation would become more accessible over time.
At least, 6000+ 32-bit multiplies per clock tick on ~2GHz+ clocks. Even cheap GPUs easily are 2000+ shaders.
> GPUs have around 32 or 64 physical cores
NVidia SMs and AMD WGPs are not "cores", they are... weird things. They have many shaders inside of them and have huge amounts of parallelism.
As far as grunt-work goes, a "multiplier unit" (literally A x B) is perhaps the most accurate count to compare CPU cores vs GPU "cores", because the concept of CPU-core vs GPU WGP / SM is too weird and different to directly compare.
Split up that WGP / SM into individual multipliers... and also split up the ~3 64-bit multipliers or ~48 CPU SIMD multipliers per core (3x 512-bit on Intel AVX512 cores), and its perhaps a more fair comparison point.
---------
Back 20 years ago, you'd only have 1x multiplier on a CPU core like a Pentium 4, maybe as many as 4x with the 128-bit SSE instructions.
But today, even 1x core from Intel (3x 512-bit SIMD) or 1x core from AMD (4x 256-bit SIMD) has many, many, many more parallel elements compared to a 2004-era CPU core.
They aren't weird things. They are the equivalent of CPU cores. By your logic CPU cores aren't CPU cores, "they are... weird things" because of SMT.
https://en.wikipedia.org/wiki/Cray_X-MP
Price US$7.9 million in 1977 (equivalent to $38.2 million in 2022)
Weight 5.5 tons (Cray-1A)
Power 115 kW @ 208 V 400 Hz[1]
CPU 64-bit processor @ 80 MHz[1]
Memory 8.39 Megabytes (up to 1 048 576 words)[1]
Storage 303 Megabytes (DD19 Unit)[1]
FLOPS 160 MFLOPS
In 2070 it still won't be enough for you. It never will be enough.IIRC the citation notice was cleared by Stallman as GPL compatible. I’d be surprised if anyone’s paid, I assumed that’s rhetoric to imply the value of a citation, or lack of citation, for anyone publishing scientific works.
> These are just some of the reasons I’ll never use parallel.
Hey I’ve actually ranted on HN before about the citation notice (e.g. https://news.ycombinator.com/item?id=15319715) - in part because I find the language of the notice a little misleading; it’s not tradition to write citations for tools used to conduct research, and it’s a requirement (not just tradition) to cite academic sources. If I used parallel to speed up some calculations, that doesn’t justify an academic citation. I don’t cite bash or python or C++ when I write papers either. On the other hand, if I’m writing a computer science paper about how to parallelize code, and especially if I compare it to GNU Parallel, then a citation isn’t optional, and I don’t need a guilt trip to add one, it’ll get requested in review, and rejected without. Is there even a journal publication to cite? (Edit: found it - the request is to cite an article in USENIX magazine.) So I find the notice a little irritating and I’m not sure who it’s aimed at exactly, or what the history of Ole feeling snubbed by scientists really is. Maybe some people were trying to compete with GNU Parallel and failing to cite it? Maybe Ole is paid by an organization that appreciates citations and will continue to fund development on Parallel if there’s evidence of it’s use in academia?
That said, GNU Parallel really is totally awesome, the documentation is amazing, and the citation notice is a one-time thing you can silence permanently. I don’t think the notice is a good reason to never use Parallel, and I do think Parallel is worth using, FWIW.
Thia is true, but it also makes it very hard for academics and PhD students who mainly write software over papers. They get no citations and eventually have to leave academia.
If we had a better practice of citing central software we use - at least the academic software that wants to be cited - we could have a more flourishing ecosystem of such software funded by the universities.
Academics seem to have a very blinkered attitude to this. I wrote some software that was popular for a while in a niche field, and people were forever asking me to waste my time by 'publishing' the manual in some pointless journal so that they could cite something and give me credit. Writing useful software counts for less in that world than publishing another pointless paper that no-one will read.
This doesn't scale. Imagine if all the software you used nagged you and had their own individual methods to silence them. I don't think this would be reasonable.
What makes this particular software so special?
Do you have a source for this? Im confused by this, as the GPL section 7 is pretty clear that additional restrictions are effectively void. I suppose it’s technically not contrary to the GPL to idly state those restrictions, but it is contrary to the GPL to expect them to do anything. If the author is deliberately including an impotent clause in the hope that people will follow it anyways, I feel that trying to confuse or scare people into doing something the GPL gives them explicit permission to do is contrary to the spirit of the GPL.
Furthermore, trying to retaliate against people who (as permitted by the GPL) remove the citation notice, as the author here has done, seems very contrary to the spirit of the GPL.
I really hope that whomever adjudicate these disputes regarding licence agreements doesn't care what a random person says about it.
https://gitlab.archlinux.org/archlinux/packaging/packages/pa...
Debian too (thanks to iib for pointing this out)
https://salsa.debian.org/med-team/parallel/-/tree/master/deb...
And looks like the author is aware of both:
https://gitlab.archlinux.org/archlinux/packaging/packages/pa...
But yeah, if guy wants to have the name of the app mentioned there is a BSD license for that I thin...
- # *YOU* will be harming free software by removing the notice. You
- # accept to be added to a public hall of shame by removing the
- # line. That includes you, George and Andreas.
The Open source way of buzzing a contestant [david@pc ~]$ echo foo | parallel echo
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:
Tange, O. (2023, July 22). GNU Parallel 20230722 ('Приго́жин').
Zenodo. https://doi.org/10.5281/zenodo.8175685
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#citation-notice
To silence this citation notice: run 'parallel --citation' once.
foo
[david@pc ~]$ > Tange, O. (2023, July 22). GNU Parallel 20230722 ('Приго́жин').
Looks like the latest release is named after Prigozhin. Yeah, probably that one, although I couldn't find anything in the mailing list to confirm it.https://en.wikipedia.org/wiki/Yevgeny_Prigozhin
edit: all releases are named after current political events:
You can add any message you want into your GPL program. Also, a GPL program does not have to be free.
This has nothing to do with the GPL. You can say in your program that 'by using this software you agree that you're a a cat' and license it under the GPL.
That does not mean the GPL relates to cats in any way.
> All other non-permissive additional terms are considered “further restrictions” within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term.
Debian explains this further in their patch file.
If I put the GPL in my software and add a file next to it that says "Also you can't use this software if you make more than $100k/year", I've pretty clearly added an additional clause that's incompatible with the GPL.
It says "please cite" and "feel free to not cite if you pay".
It doesn't say "must cite" or "you may only not cite if you pay".
IANAL, but it doesn't seem like it would interact with the GPL at all. So the worst that could be said is that the implementation is annoying or in poor taste.
I agree that in this case it’s likely not enforceable/binding especially since the GPL specifically allows you to ignore those terms. Hopefully that’s legally binding in your jurisdiction vs the other party.
But it’s a straightforward clickwrap agreement, even if the terms are non-monetary the GPL simply doesn’t allow these at all. Can’t place any stipulations on how the user uses the software.
What gives?
The startups you mention actually changed their license. That's what GNU Parallel would have to do to make this extra condition ok, but he won't do it because being a GPL-licensed GNU tool is critical to its popularity in the first place.
Part of the issue is that Ole’s citation notice doesn’t appear at first glance to some people to be compatible with the GPL. You have to read the language carefully, and read the history of GNU Parallel’s citation notice, to understand that the notice is not a licensing term.
Another part of the issue is that the notice doesn’t sound like someone just trying to make a living. It sounds like a demand or even a veiled threat, and one that is inflicted on everyone, not just academics. It’s not exactly clear about what the legal requirements even are.
I’m in favor of Ole getting citations, and I’m in favor of his right to ask. But the way it’s being asked for rubs me the wrong way a little bit, and it’s rubbed other people the wrong way a little bit ever since it was introduced. BTW, the whole reason it seems like all hell breaks loose, and the only reason this matters is precisely because the software is widely used. If it wasn’t widely used and it didn’t sit under the GNU umbrella, you’d never hear about this.
[0] https://gitlab.archlinux.org/archlinux/packaging/packages/pa...
https://lists.gnu.org/archive/html/parallel/2013-11/msg00006...
https://GitHub.com/shenwei356/rush
As you mention xargs has parallel capabilities and gargs is Apache licensed software that fixes some of xargs shortcomings:
https://GitHub.com/brentp/gargs
No reason to use gnu parallel.
Where you get the "or pay 10000€" part from? As far as I remember, the software, unless told otherwise, asks authors of scientific papers to cite GNU parallels if they used it when writing their papers. And it doesn't force it, it's not part of the license, but asks you to do so as it's academic tradition to use citations.
You could just ignore the citation and not break the license, no one would think less of you for doing so.
Most likely from the manpage:
If you use --will-cite in scripts to be run by others you are
making it harder for others to see the citation notice. The
development of GNU parallel is indirectly financed through
citations, so if your users do not know they should cite then you
are making it harder to finance development. However, if you pay
10000 EUR, you have done your part to finance future development
and should feel free to use --will-cite in scripts.
If you do not want to help financing future development by letting
other users see the citation notice or by paying, then please
consider using another tool instead of GNU parallel. You can find
some of the alternatives in man parallel_alternatives.
FWIW some distros remove the nagging message (e.g. mine - openSUSE - has it removed and the patch seems to come from Debian so i'd guess Debian and its derivatives also remove it)."If you pay 10000 EUR you should feel free to use GNU Parallel without citing."
https://www.zdnet.com/article/experimental-intel-chip-could-...
Programmability is always the biggest issue, and that's not really a chicken-and-egg problem because decades of research have gone into writing compilers and languages for massively parallel machines -- it's just hard, some would say intractable (and local memories tend to make programmability issues worse.) There are niche or embarrasingly-parallel problems that will run great. But it's hard to sell hardware that will solve only some of your problems well. And GPUs have taken over for many of those very regular problems as well.
The full crossbar, allowing each shader to individually issue a fetch from memory. The shared memory space is not like cache but instead is a shader-to-shader communication scratchpad.
Atomics support, coalescing atomics together.
-------
I mean hell: what is a core? Do remember that on SMs, every single shader (not SM) has its own instruction pointer.
Is the shader a core? No, not really. But SMs aren't a core either.
I wouldn't compare GPU and CPU architecture at all. They're just different. What I did above, breaking both down into individual multipliers then counting them seems like the best way forward, especially as we remain multiplier bound in practice.
Administrators who gauge work quality by counting citations are not helping the world much. Maybe it's time we started citing administrators who help us in our work ... so that their administrators can get rid of them if they are not helping. But of course I'm dreaming in technicolour -- administrators are never really subject to review, it seems.
Which is, to be frank, ridiculous compared to the number of papers it enabled.
> all releases are named after current political events
Section 7 of the GPL specifically says that additional restrictive terms on GPL software (like “pay me $1000 or cite me”) can be ignored or removed. If the software’s author doesn’t want people to remove his additional terms he shouldn’t have used the GPL. Publicly shaming other open source contributions for doing something that the GPL explicitly and deliberately permits (removing additional restrictive terms) is extremely improper in my opinion.
I think the confusing issue here is that the notice is not a license requirement, it does not add additional licensing restrictions. It’s an honor-system agreement between the user and Ole, and does not involve the GPL. It does seem to be walking a very fine line, and it’s easy for users to not understand the distinction, but I believe the notice does adhere to the GPL’s rules, even if it doesn’t initially appear to for us non-lawyers.
The question was rhetorical, I know that this place is frequented by quite a lot of people who wish to be part of the next YC batch, so they see themselves in the shoes of the startup, rather than the solo dev.
Still, I don't think it should be this way.
Look what parent company of HN does
zsh gives you a config wizard, sudo admonishes you to use it responsibly, just about every iOS app and an increasing number of desktop apps gives you a few pages of “what’s new” every time they’re upgraded. Desktop apps have given tips-on-startup since the 90s.
It does scale solely for GNU Parallel though for now, and very few other people have taken the same tack as GNU Parallel’s citation notice. Despite the potential for a slippery slope, it doesn’t seem to be happening. I’d speculate that if it did start to happen, then GNU would change their stance on what’s allowed by the license, perhaps.
--will-cite
you need a more generalized --clickwrap-consent parameter really. One that just says "whatever it is, I accept and I'll do it".
And that's exactly the thing GPL was supposedly founded to get away from. Restrictions on user freedoms. Especially violations so routine and tedious that we open-palm-slam "accept" without reading them.
You could absolutely write this to not look like a clickwrap agreement and lean on users. "please cite me, I'm an academic and impact matters" in the manfile or --help is not something anyone would ever get upset about or probably even patch to remove.
The only reason it's OK is because basically everyone knows it's not enforceable because of the severability part of the GPL. But it's blatantly designed to look like a serious and enforceable notice to users who don't know that, and require affirmative action from the user to "consent" and bypass the screen. And clickwrap agreements of this type are generally enforceable if there is not something like the GPL that allows you to ignore it.
like I flatly do not get why this is even debatable or questionable, the dude is trying to pull a fast one on users with a scary-sounding legal notice that implies that you need to accept this clickwrap agreement. and it's not entirely clear that he cannot actually burden you with this in all jurisdictions, since it's an agreement between you and the author that exists outside the actual source code/distribution. You can end up paying for free stuff in lots of places in life, if you're not aware about what "should" be free, and those agreements stand and are enforceable even though the thing was supposed to be free. You agreed to it. You don't have to, the GPL says that, you can edit the software to remove it without consenting, but you did accept it.
Letting the camel's nose under the tent on clickwrap agreements on GPL'd software is such an incredibly bad idea legally and morally, and this dude has been an utter dick about anyone who questions that. Sure, "he's willing to do it and nobody else is stepping up" but on the other hand he's also going off and attacking other maintainers doing their jobs, too. But that's not Stallman's problem I guess. That's another problem that only works with N=1 jerk, if that was normalized we'd have a problem.
I do not get why this guy is getting this special blessing or dispensation from FSF. Like it's not just that he's a random weirdo releasing under GPL and then trying to add additional terms (lol get stuffed), this is all occurring with the FSF's blessing, Stallman's signoff, and in the GNU distribution. Official GNU clickwrap license I guess.
At the end of the day - if the guy can't be satisfied with a polite request in the manfile, wow that sucks. But the GPL isn't about you, it's about the end user. There are explicitly licenses like BSD that require acknowledgement if that's your thing!
the user isn't merely ignoring the output though, they are actively interacting with the program in a way that the program is presenting as accepting of the agreement being presented to the user.
the agreement is plainly presented in a way that implies that it's an obligation, like any other clickwrap agreement. and everyone except ole and stallman seems to agree that it's self-evidently apparent that it's a clickwrap agreement restricting the freedoms of the user.
"free software that only prints a message and exits unless you agree to a clickwrap with further licensing terms" is not a road that FSF should go down. And it's only because of the GPL severability clause that it's not a crisis, everyone knows it's a farce, except for a bunch of the users, who are affirmatively taking action to indicate consent with an additional licensing agreement.
it's not facially clear that in most jurisdictions that the clickwrap agreement is null and void merely because the software is free. you can end up paying for lots of free stuff in life if you're not careful. you agreed to the agreement, it's on you.
you are of course free to remove the prompt and use the software yourself, and ole rants and raves about that on his website. but, agreeing to the license is a separate thing from the GPL license, most likely. just like paying for credit monitoring is different from getting your free credit reports or freezes - they'll try and railroad you into paying, definitely! and just because it's supposed to be free, doesn't mean you're not getting charged if you agree to it!
Even so, knowingly naming a release after a war criminal is very off-putting.
The author is having musk-ish type of fun and that's their freedom. My freedom is to feel disgust by seeing mass murderers, even if they are treated equal to not controversial topics.
The good news is that the new ‘tradition’ these days for academic software is to open-source all the software written for a paper or academic project, so practically everything done is visible on github & arxiv.
You're welcome?
Seriously though, adding the citation nag to software is two wrongs not making a right.
As a software user, it isn't my fault academia hasn't figured out how to reward software contribution. If they can't figure it out, finding a greener pasture makes a lot of sense.
If you're not writing papers, the citation notification isn't for you. Can't you just mute it and continue using the software without worries?
It _seems_ like a reasonable thing to ask, it's a minor inconvenience, really, so what's the big deal?
The big deal is that the behavior doesn't fit the unix philosophy. Tools are meant to do one thing, and do it well. They get composed in pipelines to get jobs done. In these pipelines, the communication medium is text, via stdin/stdout/stderr. If a tool is unpredictable in what it puts out via text, it can make the whole pipeline unpredictable, or at least more complicated.
If it _was_ okay, we should welcome everyone putting nag features in these simple cli tools, right? Well, I'd be on board with that as long as I can blanket disable all of them. If not, let's just leave our political/professional/begging messaging outside our computing tools. Okay?
Notionally the GPL allows you to disregard this but it may or may not be binding depending on your jurisdiction, and it’s certainly distasteful and against the absolute spirit and most likely the text of the GPL. This is an incompatible term being forced on the end user and the entire license might well be void.
To silence this citation notice: run 'parallel --citation' once.this is like saying that a user doesn't actually agree to anything just because they clicked "accept" in a EULA. you're just clicking buttons in software, it doesn't obligate you to anything!!! but actually yes that is most likely fairly binding in a lot of jurisdictions.
that is, again, literally the definition of a clickwrap licensing agreement and you cannot do that in GPL software, even if it's non-monetary. Requiring the user to submit a selfie in a funny hat would not be permissible under the GPL either. You can't limit what the user does with the software and how, or else it's not GPL.
it's open and shut, clickwrap agreements completely subvert and nullifies the moral stand the FSF is trying to make. And it doesn't matter how innocuous it seems, it undermines the whole point of the exercise.
fortunately the GPL includes a "severability" clause that basically allows you to ignore this and grants you a license regardless. but it is not a good look, it is not good behavior, and if every GPL'd package started adding random clickwrap agreements with big "IM A DOODOO HEAD IF I IGNORE THIS" parameters the whole ecosystem would degrade.
Arch and others are not only allowed but actually morally and practically in the right for stripping these messages, and it doesn't reflect well on Ole at all that he then goes on and throws more tantrums because he doesn't like the consequence of the license he chose.
If he wants to go proprietary, or BSD (which requires acknowledgement!), that's fine, but he's being a child and the terms he are adding are utterly uncompliant with GPL, and it's unprofessional for FSF to even humor him on this. If there were a hundred Oles the FSF would have a real problem on its hands, it's only because he's N=1 jerk that this is remotely tolerable.
But I was responding to the comment upthread "You can add any message you want into your GPL program". If you add a message that says a user of the software must do something / must agree to additional terms / etc, that additional text is not compatible with the GPL. I'm not a lawyer, so I have no idea whether the result would be that the restriction doesn't count and the software is GPL'd, or that the software isn't viably GPL'd because the GPL+clause isn't a valid license for somebody to use.
For one thing, if the author provides the source with a GPL license to user A, and user A sends it to user B, user B has the software under a GPL'd license. The normal reason for dual licensing GPL/proprietary is so that user B can pay money to bundle the software in a non-GPL-compliant way. The author can stop licensing future releases under GPL, but they can't revoke the GPL on already-distributed software.
For another, this isn't what's happening here. GNU Parallel is released under the GPL, and the author is affixing what is debatably an additional term to the GPL'd release, under the claim that it doesn't count as an additional restriction because it's "academic tradition". By the same token, I can add a clause to my software saying that rich people can't use it, because it's "hippie tradition" to stick it to The Man.
The author of Notepad++ for example is famous for adding all kinds of statements associated with the software and in no way is that part of the license.
On the other hand, if your license.txt states to i.e. not use the software for evil aka JSON famously did then yes, it is part of the license.
> If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term.
https://www.gnu.org/licenses/gpl-3.0.en.html
However, this doesn't really even come into play because the citation request is not a restriction on the license. It's not anything. As far as the GPL is concerned, it's just some code, and the GPL grants you the right to redistribute modified copies.
And by renaming it to "free-parallel" you have respected the author's trademark. You can absolutely do this, at the cost of the author being upset at you. They might get upset that "free-parallel" is too close to their "GNU Parallel" trademark but I (IANAL) don't think they'd be legally right about that. GNU Parallel coexists with other software called "parallel".
Which is a fair enough position to take, in my opinion.
But importantly you can use the software however you want that is compatible with GPLv3. That includes ignoring or removing the citation notice without paying a cent. However just because it's legal doesn't mean it won't come with the potential for social consequences.
== Is the citation notice compatible with GPLv3? ==
Yes. The wording has been cleared by Richard M. Stallman to be compatible with GPLv3. This is because the citation notice is not part of the license, but part of academic tradition.
Therefore the notice is not adding a term that would require citation as mentioned on:
https://www.gnu.org/licenses/gpl-faq.en.html#RequireCitation
The link only addresses the license and copyright law. It does not address academic tradition, and the citation notice only refers to academic tradition.
[...]
https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/cita...and from the GPL faq itself (which said citation FAQ references):
Does the GPL allow me to add terms that would require citation or acknowledgment in research papers which use the GPL-covered software or its output? (#RequireCitation)
No, this is not permitted under the terms of the GPL. While we recognize that proper citation is an important part of academic publications, citation cannot be added as an additional requirement to the GPL. Requiring citation in research papers which made use of GPLed software goes beyond what would be an acceptable additional requirement under section 7(b) of GPLv3, and therefore would be considered an additional restriction under Section 7 of the GPL. And copyright law does not allow you to place such a requirement on the output of software, regardless of whether it is licensed under the terms of the GPL or some other license.
https://www.gnu.org/licenses/gpl-faq.en.html#RequireCitation---
TLDR: The citation notice is a "cite it in academic works or pay me" agreement that is as legally binding as a pinky promise. You can break it without concern but some people may look negatively on that and it may come with social consequences.
On all such systems, it is very easy for the user to install any missing POSIX utility, but it is also easy to install any non-POSIX GNU utility.
So not even xargs is certain to exist by default on all systems.
Moreover, POSIX xargs is restricted to execute sequentially all processes.
Any use of xargs for parallel execution is non-POSIX, so in that case there is no reason to not use "parallel" instead.
parallel --embed > parallel.sh
Then store that in your source repo and use it wherever shells are used! $ parallel --embed > parallel.sh
Unknown option: embed
[edit] Ran it in Ubuntu 22.04, it does output a bash script ... which still depends on Perl. If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:
Tange, O. (2023, July 22). GNU Parallel 20230722 ('Приго́жин').
Zenodo. https://doi.org/10.5281/zenodo.8175685
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#citation-notice
To silence this citation notice: run 'parallel --citation' once.
[edit]This is really similar to the kerfuffle where the maintainer of Home Assistant asked distros to not repackage HA; it's a request, a bit at odds with community norms for libre software, but one that people are legally free to ignore.
Sharing this because its the route I went, anything I'd have written in Bash I'd now do in Perl.
Nim is statically typed and (generally) native-compiled, but it has very low ceremony ergonomics and a powerful compile-time macro/template system as well as user-defined operators (e.g., you can use `+-` to make a constructor for uncertain values so that `9 +- 2` builds a typed object as in https://github.com/SciNim/Measuremancer .
My use case is approx. like this: I can get 80% what I want with ls … | sed … | grep -v … but then it gets complicated in the script and I’d like to replace the sed or grep part with some program.
import posix; for line in popen("ls", "r").lines: echo line
in Nim, though you obviously need to replace `echo line` with other desired processing and learn how to do that.You might also want to consider `rp` which is a program generator-compiler-runner along the lines of `awk` but with all the code just Nim snippets interpolated into a program template: https://github.com/c-blake/bu/blob/main/doc/rp.md . E.g.:
$ ls -l | rp -pimport\ stats -bvar\ r:RunningStat -wnf\>4 r.push\ 4.f -eecho\ r
RunningStat(
number of probes: 26
max: 31303.0
min: 23.0
sum: 84738.0
mean: 3259.153846153846
std deviation: 6393.116633069013
)Python to me, is too far away from shell/unix. It is a programming language for writing applications. For the use case of writing shell scripts but in a more powerful language, perl is still the king here (or it should be. Sadly it doesn't appear to be the case. No one is using it except for die hard gray beards.)
Raku is a modern (still a big) language with kitchen sink. Again doesn't appear to be much uptake.
Even allows to add dependencies and if necessary compile the script on the fly.
Just the inclusion of argparse alone is worth it IMO.
> Python’s support for pipes was difficult
Well, the idea would be to replace a lot of your pipe usage.
Off the wall, but Scala has a concise syntax for process operations, but startup time is likely prohibitive.
https://docs.github.com/en/repositories/archiving-a-github-r...
If someone writes a piece of software specifically for the purposes of doing certain types of scientific research, and then other scientists use this software to help conduct published experiments, then IMO it really ought to be possible to give that person meaningful credit for their work. It's a perfectly legitimate way to contribute to a field, even if it does not take the form of a paper. But with the system as it stands, the only way to get meaningful credit is to publish a pointless paper saying, in effect, "Hey! I wrote some software!"
>if you put a bibtex snippet on your site that indicates how you’d like to be cited, that is super helpful.
I should probably have done that, but from my point of view it didn't really matter. I have a name, and the software had a website. I didn't really mind exactly how individual people chose to cite it. The absence of a ready-baked bibtex snippet would never be accepted as an excuse for failing to cite any other kind of source.
There should be a high-prestige “journal of READMEs and User Handbooks,” haha.
While it may not be in any ANSI/ISO spec for C, even Windows has popen these days. There are also some tiny Nim popenr/popenw wrappers in https://github.com/c-blake/cligen/blob/master/cligen/osUt.ni... covering the Windows case.
Depending upon how balanced work is on either side of the pipe, you usually can even get parallel speed-up on multicore with almost no work. For example, there is no need to use quote-escaped CSV parsing libraries when you just read from a popen()d translator program producing an easier format: https://github.com/c-blake/nio/blob/main/utils/c2tsv.nim
In this case, Stallman simply clarified that Parallel’s notice did not count as a legal requirement and does not conflict with the GPL. His opinion wasn’t necessary, but since he wrote the license, it is authoritative. In this case, the question wasn’t brought to court, it was simply a clarifying discussion, and thus his intention did affect how things go in practice.
> And specially someone who is neither licensee or licensor?
Also wasn’t Stallman effectively the licensor or representing the licensor at the time, as president of FSF, head of the GNU project, and author of the GPL?
Generally though, the contents of the text and (if applicable) case law surrounding its interpretation is more important.
Whether his opinion on GPL is relevant, or if it is, how important it is, is up for debate. But I still don’t think it’s “based on a random choice or personal whim, rather than any reason or system”.
Stallman is rightfully the most prominent voice to comment on the spirit of the GPL/CopyLeft/Free Software.
No it isn't. Licences, like most legal documents, are construed objectively. The subjective intention of the author is totally irrelevant to the meaning.
I feel like the whole problem here is that the legality of Parallel’s notice, and the separation of the notice from the GPL, is not at all clear. The language is confusing to users. People who take the license seriously are staying away from Parallel because of the fear of accidentally breaking the license terms.
This is a question of law that only a court can answer.
If it is something that needs to be "confirmed" by someone "authoritatively" then you should ask a lawyer for advice. You should not ask a programmer for a "ruling".
What RMS might be saying is "we won't seek to enforce it". That is completely different.
If you review the thread from the top, you might find the primary question we were discussing from the start before you jumped in is whether the Parallel notice is GPL compliant. Whether Parallel’s notice is definitively and absolutely legally binding on its own and away from the GPL is a nuance you introduced, but it has been answered for all practical purposes by both Ole and RMS. It will probably never go to court or be tested by a judge, partially as a result of what Ole and RMS have said: that the notice is not a license and is not contractual.
There is no dispute about this, and because there is no dispute and because it’s not going to court, the statements by Ole and RMS are the most definitive answer we’ve got, and to date is what people are using when making and acting on decisions about Parallel usage. Both of them have said the Parallel notice complies with the GPL because the notice is not legally binding, so Ole & RMS both were saying more than GNU won’t seek to enforce Parallel’s notice. “Academic tradition” is not legally binding law, and the notice doesn’t reference any other relevant law. The notice is full of legal holes, if you insist on interpreting it as a legal contract. It was written by Ole (not a lawyer) and doesn’t define what research usage would constitute a mandatory citation, nor what happens if the user doesn’t see the notice, or if a citation is inappropriate, or if the citation is rejected by reviewers, among many other possibilities. It doesn’t take a lawyer or judge to see that the Parallel notice is not legally enforceable, and it doesn’t take a legal education to see that it’s not Ole’s intent to enforce it as a contract. He is just asking for citations, in slightly confrontational language.
It would be fair to say that a judge or court, if this issue was ever tested in court, might overrule some aspect of Ole’s or RMS’s stated intent because their language was imprecise and effectively said something different than they meant. Then again, another judge can override the first judge. There’s nothing definitive or absolute or permanent in law, regardless of whether a judges rules on it, and intent does matter in practice. Before this ever goes to court (probably never), all questions on this topic can be (and already are!) answered by non-judges, which is why it’s demonstrably not true to claim this question can only be answered in court or by a judge.
> You should not ask a programmer for a “ruling”.
RMS wasn’t acting as a programmer when he wrote the GPL, btw, nor when he opined on whether Parallel’s notice complies, so in that sense your framing is veering into the hyperbolic.