Hints for writing Unix tools(monkey.org) |
Hints for writing Unix tools(monkey.org) |
If the implementation isn't respecting The Rule of Composition it's actually not adhering to the Unix philosophy in the first place. The tweet is referring to one of Doug McIlroy's (one of the Unix founders, inventor of the Unix pipe) famous quotes:
"This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."
Pure beauty, but it's almost too concise a definition if you haven't experienced the culture of Unix (many years of usage / reading code / writing code / communication with other followers). ESR's exhaustive list of Unix rules in plain English might be a better start for the uninitiated (among which one will find the aforementioned Rule of Composition).
For all those seeking enlightenment, go forth and read the The Art of Unix Programming:
https://en.wikipedia.org/wiki/The_Art_of_Unix_Programming
17 Unix Rules:
https://en.wikipedia.org/wiki/Unix_philosophy#Eric_Raymond.E...
You can do it too, and if you're serious about writing Unix-style filter programs, you will someday need to. How do you know which format to write? Call "isatty(STDOUT_FILENO)" in C or C++, "sys.stdout.isatty()" in Python, etc. This returns true if stdout is a terminal, in which case you can provide pretty output for humans and machine-readable output for programs, automatically.
“please don’t make the behavior of a command-line program depend on the type of output device it gets as standard output or standard input.”¹
① https://www.gnu.org/prep/standards/standards.html#User-Inter...
When used sparingly and thoughtfully, I've never personally had an issue with it.
"ps -f" truncates long lines instead of wrapping, while "ps -f | cat" lets the long lines live
How people usually discover what these commands do is by running them interactively, and if that results in some output being hidden vs being run noninteractively, then they have little reason to believe that it could yield more output than what they're used to seeing. I think a certain number of "ps" users don't know it can display full paths and commands, if they've only ever used it interactively.
It may have some merits, but as a general advice this is definitely an anti-pattern.
Another example is "curl", where "curl URL >outfile" is chatty on stderr, while "curl URL" is quiet on stderr. That's very annoying for scripting, you easily forget to set "-s" in your scripts due to that behaviour.
I love that 'git log' outputs in a pager. 'svn log' by comparison is nuts.
ls is a bit more than just a command though. It's part of the furniture and prehistoric.
Dealing with programs that act differently depending on their output device is very annoying.
cmd := exec.Command("/bin/[", "-t", "1")
cmd.Stdout = os.Stdout
isatty := nil == cmd.Run()Examining the characteristics of the output stream and changing behavior is another "rule" that is not mentioned often. Another example is buffering the output to a large block if sending to a pipe, but making it line-buffered if going to a terminal.
If it's JSON and I know what object I want, I just have to pipe to something like jq [1].
PowerShell takes this further and uses the concept of passing objects around - so I can do things like ls | $_.Name and extract a list of file names (or paths, or extensions etc)
On the other hand you made me thinking and probably you should have three code passes per default:
[0] normal behaviour (exit 0)
[1] bad arguments (exit EINVAL)
[2] --usage (print to stdout but but exit != 0)?
Anyway I am not sure if it makes sense to declare "usage" as normal behaviour.The former, I think, should write to stdout and return 0, the latter should write to stderr and return something non-zero.
Giving help if the user asks for it is normal behaviour.
annoying_program 2>&1 | less
but it is very unfriendly to stymie a user's attempt to get help when they're already probably confused.That approach dates from the days when you got multi-column directory listings with
ls | mc
Putting multi-column output code in "ls" wasn't consistent with the UNIX philosophy.There's a property of UNIX program interconnection that almost nobody thinks about. You can feed named environment variables into a program, but you can't get them back out when the program exits. This is a lack. "exit()" should have taken an optional list of name/value pairs as an argument, and the calling program (probably a shell) should have been able to use them. With that, calling programs would be more like calling subroutines.
PowerShell does something like that.
http://www.catb.org/~esr/writings/taoup/html/ch06s06.html
Or write environment variables to stdout in Bourne shell syntax so the caller call run "eval" on it. Like ssh-agent, for example.
I suggest -0 for symmetry with xargs. find calls it -print0, I think.
(In my view, this is poor design on xargs's part; it should be reading a newline-separated list of unescaped file names, as produced by many versions of ls (when stdout isn't a tty) and find -print, and doing the escaping itself (or making up its own argv for the child process, or whatever it does). But it's too late to fix now I suppose.)
That breaks when you have newlines in filenames, no?
That seems like an extremely pathological case.
This does what you would expect:
echo My brother\'s 12\" records.txt | parallel touch$ printf '"foo bar"' | xargs -n1
and
$ printf '"foo" "bar"' | xargs -n1
and
$ printf "%s" '\\"foo bar\\"' | xargs -n1
In a magical dream world I'd start a distro where every command has its interface rewritten to conform to a command line HIG. Single-letter flags would always mean only one thing, common long flags would be consistent, and no new tools would be added to the distro until they conformed. But at this point everyone's used to (and more importantly, the entire system relies on) the weird mismatches and historical leftovers from older commands. Too bad!
The "portable output" thing is especially subjective. I buy that it probably makes sense for compilers to print full paths. But it's nice that tools like ls(1) and find(1) use paths in the same form you gave them on the command-line (i.e., absolute pathnames in output if given absolute paths, but relative pathnames if given relative paths). For one, it means that when you provide instructions to someone (e.g., a command to run on a cloned git repo), and you want to include sample output, the output matches exactly what they'd see. Similarly, it makes it easier to write test suites that check for expected stdout contents. And if you want absolute paths in the output, you can specify the input that way.
'One thing well' is often intended to make people's lives easier on the console. Sometimes this means assuming sane defaults, and sometimes just a simpler program that does/assumes less. Take these two examples and tell me which you'd prefer to type:
user@host~$ ls *.wav | xargs processAudio -e mu-law --endian swap -c 2 -r 16000
user@host~$ find . -type f -maxdepth 1 -name '*.wav' -exec processAudio -e mu-law --endian swap -c 2 -r 16000 {} \;
Write concise technical documentation. Imagine it's your first day on a new job and you need to learn how all your new team's tools work; do you want to read every line of code they've written just to find out how it works, or do you want to read a couple pages of technical docs to understand in general how it works? (That's a rhetorical question)Definitely provide a verbose mode. When your program doesn't work as expected, the user should be able to figure it out without spending hours debugging it.
It's possible other descriptors would be useful, like stdlog for insecure local logs, stddebug for sending gobs of information to a debugger. It's certainly not in POSIX, so too bad, but honestly stdout is hard to keep readable and pipe-able. Adding just one more file descriptor separates the model from the view.
http://javier.io/blog/en/2014/10/21/hints-in-writing-unix-to...
If you are intercepting UNIX signals (starting with SIGINT), go back to the drawing board and think again. Don't do it. There is almost never a good reason for doing it, and you will likely get it wrong and frustrate users.
Input from stdin, output to stdout: Nicely side-stepped in that most cmdlets allow binding pipeline input to a parameter (either byval or byname, if needed). Filters are trivial to write, though.
Output should be free from headers: Side-stepped as well, in that decoration comes from the Format-* cmdlets that should only ever be at the end of a pipeline that's shown to the user.
Simple to parse and to compose: Well, objects. Can't beat parsing that you don't need to do.
Output as API: Well, since output is either a collection of objects or nothing (e.g. if an exception happened) there isn't the problem that you're getting back something unexpected.
Diagnostics on stderr: Automatic with exceptions and Write-Error. As an added bonus, warnings are on stream 2, debug output on stream 3 and verbose output on stream 4. All nicely separable if needed.
Signal failures with an exit status. Automatic if needed ($?), but usually exception handling is easier.
Portable output: That's about the only advice that would still hold and be valuable. E.g. Select-String returns objects with a Filename property which is not a FileInfo, but only a string; subject to the same restrictions that are mentioned in the article.
Omit needless dagnostics: Since those would be either on the debug or verbose stream they can be silenced easily, don't interfere with other things you care about and cmdlets have a switch for either of that, which means you only get that stuff if you actually care about it.
Avoid interactivity: Can happen when using the shell interactively, e.g.
Home:> Remove-Item
cmdlet Remove-Item at command pipeline position 1
Supply values for the following parameters:
Path[0]: _
However, this only ever happens if you do not bind anything to a parameter, which shouldn't happen in scripts. If you bind $null to a parameter, e.g. because pipeline input is empty or a subexpression returned no result, then an error is thrown instead, avoiding this problem.Nitpick: You'd need ls | % Name or ls | % { $_.Name } there. Otherwise you'd have an expression as a pipeline element, which isn't allowed.
But then you've oddities like plutil behaving like gzip by modifying the file you specify rather than printing to stdout. You have to pass -o and a dash to get it to leave the file alone and instead reformat it to stdout. That one gets me every time. And I'm not alone: https://twitter.com/mavcunha/status/417823730505895936
But other parts are nice. For instance, "system_profiler -xml > MyReport.spx" generates XML that will open in the System Profiler GUI app. The XML generated is usually a Plist, since that's as native to the platform as the Registry might be to Windows...
Let me know when PowerShell gets tabs though. Maybe there's a Terminal.app port running in Mono somewhere? Seriously, I wish somebody would build a better terminal, maybe get creative with scrollback and chaining commands, and ship it in an OS... with tabs. ;-)
Long and Short Options: https://www.gnu.org/prep/standards/html_node/Option-Table.ht...
General Interfaces: https://www.gnu.org/prep/standards/html_node/User-Interfaces...
Command Line Interfaces: https://www.gnu.org/prep/standards/html_node/Command_002dLin...
Program Argument Syntax: http://www.gnu.org/software/libc/manual/html_node/Argument-S...
http://www.robertames.com/blog.cgi/entries/the-unix-way-comm...
""" The two surprising finds in the above documents are the standard list of long options and short options from -a to -z.
Forver and a day I am trying to figure out what to name my program options and these two guides definitely help. It allows me to definitively say you should use -c … for “command” instead of -r … for “run” because -r means recurse or reverse. """
--Robert
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_...
(I'm not so convinced that long options are a good thing, as evidenced by the --extended-regexp/--regexp-extended and other little "was it spelt this way or that?" type of confusions. It's not hard to remember single letters, especially if they're mnemonic.)
curl -kLIiso example.org www.example.org
versus: curl --insecure --location --head --include --silent --output example.org www.example.rog
And of course as a practical matter, with short opts you'll run out of characters eventually, and meaningful mnemonics before that.I can't find documentation on what I mean, but try ip --help
As for dd, it came from a non-UNIX OS and kept the original syntax.
Obviously not every program will use just two file descriptors. Binary isn't handled by stdin and stdout because they're typically used for tty input/output. If you need to handle multiple files you'll take a list of file arguments. Often a program takes no input at all that isn't a command-line option.
And what 'formatting markup'? There is no 'markup' on a terminal, unless you're dealing with colors or something, which you would disable if your fd wasn't a tty. And why would you send 'headers' to a completely different file descriptor anyway?
Oh, I think I get it now. You confused the MVC architecture with Unix programs. Unix programs don't provide a user interface.
Not at all. cat wouldn't have a ncurses GUI, that doesn't make sense. My point is that 'cat --verbose' should be an option, where the stdout doesn't change but extra crap is sent elsewhere, and probably just dumped on the terminal like stderr. I sometimes want to see extra context and line numbers in my grep searches (grep -nC 3 ..) but I might want the stdout to remain clean. This makes programs more composable. Right now it's like we've got stdfmt permanently redirected towards stdout.
In practical terms, vi does its own paging. It's not a wrapper over echo | ed | less. One giant monolithic subsystem. Perhaps vi is the exception. But dd offers a progress bar, but only if you send it a SIG of some sort. wget offers a progress bar by default (silence is golden? not so much). ls yields differently columned outputs to ttys or files. I suppose this is the simplicity of Unix that I shouldn't touch.
Some unix tools work really well already, and I'm not suggesting destroying tar or xargs. I'm not sure how systemd works into this, but I'm not really a fan of that.
I guess Plan9 wasn't Unix, either.
His point is that two streams are not enough, you don't want to present the same output stream or a human, a logfile or an other utility.
> And what 'formatting markup'? There is no 'markup' on a terminal, unless you're dealing with colors or something
Right, so there is markup on a terminal.
> which you would disable if your fd wasn't a tty.
Which would be much simpler to handle if there was a stream for human consumption and one for piping
> And why would you send 'headers' to a completely different file descriptor anyway?
Because headers are useful to human users or when capturing output in a file to read later rather than in an other utility?
In your program's design, the 'cat' program would handle all kinds of file i/o, provide some kind of ncurses text GUI to select a file, a progress bar for the progress of text flowing through it, sending errors to a logging subsystem, storing header metadata in some object passed along its output streams, etc. The Unix designers had dealt with this kind of crap before, and were sick of it, and so they wrote a program which did only one thing.
What you describe is the systemd school of design. If I just make my program more complex and technically superior, i'll have a better program. Who cares that nobody wants to use it, or that it's burdensome, hard to extend, difficult to understand, and incompatible with everything that exists today? Who cares if we can already do all these things without all the downsides? Technical superiority trumps practicality. Well, that's not Unix.
The Unix environment flourished not only because it was widely available, but mainly because it was incredibly efficient. By removing all the things they didn't need, they made the system better. There are four words that accurately express all of this, and that should guide the development of any Unix tool:
Keep it simple, stupid. https://people.apache.org/~fhanik/kiss.html
That said, I'm not entirely sure which git pipes to.
Especially for PowerShell the whole problem that Console2, etc. have is trivial, as you have an API to create a host application instead of relying on polling a hidden console window. The console host is just one of those hosts.
Some work has been done on the svn side: http://svn.apache.org/repos/asf/subversion/branches/automati...
However, it's not on trunk yet because it's hard to find good defaults. An automatic pager makes sense for some commands, but not all -- and in a meritocratic development model this kind of thing can cause an endless discussion... I suspect we'll eventually merge the feature in a disabled by default state and allow users to enable it on a per-command basis.
Git's hard-coding of options passed to the pager has problems, too: https://mail-archives.apache.org/mod_mbox/subversion-dev/201...
git log | wc
vs git log --no-pager | wc
I'm sure it's neither here nor there in practice. More of a hypothetical question.Edit: my mistake, they can't contain nulls either: https://news.ycombinator.com/item?id=8485861
When a human is creating files by hand, I almost certainly agree. When a program is creating files, however, it's only a matter of time before weird characters wind their way in there.
I really wish newlines had been disallowed. (There's UI implications, in addition to the parsing ones — how do you do a list view with newlines in the filename?; I also wish filenames had a reliable character set and weren't just bytes.)
Someone replied on LWN, when he posted his proposal, that he had implemented a sort of home-grown database using non-UTF8 characters for the file names.
Rube Goldberg, indeed!
Show them with the standard escape sequence for a newline:
This\ filename\ncontains\ a\ newline
Same for any other characters that could be considered 'special' in output; I really wish the backslash convention for escaping was more common. Character sets and such are a UI/display issue, so I don't think there should be any special handling for them at the lower levels of the system.For example, I run shells in Emacs and have had to tweak loads of shell scripts written by colleagues to fix their poorly-implemented colourisation. It's useful to know when a test has failed; it's not so useful to have the whole terminal set to white text on a pale pink background.
One day I couldn't SSH into our servers from Emacs. It turned out somebody had edited .bashrc for the admin user to make the bash prompt blue. Emacs' TRAMP process was looking for a prompt ending in "$" or "#", not "$\[\033[0m\]", so it didn't realise the connections were successful.
There are two ways of handling this: we can blame the source of the bug (the person adding the colours incorrectly, or the assumption-loaded TRAMP regex), but there will always be more bugs in situations we'd never think of. Alternatively, we can avoid being 'too clever', and instead aim for consistency and least surprise.
(Actually, if you are suggesting that, I'm not going to disagree. But I am going to say that if so, those rules don't apply in the case of colored prompts, because colored prompts are useful.)
Anything we add on top of that, eg. ANSI colour codes, will be useful to some but harmful to others. The tricky part is working out which of those categories the current user is in.
I'm sure it's possible, but you'd have to acquire a new pty and decide what termios settings you want. It's a nontrivial hack, I think.
I'm actually kind of surprised it's not in moreutils[1].
[1]: https://joeyh.name/code/moreutils/
EDIT: Hmm, maybe it's not possible. I can't figure out exactly how to do it, anyway.
EDIT again: Apparently `script` does this on Linux: http://monosnap.com/image/Qlig4CHmQgV9pxvSmndVUgMTU88Adz
EDIT again: Expect's `unbuffer` also does it: http://expect.sourceforge.net/example/unbuffer.man.html#toc
Daniel J. Bernstein wrote a "pty" package back around 1991 that did this. Version 4 of the package was published in 1992 to comp.sources.unix (volume 25 issues 127 to 135). It's still locatable on the World Wide Web.
Bernstein later updated this, around 1999, with a "ptyget" package that was more modular and that had the session management commands moved out of the toolset to elsewhere. The command from that package to do exactly what you describe is "ptybandage". There is also "ptyrun". Paul Jarc still publishes a fixed version of ptyget (that attempts to deal with the operating-system-specific pseudo-terminal device ioctls in the original) at http://code.dogmap.org./ptyget/ .
As a bonus feature for people who use source code, there are similar "ptybandage" and "ptyrun" scripts, for which you will need Laurent Bercot's execline tool (http://skarnet.org./software/execline/), in the source archive for the nosh package at http://homepage.ntlworld.com./jonathan.deboynepollard/Softwa... . These make use of the terminal-management tools in the nosh toolset.
With both of these, you should be able to run "ptybandage uses_colours_if_tty | less -R"
So this feature must actually be present in the shell and maybe it is. I’m no expert but maybe zsh already offers something like this?
I don't consider that an acceptable solution.