The Birth of Standard Error (2013)(www2.dmst.aueb.gr) |
The Birth of Standard Error (2013)(www2.dmst.aueb.gr) |
All the same, I'd be willing to believe that Unix's standard error could have been an "independent rediscovery" of one feature made highly desirable by other features (redirection and pipes). It's not clear how much communication there was among distinct OS researcher groups back then, so even if other systems had an analogue, Bell Labs people might not have been aware of it.
p.s.
Well, actually more completely, something like this:
+---------+
[meta-in] --> | | --> meta-out
| p r o c |
input ==> | | ==> output
+---------+https://unix.stackexchange.com/questions/197809/propose-addi...
The idea is that some output is metadata (such as ps headers) and some is data. With stdmeta we could differentiate between the two.
TBH, it's a great idea, but history proved that we apparently prefer a single stream of data and solving all the problems it brings ...
So, well then: allowing programs to consume and emit JSON - is this progress ?
https://www.lispworks.com/documentation/lw50/CLHS/Body/v_deb...
https://www.lispworks.com/documentation/lw50/CLHS/Body/v_ter...
reminds me of those t-shirts or digital billboards displaying some system error that we've all seen as memes
I often get output from multiple threads or multiple processes garbled together on the same line. I know how to fix this, but I feel my OS should do it for me.
The pipe buffer is big enough that sane programs aren't likely to run into problems. The math:
PIPE_BUF is 512 per POSIX but in practice 4096 on Linux (probably others too?). If we assume a horrible-and-unlikely 12 formatting characters per real character (and assume a real character is non-BMP and thus 4 bytes, but still single-column), Linux has enough for 64 characters. With more reasonable assumptions (mostly ascii, no more than 4 formatting changes per line) we get more like 6 lines of output being atomic on Linux, and even POSIX being likely to get at least one whole line.
I'd argue we haven't really "solved" the optimal way to do error handling in programming: Using union types remains one of the best options, but even that has its downsides. Consider the ergonomics of forwarding an error type multiple layers in a Rust program: you can remove some of the boilerplate by strapping macros on top, but I'd argue that's more of a bandage than a fix. Most other programming languages are either using exceptions, which I don't like as they complicate control flow behavior significantly, or simply ignore error handling entirely (like C and Go; Both of them provide some standard facilities for dealing with error values, but handling it is completely manual. I do like this, since it's very straightforward, but it nonetheless is just sidestepping the problem.) And even trying to keep it simple can create new problems, like of course the way pthreads has to contort errno into a thread-local, for reasons obvious.
And while stderr has created a somewhat unified channel for dumping errors into, once they've bubbled up to the point where the program needs to output it, there's an almost unlimited amount of opinions on exactly how error logging should work. Some software won't use stderr by default, others only uses stderr for specific types of errors. Some software dumps everything that isn't data output into stderr, including e.g. `--help` text, whereas some software uses stdout for anything that isn't explicitly an error (Which often leads to me needing to pipe --help to less twice: once without, and once with 2>&1.) Categorization of error logging is also somewhat contentious: should there be a "warning" severity? should you split errors into modules? Formatting, too: what should be in a log line? Should logs be structured into a machine-readable format such as JSON?
It was probably a bad omen that even very old versions of UNIX ran into problems dealing with error logging and wound up needing to bifurcate things. Few programs feel as 'lazy' as UNIX; if UNIX couldn't ignore the problem, god knows the rest of the software was doomed.
Timestamping or sync points, so that if I pipe multiple streams (say stdout and stderr) I can keep them in sync further along when various buffers may have been involved.
Metadata, such as magic file types.
Structured data (this may link with meta data, and maybe there is even a way programs could negotiate what to send to each other).
When using PowerShell, I find it useful that it handled progress separately, so it doesn't interfere with piping (putting aside that cmdlets are .NET-based objects anyway). Is there something like stdprogress?
tangentially related.
The Great 202 Jailbreak - Computerphile
https://www.youtube.com/watch?v=CVxeuwlvf8w
staring the inimitable Professor Brailsford.
this is just to say that you could probably have run a webserver on a pdp-8/s, which was about the size of an atx case and would be a reasonable controller to build into a phototypesetter at the time
It does have the union type Result<normal, error>, but most people throw/catch Error.
In Swift, error is a simple value (without a stack frame) and thus is as cheap as a return value, but can be handled/caught anywhere in the call chain like an exception.
Error is a protocol that tags any type, so it can carry any details you like, and your catch can switch on the type.
But it's only now (10 years on) that they're declaring error types in the function signature. In this world, it turns out that not throwing is the same as throws(Never). It took this long because it's unclear (but possible) that per-type error handling helps, mainly with libraries.
Serializing/tracking the originating (thread) context and avoiding merge conflicts in error streams seems like the unsolved problem. Both Java and Swift have structured concurrency with parent/child relations with derived cancellation/termination. Perhaps later that can include errors.
p.s. That is pL(n).stderr -> pE.stdin, where pL is the 'business logic' and pE is the system's error processing aspects. I.e. the error processing component's stdin is the stderr of the logical processes (Lp), so there is a uniform process model applicable to both logical and error processing elements of the pipeline.
The issue is how to do this within the limits of line terminal interface (CLI). In code (as in in-process chaining) that aspect is a non-issue.
A plain byte stream can be easily aligned to work with any future or past encoding fashion. Consider the situation if them that designed unix had not been so aggressively minimal. We would probably be complaining how streams had to be ASN1 encoded and how much a pain it is to define the schema for what should be a simple ad-hoc data transfer.
As it stands, you can put whatever object format you want on top of the stream. I think it is the same with the files. I am sort of pleased we are not stuck with some obsolete no longer relevant, screwball structured format from the 70's that all our file have to conform to. instead our file are a simple range of bytes and we can impose whatever structure on them that we want.
TCP/IP streams are bidirectional, but there is a limited way of sending "out of band" data, though it is not used as much. It would have been nice if the stdout/stderr multiple streams extended to TCP/IP networking and even HTTP messages too.
It's not real "out of band" data: that's something wholly invented by the Unix socket API. TCP itself just has an "urgent pointer", which addresses some byte further in the data stream that the receiver doesn't have yet, with the intent that higher-level protocols could use it as a signal to flush any data up to that pointer to observe whatever the urgent message is. There's nothing in the protocol itself to actually send a message separately from the rest of the stream.
given this program
#include <string.h>
#include <stdio.h>
char large[16385];
int main()
{
printf("BUFSIZ is %d\n", BUFSIZ);
memset(large, 'A', sizeof(large));
large[sizeof(large) - 1] = '\0';
fprintf(stderr, "%s\n", large);
return 0;
}
compiled with `gcc -static` against glibc 2.36-9+deb12u7, we get this strace execve("./a.out", ["./a.out"], 0x7fffafcb4a30 /* 49 vars */) = 0
brk(NULL) = 0x1e28000
brk(0x1e28d00) = 0x1e28d00
arch_prctl(ARCH_SET_FS, 0x1e28380) = 0
set_tid_address(0x1e28650) = 1501924
set_robust_list(0x1e28660, 24) = 0
rseq(0x1e28ca0, 0x20, 0, 0x53053053) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=9788*1024, rlim_max=RLIM64_INFINITY}) = 0
readlink("/proc/self/exe", "<censored>", 4096) = 21
getrandom("<censored>", 8, GRND_NONBLOCK) = 8
brk(NULL) = 0x1e28d00
brk(0x1e49d00) = 0x1e49d00
brk(0x1e4a000) = 0x1e4a000
mprotect(0x4a0000, 16384, PROT_READ) = 0
newfstatat(1, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x7), ...}, AT_EMPTY_PATH) = 0
write(1, "BUFSIZ is 8192\n", 15) = 15
write(2, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 8192) = 8192
write(2, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 8192) = 8192
write(2, "\n", 1) = 1
exit_group(0) = ?
+++ exited with 0 +++
you can see that the single fprintf call resulted in three separate calls to write(2), even though it is only a single line of desperate screaming. those three calls happen at three separate times, typically on the order of tens of microseconds apart. if that file descriptor is open to, for example, a terminal or pipe or logfile that some other process is also writing to, that other process can write other data during those tens of microseconds, resulting in the intercalation of that other data in the middle of the screamingthreads are completely irrelevant here, except that i guess in an exotic scenario the 'other process' that is writing to the file could conceivably be a different thread in the same process? that would make your remarks about 'distinct file descriptors' and thread safety make sense. but we were talking about entirely separate processes writing to the file, since that's the usual case on unix, and in that case no form of thread-safety is worth squat; what matters is the semantics of the system calls
i don't think posix makes any guarantees about how many calls to write(2)† a call to fprintf(3) will result in, though i haven't actually looked, and i don't think wg14 concerns itself with environment-dependent questions like this at all
______
† or writev(2)
(amelius, however, did mention the possibility of multiple threads!)
it also wasn't what they were saying
this thread is starting to remind me of the 'i'm not your buddy, pal' cascades from reddit