Structured logging with slog(go.dev) |
Structured logging with slog(go.dev) |
slog.Info("hello, world", "user", os.Getenv("USER"))
It's a little magical that "user" is a key. So what if you have multiple key-value pairs? Arguably it most likely going to be obvious which is the keys, but having every other value be a key and the rest values seems a little clumsy.I really like Pythons approach where you can have user="value" it makes things a bit more clear.
logger.Info("failed to fetch URL",
// Structured context as strongly typed Field values.
zap.String("url", url),
zap.Int("attempt", 3),
zap.Duration("backoff", time.Second),
)
They also have zap.Error(err) which generates the "error" key as a convention.For example: https://pkg.go.dev/golang.org/x/exp/slog#example-Group
sugar.Infow("failed to fetch URL",
"url", "http://example.com",
"attempt", 3,
"backoff", time.Second,
)
Really slog presents mostly the same two APIs as Zap, with a single entrypoint.You... add them afterwards? It's really just a plist (https://www.gnu.org/software/emacs/manual/html_node/elisp/Pr...), that's hardly novel. The method takes any number of parameters and pairs them up as key, value.
Or if you really hate yourself, you use LogAttrs with explicitly constructed Attr objects.
> I really like Pythons approach where you can have user="value" it makes things a bit more clear.
The trouble is, Go doesn't have keyword parameters so that doesn't work.
I don’t mind it. You can use LogAttrs if you want to be explicit.
Although I do wonder if there’s anything tricky with the type system that is preventing something like this from being supported: https://go.dev/play/p/_YV7sYdnZ5V
The reason it doesn’t use maps is that maps are significantly slower, TFA has an entire section on performances.
However if you prefer that interface and don’t mind the performance hit, nothing precludes writing your own Logger (it’s just a façade over the Handler interface) and taking maps.
slog.Info("hello, world", slog.String("user", os.Getenv("USER")))
[1]: https://pkg.go.dev/log/slog#hdr-Attrs_and_Values use slog::info;
...
info!("hello, world"; "user" => std::env::var("USER"));For better or for worse, Go doesn’t have that.
It would definitely not be better from the point of view of
> We wanted slog to be fast.
json_build_object ( VARIADIC "any" ) → json
jsonb_build_object ( VARIADIC "any" ) → jsonb
Builds a JSON object out of a variadic argument list.
By convention, the argument list consists of alternating keys and values.
Key arguments are coerced to text; value arguments are converted as per to_json or to_jsonb.
json_build_object('foo', 1, 2, row(3,'bar')) → {"foo" : 1, "2" : {"f1":3,"f2":"bar"}} json_build_object(
'foo', 1,
2, row(3, 'bar')
)I’ve been working on a side project to bring something like the DataDog log explorer to the local development environment. The prototype I made has already been extremely helpful in debugging issues in a very complex async ball of Rails code. Does something like that sound useful to other folks? Does it already exist and I just can’t find it?
Logs are data.
> With many structured logging packages to choose from, large programs will often end up including more than one through their dependencies. The main program might have to configure each of these logging packages so that the log output is consistent: it all goes to the same place, in the same format. By including structured logging in the standard library, we can provide a common framework that all the other structured logging packages can share.
This is IMO the right way of doing it. Provide an interface with simple defaults, usable out of the box. Those who need more can use a library that builds towards the interface.
So when evaluating any library, you can ask "How well does this integrate with interfaces in the standard library?". Discovering that some functionality is just a "Fooer" that pieces well together with existing stuff is calming. Not only do you already know how to "Foo", you also get a hidden stability benefit: There's an implied API surface contract here.
This is in stark contrast to the "builds on top of" approach, where you end up with competing, idiosyncratic interfaces. This is often necessary, but there is always an implied risk in terms of maintenance and compatibility.
It’s a lot of effort when all you want is to log everything to STDOUT, in JSON, but you have to choose one of half a dozen logging libraries that all behave extremely differently.
Good job Go though for being opinionated but rational.
[1] https://www.elastic.co/guide/en/ecs-logging/overview/current... [2] https://opentelemetry.io/docs/specs/otel/logs/data-model/
Handle(context.Context, Record) error
This method has access to the context, which means you can get the logging handler to extract values from the context. Instead of storing the logger on the context, you can extract the traceId, etc values from the context and log those.It's a little bit involved to write a whole logger from scratch, but you can 'wrap' the existing logger handlers and include values from the context relatively easily.
There are examples in this project (which aims to help solve your usecase): https://github.com/zknill/slogmw
https://github.com/indexsupply/x/blob/main/wslog/slog_test.g...
> As the call to LogAttrs shows, you can pass a context.Context to some log functions so a handler can extract context information like trace IDs.
I’m not a fan of slog’s syntax, but the convenience of having it in the stdlib trumps that, for me.
It’s really designed to be a minimum necessary package to allow interop (via handlers) and a baseline of standalone usability (via loggers). The stdlib only provides a text and a json handler, not even a no-op handler which I think is sorely neededor a multi handler which I think would make a lot of sense.
But nothing precludes you publishing a messagetemplates handler, or whatever else you may want.
It's easy to get started with log/slog and one of the built in handlers, but as soon as you want to change something the library design pushes you towards implementing an entire handler.
For example, if I want the built in JSON format, but with a different formatting of the Time field, that's not easy to do. It's not obvious how to change the built in handler.
I wrote slogmw[1] to solve this problem. It's a set of middleware and examples that make it easy to make small changes to the built in handlers without having to write a whole new handler from scratch.
Yes, it annoyed me to no end. But OTOH I think it may be wise of them to see what the ecosystem finds and provide more convenience later. After all, it's std we're talking about, and this takes time to get right.
I'm personally missing:
- A goddamn default no-op/nil logger, that can be declared before initialized in e.g. structs.
- Customization to TextHandler for changing format easily, and omit keys (AIUI ReplaceAttr cannot omit stuff like `time="..."` on each line), which is critical since CLI screen real estate is incredibly sparse.
- (Controversial) but I would like opinionated logger initialization guidance for package authors, so that you get consistency across the ecosystem. Doesn't have to be exactly one way, but say.. two ways? E.g. a package-global and a struct-initialized version? Right now, people are even confusingly wondering if they should accept slog.Handler or *slog.Logger.
I have to admit, the `log.InfoContext(ctx,...` style of redundancy that permeates the standard lib at this point is really gross, especially given that the most common use case for go is going to have contexts everywhere.
So,
slog.Info("failed to frob", "thing", GetThing(1))
Still calls GetThing(1) when the log level is greater than Info. The only solution right now for this is to test the log level before making the logging call. It would be amazing if language designers could make the arguments late bound instead or used aspect-oriented programming approaches to protect each logging call site. LOG(INFO) << WowExpensiveFunction();
This is safe when the log level is set to WARN or higher. For the same reasons, LOG_EVERY_N and LOG_FIRST_N in the same library are pretty cheap.It's not the level of type-safety that I expect from a strongly typed language.
What it comes down to is that zap special cases things like slice-of-int, slice-of-string, slice-of-timestamp, slog doesn't, and the benchmark includes all those special cases. I question whether your typical log statement includes slices. A more fair benchmark would be just scalar types, and zap & slog optimizations there look pretty similar.
https://github.com/uber-go/zap/blob/fd37f1f613a87773fc30f719...
https://github.com/uber-go/zap/blob/fd37f1f613a87773fc30f719...
I've got a few packages that accept a basic logger interface, eg:
type debugLogger interface {
Debugf(format string, args ...any)
}
type MyThing struct {
logger debugLogger
}
func New(logger debugLogger) *MyThing {
return &MyThing{logger}
}
I'd love to switch to slog but I'll have to v2 these packages now.It's like when you're trying to internationalize, you want to emit as constant of a string as reasonably practical, so that it can be straightforwardly matched and substituted into a different language. Except in this case that different language is regexes being used to change the thing into a SQL statement to fix the mess (or whatever).
So much easier to say "stop trying to Sprintf your logs, just add the values as key-value pairs at the end of the function call."
log.Printf("failed to frob %s: %s", thing, error)
then moving from that to: slog.Error("failed to frob", "thing", thing, "error", error)
isn't terribly difficult, and will make log analysis dramatically easier. log.Printf("failed to frob %s: %s", thing, error)
Wouldn't you just want to use slog.Error("failed to frob", thing, error)
That keeps the _value_ of `thing` as the key and the _value_ of `error` as the value. That would keep more in line with your first example.If you want to use slog, I can imagine setting up the logger with a handler specific for the CLI output in the CLI tool, and a json or text structured handler in the webservice app.
The quesiton is, do you actually want structured logging in the CLI app? Yes you probably want to print something out, but is it _structured_ in the sense that slog expects? Or is it just some output.
If it's not really structured, then you probably want some other interface/library that better represents the logging you want to do. Slog will push you towards structured key-value pairs, and you might find yourself fighting against this in the CLI app.
We need to start making Json viewer were we look into log files without tooling the default.
It's so much easier to use Json automatically and ship them to systems out of the box.
Linux logging should do that too. Not just container in k8s
The prevailing solution is SLF4J, which is a facade that can be implemented by any number of backends, e.g. Logback.
There is logging in the stdlib (java.util.logging), but it's the less common choice, for whatever reason.
And then I heard "... with a chainsaw. It's a chainsaw mill" and realized I may have misunderstood the context.
Very much so!
One piece I'd like to have is a good output format from my program. Right now I have stuff outputting logs in JSON format to stderr/local file (then picked up by the Opentelemetry OTLP collector, Datadog agent, AWS CloudWatch, whatever) and the traces/spans actually sent from my process to the collector over the network. It baffles me why the traces/spans are done in that way rather than placed in a local file (and for compressibility, ideally the same file as the logs). The local file method would make it easier to have simple local-focused tools, would lower the requirements for outputting these records (no need to have async+http client set up, just the writer thread), and would better handle the collector being in a bad mood at the moment.
That's the most important piece, but if getting that done requires inventing a new format (it seems to?!?), there are some other details I'd like to it to do well. You should be able to see all the stuff associated with each log entry's matching span, which means that spans and attributes should be recorded on open, not just on close. (Attributes can also be updated mid-span; the Rust tracing library allows this.) (OpenTelemetry notably only supports complete spans). It could be more efficient by interning the keys. some care about how rotation is handled, timestamps are handled, etc.
The prototype I built is a web application that creates websocket connections, and if those connections receive messages that are JSON, log lines are added. Columns are built dynamically as log messages arrive, and then you can pick which columns to render in the table. If you're curious here's the code, including a screenshot: https://github.com/corytheboyd-smartsheet/json-log-explorer
With websockets, it's very easy to use websocketd (http://websocketd.com), which will watch input files for new lines, and write them verbatim as websocket messages to listeners (the web app).
To make the idea real, would want to figure out how to not require the user to run websocketd out of band, but watching good ol' files is dead simple, and very easy to add to most code (add a new log sink, use existing log file, etc.)
If you are concerned with cost of the log processor, Loki again has your back by being very easy and lightweight to deploy. Giving you same tools in dev as in prod.
Step one would bee sum types, so only valid value space can be represented (return value or error, but not both or neither).
Just because Go made opinionated design decisions around their error handling a decade ago when developing the language doesn't mean that there's not practical room for improvement as the language is widely in production and shortcomings in its error handling have been found.
The number of hacks I've seen over the years to try and solve the "wait, where did this error originate" problem in Go are legion, with little standardization.
And no, using Errorf with '%w' to wrap error messages along the stack isn't exactly an elegant solution.
Even if they want to keep the core error behavior as it is for compatibility, providing a core library way of wrapping with stacktraces would be a very useful next step, particularly given the most popular package doing that previously is now unmaintained.
I don’t really see the enum thing happened. Is lack of enums a real problem? Theoretically, it would be convenient, but I can’t say that I see bugs caused by its lack.
The addition of enums would move all these runtime checks to compile time.
Let's be clear: this is just a peeve of mine.
my %hash = (
Foo => “bar”,
Baz => “qux”,
);
the behaviour is no different than my %hash = (
Foo, “bar”,
Baz, “qux”,
);
it’s just that a literal plist in a hash context is interpreted as a hash.Hence “=>” being called “fat comma” in perl.
Which means you can writer
my %h = (a => b => c => d);
or my %h = (a, b => c, d);
they all mean the same thing.The library ecosystem is already excellent, and the tooling is good, lack of sum types is the single wart that makes me regret it every time I pick Go up for a project.
As far as I can tell, the lnav feature you describing allows you to define a JSON log format with predefined fields. But if some log message uses an attribute you haven't anticipated in the format definition, there's no way to see it in the pretty-printed output or filter on it in the UI and no ability to see it in the SQLite virtual table. That's why I say lnav doesn't appear to support structured logs.
edit: oh, I missed `"hide-extra": false`! That significantly improves things! Still, I don't see a way to access it from the SQLite virtual table. I also don't see something else I might want: a way to have a different "view" for certain log messages; maybe to switch between filtering/viewing particular ones, maybe to just have line-format be conditional based on the detected format. (I guess I can sort of do this based on `module-field`? but I might want it lighter-weight/finer-grained than that.)
I'm iterating on a log pretty printer that accepts structured logs in a pipe and does things like color coding, adding terminal-recognized vscode:// hyperlinks for call stacks, smart wrapping based on the terminal width, and special formatting for panics and stuff.
NCurses is probably coming in a couple months.
Does anything like this already exist?
If "thing" were a variable with a small cardinality (like a class name or an enumeration), that might change matters. But I'd still be reluctant to do that; having the two values available in separate fields, rather than as a single key/value pair, is a lot more flexible.
Most log metadata will be attached by libraries and middleware, so service/application devs won't even see most of it.
What eventually became the standard library error wrapping proposal evolved from the work done on the Upspin project. It did include stacktraces, and believed like you that it would be useful to have them. But analysis of the data showed that nobody ever really used them in practice and, for that reason, was removed from the final proposal.
> particularly given the most popular package doing that previously is now unmaintained.
Lacking wide appeal doesn't mean there isn't a niche need, of course. However, with the standard library accepting a standard for error wrapping, which this package you speak of has been updated to be compatible with, what further maintenance would be needed, exactly? It would be more concerning if it wasn't considered finished by now. It seems the solution for niche needs is right there.
I see this all the time:
main.go:141 error: could not transmogrify the thing: a144cd21c48
And then I literally grep the code base to find the error message. That works ~50% of the time, but the other 50%, I see this: main.go:141 error: not found
And then I have to spend 5-10 minutes spelunking to try to find where that error might have originated from.But this would be amazing:
main.go:141 error: not found callstack=...Have you seen https://github.com/tomarrell/wrapcheck? It's a linter than does a fairly good job of warning when an error originates from an external package but hasn't been wrapped in your codebase to make it unique or stacktraced. It comes with https://github.com/golangci/golangci-lint and can even be made part of your in-editor LSP diagnostics.
But still, it's not perfect. And so I remain convinced that I'm misunderstanding something fundamental about the language because not being able to consistently find the source of an error is such an egregious failing for a programming language.
func bar() error {
err := baz.Transmogrify()
return fmt.Errorf("transmogrify: %w", err)
}
func foo() error {
err := bar()
return fmt.Errorf("bar: %w", err)
}
func main() {
err := foo()
fmt.Printf("foo: %v", err)
// foo: bar: transmogrify: not found
}
There's your callstack, without the cost of carrying around the actual callstack.I don't think anyone has ever claimed otherwise. But I do think its a pretty good solution. Whats elegance worth, anyways?
Ignore the word error for a moment. Think about how you program in the general case, for a hypothetical type T. What is it that you do to to your T values to ensure that you don't have the same problem?
Now do that same thing when T is of the type error. There is nothing special about errors.
Properties in the log message that are not in the "line-format" are displayed below the message as key-value pairs. As an example, the bunyan[1] log format[2] has a few standard properties that are used in the "line-format" to form the main message. Then, as shown in this test[3] for this log[4], the remaining properties (even ones not mentioned in the format file) are shown underneath.
> or filter on it in the UI
Since they are part of the message as mentioned above, they can be filtered on.
> and no ability to see it in the SQLite virtual table.
The "log_raw_text" column in the table can be used to access the original log message from the file. So, you can use the JSON functions in SQLite to retrieve the value:
;SELECT log_raw_text ->> '$.repository' from bunyan
[1] - https://github.com/trentm/node-bunyan[2] - https://github.com/tstack/lnav/blob/master/src/formats/bunya...
[3] - https://github.com/tstack/lnav/blob/master/test/expected/tes...
[4] - https://github.com/tstack/lnav/blob/master/test/logfile_buny...
Have a look at the following comment on an issue that might be similar to what you're thinking of:
https://github.com/tstack/lnav/issues/1065#issuecomment-1602...
> I guess I can sort of do this based on `module-field`? but I might want it lighter-weight/finer-grained than that.
Unfortunately, the "module-field" does not work for JSON logs at the moment. It's something I should really fix.
Ultimately, lnav has existed for almost two decades now and I use it every day. So, it's always seeing improvements. If you're having a problem with it, file an issue on github. I don't always get around quickly to fixing other folks feature requests / issues, but it tends to happen eventually.
Thanks.
Just my opinions here. I don’t question the value of slog as it sits, just could have been better for the community at large is all.
func bar() error {
err := baz.Transmogrify()
return fmt.Errorf("bar: %w", err)
}
func foo() error {
err := bar()
return fmt.Errorf("foo: %w", err)
}
func main() {
err := foo()
fmt.Printf(err)
// foo: bar: transmogrify: not found
}
Also, I tend to skip quite a lot of layers. The (only?) advantage of manual wrapping over stack traces is that a human can leave just 3 wrappings which are deemed sufficient for another human, while stack trace would contain 100 lines of crap. func bar() error {
err := baz.Transmogrify()
return fmt.Errorf("bar: %w", err)
}
This is broken. If baz.Transmogrify() returns a nil error, bar will return a non-nil error.Also, annotations like this, which repeat the name of the function, are backwards. The caller knows the function they called, they can include that information if they choose. Annotations should only include information which callers don't have access to, in this case that would be "transmogrify".
The correct version of this code would be something like the following.
func main() {
fmt.Printf("err=%v\n", foo())
}
func foo() error {
if err := bar(); err != nil {
return fmt.Errorf("bar: %w", err)
}
return nil
}
func bar() error {
if err := baz.Transmogrify(); err != nil {
return fmt.Errorf("transmogrify: %w", err)
}
return nil
}And I disagree that the cost of carrying around the callstack is something to worry about. Errors are akin to exceptions in C++/Java: no happy path should rely on errors for control flow (except io.EOF, but that won't generate a call stack). They should be rare enough that any cost below about 1ms and 10k is negligible.
> Errors are akin to exceptions in C++/Java: no happy path should rely on errors for control flow (except io.EOF, but that won't generate a call stack). They should be rare enough that any cost below about 1ms and 10k is negligible.
This may be true in C++ or Java, but in Go, it is absolutely not the case.
Errors are essential to, and actually the primary driver of, control flow!
Any method or function which is not guaranteed to succeed by the language specification should, generally, return an error. Code which calls such a method or function must always receive and evaluate the returned error.
Happy paths always involve the evaluation and processing of errors received from called methods/functions! Errors are normal, not exceptional.
(Understanding errors as normal rather than exceptional is one of the major things that distinguish junior vs. senior engineers.)
> Errors are normal, not exceptional.
The _handling_ of errors is normal. Code that doesn't consider errors is not production code.
And granted, in Go, control flow is driven by errors more often than in C++ or Java. Sentinel error values are common. See for example all usage of error.Is, checking for io.EOF, packages that define ErrSituationA and ErrSituationB, etc.
But my argument was about errors that can't be dealt with locally, where the origination and ultimate handling are very far apart. A given flow will encounter these errors relatively rarely compared to the happy path (and if it's not rare, you probably need to fix or change something). Having an intuition about this is important for predicting your code's performance. For example:
- The SQL call failed because the network connection dropped; client gets 500 or 502, or retry.
- A call to an external service failed because the network was bad; it gets retried.
- The SQL call succeeded, but the record the client asked for wasn't found, so the client gets a 404.
- Writing to a temporary file failed because the disk is full, so some batch job fails with an error.
Apart from potential concerns about DoS, worrying too much about the performance of error handling in these relatively rare cases is absolutely premature optimization.
DoS isn't even a concern. I just benchmarked capturing a call stack in Go, and it's on the order of a few microseconds. Unless you're in performance critical code (and you're benchmarking, right?), it's fine.
Most Go programmers are too scared to panic and abort when invariants are violated. I think most codebases contain at least 2x as much error handling as is really necessary.
It would be a bit odd to not add context, wouldn't it? Same goes for any value. This is not exclusive to errors. If you consider a function which returns T, the T value could equally be hard to trace back if you find you need to determine its call site and someone blindly returned it up the stack. There is nothing special about errors.
While ideally you are returning more context than Errorf allows, indeed, it is a good last resort. If your codebase is littered with blind returns, the good news is that it shouldn't be too hard to create a static analyzer which finds blind returns of the error type and injects the Errorf pattern.
Do you think most errors look more like ParseInt, or more like sql.Open where 1ms might be acceptable? (Do you think a call stack from the insides of sql.Open would be useful? My experience, mostly not...)
So the stacks should probably only be for "complex errors", and only for frames that happen in code you (hand waving) "care about". Maybe your programs just have far too complex internal error handling?
1. errors that can be handled locally (such as parsing; in other languages, these situations are often signaled with return values instead of exceptions)
2. errors that can't be handled locally (such as network errors; other languages use exceptions for these)
My argument was that worrying too much about error handling performance in #2 is premature optimization. 1ms is extreme, but the actual figure of capturing a call stack in Go -- several microseconds, by my benchmark -- puts it squarely in the "don't worry about it unless your code is performance-critical" category.
See my sibling comment: https://news.ycombinator.com/item?id=37234455
When code encounters an error, it can either deal with that error programmatically, or return that error to its caller. I don't think you can make any generalized assertions about whether one or the other of these cases is more common, and I'm confident that you can't assert that one or the other of these cases is better or worse than the other, or that one of them represents a problem worth fixing.
Errors potentially occur at every fallible expression. Where an error is handled is orthogonal, and generally unknowable, to the given bit of code that receives that error.
I agree with you that "the performance of error handling" should never be a first-order concern when writing code.
I don't agree with you that capturing a call stack is fast enough to ignore. Calling runtime.Callers (https://pkg.go.dev/runtime#Callers) takes time proportional to the size of the pc []uintptr slice, and can easily get to O(ms) or beyond. It's fine if a given bit of code opts in to this cost, but it's not something that you should do by default; the threshold for performance critical code is O(ns), not O(us).
It's not something to worry about, but it's also a premature optimization to include when there is no need. The Go team considered adding stack traces as described before 1.13, postulating that it would be useful, but measurement determined that they were rarely used in the real world.
If your measurements (you are measuring, right?) that pertain to your specific situation tells a different story, they aren't something to be afraid of, but would be silly to make the default for everyone. The standard tools don't need to serve every single use case ever imagined.
The reality is, unless you forget how to program every time you see the word error (which seems to be a thing), in the real world you are never going to just `return err` up, up, up the stack anyway. Even ignoring traceability concerns, that is going to introduce horrible coupling. You wouldn't do that for any arbitrary type T, so why would you for type error? There is nothing special about errors.
Panic isn't an ersatz error reporting mechanism, it's a tool of absolute last resort. Any function or method that can fail should return an error, and should signal failure via that error. Callers that invoke any fallible function or method should always receive, inspect, and respond to the returned error.
And, in any case, arbitrary code doesn't have the right to abort the program in the first place! Only func main is allowed to terminate the process. Errors in any other context should always be reported to the caller via normal control flow, i.e. return.
A service should never crash in response to a runtime error, or any kind of business logic conditional! Crashes are not normal, they signal critical problems and/or programmer errors. A service that crashes in production is a priority-0 bug that needs to be addressed and fixed immediately.
More specifically, if you can't trust that calling a function will always return execution to the call site, then there is no way to build a practical and deterministic model of control flow, and it becomes impossible to manage resource lifecycles.
Consider a network protocol that expects an explicit disconnect/teardown procedure. If your process creates a connection with this sort of protocol, you need to at least try to tear the connection down before terminating the process outright. You can't just yolo kill the client and expect the server to deal with it. And this is just one example among infinitely many possible examples: basically every API that understands a concept of a resource has similar expectations: filesystems, consensus protocols, etc. etc.
Crashing is an error handling strategy only if you know a priori that your code will be running in an execution environment that treats crashes as normal. Such environments are exceptions, not the rule.