Zq: An easier and faster alternative to jq(brimdata.io) |
Zq: An easier and faster alternative to jq(brimdata.io) |
- https://github.com/thisredone/rb is a widely used ruby version of this idea
- https://github.com/KelWill/nq#readme is something similar that I wrote for my own use
If yes, it's fascinating to me, that jq is so powerful, it's even useful when handling JavaScript Object Notation in JavaScript.
[1] https://github.com/antonmedv/fx
[2] https://twitter.com/antonmedv/status/1515429017582809090
echo '1 2 3' | jq ....
as creating three separate json documents, each with a single number as their top level "document" , body, or content.
So of course you can't sum them. They are fed as separate documents to the jq pipeline as if you processed three separate jq commands.
Perhaps by stateless you mean no mutuable global state? But it certainly maintains state from the location in the input document to the output of each selector/functor.
IMO it helps if you have a background in some of the concepts of functional programming.
good_data = fetch("openlibrary.json").docs.filter({"author_name": Arr, "publish_year":Arr})
good_data.map({{"title": A.title, "author_name": A.author_name[0], "publish_year": A.publish_year[0]}}).group("author_name").mapv(len).sortv((>=)).limit(3)
[1] https://github.com/dflemstr/rq [2] https://news.ycombinator.com/item?id=13090604
Also prob not the first to create a project for personal use that just wraps evals in another language haha: https://www.npmjs.com/package/jsling
That plus good old fashioned sed/grep/awk give me everything I need to do on the cli.
If I want more, it's python or node.
> for d in $(echo $PATH | tr ":" "\n") ; do ls $d | grep "^..$"; done | sort -u | wc -l
> 52
I can fit a few more in
(edit: I can't work out how to put code in a comment)
Zq looks cool, but the fact that this piece doesn't contain a single instance of the word "map" tells me the authors still haven't gotten jq. Especially with the running strawman example of adding numbers.
I feel the author makes his case clearly, then presents an alternative. Underneath all this is a ton of work, for which I applaud OP.
It may not scratch your particular itch, but come on!
Being an ass on HN is a choice. It happens far too often, and I wish everyone would just dial it back.
This place has a high standard for new tools and libraries, particularly one that claims to be better in any stretch ("faster" and "easier"). If this was say, a college student learning programming and presenting it as "hey I made a jq alternative and I believe it's easier and faster" I imagine it would solicit more softened feedback.
Come prepared, and ready to defend your stance. If you can't take the heat, don't come in the kitchen.
I see criticism for the way they're trying to position it as easier than jq when it's just different than jq.
It looks like a cool project on its own and doesn't need to describe jq as confusing to make that point.
Easier, as a universal claim, is hard to establish - you'd need to do user studies. Easier in the author's opinion is normal usage, and their opinion is as good as anyone else's. They gave a reasonable justification.
I kind of think you'd need to use both tools to have an informed opinion about which you think is easier. But most of us aren't going to do that, which is fine.
I think having strong opinions about which is easier without trying them both is weird, though.
The question for is this; can I do with json files what i can do with Python using Zq?
Seems to me that if you're in a shell, then you should be "shell-like." There should not be much of a learning curve at all, and when in doubt, try to behave like other shell tools, in a Unix way way. Make pipe behavior generally predictable, especially for those who aren't deep into json et al.
And if you're not going to do that, say so on "the box?"
(Disclaimer, it could be that I'm an idiot when it comes to all of this and I'm missing something big. Kind of feels that way, and I welcome correction)
I almost gave up before I got to the first mention of zq, and then wished I had.
https://docs.microsoft.com/en-us/archive/msdn-magazine/2003/...
Anyway, I've installed ZQ and will look to use it, even my simple usage of JQ had already led to thoughts of writing my own, better version :)
Quick bug report: On the Aggregate Functions page the link to _countdistinct_ goes to the page for _count_, and there actually isn't a page at https://zed.brimdata.io/docs/language/aggregates/countdistin....
I think jq has a pretty elegant data model, but the syntax is often very clunky to work with.
So here is a half thought-out idea how you might improve the syntax for the "stateful operations" usecase the OP outlined:
I think it's not quite true that different elements of a sequence can never interact. The OP mentioned reduce/foreach, but it's also what any function that takes argument does:
If you have an expression 'foo | bar', then bar is called once for every element foo emits. However, foo could also a function that takes arguments. Then you can specify bar as an argument of foo like this: 'foo(bar)'. In this situation, execution of bar is completely controlled by foo. In particular, foo gets to see all elements that foo emits, not just one each. I believe this is how e.g. [x] can collect all elements of x into an array.
In the same way, you could write a function 'add_all(x)' which calls x and adds up all emitted elements to a sum.
However, this wouldn't help you with collecting all input lines, as there is nothing for you function to "wrap around". Or at least, there used to be nothing, but I think in one of the recent build, an "inputs" function was added, which emits all remaining inputs. So now, you can write e.g. '[., inputs]' to reimplement slurp. In the same way, you could sum up all input lines by writing 'add_all(., inputs)'.
However, this is still ugly and unintuitive to write, so I think introducting some syntactic sugar for this would be useful. E.g., you could imagine a "collect operator", e.g. '>>' which treats everything left of it as the first argument to the function to the right of it.
e.g., writing 'a >> b' would desugar to 'b(a)'.
Writing 'a | b >> c' would desugar to 'c(a | b)'.
Any steps further to the right are not affected:
'a | b >> c | d' would desugar to 'c(a | b) | d'.
Scope to the left could be controlled with parantheses:
'a | (b >> c)' would desugar to 'a | c(b)'.
To make this more useful for aggregating on input lines, you could add a special rule that, if the operator is used with no parantheses, it will implicitly prepend '(., inputs)' as the first step.
So if the entire top-level expression is 'a | b >> c', it would desugar to 'c((., inputs) | a | b)'.
This would make many usecases that require keeping state much more straight-forward. E.g. collecting all the "baz" fields into an array could be written as '.baz >> []' which would desugar to '[(., inputs) | .baz]'
Summing up all the bazzes could be written as '.baz >> add_all' which would desugar to 'add_all((., inputs) | .baz)'
...and so on.
On the other hand, this could also lead to new confusion, as you could also write stuff like '... | (.baz >> map) | ...' which would really mean 'map(.baz)' or 'foo >> bar >> baz' which would desugar to the extremely cryptic expression 'baz((., inputs) | bar((., inputs) | foo))'. So I'm not quite sure.
Any thoughts about the idea?
The purpose of life is not to know JQ. I just want to process the JSON so I can move on and do whatever is actually important. Ideally, I'd just be able to tell GPT-codex to do what I want to do to the JSON in English.
We're not there yet, but in the meantime if there's another tool that allows me to know less in exchange for doing more, I'll gladly use it.
When you have time to sharpen the saw come back and dig into the details of how jq and tools like it work and where their limits are. Looking at the jq builtins[1] can be very enlightening
If you get to the point where your goal is to increase your jq skills I'd recommend looking at the jq questions on Stack Overflow and posting your own solution. Contributing a solution to https://rosettacode.org/wiki/Category:Jq is also good.
1- https://github.com/stedolan/jq/blob/master/src/builtin.jq
... | deno xeval '...stdin processing code using special var $'
which was close to xargs in terms of conciseness. Unfortunately, it was removed as being considered "too niche" [1].That is very rare event with established tooling.
Most of the time complexity is just shifted around.
English descriptions will never be completely unambiguous and unique keys into a JSON data structure. There is a very good reason programming languages (and other forms of languages) exist.
jq had a tough learning curve so you should switch to zq which is a (closed source?) wrapper around an obscure language you’ve never heard of that we promise is easier because reasons. Also coincidentally it’s the language of an ecosystem we were funded to build.
Edit: mea culpa, turns out you can download the source (revealed half way through the article).
Yes, it’s an obscure query language. But if you were interested in jq, that clearly wasn’t a barrier to entry.
I agree the author is happy to show off their tool, but disagree that that is somehow disqualifying. They made a cool thing, they’re allowed to be proud about it.
The pipe operator that's in its final stages of approval for JavaScript uses '|>' as its sigil, which is a decent compromise between not conflicting with existing operators, being compatible with developers' existing pattern matching, and representing what it does somewhat. And 'a | b |> c' is ok.
You could just have '@c a | b' mean "everything from @c onwards to the end of the string is the argument to c" i.e. 'c(a | b)' and have 'c a | b' be 'c(a) | b', then anything more complicated just requires using the parentheses operator to enclose an expression i.e. 'c (a | b)' or just 'c(a | b)' if your tokenizer is a bit cleverer :) Actually I like this idea, because '@' is syntactic sugar for () around the rest of the query, and a function then operates on the value of the expression following it.
As they wrote it, it would be surprising if it weren't.
I've learned a lot from your comments and pointers.
The Zed project is broader than "a jq alternative" and my bad for trying out this initial positioning. I do know there are a lot of people out there who find jq really confusing, but it's clear if you become an expert, my arguments don't hold water.
We've had great feedback from many of our users who are really productive with the blend of search, analytics, and data discovery in the Zed language, and who find manipulating eclectic data in the ZNG format to be really easy.
Anyway, we'll write more about these other aspects of the Zed project in the coming weeks and months, and in the meantime, if you find any of this intriguing and want to kick the tires, feel free to hop on our slack with questions/feedback or file GitHub issues if you have ideas for improvements or find bugs.
Thanks a million!
https://github.com/brimdata/zed https://www.brimdata.io/join-slack/
People with the time and inclination to slow down and think a little more about how the tools work will produce cleaner solutions.
In your example to convert
{"name":"foo","vals":[1,2,3]}
to {"name":"foo","val":1}
{"name":"foo","val":2}
{"name":"foo","val":3}
All you need is this jq filter {name:.name, val:.vals[]}
To me this is much better than the proposed zq or jq solution you're using as a basis
for comparison. You could almost use the shorter .vals = .vals[]
if the name in the output didn't change.These filters takes advantage of how jq's [] operator converts a single result into separate results. For people new to jq this behavior is often confusing unless they've seen things like Cartesian products.
.[] - https://stedolan.github.io/jq/manual/#Array/ObjectValueItera...
I think jq is very elegant - genius even - but whenever I use it, I have to look up the docs for syntax. But I guess that's true for any infrequently used tool.
ZSON looks way better than JSON. I pray that the Zed project becomes more popular.
Coincidentally, after hearing of a friend's woes dealing with massive amounts of CSV coming from a BPF-instrumental kernel, I played around a bit with integrating Zed and BPF. Just an experimental toy (and the repo is already out of date)...
https://github.com/brimdata/zbpf
The nice thing about Zed here is any value can be a group-by key so it's easy, for example, to use kernel stacks (an array of strings) in a grouping aggregate.
(p.s. for the record, the only thing I have to do with the modern linux BPF system is the tiny vestige of origin story it shares with the original work I did in the BSD kernel around 1990)
Positioning as an opensource Splunk would be an interesting play. Going through your docs the union() function looks like it returns a set, akin to splunk values(), is there the equivalent to list()?
Elastic is great in its lane, but it requires more resources and has a monolith weight, that has left a sour taste from our internal testing. Doing a minimal ElasticSearch compatible API would open up your target audience, are there any plans to do you it in a short term horizon (< 1 year)?
As for list() and values() functions, Zed has native arrays and sets so there's no need for a "multi-value" concept as in splunk. If you want to turn a set into an array, a cast will do the trick, e.g.,
echo '1 2 2 3 3' | zq 'u:=union(this) | cast(u,<[int64]>) ' -
[1,2,3]
(Note that <[int64]> is a type value that represents array of int64.)
ShowHN post(FAQ)[2]
disclaimer- I'm founder/CEO of Dassana.
* jq (a great JSON-wrangling tool)
* jc (convert various tools’ output into JSON)
* jo (create JSON objects)
* yq (like jq, but for YAML)
* fq (like jq, but for binary)
* htmlq (like jq, but for HTML)
List shamelessly stolen from Julia Evans[1]. For live links see her page.
Just a few days ago I needed to quickly extract all JWT token expiration dates from a network capture. This is what I came up with:
fq 'grep("Authorization: Bearer.*" ) | print' server.pcap | grep -o 'ey.*$' | sort | uniq | \
jq -R '[split(".") | select(length > 0) | .[0],.[1] | gsub("-";"+") | gsub("_";"/") | @base64d | fromjson]' | \
jq '.[1]' | jq '.exp' | xargs -n1 -I! date '+%Y-%m-%d %H:%M:%S' -d @!
It's not a beauty but I find the fact that you can do it in one line, with proper parsing and no regex trickery, remarkable.[1] https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-l...
[1] "the person who was retweeted" in lieu of a better word.
The touted claim that is fundamentally stateless is not true. jq is also stateful in the sense that it has variables. If you want, you can write regular procedural code this way. Some examples [1]
The real problem of jq is that it is currently lacking a maintainer to assess a number of PRs that have accumulated since 2018.
[0] https://github.com/stedolan/jq/wiki/jq-Language-Description
[1] https://github.com/fadado/JBOL/blob/master/fadado.github.io/...
Incidentally there are many tools that help you do this like dsq [0] (which I develop), q [1], textql [2], etc.
[0] https://github.com/multiprocessio/dsq
Rewriting all or parts of it in C++ would make it dramatically faster. I would start by ripping out the asserts and using a different strtod which they spend an awful lot of time in.
Just that jq does two things: 1) ingest and 2) query.
If you're doing a bunch of exploration on a single dataset in one period of time or if the dataset is large enough and you're selecting subsets of it, you can ingest the data into a database (and optionally toggle indexes).
Then you can query as many times as you want and not worry about ingest again until your data changes.
All three of the tools I listed have variations of this sort of caching of data built in. For dsq and q with caching turned on, repeat queries against files with the same hashsum only do queries against data already in SQLite, no ingestion.
To me they look similarly complicated and the examples stresses certain aggregation operations that are harder to do in jq (due to it being stateless).
I think you got it — that’s exactly the idea. They claim (reasonably?) that it’s a more intuitive DSL; and it supports state. They also make some performance claims towards the end of the article.
essentially a marginal speed increase they think on json, but a much bigger speed increase (5x-100x they claim) if you switch to their native format ZNG.
if I'm switching formats completely, I'm not sure why I care about jq vs zq in json performance ...
A saner approach is to gron the damn json and just use regular unix tools on the data.
I can see where jq might confuse someone new to it, but their replacement is irregular, stateful, still difficult, and I don't even see variable binding or anything.
jq requires you to understand that `hello|world` will run world for each hello, passing the world out values to either the next piped expression, the wrapping value-collecting list, or printing them to stdout.
it's a bit unintuitive if you come in thinking of them as regular pipelines, but it's a constant in the language that once learned always applies.
this zed thing has what appears to be a series of workarounds for its own awkwardness, where they kept tacking on new forms to try to bandaid those that came before.
additionally, since they made attribute selectors barewords where jq would require a preceding reference to a variable or the current value (.), I'm not sure where they'll go for variables should they add them.
expand_vals_into_independent_records='
.name as $name | .vals[] | { name: $name, val: . }
'
echo '{"name":"foo","vals":[1,2,3]} {"name":"bar","vals":[4,5]}' |
jq "$expand_vals_into_independent_records"
Also, generally, not a fan of the tone of this article. jq '{name, val: .vals[]}' Get-Content cars.json | ConvertFrom-Json | ? { $_.color -eq 'red' }
The beauty of this is that the query syntax applies not just to JSON but to every type of collection, so you don't have to learn a specific syntax for JSON and another for another data type. You can use Get-Process on Linux to get running processes and filter them in the same way. The same for files, HTML tags, etc. I think nushell is doing something similar, though I haven't tried it yet.I prefer this approach to another domain-specific language, as interesting as jq's and zq's are.
Sometimes jq -r '.[]' works, but its all just trial and error. I use plenty of jq in my scripts, but I can never seem to visualize how jq looks at the data. I just have to toss variations of '.[whateveriwant].whatever[.want.]' until something works....I suppose the root of my complaint is that jq does not do a good job of teaching you to use jq. It either works, or gives you nothing, and while I've learned to work around that, I'll try anything that claims to be even 1% better than jq.
which follows the jmespath standard
I'm personally trying to move away from jq and towards jp, because
- there's a standard defining it, not just an implementation, decreasing the odds of being stuck with an unmaintained tool
- there are libraries supporting the syntax for most of the major programming languages
- JMESPath's relative simplicity compared to jq is a good thing, IMO - Turing-completeness is a two-edged sword
- JMESPath is the AWS CLI query language, which is a convenient bonus
Or rather the pure Go rewrite https://github.com/itchyny/gojq which is a better faster implementation, with bugs fixed
Please do not recommend HomeBrew for Linux. A binary download is safer compared to how HomeBrew clobbers a Linux machine. If you do not wish to use a Linux package manager, simply point at the binary download. It is much safer and less intrusive.
I wrote a tool to do this -- https://github.com/hotsphink/sfink-tools/blob/master/bin/jso... -- but I do not recommend it to anyone other than as perhaps a source of inspiration. It's slow and buggy, the syntax is cryptic and just matches whatever I came up with when I had a new need, etc. It probably wouldn't exist if I had heard of jq sooner.
But for what it does, it's awesome. I can do things like:
% json somefile.json
> ls
0/
1/
2/
> cd 0
> ls
info/
files/
timings/
version
> cat version
1.2b
> cat timings/*/mean
timings/firstPaint/mean = 51
timings/loadEventEnd/mean = 103
timings/timeToContentfulPaint/mean = 68
timings/timeToDomContentFlushed/mean = 67
timings/timeToFirstInteractive/mean = 658
timings/ttfb/mean = 6
There are commands for searching, modifying data, aggregating, etc., but those would be better done in a more principled, full-featured syntax like jq's.I see ijq, and it looks really nice. But it doesn't have the context and restriction of focus that I'm looking for.
The language itself borrows a lot of concepts from functional languages as higher order functions, closures... you can also develop modules with functions for easy reuse...
A simple join for instance looks like this:
let $stores :=
[
{ "store number" : 1, "state" : "MA" },
{ "store number" : 2, "state" : "MA" },
{ "store number" : 3, "state" : "CA" },
{ "store number" : 4, "state" : "CA" }
]
let $sales := [
{ "product" : "broiler", "store number" : 1, "quantity" : 20 },
{ "product" : "toaster", "store number" : 2, "quantity" : 100 },
{ "product" : "toaster", "store number" : 2, "quantity" : 50 },
{ "product" : "toaster", "store number" : 3, "quantity" : 50 },
{ "product" : "blender", "store number" : 3, "quantity" : 100 },
{ "product" : "blender", "store number" : 3, "quantity" : 150 },
{ "product" : "socks", "store number" : 1, "quantity" : 500 },
{ "product" : "socks", "store number" : 2, "quantity" : 10 },
{ "product" : "shirt", "store number" : 3, "quantity" : 10 }
]
let $join :=
for $store in $stores, $sale in $sales
where $store=>"store number" = $sale=>"store number"
return {
"nb" : $store=>"store number",
"state" : $store=>state,
"sold" : $sale=>product
}
return [$join]
Of course you can also group by, count, order by, nest FLWOR clauses... $ wc -l big.log
979400 big.log
$ du -hs big.log
570M big.log
`count` is a small program that counts lines on stdin. like `sort|uniq -c |sort -n`jq takes 12 seconds:
$ time cat big.log |jq -cr .method |~/bin/count
848000 GET
94800 POST
34000 HEAD
2400 OPTIONS
200 null
real 0m12.381s
user 0m12.427s
sys 0m0.333s
my tool takes .5 seconds $ time cat big.log |~/bin/jj method |~/bin/count
848000 GET
94800 POST
34000 HEAD
2400 OPTIONS
200
real 0m0.466s
user 0m0.512s
sys 0m0.198s
`jj` is a little tool I wrote that uses https://github.com/buger/jsonparserThe post links to the tutorial "An Introduction to JQ" at [1].
Somewhere inside the tutorial, array operators are introduced like this:
> jq lets you select the whole array [], a specific element [3], or ranges [2:5] and combine these with the object index if needed.
This is not supposed to be criticism on this particular tutorial (I've seen this kind of description quite often), but I could imagine this to be a typical "eyes glaze over" moment, where people subtly lose track of what is happening.
It appears to make sense on first glance, but leaves open the question what "selecting the whole array" actually means - especially, since you can write both ".myarray" and ".myarray[]" and both will select the whole array in a sense.
I think this is the point where one would really need to learn about sequences and about jq's processing model to not get frustrated later.
I don't know how jq works internally and in my mental model [] maps into the json array and also can wrap things back into an array. So that [.[]] unwraps and then rewraps a JSON array, sort of like how [.[].title] is the same as map(.title).
One thing that seems to be perhaps a misconception amongst some is that jq invocations are short and only 'one-liners', and that a 'real script' (in a 'real language') would be better in many cases. I think this lack of larger program examples probably helps to perpetuate this misunderstanding too.
Anyway, I was inspired enough by the article in question to write up some of my own thoughts on jq and statelessness: https://qmacro.org/blog/posts/2022/05/02/some-thoughts-on-jq...
I also have never seen jq as a performance bottleneck.
jq is stable, I have never encountered a bug with it and I have never seen it getting stuck after years of usage. It is dependable and practical.
jq has helped me put out countless fires throughout my career. I should donate to it one day.
I do like tools that complement/supplement jq though, like jid: https://github.com/simeji/jid
Why would you change that?
Also i'm working and prototyping some kind of http decoding support that will make things like select on headers and automatic/manual decoding body possible.
The issue with jq is that I use it maybe once a month, or even less. The syntax is "arcane enough" that I keep forgetting how to use it because I use it so sporadically.
In comparison awk – which I also don't use that often – has a much easier syntax that I can mostly remember.
Not entirely convinced by the zq syntax either though; it also seems "arcane enough" that I would keep forgetting it.
There are at least a dozen tools and languages and syntaxes that I've used sporadically over the years - awk, sed, bash, Mongo, perl, etc. I don't use them often enough to remember exactly how they work, and so I always have to spend a few hours reviewing manuals or old code repos or an O'Reilly book.
But if I do end up using it for a few days in a row, it starts to make sense, and I improve each time I use it.
But not with jq.
It just does not make sense to my brain, no matter how many times I've had to use it. Every single time I need to use it, it requires finding some Stack Exchange or blog and just copying and pasting. Even after seeing the solution, rarely do I then really understand why or how it works. Nor can I often take that knowledge and apply it to similar problems.
About the only other syntax or language that gives me such problems is Elastic Search DSL.
[1,2,3] | js "out = 0; for (const n of this) out += n"
That would print "6". `out` would be a special variable you write to to print the result, and `this` would be the input.I've tried a couple of times to get into awk, but still find the syntax arcane.
> Not entirely convinced by the zq syntax either though; it also seems "arcane enough" that I would keep forgetting it.
I think this is the main thing. I’d prefer a streamlined CLI tool where you passed in some JS code and it’d just run it on the input (with the same slurp/raw args as jq). Could just be npm with underscore.js autoimported.
I ended up writing my own guide to it, that in my unbiased opinion makes it easier to get the point where in-depth examples and language descriptions are easier to understand.
Edit: Oh, wow, it's even mentioned in this article. Maybe I should read before commenting.
Your guide was great for this. It stepped me through enough of the bare basics in a way that the underlying model was obvious. It didn't get me nearly far enough for many of the tasks that I need jq for, but it got me started and that's all I really needed. Everything additional that I need to learn becomes obvious in retrospect—"of course there's an operator for this, there kind of has to be!".
Thank you!
> The jq documentation is written in a style that hides a lot of important detail because the hope is that the language feels intuitive.
Yeah, not so much boys! Also, that disclaimer should really be at the top of the manual, with a link to the wiki, rather than vice-versa, as it is now.
The wiki is like secret information -- "oh, hey, here's the page that actually tells you how it works!"
This part in particular jumped out at me:
> To work around this statelessness, you can wrap a sequence of independent values into an array, iterate over the array, then wrap that result back up into another array so you can pass the entire sequence as a single value downstream to the “next filter”.
This is literally just describing a map. A technique so generally applicable and useful that it's made its way into every modern imperative/procedural programming language I can think of. The idea that this person fails to recognise such a common multiparadigmatic programming idiom doesn't fill me with confidence about the design of zq.
def map(f): [.[] | f];
Many built-in functions in jq are implemented in jq, in terms of a small set of core primitives. The implementations can be inspected in builtin.jq.https://github.com/stedolan/jq/blob/master/src/builtin.jq#L3
Results can be emitted iteratively using generators, which are implemented as tail-recursive streams [0]. Combined with the `input` built-in filter, which yields the next item in the input stream, and jq can handle real-time I/O and function as a more general-purpose programming language.
I built an interpreter for the Whitespace programming language in jq using these concepts and it's easily one of the most complex jq programs out there.
[0]: https://stedolan.github.io/jq/manual/#Generatorsanditerators
Like you, I almost always just write Python scripts for such tasks because it's a lot easier for me to reason through it and debug it, but it's definitely slower-going than what I might do if I were very adept in a terse language like jq. I don't do this too often, so it makes little difference to me, but if someone is doing this multiple times a day, every day, it'll add up. As you say, it takes a few minutes; with jq, it could be a few seconds.
It might actually make sense to embed jq functionality into your favourite language (as a library or so), as it is quite a nice and well-chosen set of functionality.
curl -s http://foo.bar | jq .some.nested.value
Anything more complicated I would indeed go for writing a proper script.As the complexity of the input JSON grows or the complexity of your processing, it does makes sense to leave jq behind for a higher level language.
I disagree with 'leaving for a higher level language'. Jq is an extremely high level language.
What it is _not_ is a general purpose language.
I used to 100% agree with you, but these days I understand why so much stuff ends up being bash and jq.
I don't even process complex JSON...it's usually pretty basic. But being able to quickly select parts out of streams of JSON data on the CLI is incredibly useful to me, and learning even just the basics of jq has paid for itself a hundred times over by now.
Granted, a lot of my job right now is data forensics stuff, so I breath this kind of stuff. You might never need jq.
I imagine this is how it's used 90% of the time, but can do lots more advanced stuff as described in the article.
In many cases, the most appropriate and useful tool for the job would be jq - one line in the shell script corresponding to the required data transform, calling out to `jq`, which already has a reasonable user base and documentation, and could be trivially replaced by anyone if the business needs change.
And since it relies on .NET, that also requires its own separate opt-out for its telemetry. There might be other components, now or in the future, that also send data to Microsoft by default and would have to be separately discovered and disabled.
[1] https://docs.microsoft.com/en-us/powershell/module/microsoft...
Building a program with .NET does NOT cause that program to send telemetry to Microsoft.
You're thinking of the .NET SDK itself. Using PowerShell does not trigger any use of the .NET SDK.
Disclaimer: I work for Microsoft.
> There might be other components, now or in the future, that also send data to Microsoft
Of course. Do your due diligence on whatever you install. No tool should be exempt from that.
This is the best part of pwsh. Everything is standardized, you're not guessing at the idioms of each command, and you're working with objects instead of parsing strings!
My second favorite part is having access to the entire C# standard library.
I've been using it since day one from 2006, every single day. It has come a long way and the current PS7 is the best shell experience there is. Hands down no contest.
Snover's passionate early presentation about the PS pipeline is a pretty cool tech video. https://www.youtube.com/watch?v=325kY2Umgw8
Actually PowerShell is "Perl interactive done right" if you read what the designers say about their influences - the automatic variable $_ is straight from Perl and the array creation syntax @(a, b, c) is also a Perl-ism from @arr = (a, b, c). Which is funny as I dislike Perl intensely but really like PowerShell :)
To be fair there's not much Perl in PS, it's as much influenced by KSH, Awk, cmd.exe and VBScript as Perl. Thankfully "influenced by" isn't "a melange of", because a combination of all of those sounds like an abomination lol, and PS is wonderful in being about as consistent and simple as a proper shell can get.
Set-PSReadLineKeyHandler -Key Tab -Function MenuCompletefetch("cars.json").filter({"color": "red"})
# or
echo(fetch("cars.json").filter({"color": "red"}))
Looking at the popularity of VSCode, I don't think Microsoft hatred blocks its adoption.
In-apt comparison. The people using VS Code are more likely to be migrating from proprietary tools like PyCharm, Sublime Text, etc or bloated offerings like NetBeans or roughly equivalent offerings like Atom.
The people that would use PowerShell would be migrating from the likes of Zsh, Bash, Fish, and other “hard core free” software.
jq and its unix-y friends allow me to trade off expressiveness against having to memorize arcane invocations
gc cars.json | ConvertFrom-Json | ? color -eq 'red'
`ConvertFrom-Json` doesn't have a default alias, but you can define one in your PowerShell profile. I do that for commands I find myself using frequently. Say we pick convjson: gc cars.json | convjson | ? color -eq 'red'
That's more like what my typical pipelines look like.The nice thing about aliases is you can always switch back to the verbose names when clarity is more important than brevity, like in long-term scripts.
Edit: Seems I've been using too many braces and dollar signs all these years. Thanks to majkinetor for the tip.
However, I really do like jq for queries and scripting, so I keep both around.
If you can emit the syntactic form as a Python or perl ref, or a jq array ref, then I could use your tool to find the structure and the other ones to stream.
Great example! Thanks for posting this.
Detect changes of a specific node and the whole subtree/subtree:
let $node := jn:doc('mycol.jn','mydoc.jn')=>fieldName[[1]]
let $result := for $node-in-rev in jn:all-times($node)
return
if ((not(exists(jn:previous($node-in-rev))))
or (sdb:hash($node-in-rev) ne sdb:hash(jn:previous($node-in-rev)))) then
$node-in-rev
else
()
return [
for $jsonItem in $result
return { "node": $jsonItem, "revision": sdb:revision($jsonItem) }
]
Get all diffs between all revisions and serialize the output in an array: let $maxRevision := sdb:revision(jn:doc('mycol.jn','mydoc.jn'))
let $result := for $i in (1 to $maxRevision)
return
if ($i > 1) then
jn:diff('mycol.jn','mydoc.jn',$i - 1, $i)
else
()
return [
for $diff at $pos in $result
return {"diffRev" || $pos || "toRev" || $pos + 1: jn:parse($diff)=>diffs}
]
Open a specific revisionBy datetime:
jn:open('mycol.jn','mydoc.jn',xs:dateTime('2022-03-01T00:00:00Z'))
By revision number: jn:doc('mycol.jn','mydoc.jn',5)
And a view of an outdated frontend:https://github.com/sirixdb/sirix/raw/master/Screenshot%20fro...
def stores:
[
{ "store number" : 1, "state" : "MA" },
{ "store number" : 2, "state" : "MA" },
{ "store number" : 3, "state" : "CA" },
{ "store number" : 4, "state" : "CA" }
];
def sales:
[
{ "product" : "broiler", "store number" : 1, "quantity" : 20 },
{ "product" : "toaster", "store number" : 2, "quantity" : 100 },
{ "product" : "toaster", "store number" : 2, "quantity" : 50 },
{ "product" : "toaster", "store number" : 3, "quantity" : 50 },
{ "product" : "blender", "store number" : 3, "quantity" : 100 },
{ "product" : "blender", "store number" : 3, "quantity" : 150 },
{ "product" : "socks", "store number" : 1, "quantity" : 500 },
{ "product" : "socks", "store number" : 2, "quantity" : 10 },
{ "product" : "shirt", "store number" : 3, "quantity" : 10 }
];
[
{store: stores[], sale: sales[]}
| select(.store."store number" == .sale."store number")
| { nb: .store."store number",
state: .store.state,
sold: .sale.product
}
]
Try it online - https://tio.run/##rZPPUsMgEMbP5Sl2ctIZmklbe6HTg@PZJ8jkkD84Rh...https://www.npmjs.com/package/json
Edit: added more information
I didn't check any of them out though.
considering how much data I can already get via json (or converted to json via other json related standards such as geojson), there doesn't seem to be much of a compelling case to use ZNG.
I'd love to hear different though!
And in ansible, too, FWIW, but yes it's my hand-to-hand combat with the language in both of those circumstances that has formed my opinion about it
Regrettably, "kubectl get -o jsonpath" is _almost_ the same, but just different enough to trip me up :-(
FWIW, my choice to move towards JMESPath is definitely a pragmatic one, not an aesthetic preference. I can't say JMESPath fills me with joy, and the syntax does not come naturally to me - it just solves the problem acceptably well and in a way that has some network effects.
I have plenty of respect for "I've used this tool in the wild and the experience was unpleasant."
> we have also introduced an x-robots-tag: none in the http response header of Wiki pages
> Abusive behavior in Wikis had a negative impact on our search engine ranking
> GitHub is currently permitting a select criteria of GitHub Wikis to be indexed
I don't see anything here about wiki specifically but maybe one of the rules hits wiki pages?
$ printf '{"a": {"b":"c", "d":["d0","d1"]}}' | jq -r '[ .a as $a | $a.d[] | {x: ., y: $a.b}]'
[
{
"x": "d0",
"y": "c"
},
{
"x": "d1",
"y": "c"
}
]
and I realize this isn't as pure CS-y as you were asking, but this syntax is hell on quoting $ printf '["a","b"]' | jp -u 'join(`"\n"`, @)'
# vs
$ printf '["a","b"]' | jq -r 'join("\n")'I also find jq more intuitive but I really dislike that we have three standards each used by a number of tools, e.g. jsonpath, jmespath, and jq.
gc cars.json | convjson | ? color -eq 'red'There are a couple other copy variants as well:
I was running 0.7.1 rather than 0.8.0, so will upgrade now
Edit: Works great!
However, presumably PowerShell requires at least the .NET Runtime if not the .NET SDK, doesn't it? The docs [2] suggest running "dotnet --list-runtimes" to "see which versions of the .NET runtime are currently installed", so it sounds like the Runtime also includes the dotnet command. Does running the recommended "dotnet --list-runtimes" command send telemetry, like most of the commands? Or are you saying that the Runtime, unlike the SDK, doesn't include telemetry at all?
[1] https://docs.microsoft.com/en-us/dotnet/core/tools/telemetry
[2] https://docs.microsoft.com/en-us/dotnet/core/install/how-to-...
Nope, these days .NET programs (like PowerShell) bundle the runtime. But even if they did a lighter distribution that depended on the runtime already being installed, there would be no .NET telemetry sent.
> Does running the recommended "dotnet --list-runtimes" command send telemetry, like most of the commands?
This is still an SDK command. I don't personally know if this one sends any telemetry.
> Or are you saying that the Runtime, unlike the SDK, doesn't include telemetry at all?
The runtime does not send telemetry.
Edit: Actually, the ".NET Runtime 6.0.4" [1] (not the SDK) definitely has a "dotnet" command included. Presumably with the telemetry?
Thanks for linking this project, will try it, it may be a game changer.
> an implicit loop
As an occasional awk user, I'd love if you expand on this. Maybe it will help clear things up for me. You're not referring to the fact that awk operates on every line independently, are you? for line in readfile()
for block in script:
if block.match(line)
run_block(block)
end
endfor
endfor
Where the "for line in readfile()" is the "implicit loop", and the blocks are
the "condition { .. }" blocks.The actual flow is a little bit more complex and has some exceptions e.g. (BEGIN/END), but this is about the gist of it.
... | awk '{print $2}'
I thought there was all this confusing syntax, but something like awk '/pattern/ {print}'
was more clear to me. In the first case, the empty pattern matches every line of the input, and the action is simply to print the second field of each line. Patterns can vary in complexity from the empty pattern to long chains of logical operators and regular expressions, such as /pattern/ in the second example. The outer quotes are just to prevent the shell from eating your dollar signs or other special characters. In a standalone AWK script you can write it like /pattern/ {
print
}
which also makes it look more like another language.If you can get your hands on a copy of The AWK Programming Language, it's a pretty quick and pleasant read that helped everything make more sense to me. I do most of my data analysis for my research using AWK and really enjoy working with it.
That's a ridiculous take. 99% of users don't understand what all that technobabble in a typical EULA means, they will just go for the option they are nudged to (which is why first the courts and now enforcement agencies are stepping up their game against that practice [1]).
The way that the GDPR expects stuff to be handled is by getting explicit user consent, the consent must be a reasonably free choice (i.e. deals like "give me your personal data and the app is free, otherwise pay" are banned), and there must not be any exchange of GDPR-protected data without that consent unless technically required to perform the service the user demands. Clearly, a telemetry opt-out is completely against the spirit of the GDPR and I seriously hope for Microsoft to get flattened by the courts for the bullshit they have been pulling for way too long now.
What I would actually expect of Microsoft is to follow the Apple way: have one single central place, ideally at setup and later in the System Preferences, where tracking, analytics and other optional crap can be disabled system-wide.
[1] https://www.hiddemann.de/allgemein/lg-rostock-bejaht-unterla...
Then it befits a ridiculous state of affairs. It would be great to have the standards you suggest, and it's a shame that we don't. But that doesn't change the fact that we don't, and because we don't, we need to do due diligence on the tools we install.
https://docs.microsoft.com/en-us/powershell/module/microsoft...
and is "anonymized information about the host running PowerShell, and information about how PowerShell is used". It sucks that it has telemetry, but anonymised information about whether a computer ran 10 .exe or 10 cmdlets pales into insignificance against Windows and Edge and OneDrive slurping up names, addresses, files, moving logins to Microsoft accounts, sending browser history to Microsoft, checking downloads with Microsoft, keeping a history of all programs run in Windows for timeline and trying to send that to Microsoft to sync it between devices, moving OneNote to the cloud, having the start menu search be a Bing web search, defaulting to Cortana being a cloud based voice search, sending pen and ink data to Microsoft, and etc. etc.
I would simply wish for no telemetry to happen at all without user consent. If Microsoft wants information about how people use their software or how stable it is and not enough people opt in, they should fucking pay people money for market research and QA.
This is still GDPR non-compliant, you should have a central place to _opt-in_ tracking, analytics and other optional crap if you so desire.
In Debian, you can opt-in at setup time or any later time with a simple "dpkg-reconfigure popularity-contest" (even though that one isn't fully GDPR-compliant as you can't easily read what exactly is being done from the same screen).
Yes, if you download a .NET Runtime distribution, it will include the `dotnet` command from the SDK so that basic commands like `dotnet --list-runtimes` and `dotnet --list-sdks` are available. These commands may send telemetry. But as you probably saw on https://docs.microsoft.com/en-us/dotnet/core/tools/telemetry , using `dotnet path/to/program.dll` to run an unbundled .NET program will never send telemetry.
> The AWK Programming Language
I see it's public domain and discussed here on HN:
https://news.ycombinator.com/item?id=13451454I'll go over it, thank you very much for the suggestion.
You cannot opt-in. You can go to `System Preferences > Security & Privacy > Analytics & Improvements` and opt-out, but the default is not opt-in.
I draw your attention to the link I posted, and the purple background call out box with the exclamation mark icon and the heading "Note" which says: "Application Insights uses the hosts IP address to determine the geographic location. The IP address is never included in the telemetry data or stored in the database."
> "I would simply wish for no telemetry to happen at all without user consent."
I would, too. So did someone on Github: https://github.com/PowerShell/PowerShell/issues/15722 "Change telemetry from opt-out to opt-in" where Microsoft said "we felt that by making telemetry opt-in it would bias and limit our telemetry in a way that would make less useful to our users."
It nevertheless is transferred to the origin server (otherwise, how would TCP work), and that is enough for it to be considered under GDPR!
> we felt that by making telemetry opt-in it would bias and limit our telemetry in a way that would make less useful to our users
Well, that still doesn't override the GDPR. The GDPR is law, precisely in the spirit to prevent un-consented tracking!
This makes it sound like this is some antagonistic relationship where the OSS maintainer loses. But the idealistic scenario that you are alluding to[1] is about a developer who develops free OSS in their free time. And then, yes, very few end up paying or donating anything. But how is a predictable chain of events a loss? What is the “economics” of it?
[1] Some OSS developers do it as their day job.
In any case, what I meant by the "economics" of it is that in general a person can only afford to work for free for so long before they need to pay bills, eat, have and/or acquire a standard of living that isn't poverty. If they have a day job where they are writing this software in their free time, how long can they do this before burning out?
How does one afford to work for free? One has a day job. How does someone who volunteers for search-and-rescue afford it? That’s obviously a ridiculous question—they are volunteers so they necessarily must do something from nine to five. Or be independently wealthy.
But how does one avoid burnout as a double-worked programmer? I think we have ourselves to blame on that point since we have put the double-worked programmer on a pedestal. So we can either:
1. Not work on things both professionally and in our free time; or
2. Force ourselves to do just that because we gain something extrinsic from it that we might need, like simply keeping up with the Joneses (having an answer for “where’s your private GitHub” in interviews…)
https://changelog.com/posts/i-just-hit-100000-per-year-on-gi...
> How does one afford to work for free? This is exactly my point. The work isn't done for free, the person is spending their own money and time which takes away from a limited pool of resources they own. If they're insanely rich, they could probably "afford" to do this work until they die.
But you're making a mistake in your reasoning relating "volunteers", "free work", and "day jobs." Here is what I think you are missing in this assessment: A worker for a company/day job works with an obligation via contract for compensation for their time from their employer. A volunteer works without a contractual obligation of compensation for their time from the community that benefits from their work. In this latter case while there is no contractual obligation for a society/community to compensate the volunteer, it does not forbid it. Does someone who works as a volunteer search-and-rescue deserve to be compensated? I'd say yes, in fact, they do. They are providing a service.
Now I'll get ahead of the next possible argument. "But there's not enough work or compensation for them to make a living!" This is two parts:
1. For not enough work - This is only true because of the example chosen and our human tendency to draw broad analogies. There can definitely be enough work in multiple domains (and especially in software) but also what about volunteer firefighters? 2. Not enough compensation - This is because people with the means to compensate the work, simply are not doing that. And it's not a good faith argument to tell me that in the original case enough people with enough money aren't using the project to compensate it's continued development and maintenance.
To sum up all of the above: Yes, work like this is volunteer work, and it says a lot that societies and communities do not compensate this work. Simply because they don't compensate that work doesn't mean it's not able to be compensated. And there are key differences between this relationship of work and compensation that make it different from the colloquial "work" as in a day job. As this is an entity reserving your time under contract.
Now for burnout as a double-worked programmer. I think you're right on these two points. Obviously the second situation is not ideal. If someone wants to do it, let them. There are plenty of open source projects still maintained by stretched thin developers. Is this a tenable solution long term? No in the vast majority of cases and that's my point!