Show HN: Choose – An alternative to cut and sometimes awk

Show HN: Choose – An alternative to cut and sometimes awk(github.com)

87 points by theryangeary 6 years ago | 45 comments

I think it is very cool that Rust has led to a renaissance of rewriting classic Unix tools to make them fit more with current use. Unix was never meant to stand still. It just happened that AT&T broke up and it took a while for Linux to catch up, and by then people got used to the idea of a fixed set of POSIX utilities. But their CLIs are often quite bad, and security was never a consideration in the olden days, so it’s good see them re-evaluated.

masklinn 6 years ago | |

> I think it is very cool that Rust has led to a renaissance of rewriting classic Unix tools to make them fit more with current use. Unix was never meant to stand still.

Technically it has not, even for the core tools they've been getting extended, usually incompatibly, in both GNU and BSD lineages. Though it's pretty funny how much the rust community has been taken up by providing alternatives and replacements for "classic" (POSIX) utilities.

ryi 6 years ago | | |

While development has not stopped, there haven't really been many advances in improving the syntax. Many of these tools are stuck in awkard, unintuitive syntax, which is cumbersome unless you use them frequently. I feel like the renaissance has lately been one of usability, which I personally really appreciate. Obviously, hard-core daily users would disagree, but considering I use `awk` at most once every 6 months, I hate that I need to spend 20 minutes re-learning how to use it every single time, particularly for basic purposes.

oefrha 6 years ago |

Hmm, so compared to cut this

1. saves -f because it doesn’t support cut’s -b and -c modes (edit: actually -c is supported, I just didn’t see it);

2. Uses -f instead of -d, making it rather confusing for cut users;

3. Uses : instead of - for range specifications;

4. Offers an exclusive indexing mode;

5. Misses a bunch of other cut features (assuming coreutils cut).

Not sure I see much appeal...

Edit: Another thing I missed: regex separator instead of just character list.

masklinn 6 years ago | |

> Not sure I see much appeal…

The appeal is the same as replacing grep with a fancier searcher:

1. it has good and sensible defaults (field mode, also I'd have to check but hopefully and unlike cut it doesn't print the entire line when it's unhappy with the selection you asked for, that's worse error handling than ed) (edit: confirmed, if you give `choose` nonsensical selection it doesn't print anything e.g. if you ask `cut` for columns 10-15 of data with 3 columns it's going to print the source as-is, choose is properly going to print a bunch of empty lines, that alone makes it better than cut)

2. It works better on actual data, which is generally whitespace-separated rather than tab-separated, meaning cut requires preprocessing before it'll do anything of use

Can you massage cut or the data to fit? Yes, in the same way you can massage grep or your data to fit. That you don't have to and the utility behaves sensibly by default is appealing. This exact thing is one I've been thinking about for some time now, I'm glad somebody else agreed and did the legwork.

BiteCode_dev 6 years ago | |

To get the appeal, give a generalized cut version of:

    echo -e "foo   bar   baz" | choose -1 -2

xthetrfd 6 years ago |

I can't understand why you are mentioning awk. Cut or choose cannot be compared to awk, awk is a programming language.

Also I don't think that it's so much easier to use than cut. On the other hand every *nix system has cut so if you make scripts with it they are portable.

BiteCode_dev 6 years ago | |

> I can't understand why you are mentioning awk. Cut or choose cannot be compared to awk, awk is a programming language.

Because 99% of awk IRL use is just as a fancier cut.

It's very rare someone even sets a variable using awk. If you do it, you are a statistical rarity.

> Also I don't think that it's so much easier to use than cut. On the other hand every *nix system has cut so if you make scripts with it they are portable.

I, for one, never remember the syntax for cut. If "choose" gets a deb, I'll use it: Python slicing is something familiar to me.

I don't care if cut is on every unix system: if I have the possibility to install things on the machine, then I'll just install what I need. I have a script for that. If I don't, I'll google/man/--help GNU commands as usual.

And as for writing shell scripts, I use Python anyway.

masklinn 6 years ago | | |

> Because 99% of awk IRL use is just a as fancier cut.

You say "fancier", I say "working": since cut can't work on general whitespace without a pre-processing phase (e.g. tr), it simply doesn't work for the vast majority of the things I try to shove into it, and I pretty much always end up using awk instead.

Choose means my awk use will fall down by 99% or so.

Legogris 6 years ago | |

> regular expression field separators using Rust's regex syntax

This actually makes choose a cut-killer for me. It can be frustrating having to figure out which delimiters to use - tabs or spaces? If spaces, you'll have to chain it with tr, or resort to awk.

BiteCode_dev 6 years ago | | |

And choose makes it even better by using "\s" as a default separator. So you usually don't have to specify a separator at all.

asicsp 6 years ago |

Not sure why it is even compared to awk instead of just cut. It could've been introduced as cut-like command with regex input field separator. Or at least not say things like:

>However, the awk command is not ideal for rapid shell use

And

>cut is far from ideal for rapid shell use, because of its confusing syntax

anything new is confusing until you learn enough to be comfortable

>ranges are just plain difficult to get right on the first try

and how does choose become easy to use with ':' character instead of '-'

Is this a typo or does inclusive/exclusive depend on whether first number is specified?

>choose 2:5 # print everything from the 2nd to 5th item on the line, _inclusive_ of the 5th

>choose :3 # print the beginning of the line to the 3rd item _exclusive_

tzs 6 years ago |

Can it output fields in an order other than their input order? That's the one thing I regularly wish cut could do. I would like the output of the second cut below to be "3,1", not "1,3".

  $ echo "1,2,3,4,5" | cut -d , -f 1,3
  1,3
  $ echo "1,2,3,4,5" | cut -d , -f 3,1
  1,3

Klasiaster 6 years ago | |

Yes:

  $ echo "1,2,3,4,5" | choose -f , 2 0
  3 1
  $ echo "1,2,3,4,5" | choose -f , 2:0
  3 2 1

Note that the indexing starts with 0, "-d" is "-f", and a range is denoted by ":" instead of "-" which is used for indexing from the end.

js2 6 years ago | |

    awk -F, '{print $3","$1}'

typon 6 years ago |

I love my coreutils replacements...not that they're in Rust but because they're generally faster and easier to use. fd, rg, bat and now i shall use choose! I almost always have to lookup awk's syntax but the defaults in choose seem trivial to remember. Thanks for making this!

ilovetux 6 years ago |

I am not sure why being zero-indexed is considered a feature. I have no problem using a zero-indexed system, but I've never really thought of it as a feature. Is there something I'm missing that makes zero-indexed systems faster, easier to use or otherwise better than one-indexed system?

xxpor 6 years ago | |

There's a better reason than this that I'm forgetting, but never underestimate the power of being the same as what people are already familiar with. Every time I have to write lua or read some Matlab, the mental overhead of having to remember everything is one-indexed is just incredibly annoying.

fwip 6 years ago | | |

Anyone used to command-line tools is used to fields being 1-indexed.

awk uses $0 as the whole line, and $1 as the first field. cut uses -f1 as the first field $1 is the first argument to a posix shell script /1 is the first matched reference in a sed $1 is the first regex match in perl

A command-line tool being 0-indexed breaks from expectation of what everybody is used to using on the command line.

Klasiaster 6 years ago |

Inspired by a remark about Python's default split behavior in comment https://news.ycombinator.com/item?id=23446146 I wrote a Python oneliner for field selection works similar to "choose" but throws exceptions when the field cannot be found:

  $ echo " a    b c" | choose 1 2
  b c
  $ echo " a    b c" | python3 -c 'import sys; [print(f[1], f[2]) for line in sys.stdin if (f := line.split()) or True]'
  b c

BiteCode_dev 6 years ago |

Someone should make a bundle installer with this, bat, fdfind and ripgrep. I do enjoy those alternative to GNU, and install as many as I can: they are easier to use, usually faster, and just make more sense to my brain.

This pain is real: https://xkcd.com/1168/

asicsp 6 years ago | |

There is https://github.com/uutils/coreutils implemented in Rust

BiteCode_dev 6 years ago | | |

Yes but their goal seems to be API compatible, which I understand the point of, but is not useful to me.

jpxw 6 years ago |

What does this do that cut can’t?

LeonB 6 years ago | |

This is a poor question as it invokes the Turing tarpit. (Why using any language that is higher level than machine code?)

If it is more comfortable to use for some people then it’s a great invention.

Legogris 6 years ago | |

Select fields from output with space-delimiters (commands like `docker images`, which have to be preprocessed with tr)

BiteCode_dev 6 years ago | |

Nothing, but try doing a general version of:

    echo -e "foo   bar   baz" | choose -1 -2

With cut.

fwip 6 years ago | | |

    echo -e "foo   bar   baz"  | xargs | cut -d\  -f1,2

tyingq 6 years ago |

Very cool. An inverse mode to suppress matched fields might be a neat feature.

LockAndLol 6 years ago |

Very nice. Good work.