Grep flags – the good stuff

Grep flags – the good stuff(zwischenzugs.com)

87 points by U1F984 4 years ago | 52 comments

nickcw 4 years ago |

My favorite feature is:

       -P, --perl-regexp
       
              Interpret I<PATTERNS> as Perl-compatible regular
              expressions (PCREs). This option is experimental when
              combined with the -z (--null-data) option, and grep -P
              may warn of unimplemented features.

As everything (python, Go, javascript, etc, etc) uses perl regexps now-a-days and I can never remember which things I need to escape for old gods regexp.

kjeetgill 4 years ago | |

Haha, agreed 100%. My go to for years had been https://www.regexplanet.com/. It let's you test what regex escapes work where without spinning up one-off `void main()s`.

silisili 4 years ago | |

Slightly pedantic, but Go uses RE2 which is subtly different than PCRE. In most common use cases, you'd probably never know.

hrez 4 years ago | |

also combined with -o i.e.

grep -Po "Name:\K\w+"

mmh0000 4 years ago |

An amazing grep trick that I use all the time: The -e flag can be used to search for multiple terms. A blank -e will search for null. Thus:

Lets assume we have a log file with a bunch of relevant stuff, I want to highlight my search term, BUT I also want to keep all the other lines around for context:

  $ dmesg
  ...SNIP...
  [2334597.539661] sd 1:0:0:0: [sdb] Attached SCSI removable disk
  [2334597.548919] sd 1:0:0:0: [sdb] 57280429 512-byte logical blocks: (29.3 GB/27.3  GiB)
  [2334597.761895] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  [2334597.761900] sdb: detected capacity change from 0 to 57280429
  [2334597.772736]  sdb:
  [2334631.115664] sdb: detected capacity change from 57280429 to 0
  ...SNIP...

A simple grep, will only return the selected lines:

  $ dmesg | grep capacity
  [2334597.761900] sdb: detected capacity change from 0 to 57280429
  [2334631.115664] sdb: detected capacity change from 57280429 to 0

But I want all lines:

  $ dmesg | grep --color -e capacity -e ''
  ...SNIP...
  [2334597.539661] sd 1:0:0:0: [sdb] Attached SCSI removable disk
  [2334597.548919] sd 1:0:0:0: [sdb] 57280429 512-byte logical blocks: (29.3 GB/27.3  GiB)
  [2334597.761895] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  *[2334597.761900] sdb: detected capacity change from 0 to 57280429*
  [2334597.772736]  sdb:
  *[2334631.115664] sdb: detected capacity change from 57280429 to 0*
  ...SNIP...

The null trick also works well on directories with many small files, like /proc/ or /sys/. Say, for example, you wanted to get the filename and value of each file:

  $ grep -R '' /sys/module/iwlwifi/parameters/
  /sys/module/iwlwifi/parameters/nvm_file:(null)
  /sys/module/iwlwifi/parameters/debug:0
  /sys/module/iwlwifi/parameters/swcrypto:0
  /sys/module/iwlwifi/parameters/power_save:N
  /sys/module/iwlwifi/parameters/lar_disable:N
  ...SNIP...

addingnumbers 4 years ago | |

I always used "-e ^" the same way you're using a null string, to show all lines of the files, each prefixed with the path and filename. Are they equivalent or is there a caveat I should watch out for?

likpok 4 years ago | | |

Zsh (and possibly other shells?) will expand a raw ^ into filenames. '' is a little shorter than '^' if you have to quote it.

throwawayboise 4 years ago | |

Since you're probably (or at least often) pipelining this into "less" assuming there is more than one screenful of lines, why not just use the regexp searching/highlighting built into less?

pjungwir 4 years ago | |

I have this in my ~/bin, but I like your version even better!:

    grep --color "$1"'\|$'

asicsp 4 years ago |

>The -I flag only considers text files. This radically speeds up recursive greps.

I use ripgrep when I need better speed. I've pretty much switched to ripgrep these days, but still use GNU grep when I'm answering questions on stackoverflow, reddit, etc.

>ABC flags

Good to also know about `--group-separator` and `--no-group-separator` when there are multiple non-contiguous matches. Helps to customize the separator or remove them altogether. Sadly, these options are still not explained in `man grep` on Ubuntu. You'll have to use `info grep` or the online manual to find them.

Options I use often that is not mentioned in the article:

* `-c` to count the number of matches

* `-F` for fixed string matching

* `-x` to match whole lines

* `-P` for PCRE (as mentioned in many comments here)

* `--color=auto` this is part of command name alias, so it is always used

I wrote a book as well on "GNU grep and ripgrep": https://github.com/learnbyexample/learn_gnugrep_ripgrep Free to read online.

CalChris 4 years ago |

I have a shell alias/function variations of which I've used for decades. This is the zsh version:

  function fvi { grep -rl $1 . | xargs nvim +/$1 }

It greps a directory recursively and opens files which have a pattern and puts the pattern in the search buffer.

chillpenguin 4 years ago | |

very cool!

tptacek 4 years ago |

Honorable mention for `-q`, which is useful in shell scripts when you don't want the output, just the result code.

cjcampbell 4 years ago | |

I am a big fan of the `if [!] grep -q ...` pattern. I'd probably rank -q near the top of the list for shell scripts.

fomine3 4 years ago |

Here's how GNU grep detect binary files, good to know sometimes: https://unix.stackexchange.com/a/276028

js2 4 years ago |

In my PATH I have this script as git-gsr, which I can call as "git gsr".

    #!/bin/sh
    
    usage () {
        cat >&2 <<'__USAGE__'
    usage: git gsr [-P | --perl-regexp] <old> <new> [paths...]
    
      replace all occurrances of <old> with <new> optionally limited to
      <paths...> (as interpreted by git grep)
    
      -P, --perl-regexp     interpret <old> as perl regular expression;
                            default is to treat it as a fixed string.
    
    __USAGE__
        exit 1
    }
    
    pattern='-F'
    perl='BEGIN {($old, $new) = (shift, shift)} s/\Q$old\E/$new/g'
    
    case "$1" in
        -P|--perl-regexp)
            shift
            pattern='-P'
            perl='BEGIN {($old, $new) = (shift, shift)} s/$old/$new/g'
            ;;
        -*) usage
            ;;
    esac
    test $# -lt 2 && usage
    old=$1; new=$2; shift; shift
    git grep -l -z $pattern "$old" -- "$@" |
    xargs -0 perl -pi -e "$perl" "$old" "$new"

pavon 4 years ago |

Learning about -o has decreased my use of sed considerably. Where I used to use:

    sed -n 's/.*\(pattern\).*/\1/p'

it can instead simply be:

    grep -o 'pattern'

The -w flag is new to me today - excited to save still more keystrokes!

aidenn0 4 years ago |

I think I have never used grep -r. I'm sure gnu grep has some way to specify which files to search, but why would I learn that syntax as well when I already know find, and exec works (exec + is much faster, but exec ; gets you the results too if your find lacks exec +).

bloopernova 4 years ago | |

Check out ripgrep

nicholasjarnold 4 years ago | | |

...you beat me to it! RipGrep[0] is among my favorite semi-recently discovered CLI tools.

[0] - https://github.com/BurntSushi/ripgrep#why-should-i-use-ripgr...

aidenn0 4 years ago | | |

I'm aware of rg, ag, &c. these tools. I even wrote a clone of ag in shell using find/grep/xargs (the last one being needed to get parallelism to match ag's speed).

throwawayboise 4 years ago | |

Yep, find ... | xargs grep ... is something I use almost daily. I don't use -exec with find, mostly just because I learned the xargs way and it's habit.

throwawayboise 4 years ago |

It's good to be aware that gnu grep has a lot more features than "unix" grep, so if you find yourself on a BSD system a lot of this stuff doesn't work.

simon04 4 years ago |

https://explainshell.com/explain?cmd=grep+-rilIv

cjcampbell 4 years ago | |

And explainshell might be my favorite discovery of the day. I hadn't stumbled on it before now. I think it'll be helpful for folks I mentor/train. I do push everyone toward using the man pages directly, but I can see how this would be less intimidating for a beginner.

jjoonathan 4 years ago |

It's weirdly difficult to get grep to search for fixed binary strings, with lots of gotchas if you don't understand grep internals. I still don't, but this is the best I have been able to do after knocking my forehead on three or four of said gotchas:

    LC_ALL=C grep -larP '\x1A\x2B\x3C\xFF'

wahern 4 years ago | |

I presume -P is a hack so that grep does the job of decoding the escape sequences? It seems your struggle here is mostly with the shell, not grep; specifically, with the fact that normal shell syntax does not recognize escape sequences other than \$, \`, \", \\, and \<newline>. Try this, which uses printf to process escape sequences.

  grep -larF "$(printf '\032\053\074\377')"

The -F flag should also make this faster as it doesn't actually need to use a regular expression engine, let alone a Perl-compatible one.

Caveats:

1) POSIX only requires printf to recognize octal escapes (\nnn, or \0nnn if using %b specifier), not hexadecimal escapes. Many implementations recognize the latter, but not Debian dash.

2) Shell command substitution strips trailing new lines from the output, so if your binary string ends in a newline you'll need to use extra tricks. E.g. S="$(printf '\032\053\074\nX')"; grep -larF "${S%X}"

3) It's probably a good idea to still specify LC_ALL=C, but because the binary string is now being passed through the shell's innards it might need to be set in the environment of the shell itself, not simply the environments of the printf and grep subcommands. (Also, technically I'm not sure if the C/POSIX locale is required to be 8-bit clean, yet, but in practice it will be.)

Bash and some other shells support an extension ($') for expanding escape sequences inline:

  grep -larF $'\x1A\x2B\x3C\xFF'

If you do any amount of shell programming--even if you only stick with Bash--it's worth spending 30 minutes reading the "Shell Command Language" chapter of the POSIX specification: https://pubs.opengroup.org/onlinepubs/9699919799/ The first few sections are the most concise resource available for explaining, step-by-step, shell parsing rules.

chaps 4 years ago |

Grep is nice and I've used it daily, but damn does it need multi threading! Especially for recursive greps. I find myself doing this a heck of a lot these days:

  find . -type f -name \*txt \; | xargs -I{} -P24 bash -c "grep -Hi foo '{}' ; :"

milliams 4 years ago |

I have problems remembering the mnemonic for -A and -B. I can't get it straight whether it's "before" and "after" or "above" and "below". I always just try one then the other!

______-_-______ 4 years ago | |

I've never had trouble remembering, but after reading this comment I'm afraid I might start :(

(joking)

VTimofeenko 4 years ago | |

The way I remember this is by keeping in mind that grep processes the area of the search as a stream of discrete lines by looking at each line individually. "Above/below" are concepts on the stream level, and "after/before" are on the line level.

waynesonfire 4 years ago | |

A after, B before, C context (for both)

indigodaddy 4 years ago | |

Same!

dcassett 4 years ago |

I looked for an option to specify a filelist (like ctags -L file) but didn't seem to find one. I'm assuming others are happy with using xargs or `cat file`, but it would seem like a useful feature.

waynesonfire 4 years ago |

very frequently I want to chain grep things... I lean on the "|" operator for this, e.g. cat hello | grep foo | grep bar and it seems verbose. any tips?

xorcist 4 years ago | |

That kind of matches are what regexps were intended for:

  grep "foo.*bar" hello

Unfortunately, the basic grep syntax doesn't give you an easy way of specifying both orders, so that would have to be something like:

  grep -e "foo.*bar" -e "bar.*foo" hello

You can specify random order in a couple of different ways with Perl-compatible regexes, such as lookaheading the search terms from the end of line marker. But it's not as easy to read as it should be.

danadam 4 years ago | |

With sed instead of grep, and probably not much better, but in a single command:

  cat hello | sed -n '/foo/{ /bar/{ /baz/ p }}'

ravoori 4 years ago | | |

with awk: <hello awk '/foo/ && /bar/ && /baz/'

BenjiWiebe 4 years ago | |

Keep doing it that way, I'd say. The other replies are ways to do it without chaining grep, but they don't seem any less verbose and definitely are less obvious.

inetknght 4 years ago |

My grep is almost always:

    grep -nRI foo ./

Sometimes I add `-i`

Often I will add `-P` and encase the regex with single-quotes of course

zwieback 4 years ago |

Ah, learned the difference between -i and -I.

I also like --include and --exclude, especially since it allows regex for which files to look at

rustyminnow 4 years ago | |

If you need to include/exclude multiple things, you can go `grep pat --exclude={foo,bar,baz}` which expands to: `grep pat --exclude=foo --exclude=bar --exclude=baz`. Much easier than typing out the flag multiple times

zwieback 4 years ago | | |

or even something like --exclude={foo,ba[rz]} , probably getting the syntax wrong

declnz 4 years ago |

I feel `grep -e` (at least) should have been the default in retrospect

Living without those few "extensions" feels... empty

notatoad 4 years ago |

i've had this stuck to my office wall for a while, and i've internalized most of it by now but it's still great

https://twitter.com/b0rk/status/991880504805871616

beembeem 4 years ago |

"-Irs/-Iirs --color=always" is my standard set of flags

-l/-h/-v/-o show up every now and then

-P, --perl-regexp Interpret I<PATTERNS> as Perl-compatible regular expressions (PCREs). This option is experimental when combined with the -z (--null-data) option, and grep -P may warn of unimplemented features.

$ dmesg ...SNIP... [2334597.539661] sd 1:0:0:0: [sdb] Attached SCSI removable disk [2334597.548919] sd 1:0:0:0: [sdb] 57280429 512-byte logical blocks: (29.3 GB/27.3 GiB) [2334597.761895] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [2334597.761900] sdb: detected capacity change from 0 to 57280429 [2334597.772736] sdb: [2334631.115664] sdb: detected capacity change from 57280429 to 0 ...SNIP...

$ dmesg | grep --color -e capacity -e '' ...SNIP... [2334597.539661] sd 1:0:0:0: [sdb] Attached SCSI removable disk [2334597.548919] sd 1:0:0:0: [sdb] 57280429 512-byte logical blocks: (29.3 GB/27.3 GiB) [2334597.761895] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA *[2334597.761900] sdb: detected capacity change from 0 to 57280429* [2334597.772736] sdb: *[2334631.115664] sdb: detected capacity change from 57280429 to 0* ...SNIP...

$ grep -R '' /sys/module/iwlwifi/parameters/ /sys/module/iwlwifi/parameters/nvm_file:(null) /sys/module/iwlwifi/parameters/debug:0 /sys/module/iwlwifi/parameters/swcrypto:0 /sys/module/iwlwifi/parameters/power_save:N /sys/module/iwlwifi/parameters/lar_disable:N ...SNIP...

#!/bin/sh usage () { cat >&2 <<'__USAGE__' usage: git gsr [-P | --perl-regexp] <old> <new> [paths...] replace all occurrances of <old> with <new> optionally limited to <paths...> (as interpreted by git grep) -P, --perl-regexp interpret <old> as perl regular expression; default is to treat it as a fixed string. __USAGE__ exit 1 } pattern='-F' perl='BEGIN {($old, $new) = (shift, shift)} s/\Q$old\E/$new/g' case "$1" in -P|--perl-regexp) shift pattern='-P' perl='BEGIN {($old, $new) = (shift, shift)} s/$old/$new/g' ;; -*) usage ;; esac test $# -lt 2 && usage old=$1; new=$2; shift; shift git grep -l -z $pattern "$old" -- "$@" | xargs -0 perl -pi -e "$perl" "$old" "$new"