How does Awk ' a[$0]++' work?(unix.stackexchange.com) |
How does Awk ' a[$0]++' work?(unix.stackexchange.com) |
awk '!a[$0]++'
"removes duplicates while maintaining order. a feat unachieved by uniq or sort -u which either removes only adjacent duplicates or has to break order to remove duplicates."Also, this version[1], which works with gawks from 2013 and after, changes the file in place as it goes, where the original version just makes the array which then needs to be printed/saved:
gawk -i inplace '!a[$0]++' file
(The original version seems to only work with files up to about 2GB.)p.s. The current misspelled title using "Awk", raises another question–I was surprised that "Awk '!a[$0]++'" works too (on Mac at least). If I "ls Awk" in \usr\bin, it says Awk is there. If I "ls awk" it says awk is also there—but it seems they're the same file, and it's only pretending Awk is a file. AWK also... I never noticed that before!
[0] I sympathize—I once tried to submit a !!Con talk.
I presume this is a side-effect of the macOS default filesystem being case-insensitive. I'm running macOS with a case-sensitive fs and that does not work:
$ awk
usage: awk [-F fs] [-v var=value] [-f progfile | 'prog'] [file ...]
$ Awk
-bash: Awk: command not found
This default is a topic of great debate.This is quite a different culture then "young folks" [TM] who write a unit-tested nodejs library with tons of dependencies to do the same ;-)
Of course I'm exaggerating, but there's some wisedom in these onliners, or Perl5 with it's sigils.
This is one of these "it works for me" problems, the main issue is probably actually the lack of documentation. I wonder why quickly rushing wizards don't adopt the dictation technique used by (alway rushing) medicals/physicians. They outsource the transcription to lower paid assistants.