Most Pressed Keys and Programming Syntaxes(mahdiyusuf.com) |
Most Pressed Keys and Programming Syntaxes(mahdiyusuf.com) |
For most other languages the focus seems to be mainly on letters (typing out keywords/identifiers), so in that sense there is little evidence that one language would be easier to write than another.
Additionally, I imagine that some of the code is auto-completed by an IDE, which this analysis fails to account for.
As an aside, its interesting (to me, as a Colemak user) that all of the hot alphabetic characters in all the analyzed languages are on the home row of the Colemak layout (except L, which seems to be kind of hot in C++).
while the following do not: break, case, continue, default, double, else, enum, extern, register, return, signed, sizeof, typedef, unsigned, volatile
Seems legit.
It can generate keystroke data like this, but the data is stored locally.
(Simple) Example:
PHP:
$myString = sprintf('Total: %.2f', $myFloat);
ObjC:
NSString *myString = [NSString stringWithFormat:@"Total: %.2f", myFloat];
Maybe this is the reason these characters don't score too highly in overall frequency? I have no answer for the high frequency of '/'.Since these are generated offline, the keyboard heatmaps are meaningless and the representation is slightly misleading IMO.
As mentioned, auto-complete and similar functionality change the heatmap, but that's what people actually press. This data would be alot better for actual use.
Though I don't mean it as a scold, it wasn't really in the hands of the author to collect such vast amounts of live data, and surely a lot more work than was his intension.
Also, for Lisp, I never touch the closing paren. M-( does both at the same time.
And I can visually see the reasoning behind Colemak being like this, now.
[1] http://softwaremaniacs.org/soft/highlight/en/ http://softwaremaniacs.org/media/soft/highlight/test.html
[2] http://blog.chrislowis.co.uk/2009/01/04/identify-programming...
The reason, of course, is that 4 is also $, which is used to denote a scalar in Perl.
Thus, because 5,6,7 correspond to %,^,&, which generally get used to a lesser degree for things like modulo, hashes, exponentiation and logical-and, they're used less.
The heat map isn't accounting for shift. 5,6,7 also include %,^,&
I only know about this because it was referenced in a book on cryptanalysis. The simplest sort of cipher can be broken by paying attention to the relative frequency of letters in the original text. I remember a useful mnemonic for remembering the most common letters: the sentence "a sin to err" contains them. E, followed by t and a, are the most common out of those (t and a are very close).
However, it is in most of my variable names. Given that it's the most common letter in English, that makes sense.
There's the famous phrase "ETAOIN SHRDLU", dating back to printing press days, of the approximate order of the most common letters in English.
Not to be confused with the early AI program "SHRDLU"". :-)
It seems the only safe character across many countries are 0-9a-z.,-\ plus space/tab/return. Not a lot to work with :)
~ $ curl -s 'http://lib.store.yahoo.net/lib/paulgraham/onlisp.lisp' | egrep -o '.' | sort | uniq -c | sort -nr
18641
3277 )
3276 (
2561 e
2159 a
1934 s
1903 r
1672 n
1661 t
1477 l
1380 o
1287 c
1282 d
1067 p
1064 i
1030 m
[...]
Without vowels: 18641
3277 )
3276 (
1934 s
1903 r
1672 n
1661 t
1477 l
1287 c
1282 d
1067 p
1030 m
929 f
852 b
654 ,
601 g
[...]My first thought: 'unmatched paren?'
But I bet there's a smiley face on that page somewhere.
EDIT: If there is, I don't see it.
"Shouldn't shift be twice as hot as the parentheses?"
For offline analysis? No. Shift should be as hot as the SUM of both types of parentheses. In practice, both parentheses will be equal in count modulo some epsilon for unmatched parenthesis in strings and comments. Therefore, shift will be close to twice as either parenthesis.
For online analysis? No. Shifted characters can come into existence without being typed. For parenthesis, autocompletion is one way. Automatic bracket matching is another. There are many more, including template expansion, copy/paste, and several paredit operations.
Shift is used for many combinations besides just parentheses, such as that capital at the start of this sentence. It would likely be more than twice as much.
And furthermore, where are all the numbers, tabs and spaces?
(let ((rpar (get-macro-character #\) )))
Escaped. :)