Impending kOS(archive.vector.org.uk) |
Impending kOS(archive.vector.org.uk) |
However, they've been working on a new version of k that's not publicly available yet and I suppose the kparc stuff requires that.
Tristan
When I mentioned to a friend that the current version of K5 was a binary < 100KB, his response was "I don't believe it. I can't write HelloWorld in less than 100 KB!".
Access to more resources does not mean that one should be wasteful.
K benefits from a dedication to avoiding waste and duplication and efficient use of mathematical concepts. The Fundamental New Computing Technologies at VPRI has similar goals (an entire end-user system including "Office" apps in 40 KLOC or less).
Being able to prototype a multi-proc map/reduce algorithm in k with 1000 procs on a laptop with 8 GB RAM is quite nice.
> 500 lines of C code that's meant to change the world? I really doubt it (or, these guys arent using line breaks). There are line breaks, undoubtedly. That said, I'm sure the code is concise - much like k code.
K3 was 1200 lines of code and included the language, windows(GUI), database, IPC, REPL (w/ simple debugger), FFI and OS interaction. The Windows executable was 320 KB.
The real challenge is that, 99% of the time, requirements and integration is what kills you, not raw performance. For the cases where performance (or formal correctness, or whatever) matters, the main challenge is usually to convince the market that it's worth paying for, and then finding the right developer project match.
No shit. I used to work as a quant, and while I was an okay quant and mediocre trader at best, I survived for three years in the industry because of my kdb+ proficiency: the firm I was at spent a couple of million dollars on kdb+ only to find out that most people could not wrap their heads around kdb+ let alone debug it effectively.
My (former) colleagues were definitely smart people. In many ways, they were way smarter than myself. But I somehow could get a much better handle of kdb+'s idiosyncrasies, and my ability to stare at dense k/q code (usually no more than a dozen lines) and figure out what's wrong with it earned me the reputation as the "q guy" - and some level of job security.
The firm eventually phased out kdb+ completely after my boss and I left (the two proponents of kdb+).
Yea, I know all about the training and First Derivative. My employer also hired them.
In their defense, every First Derivative KDB+ consultant that I worked with was very sharp and an excellent teacher. They really knew their stuff, and First Derivative is no small part of what has made KDB+ so successful. However, even with their excellent pedagogy, most of my co-workers were totally lost/weren't willing to apply themselves to learn q/kdb+ well.
Here is another way to think about it: many people can't ever get their heads around certain conceptually difficult topics, say, measure theory or quantum physics. I don't think kdb+ is nearly as hard, but it seemed that way looking at my peers who were no slowpokes.
> ... came back buzzing with excrement
1. kdb+ was (and maybe is) a good solution to the problem that we had: doing complex data manipulation/simple statistical calculations against billions of rows of time series data. Hadoop is the term du jour for data processing, but truth of the matter is that finance doesn't have really huge data. At best, it's a couple of terabytes, and most of the time, you are working with a small subset of it. Running KDB+ on a beefy server or two would usually do the job (rather well).
2. Maybe because I studied math, but I find k/q's vectorial/functional sematics appealing. I think the syntax is horrible, but the semantics is very neat.
3. Finally, because it helped me keep my job. It was rather amazing to me that all these Ph.D. statisticians that I worked with couldn't bring themselves to learn kdb+ effectively. Apparently this stuff can be very hard for even the smartest people (or maybe they thought it was such a niche skill with a low ROI).
That's what irritated me most.
What I'd like to understand is - what led the author to this particular conclusion? Is it the fact that this language is super expressive and concise? Is it that it routinely [1] outperforms its C counterparts even if it ultimately translates to C? Is the Z graphical interface so superior that it'll blow the pants off Cocoa and Quartz and X.org or Wayland or what have you? Why would one rewrite emacs or vim on it? I don't want some basic 4 line text editor - I would like to be productive. Why would Mozilla spend energy porting firefox to it? Or Google, chrome? Or bash?
Simply talking about the history of K/kdb+ and how brilliant its creator is simply doesn't help the reader understand why they should be excited about it. If that was the intention of this article, then the real points to make should've started after that line.
That would've been much more interesting.
[1] - No pun intended, of course
Btw: k doesn't translate to C. It's actually a quite simple interpreter. The fact that it outperforms other languages so easily should be saying more about those languages than it should be saying anything about k.
As I commented above, the syntax is horrible. But I still think it is an effective tool for certain problems.
And to help calibrate, what are your preferred languages and styles?
The code is, well, not the easiest to understand.
If you've worked with such things for your day job, exposure to an APL language is mind blowing in the same way as exposure to Lisp is. You'll rapidly find out that an awful lot of the numerics world is an ad-hoc reinvention of an APL language. Leading thinkers in the numerics world have noticed. Have a look at the Tensor type inside Torch7, or -idx- class in Lush (the same thing): they are, in fact, a sort of APL with a more conventional, aka painfully wordy, notation.
Writing an editor in an array language seems crazy, but then, writing a parallel processing system in a language that was designed to run applications in your web browser also seems crazy. If people had stuck with APL style languages, well, databases, particularly distributed databases (Kx and 1010, both K based systems, scale to Pentascale, and have for a long time), would suck less, as would CUDA programming. Their revival could make life easier in these problem domains.
The download link at http://www.kparc.com/ asks for password, so I'm not sure whats going on with that.
Looking at his code on http://www.kparc.com/edit.k I'd like to disagree with that statement
atom (int, float, char, date, symbol, ...)
list (one dimensional array of atoms, dicts, flips or lists)
dict (a map from one list to another)
There's also a flip, which exchanges the first two indexes applied to an item (so, e.g., it effectively transposes a list of lists) but it is just sugar (both syntactic and semantic).
You can trust Whitney that all of these are properly implemented, including appends.
It's not often that you actually need more. I've discovered this after using K for a while, and going back to python.
Back in my pre-K (ha!) C++ and Python day, I had an awful lot of classes everywhere. After using K for a while, my Python and C both have much much fewer (structs in C more often than python, as C is missing python's dict). And the code has gotten much shorter and more efficient. Arguably, more readable as well. And I've essentially dropped C++ for C, because the extra complexity is just not worth it.
But the actual meaning of the program is still lost on me. I can only guess it has something to do with parsing files (note the checks for curly braces). Feeding it its own source code produces some output, but I have no idea what it actually modified.
I believe the audience is developers using K in a commercial/production environment.
The biggest differences, compared to Arthur's style of writing, are: * Less code on each line * A separate comment column on each line * Nominal use of spaces for readability
We also use Forth which I think is a really another way of shifting mind.
Your code in whatever language you work will benefit from that rethinking.
The closest I could find is this [1] but "The model is expressed in SHARP APL", so from the start, it's circular.
https://www.youtube.com/watch?v=VSJpJt3c11c
It goes form solving some Euler problems to a full-blown web app in J.
Coming back to the question, for J its vocabulary on jsoftware.com is a good resource.
An OS this small is incredibly exciting to me.
This summer, Pierre and I got kOS to boot directly into g (the graphical interface; formally called z) with ISR, keymap, modesetting, basic filesystem, etc weighing in around 100 lines of C. That was pretty exciting. Could probably be done with less with some deeper changes to Arthur's code, but it's still very useful to run k under Linux. Oleg made a silly little game in kOS.
Arthur and Oleg did some performance benchmarks staging k against q (current kdb+), Postgres, some "popular RDBMS" (that I can't name), and MongoDB. It was impressive that k is so much faster than q, but it also really underscores the cost of the wrong data structure (and how hard it is to get the right one with SQL or MongoDB).
I realize you have a thriving commercial software company and that's cool. But...wow. This is exactly the sort of thing Alan Kay's team has been working on for the past five years, and you guys seem to be beating them to it, with a completely different approach. It would be pretty amazing to be able to dig into it, find out how the whole system works, and contribute.
"Whitney sent Oleg and Pierre some of the C code he was working on, and notes on a problem he didn’t know how to solve. They emailed back a solution, coded in his style."
Did Pierre and Oleg think their solution out in standard code first and then make it Whitney-like, or did they find themselves thinking in Whitneyese straight away? I imagine their teacher may have noticed Whitney tendencies and that is what led to the original encouragement to make contact.
"Whitney’s strategy was to implement a core of the language – including the bits everyone thought most difficult, the operators and nested arrays – and use that to implement the rest of the language. The core was to be written in self-expanding C. As far as I know, the kdb+ interpreter is built the same way.
Unlike the tall skinny C programs in the textbooks, the code for this interpreter spills sideways across the page. It certainly doesn’t look like C."
This mean that the code is very unreadable C? Like a kind of code-golf?
How replicate it for built a speedy interpreter? And what if I use lua or python instead?
The mainstream says: readable C has function and variable and type names that express meaning, so a function is read like a narrative with verbs, adjectives and nouns. The fact that this narrative scrolls over pages, is unimportant.
The APL/K/J school says: readable C has functions, variables, and types named with single letters, so that the totality of a function is short enough to fit in one glance - preferably, on one line that does not need a scrollbar. The function does exactly what it says, no more and no less; its intent is thus completely clear. To name it descriptively would ruin the ability to grasp the whole thing as a gestalt.
But that how relate to how build a fast interpreter? Faster than C?
However, whats really impressed me with KDB is that you can do so much more with it. In some banks it has effectively become the messaging middleware for connecting hundreds of disparate data sources. In addition to passing messages you get the data storage and analytics tools for free...
Its continued survival is rather a testament to excellency of marketing (as evidenced by this article!) rather than actual merits.
And with the rise of parallel programming we're going to actually switch more to more vector way of describing and solving problems. Just like Lisp ideas are spreading everywhere in modern languages, APL ideas are also fruitful.
There seems to be this strange idea going around that if we just get the right tool, everything else is going to change forever. I see this a lot with people trying to create IDEs that let non-programmers create programs without really knowing how to code.
But the thing is, most people just don't have anything worth coding. The problem isn't that the tools don't exist. They do, even if they aren't perfect. It's that making something that matters isn't an easy thing to do. And no tool can change that.
> todo
> files, procs, tcp/ip, usb, ..
Oh.
The rest of the wiki is quite useful including a references and tutorials for q.
k5 uses operators instead of words like q.
Btw, being able map/reduce w/ 1000 procs on a 8GB Linux vm (using k5) is both useful and fun.
First, get out the reference manual: http://kparc.com/k.txt and we'll do the first couple lines.
The sequence that goes f x applies x to f. this f is unary.
The sequence that goes x f y applies x and y to f. this f is binary (and just labelled verb).
Some things (adverbs) go f a x and apply f in some special way to x.
Last hint: You read this code from left to right (like english). Do not scan it.
Now let's dive in:
c::a$"\n"
This says: c is a view of where a is a newline. That is, if "a" contains a file, then c is the offsets of each newline.A view is a concept in k that is unusual in other languages: When "a" gets updated, then "c" will automatically contain the new values. This is similar to how a cell in Excel can refer to other cells and reflect whatever the value of those cells happens to be.
b::0,1+c
This says: b is a view of zero join one plus c. That is, where "c" contains the offsets of each newline, 1+c would contain the beginning of each new line. We joint zero to the beginning because of course, the beginning of the file is also the beginning of a line. d::(#c),|/-':b
This says: d is the view of the (count of c) joined with the max reduce, of each pairs difference of b. That sounds like a lot, but "each pairs difference of b" (where b is the position of all the new lines) is also the length of each line, and the max reduce of that is the longest line. You might say that "d" is the dimensions of the file. i::x,j-b x:b'j
This says: i is a view of x (?) joined with j (?) minus b (the offset of the beginning of each line) applied to each x which is defined as the bin of j in b.j hasn't been defined yet, but you can see that x is defined later in the definition, and used to the left. This is because K (like APL) executes from right to left just like other programming languages. you write a(b(c(d()))) in C and you're executing the rightmost code first (d) then applying that leftwards. We can do this with anything including definitions.
The other string thing is that we know that b is the offset of the beginning of each line, and yet we're applying something to it. This is because k does not distinguish between function application and array indexing. Think of all the times you write in JavaScript x.map(function(a) {return a[whatever] }) when you'd really just like to write x[whatever] -- k let's you do this. It's a very powerful concept.
On that subject of binning: b'j is going to find the index of the value of b that is smaller or equal to j. Since we remember that b is the offset of the beginning of each line, then if j is an offset in the file, then this will tell us which line it is on(!)
But we don't understand what j is yet; it's the next definition:
j::*|k
This says: j is a view of the last (first reverse) of k. We don't know what k is yet. f:""
This says: f is defined as an empty string. g::a$f
This says: g is a view of the offsets of f (an empty string) in a. Initially this will be null, but other code will assign to f later making g a value.Next line.
s::i&s|i-w-2
This is very straightforward; & is and, and | is or. While excel doesn't let us use a cell in it's own definition, k does: It means the old value of s. So this is literally: the new view of s is i and the old view of s or i minus w (?) minus 2.We don't know what w is yet.
S:{s::0|x&d-w}
This is a function (lambda). x is the first argument. If we called S[5] it would be the same as setting s to zero or 5 and d (dimensions) minus w. Double-colon changes meaning here; it no longer means view, but set-global. px:{S s+w*x}
This requires some knowledge of g/z: http://kparc.com/z.txtNote that px is defining a callback for when the pageup/pagedown routines are called. x will be 1 if we page down and -1 if we page up. It may now become clear that S is a setter of s that checks it somehow. When we understand what w is (later in the program) it will be absolutely clear, but pageup/pagedown are changing s by negative w when pageup, and positive w when pagedown.
wx:{S s+4*x}
Consulting the g/z documentation, we can see this has to do with wheels. Note we modify s again relative to 4 times x; x is -1 when wheelup and 1 when wheeldown. It becomes clear that s is modified by the pageup/pagedown and the mouse wheel. cc:{9'*k_a}
Again: in the documentation, 9' stashes something in the clipboard. cc is a function that takes the first slice of offsets k (we still don't understand) in a (the file). The g/z documentation says that cc is the callback for control-C. This is expected as control-C traditionally copies things to the clipboard. Since the slice of offsets k in a are being saved in the clipboard, we may guess at this point that k will contain the current selection.This process is time consuming, but it is to be expected: Learning english took a while at first, and often required consulting various dictionaries. Eventually you got better at it and could read somewhat quickly.
I don't know if you want to try the next few lines yourself to see if you can get a feel for it, or if you want to try one of the less dense examples:
* http://www.kparc.com/$/view.k * http://www.kparc.com/$/edit.k
... or if you want me to keep going like this, or if you want to ask a few questions. What are your thoughts?
Can you write this editor in one line of less than 6000 chars of javascript (without a "textarea")?
data:text/html, <html contenteditable>
Done :)
I studied Arabic a lot, and Chinese about a year. I cannot speak to Chinese with only one hazy year under my belt, but I can speak to Arabic.
Because Arabic has lots of syntax realized at the morpholgical level, you can encode a whole sentence (subject (with declension inherent and gender variable, verb conjugated (to passive/active, past/present/future, standard/subjunctive) and direct object (declension inherent and gender variable) all in one word as we know the in English.
أضربه (A-dr-b-u; a (I) dr-b (hit) u (him/it): I hit him (present tense)
And that is a super simple example. I have seen much more compicated setences in one word, and even better in two or three. So, I hypothesized Arabic is very, very dense. I think and Russian and others could be considered similar.
However, with this level of density (maybe we argue "compression" from a CS perspective) I noticed books and their translation were routinely about the same length in pages. Never identical mind you, but never something crazy like 50 pages more (I am guessing; it has been a long time since I made such an experiment and would have trouble agreeing with someone on what is significant).
Now, one could hypothesize a shitload about what this means, but computation is realized as the same "stuff" (machine code instructions) in programming languages, where no parallel exists in human language for mapping human language to computaion, as far as I know from my between minor and major courseload in linguistics, specifically computational linguistics. If someone can contradict me, I would LOVE to read about measured cognition and language constructs.
However if I then show you a programming language, a database engine (similar in capability to SQL but around 1000x faster), a graphical desktop with icons, mouse, editors, filesystems, ISR, and so on- and it's still under 400 lines of C code, then maybe it's easier to have this conversation that I think we need to have: That maybe all you programmers have simply been doing it wrong, and there's something fundamentally wrong with the way you and everyone else programs computers.
I think most programmers take so much offence from the thesis: that everyone programs wrongly (or badly) and that a fundamental shift in the way we program could make programming much better, that it's been difficult to actually work on this problem. Just look at all the people who are complaining about the text editor being difficult to read without saying I can't read this, and I want to get better.
And I think it should be obvious: You've built bridges for thousands of years, so you expect you're pretty good at it now, and yet there are still improvements in bridgemaking today; why do you expect programming to be any different? Why do you have "best practices" for programming if you've only been doing it for a few decades and don't really know how to do it yet?
Maybe it's much easier to think we're working on giving non-programmers the ability to program, but we're not: we're trying to make programming suck less.
At the same time, it does sound slightly off. What's the catch? In the text editor example, how capable is it? Often, such claims rely on a trick, like saying "<textarea/>" is a text editor in 11 characters. Or the tiny Haskell quicksort that's actually rather inefficient.
I think I'll try playing with it. How long should a Logo interpreter in K be?
It's that way with religion, paradigms in medicine, and even chemistry/crystallography (see e.g. Shechtman vs Pauling[1]). It's human nature.
It is also worth pointing out that while this specific version of K is new, K itself is 20, and it is mostly purified APL, itself over 60 years old. People have been ignoring this since forever (APL found stellar success in the OR community in the 70s and 80s on its own, in trading floors in the 80s and 90s thanks to Arthur Whitney, but has since mostly disappeared)
[0] http://en.wikipedia.org/wiki/Somebody_Else%27s_Problem#Ficti...
[1] http://www.theguardian.com/science/2013/jan/06/dan-shechtman...
Right, and I'm not saying programming won't improve. It obviously has. No one wants to write a web app in C++ or assembly. What I'm specifically taking issue with is the idea that there is some sort of monumental change out there that is going to enable us to do... what exactly?... well no one can really answer that, but the claim is that it is big and exciting and will change everything.
To me, part of being an engineer is not taking offense when being told there are potentially better ways to do things, but at the same time I can understand why some people jump to that kind of statement since at first glance it seems like a regression of sorts, especially when the industry has a continuing history of trying to lower the entry barrier. Even though underneath the programming concepts may be revolutionary, people may also be quick to balk at it when it's in such a form (hence the first question).
Would you rather do:
IXDCCCLVI * VIIDCCCXLIX
or
9856 * 7849 ?
If it's really similar in capabilities, why not strap an SQL parser on it and sell it? You can do it client-side to avoid wasting performance where it matters. Surely there's money to be made in the database business if you are an order of magnitude faster than everyone else -- let alone three.
I'm trying not to add more words to that paragraph, though I could/should.
+/%#
You can set this up as: avg=: +/%#
or tally =: #
divide =: %
sumall =: +/
How's that for a dsl? Which allows this now: avg=: sumall divide tally
avg 2 4 6
4
EDIT: I forgot to add that these operations, as others in J or APL, work on scalars, vectors and arrays without any special typing or handling. See this presentation for at least the first 5 minutes to see J in action with explanation: http://www.infoq.com/presentations/j-language
The conciseness allows you to start seeing the small patterns in the small expressions just like mathematics, hence bringing clarity and speed of abstraction to your concept manipulations.I seem to remember being asked about one of those array languages and saying no thanks.
Some databases don't manage that situation as well as they should. I.e. - they were developed in an era of small RAM & large disk.
> And if it's not I/O bound, you're not pushing it right. That's exactly the point. k is "pushing it right" while the others don't do nearly as well.
> the new OS probably doesn't natively support 99.9% of existing high performance hardware.
At <500 LOC of ANSI C there's not much to port to new hardware, given a decent C compiler.
Humorously, it looks like this code would have received a poor grade. It meets just about every standard for low quality outlined here: http://web.stanford.edu/class/cs107/landmarks.html, particularly "Fast code which doesn't work quite right." (Due to an off-by-one error, this code fails to properly reconstruct the example text.)
What's your example? It seems to work fine to me on "{all is well}{ell that en}{hat end}{t ends well}"
Writing in a dense fashion facilitates thinking about the solution.
What I was thinking, thanks
However I think there is value in the dense coding style that has nothing to do with it's performance: it's that it reduces bugs and helps you think about the problem.
Thanks for the perspective. Good luck to you all with kOS and others.
np::*((l$[=/k;(*i),0;i]),j)_a;ci:{{J y;x:3:. x;J y;kx x;K(y,j)}[np;j]}
When you type !10 and press control-I it prints the result where the cursor is and selects it. This way you can press backspace to delete the output and change the code easily.How much code should that take? Probably less than that (I didn't know about bin yet), but I think that's part of the learning process.
I think a minimal logo would only take a couple lines.
Some q is readable english - e.g., an expression like
sum price where size>3
is (to the uninitiated) more readable than the equivalent k +/price@&size>3
but that only works for simple stuff. The (idiomatic!) computation of maximum-subarray-sum[0] |/0(0|+)\
becomes max over 0 (0 max +) scan
which is not more readable. And you can drive the point ad absurdum by making it even more verbose: max over zero (zero max plus) scan
The syntax seems like it is what stops you from understanding it because it is the first thing you meet. But it's the semantics that you need to grok, and the syntax just matches them.over(max, scan(0, max(+, 0))
slightly more readable, even if that still gives me a higher-order-headache.
Because, y'know, it's confusing to have symbol salad with a weird mixture of infix and postfix operators, even more so if you have type raising (or the moral equivalent thereof) thrown in for good measure.
I think this is true of most programmers.
While "hello world" type programs tend to be the pedagogical example, kOS demonstrates that the complexity of such a one-screen low-defect program is much higher than people previously thought.
My screen is about 55 lines tall and maybe 170 characters wide. I sometimes shrink or increase the font slightly to make programs fit on screen.
http://code.kx.com/wiki/Startingkdbplus/qlanguage http://www.kx.com/q/d/kdb+.htm http://www.kx.com/q/d/kdb+1.htm
It's a commercial product and I understand kx does pretty well.
The list is subscriber-only, and I apologise that the information isn't very easy to get at.
I'm more interested in smaller devices though (ARM, etc).
> I'm more interested in smaller devices though (ARM, etc).
Yes. I imagine a smartphone with a very low-power processor and a couple orders of magnitude less RAM than the mainstream models could have very impressive battery life.
I have noticed every page I scroll causes a comprehensive loss of around 90%, so in reading something that is 10 pagefuls long, I might only be able to produce a tiny part of the program.
Your milage may vary.
I find not scrolling, and just moving my eyes, I rapidly absorb the program, and I find most bugs just by reading the code. This practice is absolutely impossible for me if I have to scroll very far and made difficult by scrolling at all.
It is for this reason that I find simply counting the actual words to be an excellent estimate of complexity.
By the way: There are several temporary variables in that code; c:: creates a view called "c" which automatically updates whenever the dependent variables on the right side change.
After digging around for a while, I discovered there was no bug. The partner's client code had the auth disabled, and the pervious server was misconfigured to not require auth. All which would not have been a problem if the system just did an "if headers.auth != "Basic ..." - but buried in this forest of stuff, it was overlooked.
It seems that some developers just love their edifices. They build all this "infrastructure", expanding code by an order of magnitude or more. It's considered good and robust and so, so much writing online is dedicated to this pursuit. I think it gives those programmers a feeling of import, as if they're really architecting something, not just pushing a few form fields around.
Even on the line by line basis, it's shocking how they love verbosity. Type inference? Nope, that makes things too compact and hard to read. Higher order functions to wrap up common patterns? Too difficult to understand. I'm not sure if developers simply lack the tiny bit of extra intelligence, or if they've tried it and honestly concluded that overflowing verbosity is the key to readability. Either way, it's sad, and holding back progress slightly.
Why do you think that is?
Is there evidence one way or the other on whether it's better to measure size with, say, number of lines, number of tokens, or number of nodes in a parse tree? or something else?
And token counts don't help as code that insists that each brace must be on its own line detracts from readability. For one thing it pushes the last bit of the function off the bottom of the screen meaning you have to scroll.
A line that is overly complex is eventually get rewritten.
I say this as someone who has written large bodies of code in sigma 5 assembly, Fortran II and IV bliss 36, C, C++, and Lisp. Perhaps more to the point, these days I read large bodies of code measured in millions. Lines of code dictates how long it will take to understand it.
Peter Norvig in paip gives some examples of small code and how it can be exceedingly clear.
avg list -> (sumall list) / (tally list)
I guess you just need to know how the argument you give to avg when you use it distributes over the functions that comprise the expression. The list argument to tally isn't a problem, but why does sumall get it's own copy of it? Or is the execution model something else entirely?1) when you want to calculate f(y, g(y)) , you write (f g) y - this is "hook" of one argument (monadic, in J terms)
2) when you want to calculate f(x, g(y)) , you write x (f g) y - this is "hook" of two arguments (dyadic)
3) when you want to calculate f(g(y), h(y)) , you write (g f h) y - this is monadic "fork"
4) when you need f(g(x, y), h(x, y)) , you use x (g f h) y - dyadic fork
5) when you have a train - say, (a b c d e) x - longer than 3 elements, then you consider rightmost 3 functions (functions are called verbs in J) as a single fork - say, f - and then consider (a b f) x . So (a b c d e) x is
b(a(x), d(c(x), e(x)))
If the train length is even, the last operation becomes hook - so x (a b c d) y is
a(x, c(b(x, y), d(x, y)))
You can express any computation as a sufficiently complex train.
a(x, c(b(y), d(y)))
Roger himself has dismissed [0] hooks as an unfortunate result of J4's myriad train rules, made in the name of tacitable everything, which I lament because for some reason, tacit programming is just so much more satisfying than normally solving the problem.
[0]: http://www.jsoftware.com/jwiki/Essays/Hook%20Conjunction%3F
Would you say that this is 90% of the reason it looks so uncomprehensible?
> The small size of the interpreter and compact syntax of the language makes it possible for K applications to fit entirely within the level 1 cache of the processor.
Sounds like the overhead's acceptable?
.
Edit: the following page (2002) says
> Even though K is an interpreted language, the source code is somewhat compiled internally, but not into abstract machine code like Java or Python. The interpreter can be invoked interactively or non-interactively, and you can also compile a source file down to a binary file for product distribution, if you wish.
sumsquares:{+/a*a*a*a:1+!x}
which sums pow(a,4) for a from 1 to x (its argument), when interpreted, allocates a vector of length x, initializes it to the numbers 0..(x-1), then allocates another vector of length(x), to which it copies the first vector while adding 1 to each element, and calls it a. Then makes another vector of length(x), puts a * a in respective elements, etc. - and finally, sums the elements of the remaining vector.If we can learn from numpy (K equiv) -> numexpr (same work, smarter about allocations) -> numba (jit compiler, avoids allocations), we can expect x2 faster speed in both "upgrades" - about x4 overall.
If that sounds too little speedup for so much effort, you are not giving modern caches enough credit. Almost every single memory access here is in L1, as they are all sequential and perfectly predictable. So it's just 2-5 slower than doing the perfect register allocation with no memory access - but with the cost of ballooning code size (going out of L1 itself ...). Also, interpretation overhead is nearly nonexistent because operation on vectors are coded in tight cache aware C loops.
Overall in the interpreter/compiler tradeoff, K comes out far enough ahead of everything else that it did not make sense for them to invest in a sufficiently smart compiler. In fact, it is only a few months ago that they bothered using SSE/AVX vector instructions for significant speedup. It is so far ahead of the curve (for its use cases) that doing that didn't really give any marketing advantage until recently (and I'm not sure it does now).
In addition, K is written in ANSI C to maintain portability.
I exclude textarea for the same reason dictionaries define something without using the same word and base form.
And the argument is generally meaningless: You may just as well call emacs and say you are using "the operating system's high level functionality".
There is, however, a fundamental difference between codegolfed K and codegolf in essentially any other language:
Codegolfed k uses 1-letter variable names, and drops white spaces (which, unfortunately, is kind of idiomatic). But everything else is idiomatic even for the non-golfers.
Codegolfed JS or C much more often than not is non-idiomatic in order to save all that space.
e.g., compare the K Sudoku solution from http://kparc.com/z/game.k to any other language, and you'll see that the K code is almost a generic solver (the 3, 9 and 10 stand for box size, board size and number of options respectively while setting up the constraints for the solver), whereas every other short solution (that usually come about twice as long to 10 times as long) is extremely specific to the standard sudoku table.
That's not random - idiomatic K solutions tend to be more generic than other languages. The syntax and semantics gently push you towards more general and more efficient solutions compared to most languages, in that the more general (and/or efficient) solution is usually shorter and simpler to code than a specific case.
I know it sounds unbelievable. But you might believe it when you see example after example and realize that it's not just a "one trick pony". And still, most people surround it with a SEP field. I recommend spending the time needed to grok it instead. [Or APL or J; my personal preference is K, but to each his own]
This is gibberish. It fails at the one task of a programming language: to translate something a human can understand into something a computer can understand, with as little confusion as possible.
Much like math and physics, this is an extremely terse notation that is immensely useful to those who practice it, and looks like gibberish to those who have not attempted to study it. You could give the same comment about an article in any area of math whose notation you are unfamiliar with -- and you would be just as equally wrong.
The real question is why is X language so slow?
This is not intended to be glib: but I do not think I can put it more simply than that. X language is slow because it uses lots of library code that it doesn't actually use (to get a friendly interface), it has a lot of redundancy (because of the wrong abstraction level), and because it wastes memory (in order to have an API that links well with others). Or it's slow because it thinks B-Trees are really cool. Or because it has the wrong intrinsics. I don't know.
But I am convinced this is the discussion we need to be having.
The code should provide isomorphic samples from the languages (or implementations of languages) that are being tested. Ideally, the code samples should be idiomatic.
Benchmarks should test the aforementioned code such that performance comparisons can be made. (With some degree of accuracy.)
Analysis should summarize and draw tempered conclusions from the code and benchmarks. Trade offs should be documented.
The Computer Language Benchmarks Game[1] is a first approximation of this. It provides the first two things but has no analysis in prose.
Does such a thing for k exist? Even if it's a very rough blog post somewhere from 5 years ago?
I ask this because there's a lot of lofty claims being made in this thread, and the closest thing to evidence I've seen is, "trust us, it's made a lot of money doing [financial things]. oh, and it can update a million records in a second." Since I don't believe that a free lunch can exist, I am naturally inclined to reach the truth of the matter. I see that k is supposed to be wicked fast---at what cost? Where are the examples? Where is the analysis documenting this?
Incomprehensibility happens in part because all ASCII characters are used - so you have a lot of differently-looking symbols; because [ and { aren't paired with ] and }, and " isn't paired either; because you often have . and : as the second character of built-in entities, so you've got a lot of dots... APL used the symbols invented explicitly for those purposes, but the price - non-ASCII alphabet - was considered too high. So - no, I wouldn't say forks and hooks are the source of 90% of uncomprehensibility.
We've debated the merits of counting tokens before, but I don't recall anyone mentioning a study about it. In real programs—i.e. when you're dealing with idiomatic code as opposed to something designed to game a metric—I doubt that LoC, lexical length, and number of tokens differ much.
In terms of information density per syllable, mandarin wins, with english coming in a close second. When speaking, english usually has more syllables per unit time than mandarin, so english has the highest spoken information density of any language. Japanese is the on the opposite end of the spectrum. Despite having the highest syllabic rate, it has the lowest information density.[1]
For written information density, logographic languages win. This is pretty obvious if you've seen a Chinese or Japanese translation of something familiar, such as a Harry Potter book. They're ludicrously thin.
1. See the figures at the end of this paper: http://www.ddl.ish-lyon.cnrs.fr/fulltext/pellegrino/Pellegri...
Like Apple fanbois have "there's an app for that", I love HN moments "Oh I got a citation for that" and for topics I would find very difficult to research at a cursory glance!
Of the seven languages in the study, using 20 specific short texts, that were originally written in English then translated (well?) in other languages.
Since the texts were not explicitly designed for detailed cross-language comparison, they exhibit a rather large variation in length. For instance, the lengths of the 20 English texts range from 62 to 104 syllables. To deal with this variation, each text was matched with its translation in an eighth language, Vietnamese (VI), different from the seven languages of the corpus. This external point of reference was used to normalize the parameters for each text in each language and consequently to facilitate the interpretation by comparison with a mostly isolating language (see below).
It shouldn't be particularly surprising that english comes out ahead. It has a huge vocabulary, tons of phonemes, and makes many parts of speech optional. It lacks tones, but would probably have to sacrifice some phonemes to stay comprehensible.
If we loosen that req, it gets more interesting. I assume Russian will line up with the following.
In Arabic, the default in formal Standard Arabic (not the dialects, that is another can of Bedouin worms) is Verb Subject Object. You can, however, have VSO, SVO, OSV, OVS, depending on context. I think you decline and conugate verbs in Arabic as you would in Russian. So you can probably play with written form, emphasizing different parts as you suggest in a similar way.
Am I way off? That is what I gathered from Russian/USSR republic kids I have befriended over the years. Not sure if that scans.
Iverson received a turing award[1] for his work on this subject.
mss = lambda x: max(scan(lambda a,b: max(0,a+b),0,x))
assuming a definition "scan" (which is like "reduce", except it gives you all intermediate values), an example of which is: def scan(f,x0,x):
r = [x0]
for x1 in x:
x0 = f(x0, x1)
r.append(x0)
return r
Note that the K is idiomatic whereas the python is (arguably) not. Of course, it could be worse; the advantage is that, much like math, the 9-char K version is pattern-matched by your eyes once you are familiar with it, whereas no other version presented here (or in almost any other language) can utilize that feature of your brain. def scan(f, iterator, initial=0):
yield initial
yield from scan(f, iterator, f(initial, next(iterator)))
Even with python2, you could make a non-recursive version that'd be still shorter than your scan and faster. Alternatively, you could pair a coroutine with that reduce function....But always be suspect of your code if you are iterating and appending to a list. Likely there is a much better way.
Second, it is ~10% faster, but that speed difference disappears completely if you eliminate the namespace lookup (that is, add e.g. "o = r.append" before the loop, and call o() instead of r.append() inside the loop). It potentially uses less memory - but not the way you did it (unless Python 3 gained TCO when I wasn't looking. Did it?) - your formulation does not load the call stack, but it does create len(iterator) generators that - until the innermost StopIteration - all need to live somewhere on the heap. recursive solutions without TCO are rarely good enough to replace iteration.
Even if you did it right, it's more efficient, but not significantly so timewise, and slightly easier to use iterators in general, yes. It is mostly space-efficient in general.
I think it is more idiomatic, though - and also Python2 compatible - to just replace references to 'r' with yield in my code, than using the recursive definition you gave above - which is more idiomatic in functional languages, but less in Python (and harder to debug in any language than the iterative version)
If that doesn't work download q/kdb+ from kx.com (Windows/Linux/Mac OS X). It can runk4.
/ k4 implementations of the Bell Labs benchmarks are available http://kx.com/a/k/examples/bell.k
Having learned BASIC, FORTRAN and Pascal, C seemed like line noise - at first. As did PERL. And then k.
Btw, COBOL seemed "too verbose".
Once I actually started writing many k programs and then reading even more of them, I was able to recalibrate for the abstraction/density. I moved my intellectual comfort zone. Ironically, I was already there with mathematics. However, programming languages were different :).
Now, as a result, every time I have to read Java, I suffer from a kind of fatigue - having to read way too much code to glean the writer's intent. I just want them to get to the F'ing point.
N.B. - Mathematical literature/writing went through this same transition during the Renaissance. Equations were described in natural language (not unlike COBOL). A simple polynomial could require a paragraph of text to describe.
I think with a language like k or q, which appears to be purpose-built for certain types of problems, people look at it and get easily confused and discouraged because it's so different from all the more mainstream general-purpose programming languages they're used to. And it's a lot easier to put down something you don't understand than to admit you don't get it, or to spend lots of time learning something that may not be of much use to you. Kinda sucks, but it's often human nature.
The thing is, it's not purpose built, and it doesn't even appear to be if you suspend your disbelief. The only reason you'd think it is purpose built is because "well, it can't be this short if it wasn't purpose built". But if you go over the manual, and find special built operators, please tell us what they are.
e.g., to compute an average, you can use the function avg:{(+/x)%#x} - with the exception of parentheses, every character has an orthogonal function. Similarly, the maximum subarray sum solution mss:|/0(0|+)\ ; and there are many others. And it's not just math stuff - http://nsl.com has lots of other examples of many kinds -- and most importantly -- is an operating system + GUI not general enough?
This doesn't happen very often, but I find the thought comforting.
Comments obviously are not code, so it's reasonable to complain about lack of comments.
You suggested wordcount, I think wordcount is good, so it's reasonable to complain about single letter words rather than descriptive words.
uberalex's suggestion for reformatting wouldn't change the algorithm or speed. It would simply spread operations across more lines. That also seems like a reasonable thing to ask, to me. They can learn your method either way.
Edit: I mean, I'm sure fitting more on the screen is valuable, but people already know how to fit many times as much code onto a screen. They avoid it on purpose for whatever reason.
I think this reason (whatever it happens to be) is probably wrong.
And things like K rarely do.
It uses generator/iteration semantics instead of list semantics. If you wrap the whole thing with a decorator like function that does:
def scan_wrapper(f, x0, x):
return list(scan(f, iter(x), x0)
You get the exact same semantics. For most cases (including the one you cited), the alternate semantics are actually better, more flexible, and avoid requiring a list to be built in the first place.No problem with iteration order side effects either unless your f() somehow invalidates your iterable... and you still have some potential exposure there in your original implementation.
> Second, it is ~10% faster... > It potentially uses less memory...
Yeah, I think you are understating it to say the least. Not only are you using less memory, but you are saving having to rejuggle/resize the list all the time.
> (unless Python 3 gained TCO when I wasn't looking. Did it?)
I guess in a way it sort of did for the case of yield from: 'The iterator is run to exhaustion, during which time it yields and receives values directly to or from the caller of the generator containing the yield from expression (the "delegating generator").'
So, even without full on TCO (which is still possible... I'm not sure if they did it with yield from), you at least have direct pass through from the generator to the caller. Because of iterator semantics, that should mean that each of the generators gets created on an as needed basis and the previous generator should get destroyed right thereafter. It is possible though that it isn't quite doing it right, in which case I'll concede that I'm still allocating an N deep generator stack, but that is still likely to be more memory efficient because it isn't having to reallocate/resize/copy increasingly larger lists throughout the execution.
> recursive solutions without TCO are rarely good enough to replace iteration.
As I mentioned, you can do the recursive solution as well, and it has the advantage of working with old Python. Still simpler and still far more efficient (here it is with extra wrapping to keep the semantics the same):
def scan(f, x0, x):
def scan_helper():
yield x0
for x1 in x:
x0 = f(x0, x)
yield x0
return list(scan_helper())
> It is mostly space-efficient in general.You say that like when doing statistical analysis space-efficiency isn't a concern...
> I think it is more idiomatic, though - and also Python2 compatible - to just replace references to 'r' with yield in my code, than using the recursive definition you gave above - which is more idiomatic in functional languages, but less in Python (and harder to debug in any language than the iterative version)
I was actually mostly getting at using yield instead of list append. I was just trying to express it as tersely as possible, which unsurprisingly became Python 3 and a functional style mechanism.
While I agree that often there is a struggle to understand functional programming, I think in this case it is very idiomatic Python (particularly since they defined "yield from" specifically for cases like this), and the code is very simple, readable, and easier to verify for correctness.
> Yeah, I think you are understating it to say the least.
I actually measured it. It was 10% faster with a call to 'r.append', and within 0.1% with the append lookup hoisted out of the loop, on 1000 external iterations over 50,000 list items, minimum of 3, inconsistent which version was faster. my scanned function was def f(x,y): return max(0,x+y)
> You get the exact same semantics. For most cases (including the one you cited), the alternate semantics are actually better, more flexible, and avoid requiring a list to be built in the first place.
range() on Python3 is not a list, and the original works on it and yet you needed scan_wrapper(). I didn't try to fix it - I just tried to use your version and stumbled on the differences.
> but that is still likely to be more memory efficient because it isn't having to reallocate/resize/copy increasingly larger lists throughout the execution.
Instead you allocate a generator state each time. Surprisingly, it doesn't make a difference in practice (I've measured) - the python lists grow exponentially, so we have e.g. 20 allocations+copies instead of 1,000,000 generator state allocations. I would have expected the list to be faster - but there's no measurable difference.
> As I mentioned, you can do the recursive solution as well, and it has the advantage of working with old Python. Still simpler and still far more efficient (here it is with extra wrapping to keep the semantics the same):
Recursive takes about 10 times more memory. Seriously - no TCO means call stacks are not free. Instead of one object which I stored, you store about 10 in each call frame!
But yes, your latest version is the idiomatic preferred version style, IMO: it's also the most efficient. (And unlike the earlier python3 functional, it doesn't need a helper to fix the INPUT)
> You say that like when doing statistical analysis space-efficiency isn't a concern...
Of course it's a concern. I'm not saying it is not useful, I'm saying it is overrated, especially when compared to recursive solutions, which -- in python -- cost about 10 times more in stack space then you save in list space.
> I was just trying to express it as tersely as possible, which unsurprisingly became Python 3 and a functional style mechanism.
... and taking much more memory in the process, though believing that you are using less. Not blaming you - python overlooks these things. One of K's nice features is that the costs are mostly evident on a cursory look.
> I think in this case it is very idiomatic Python (particularly since they defined "yield from" specifically for cases like this), and the code is very simple, readable, and easier to verify for correctness.
The recursive functional version is speedwise comparable, and spacewise much less efficient than my (admittedly, stupid) version. It is easier to prove formally, but harder to debug with a debugger because of the generator stack (which is a call stack, even though it does not exhaust the C call stack like regular calls do).
Can we agree that your latest scan(), if we dropped the two lines that have "scan_helper" in them, is the most efficient, most idiomatic, most (py2 and py3) compatible and clearest?
Yeah. Actually, I just generally liked your feedback here.
Specifically about graphs, you can look at:
http://nsl.com/papers/order.htm - topological sorting
http://nsl.com/k/tarjan.q - strongly connected components
http://nsl.com/k/loop.q - find loops in graphs
I think in all of these the graph is represented either as a list of edges or a dictionary of node->(list of nodes that it has edges to)
Unary over is the "fixed point"/"converge" adverb, which does
x <- f(x)
until x stabilizes (to within floating point tolerance if it is a float), returns to its first value, or goes through a requested number of iterations.The best example of this that I can think off is the K "flatten" idiom:
,//
read: "concat over, converge". That is, given a general list, it concatenates all its items promoting atoms to one-element lists - thus, flattening one level of the list; And then applies it again and again until there is no further change, thus flattening successive levels of the list.Is this the most efficient way to do this? No! in fact, for an unbalanced one sided list it will do O(n^2) where n is the number of items, with a best (and idiomatic Lisp/Haskell) solution being O(n), although it's usually 100 chars rather than 3.
But the actual code orchestrated by these 3 chars behind the scenes is all tight C loops, so for small n it will beat complex solutions. And it is all of 3 self-describing, easily remembered, easily recognized, easily optimized (if Arthur ever cared ...) characters. If you care about worst case, you can easily code the standard Lisp/Haskell solution just as you would in those languages. See [0] for more.
The underlying computational model fits sequential, parallel, SIMD, and almost every other paradigm much better than all the popular programming languages. Unfortunately, there's a learning curve that puts of most people (and is perhaps insurmountable to some people who have no problem with Python, Java, C or PHP) - it's much more Math-oriented.
[0] http://www.math.bas.bg/bantchev/place/k.html
edit: added [0] link and ref
DFS is then a variation of the more familiar functional style of tackling the problem where you have your end condition (i.e. something that matches what you're looking for) and failing that do something else (typically recursion).
I can't recall enough of the K syntax these days to actually implement that right now though, or if K has TCO.
This is a honest question and I feel like there is some upside to those names I just keep missing. As I said, I'm not fluent in J, but while learning it I wrote and read quite a bit of it, and I only made it through some longer (like, longer than half a line!) examples thanks to a sheet of paper and sheer determination. I often was going through a fairly complicated expression and was starting to see what is it about, only to be stopped by an 'x' or 'c': I then had to go back a couple of lines, read 'x' definition again, and retry parsing that line from the beginning, hoping that I will remember what 'x' is this time. I started taking notes for this reason (it worked quite well I think).
Anyway, you seem to have no problems reading such code, so I figured I'd ask you: why and what is this style of naming good for, and what one needs to do to master it?
When I sit down to a program, I have an idea about what I want to do. I don't have home/end keys on my keyboard and I note that pressing Fn-Left and Fn-Right is annoying because I don't usually push them (I use emacs sometimes). So when I sit down to edit I want to add emacs keys.
I can see at http://kparc.com/z.txt that hx handles home/end, and that cX is control-X, so I think it should be something like:
ca:{hx 0 -1};ce:{hx 0 1}
Now I look at the characters spent: Is it possible I could say this simpler? I don't think so, but I'd love someone to tell me.How about delete? I think what I want to do is select the next character and remove it. So where is the cursor? Well I remembered that hx moves the cursor, so I read it's definition:
hx:{L i+d*x}
Okay, so I remember d is "dimensions" and "i" is something, and L I haven't seen yet. So I go look at L: L:{K@B/0|x&d-1}
Now I have two new words: K and B to look up: B:{c[x]&y+b x}
K:{J(;|\(*k),)[H]x}
Okay, B is straightforward: b is indexed by line, so b x returns the offset of line x. We add this to y (which is the second value of |x&d-1) and take the min of it and the offset for c. I might think at this point B converts the x/y coordinate into a linear offset. But what about K?K is using J and H (but it turns out we only need J):
J:{k::2#0|x&-1+#a}
And here we see the definition of k: x&-1+#a is the smallest of x and the second to last character in the file (a), and the max of that and zero (0|). This reads like "bounds check x to the shape of a". Taking 2 from that simply takes two values and assigns them to k so if x is only one-value k will still have two.k is the cursor selection.
From here, I can make an attempt at implementing delete. My first attempt looked something like:
cd{a::((0,k)_a),(((k+1),(#a)-(k+1)))_a}
Oh! But I've seen a lot of these terms before! Maybe I can do better. I note that kx is documented as the callback for keystrokes http://kparc.com/z.txt and can come up with: cd:{K j+!2;kx""}
This seems about as simple as I can make it. I'd like to see someone do better, but knowing that K is a setter for the cursor selection and j is the offset of the cursor, then j+!2 simply returns offset and offset+1; K will select it and kx"" will remove it.Now maybe if k were called cursorSelection I might have gotten my first cd faster, but I would've missed the opportunity to see how these functions and variables were interconnected and I might not have written the second one.
I noticed however, that not scrolling really helped this exercise. I just moved my eyes around, and jotted a few symbols down on my notepad. I feel like if I had to scroll or switch windows I would not have been able to do this.
As for your last question: What does one need to master it? For now, I would say practice. Try writing a program in as dense a manner as possible. Remove as much redundancy as you can. Read it and re-read it until you feel like it can be no more abstract.
I am working on a much better answer, but I hope that one will do for now.
The shortcut of using descriptive names just doesn't have the same ROI in all kinds of code. The first time I was forced to abandon my descriptive-naming ways was when I started writing finite element solvers. I don't think it's much of a stretch to believe that (some areas of) finance have the same cost/benefit profile. Since this is a desktop programming example it's relatively easy to come up with good descriptive names, so the short names are almost certainly a holdover.
Or it's just a macho thing. There's enough pixie dust floating around this press release that I wouldn't be surprised.
It's bullshit. The letters are not letters, they're symbols. Replace them with JPGs of pokemon if you want. It's just as well if they were ancient hieroglyphs. If you forgot what they all meant, you have to read the code and keep track of what variable name contains what. Or make a note on some paper.
This is why one-letter variable names are an antipattern: when there's no rhyme or reason to what value is associated with what symbol (why is `c` the index of the newline and not `n`?), your brain isn't going to remember it, and you've instantly forgotten what that code does. And everyone after you needs to do the same thing: you need to re-remember how each line of code works each time you work with it. And more importantly, looking at one dense line of code provides no context as to what the code around it does, compared to C or Python where one function can give you a decent idea of what its use is.
It's even more of a nonsense issue because the actual name of the variable is just a number: every one-letter variable name maps to an integer. Why not support proper identifiers and replace them with unique integers at runtime? The same goes for whitespace: it doesn't need to be kept in the program at runtime, but makes the application infinitely more readable because units of logic can be grouped on their own line.
So to answer your question: no, this is not good for anything and no, you shouldn't try to master it because there are much more useful things that you could be doing with your life.
Another reason is the same as in math. When you write equations on a whiteboard - the long-sought golden standard of expressiveness - you don't use long names - at best you encode names with subscripts, indexes etc. But names themselves are usually pretty short. APL languages use the same rationale.
Just like you need to read carefully every line of a math equation to understand what's going on, you have to read carefully each symbol of programs in APL family languages. It's unusual for programmers who got used to more help along the line - but the vocabulary of all such languages is quite short and doesn't extend that often. Another reason why APL could be used for teaching math.
I don't agree that this style of naming is not good for anything. Somehow majority of programmers in these languages agree with that. Regarding much more useful things -
"If you are interested in programming solutions to challenging data processing problems, then the time you invest in learning J will be well spent." (http://jsoftware.com/)
The smooth creation of lists is I think one of the most important language features higher level languages have over lower level languages like C.
Just this thing:
c::a$"\n"
That's all I needed to be convinced that modern languages should have similar view constructs. In ruby it'd be: c = a.map.with_index{|a,i| a == "\n" ? i : nil }.reject{|i| i.nil? }
Quite a mouthful, mainly because Ruby lacks a neat way to do '$'. But doing the same thing in C would really be awkward, and likely not as efficient unless you have some fancy code for building enumerators in C.You obviously have some experience working with K, and it sounds like at least Javascript, too? K is so foreign I expect it has a lot of interesting thoughts locked up in there that maybe don't get the attention they deserve.
Would you say there are any "killer features" of the language / environment that you miss when working with more traditional languages?
K/Q/kdb+ deploys a single executable and some additional k code (Q is written in k) in < 600KB uncompressed. Admin is very simple. Multicore use is trivial Hot code updates/upgrades (interpreter uses new definition on next execution) Code works the same (in most instances) for: * a single file/database * multiple files/tables * column files (for each table) * files on multiple machines (like MapReduce/Hadoop)
with no changes.
K/Q/kdb+ deploys a single executable and some additional k code (Q is written in k) in < 600KB uncompressed. Admin is very simple. Multicore use is trivial Hot code updates/upgrades (interpreter uses new definition on next execution) Code works the same (in most instances) for: * a single file/database * multiple files/tables * column files (for each table) * files on multiple machines (like MapReduce/Hadoop)
with no changes. </pre>
The code is full of these things, that looks conceptually a lot like Perl oneliners (which I don't mean in a negative way). But the trick to coding with these short idioms is to make sure your input is strict in a very specific way that fits your particular problem.
A nontrivial codebase could not be written in this way. A real life editor would for example have to the concept of active parts of files to edit files larger than fits in memory, the ability parse different encodings and line endings without changing the on-disk format unless asked for, and many other things (and this just to implement the bare bones of an editor from the 80s).
This looks a lot like a macro language, with its many ready made functions for accessing ctrl-key sequences, cursor movements etc. Would it really hold up outside its problem domain?
Why? Address space is cheap, and a is simply map'd to the file.
I don't have any text files bigger than 64 bits of address space. Do you?
> the ability parse different encodings and line endings
vi doesn't do this. acme doesn't do this. I would agree it's a popular feature, but I still think it's a mistake: If every program on my computer has to continuously parse and deparse bytes into text, and be aware of all the different things every other operating system calls text, then it seems like a waste; it seems redundant. It's a waste of code, and of memory, and I think any mere editor that "needs" this lacks sufficient composition.
Its actually not at all an unusual concept; its well known from SQL, for instance, but also Scala has them (Scala views aren't views in the sense that you use the term, but the lazily-implemented transformers they implement produce views of the type under discussion) as does Ruby (via Enumerable::Lazy). In fact, while the name "views" isn't exactly commonly used for them outside of SQL (though, as noted, Scala uses the term for the construct through which one obtains them), they are pretty ubiquitous in modern languages.
Unlike SQL, views in k are not limited to queries as dependency expressions. The expressions can be arbitrary.
Doing this generally in Ruby I think is impossible, but you might be able to get close if all your objects are based on ActiveModel::Dirty
Not so much. At least with views over most Enumerables, its quite possible in Ruby -- that's the whole reason that Enumerable::Lazy exists.
The existing File class doesn't quite support it because of the way its iterators are implemented (particularly, they are one way) but the class is easily extended to allow it, e.g.:
class RewindFile < File
def each
rewind
each_char {|x| yield x}
end
end
Then you can create a synced view that has the character positions of the newlines in a file like this: a = RewindFile.new "myfile.txt"
c = a.lazy.each
.with_index
.find_all {|x,_| x=="\n"}
.map {|_,y| y} c = []; a.lines{c << a.pos}
As far as $, you may know it from Regex as the new line indicator. c=[];n=0;a.lines{|x|c<<(n+=x.size)-1}
but $"\n" wasn't special, and this allocates tons of memory. tinco's implementation is much closer to what k is actually doing.The value K provides is the collection of operators like `$` that implement a high level language for the sorts of problems K programmers face.
If you went through and implemented all of those operators in Ruby and only used those instead of things like loops, your code would be "unreadable" to the standard Ruby programmer; essentially you'd be programming in a different language.
module Enumerable
def dollar(c)
map.with_index{|a,i| a == c ? i : nil }.reject{|i| i.nil? }
end
end
And then you could do: c = a.dollar("\n")
There is of course a reason this dollar method is not a part of the standard library. Its name makes no sense and it's oddly specific, how often would you want the indexes of matches to a character? Most modern languages don't like to work with indexes a lot, and in my day to day work I don't need indexes very frequently either. This I guess is just a thing that these finance/apl people do more often, so they have a specialized standard library.(So when I said 'a neat way to do $' I meant it has no find_all_with_index method, which would make my implementation much cleaner)
.find_all.with_index { |item, index| ... }What I think is being sought is more like:
module Enumerable
def find_indices
each.with_index
.find_all {|x,_| yield x}
.map {|_,y| y}
end
end