Also when someone claims 1000x better Performance I want to know why. For example MySQL or PostgreSQL -> Clickhouse I can clearly attribute to column store, compression, vectorization, parallel execution on multiple CPU cores and machines...
EDIT: shakti.com/k/k.zip is now returning 404.
[1] https://en.wikipedia.org/wiki/Security_through_obscurity
The code does, as well. Either Mr. Whitney's brain is not wired like a regular homo sapiens sapiens, or the entire thing smells of "I am smarted than you and I don't need to lower myself to your level."
I do not buy for a single second that for Mr. Whitney debugging IOCCC-level obfuscated code is easier than plain C code. One writes "normal code" because one will have to read it later, and they don't want to spend ages doing so, unless they have to keep an air of superiority about their abilities to their peers.
I get that APL is obtuse and dense. But writing obtuse and dense C doesn't turn it into APL.
He was superb at finding errors at code review. As in looking through code someone else had written and pulling out the mistakes. Presumably everything looked completely trivial to him, regardless of how tangled the control flow had got.
Whitney may be similar.
Lagniappe: click on the circular red button underneath the comic, to the right of the orange "RANDOM" :-)
Calling it a 'release' is an overstatement. The docs state that it is a work in progress. It's also quite buggy (it's easy to get a segmentation fault). The version I saw in January was about 1/3 the size of this version, and also buggy. I hope that the final version of this code is less buggy and more usable.
If you want to learn the K language, don't use this version. Any of the other open source K projects are better than this (more complete, less buggy, better documented). This project is good if you want to learn more about the Arthur Whitney C coding style, because it is so small. Other projects written in this style (some open source K implementations, the J language) are huge by comparison.
One thing that puzzles me, about array languages, is that despite several open source implementations already existing, like J, its surprisingly difficult to find them packaged in Linux repositories. For example, you can't just "apt install J", or "apt install gnu-apl" on Ubuntu. In J case, it seems the default is just compiling it from source. Is there something tricky about packaging them?
The closest to a repository-friendly array language I could find was the klongpy implementation of klong[0], that is pip installable.
I also wonder if the commercial version has anti-benchmark clauses like some database vendors. I've always seen claims that K is much faster than anything else out there, but I've never seen an actual independent benchmark with numbers.
Edit: according to https://mlochbaum.github.io/BQN/implementation/kclaims.html, commercial licenses do indeed come with anti-benchmark clauses, which makes it very hard to take the one in this post at face value.
More recently, we have been working on GFQL with users at places like banks (graph dataframe query language), where we translate down to tools like pandas & cudf. A big "aha" is that columnar operations are great -- not far from what array languages focus on -- and having a static/dynamic query planner so optimizations around that helps once you hit memory limits. Eg, dask has dynamic DFS reuse of partitions as part of its work stealing. More SQL-y tools like Spark may make plans like that ahead of time. In contrast, that lands more on the user if they stick with pandas or k, eg, manual tiling.
Kdb/q is like minimalist footwear. But you can run longer and faster with it on. There's a tipping point where you just "get it". It's a fantastic language and platform.
The problem is very few people will pay 100k/month for shakti. I'm not saying people won't pay and it won't be a good business. But if you want widespread adoption you need to create and an ecosystem. Open sourcing it is a start. Creating libraries and packages comes after. The mongodb model is the right approach IMO
Is something else better (if so what)?
Quick for evaluating some idea you just had if you are a quant? Yes absolutely!
So imagine you have a massive dataset, and an idea.
For testing out your idea you want that data to be in an “online analytical processing” (OLAP) kind of database. These typically store the data by column not row and other tricks to speed up crunching through reads, trading off write performance etc.
There are several big tech choices you could make. Mainstream king is SQL.
Something that was trendy a few years ago in the nosql revolution was to write some scala at the repl.
It is these that K is competing with, and being faster than.
Yes, a very specific implementation will be faster than a generic system which includes network delays and ensures you can handle things larger than your memory in a multi-client system. But the result is meaningless - perl or awk will also be faster here.
If you need a database system, you're not going to replace it with K, if you need super fast in-memory calculations, you're not going to use a database system. Apples and oranges.
I doubt they'd give them out to a random individual or small startup, but maybe still possible for a serious potential customer.
#define _(e) ({e;})
//!< isolate expression e in its own lexical scope and clamp it with ;
//!< note the outer parens, which is a very powerful c trick: they turn _(e) into a so called
//!< r-value, which basically means we can do x=_(e) for as long as e evaluates to or returns
//!< at least anything at all, i.e. not void. this macro is fundamental to k/simple implementation.
I didn't know that corner of C. Removing the () from the macro does change what you can pass as e, and assigning the result of a block does work as one would expect.edit:
-Wpedantic on gcc will tell me ISO C doesn't like the construct but it still compiles it happily.
Clang offers -Wgnu-statement-expression-from-macro-expansion
So it looks likely that this is the GNU statement expression extension after all and not a part of C. Shame.
The Linux kernel makes quite a bit of use of them as far as I'm aware.
the free(MIT license) version is shakti.com/k/k.zip
the supported version(supported and 10 to 100 times faster)
is $100K per month(minimum)
https://groups.google.com/g/shaktidb/c/5SPufca3mo4Good joke, though.
For example, I have an old friend from a major investment bank who used to work on an internal (proprietary) pub/sub system but who, these days, works on integrations between that system and Apache Kafka.
Wouldn't work the same way if your core customer base is elementary school teachers.
> K is a proprietary array processing programming language developed by Arthur Whitney and commercialized by Kx Systems. The language serves as the foundation for kdb+, an in-memory, column-based database, and other related financial products. The language, originally developed in 1993, is a variant of APL and contains elements of Scheme. Advocates of the language emphasize its speed, facility in handling arrays, and expressive syntax.
There was also a great thread on HN about it as well[2].
And many programming languages do, for example C# and Racket. I have a feeling it isn't very hard to implement, since that's what you typically do if you need complex numbers in Java.
It’s deliberate and powerful.
Here is a recent one: https://news.ycombinator.com/item?id=39026551
There was an epic post some years ago but couldn’t find it now from my phone.
If you'd like an antidote, have a read of Gerald Jay Sussman's books, where you'll see profound concepts from maths and physics captured in succinct and expressive (as opposed to merely terse) code, accompanied by eloquent explanations devoid of boasts or name dropping and provided free of charge online. That will change the way you think about computing too, but it will be a more pleasant experience.
The first one I found was "Structure and Interpretation of Computer Programs"
g(_M,W-=1<<f;xx=M[f];M[f]=x)Ui(M_,Ux=M[i];x?(W+=1<<i,M[i]=xx,x):30>i?_M(i,M_(i+1))+(2*n0<<i):OO())f(_r,pt?x:65535&rx?(--rx,x):e(if(!Tx)n(_r(xU))_M(mx,x-n0);x))f(r_,pt?x:65535&++rx?x:OO())
Edit: Looking at it a bit more, I can't tell if the code is obfuscated or if the author really wrote it like this...vector the journal of the British APL Association
Perhaps a half way house is sql. The difference between ORM-style CRUD and a power user using window functions to make the data dance shows there is still art to be had in programming :)
That being said, I'm not convinced that the extremely minimal syntax is essential. I think it can be done another way ;)
If you're looking for a practical k implementation, I recommend ngn/k, and several other implementations are listed at https://k.miraheze.org/wiki/Running_K .
Oftentimes, the way something is presented and how the language is used, might be as important as the thing itself ;-)
If you want you can switch out his terse names in the .h and .c and see if that helps. I'm not so sure it does, but experience with array languages and a couple of decades with rather advanced C will. As in, experience is what matters rather than "IQ".
Is the terseness of the site mean to reproduce the terseness of the language? Is that the gimmick?
This isn‘t an advocacy piece directed at the general public. You‘re not his audience.
Fortunately, there is secondary commentary, like this thread, so we can get an idea what this is about.
E.g. it's tempting to dismiss Haskell as something invented by mathematicians more concerned with the elegance of their abstractions than actually getting things done, but Pandoc is so undeniably good and useful that you're forced to admit Haskell can be a good choice. What's the Pandoc of K?
//k(c)2024 arthur whitney(l)MITImho software size should reflect complexity of the problem domain. Not arbitrary metrics like say, the capabilities of a system executing it.
So "Hello World!" should weigh in at mere bytes. Not KBs or even MBs.
I mean faster at filtering data than a python script? Sure. Faster than a database or hand-rolled C code? Only if your benchmarks are misleading.
The new version requires ARM 64 or Intel 64 with AVX2. It requires clang-13 (clang-14 and later won't work). Gcc doesn't work.
With clang-14, I got build errors. First error: ./a.h:38:30: error: use of unknown builtin ‘__builtin_ia32_pminub256’ [-Wimplicit-function-declaration]
Seems to be related to this LLVM change which removed the above builtin: https://reviews.llvm.org/D117798
When I replaced __builtin_ia32_pminub256 with __builtin_elementwise_min and ditto for max, then it compiles and apparently works.
https://ktye.github.io/kdoc.htm
https://github.com/ktye/i/releases/download/latest/k.c
IIRC, some old UNIX versions had an APL interpeter in the userland. For me, a k interpreter could be the ultimate UNIX utility. But interoperability with pipes and other UNIX utilities is awkward to say the least, as is having to use other programming languages as duct tape.
It's writing C in array-language style rather than intentional obfuscation.
I can totally believe that he didn't intentionally obfuscate it, but its incomprehensibility made it harder for other people to make a knockoff and thats why it survived and became successful.
Bonus: Go visit and do "View Source" on that website. Even HTML has fragrance of K.
I read a couple of other threads and some people try to claim less code = fewer bugs, but that's pretty clearly nonsense otherwise minifiers would magically fix bugs.
As for why people actually use this (it seems like they really do), my guess would be that it's used for write-only code, similar to regexes.
Like, using a regex you can't deny that you get a lot of power from not many keystrokes, and that's really awesome for stuff you're never going to read again, like searching code in your editor, one throwaway shell commands.
But they also tend to be completely unreadable and error prone and are best avoided in production code.
[0] https://scholarworks.iu.edu/dspace/bitstreams/dcbd5240-8454-...
[1] https://www.bonfire.com/co-dfns-thesis-edition/
(a) come to think of it, theses are one and done
(b) thanks to Kragen for pointing out this 02019 work!
One of these days I need to (a) learn more about the Berber culture, and then (b) write an array language which exploits the ⵜⵉⴼⵉⵏⴰⵖ symbology.
https://en.wikipedia.org/wiki/Tifinagh#/media/File:Tifinagh_...
https://www.win.tue.nl/~aeb/natlang/berber/tifinagh/tifinagh...
https://www.edition-originale.com/media/h-3000-saint-exupery...
Concision is the handmaiden of clarity.
i don't think i'm going to get into the practice of doing it to that extreme, but i'll probably adopt the multiple-things-per-line part for side projects, and skip the shortening of keywords
with a wide monitor, you can have 3 files like this open at a time. you can fit a surprisingly large program on the screen at once, which is the benefit of coding like that
An ultimate expression of insider elitism and knowledge hoarding, which can be a self-interested asset to profitable, closed-source, specialized software.
For everyone else, code should present no surprises whenever possible with semantic expressiveness and reserve comments for explanation of surprises, design choices, and protocols.
There is a real tradeoff between making code friendly to the uninitiated and making code ergonomic for the expert. It's completely natural that non-initiates feel unwelcome when the code is written for domain experts. In your company's codebase, is it really best to optimize for onboarding newcomers over optimizing for the productivity of your engineers? Where along that spectrum maximizes your goals?
Are we building chairs or are we building chair-builders?
I don't know this AW guy, but to me that's a huge red flag and a sign that a programmer hasn't worked on anything substantial. Ie non-trivial stuff that's maintained by a team over time.
Being able to read the code is irrelevant, as the comments should tell you why the code is doing what it's doing.
For example, yeah I trivially can see the code is doing a retry loop trying to create a file with the same name.
That looks like a bug, if you can't create the file with that name, you change the name in the retry loop.
But the comment will tell me this is due to certain virus scanners doing dumb stuff, so we might have to try the same name a few times.
Sure, good code will have few comments as most of it should be self-documenting through good structure and names of classes and variables. But in non-trivial code there will always be places where it is not obvious why the code does what it does.
Maybe you want to check who he is?
This isn't the case. It's more likely a lack of business socialization combined with individual hyper-achievement. Reminds me of Ian Pratt in some ways.
An annoyance in tech, both startups and corporate, is technically-capable people but with outsized egos.
Sure there might be some substantial software that was written as a one-off (ie not modified after release), but that's the minority by far.
https://newenglandhistoricalsociety.com/the-mordant-humor-of...
Remind me of:
> Brevity is... wit
(Mr. Lisa Goes to Washington)
In other cases, especially in my sector, it's simply that the customer values lower cost over absolute performance.
Of course, not saying you shouldn't be mindful of introducing an O(n^2) algorithm when an O(n log n) can trivially be used, or similar.
You can try klongpy, a K-like array language implementation that runs atop numpy: https://pypi.org/project/klongpy/
(Copied from some forum, since I don't use Python much)
import numpy as np
def geo_mean_overflow(iterable):
return
np.exp(np.log(iterable).mean())
Or, from statistics import.
geometric_mean
geometric_mean([1.0, 0.00001, 10000000000.]) # 46.415888336127786
In J, since I don't know K: gm=:#%:*/
Even shorter than Python whether it's a canned lib routine or created from composing simple functions.And I don't need to format code on HN in J because it's so short anyway, besides I don't know how!
I’m not saying they were right or better. Horses of courses. Array languages do my head in and my choice is sql.
"the only supported atom/vector type is 8bit integer, so beware of overflows"
Still, it's fascinating how an interpreter can be written with such a small amount of code.
From the April github site: "April compiles a subset of the APL programming language into Common Lisp. Leveraging Lisp's powerful macros and numeric processing faculties, it brings APL's expressive potential to bear for Lisp developers. Replace hundreds of lines of number-crunching code with a single line of APL."
https://github.com/phantomics/aprilFor example, he wrote an early J interpreter this way in 1989. There's also a buddy allocator he wrote at Morgan Stanley that's only about 10 lines of C code.
I'd say the minimal syntax isn't just a gimmick, because it really does help with mentally chunking phrases/idioms to a degree that's not possible when the same phrases are multiple lines long. Terseness also makes it physically faster to write the same thing, which encourages interactive experimentation much more than other languages.
These are small things, but taken together you get an experience that's more than the sum of its parts.
A lot of folks seem to tolerate K syntax because K jobs pay well. (Supposedly. I've never seen a super-high paying K job in real life.) But I actually like the K syntax because it helps organize my problem solving, and it gets out of the way during experimentation time. To me it's like NumPy/Pandas but better designed and without all the ceremonial boilerplate.
I should make a ranked list.
In regards to programming, my top 30 would include:
ScrollSets: https://breckyunits.com/scrollsets.html
RegEx.
The dataflow paradigm as popularized by dplyr would be on there.
HIT ranking: https://breckyunits.com/hits.html
> real-sql(k) is consistently 100 times faster (or more) than redshift, bigquery, snowflake, spark, mongodb, postgres, ..
> same data. same queries. same hardware. anyone can run the scripts.
When talking about speed I was rolling the time taken to write the query and get the query to run with the run time. The total speed depends upon both.
In another thread someone pointed out that being a good quant is also about having the best ideas in the first place.
This is just a general comment on the whole comments section in general that I leave here:
We have this weird situation where the general programming population looks at a small group of the most profitable programmers who are thinking about their domain problems in languages that are a mapping or mathematical symbols to a qwerty keyboard (in the old days there were APL keyboards). And the mainstream programmers say that is so weird that it must be wrong and must be a lie and so on.
Occam’s razor says that those profitable programmers wouldn’t be buying K if the same results at lower TCO or better results at a higher price?
In broader data engineering there has been tech like “I use scala!” that are used to gatekeep and recognise the in crowd. But that is in the faceless corporate end of enterprise data engineering where people are not measured in bottom lines.
Sorry for venting :)
It doesn't look to me like GP switched to develop and complained of conflation. The switch happened higher up the thread by wood_spirit, and GP just continued the conversation (and called out the tendency to conflate, without calling out a specific person).
On a meta note, I wish this trend of saying "it seems weird" and then calling out some fallacy or error would die. Fallacies are extremely common and not "weird", and it comes off as extremely condescending.
It happens quite frequently on HN (and surely other places, though I don't regularly patronize those). So to be clear, this isn't critcism levelled at you exclusively. (I even include myself as target for this criticism, as I've used the expression previously on HN as well, before I thought more about it).
Firstly, in this case and in most cases where that expression is used, it's actually weird to call it weird[1]. Fallacies, logic errors, and other mistakes are extremely natural to humans. Even with significant training and effort, we still make those mistakes routinely.
Secondly, it seems like it's often used as a veiled ad hominem or insult. It's entirely superfluous to add. In this case you could have just said "you complained about people conflating the two aspects and then conflated them yourself." (It still wouldn't have been correct as GP didn't conflate them, but it would have been more direct and clear).
Thirdly, it comes off as condescending[2]. It's sort of like saying, "whoa dude, we're all normal and don't make mistakes, but you're weird and do make mistakes." In reality, we all do it so it's not weird at all.
[1]: https://www.merriam-webster.com/dictionary/weird
1: of strange or extraordinary character : ODD, FANTASTIC
2: of, relating to, or caused by witchcraft or the supernatural : MAGICAL
[2]: The irony of this is not lost on me. I can definitely see how this comment might also come off as condescending. I don't intend it to be, but it is a ridiculously long comment for such a simple point. It also included a dictionary check which is also frequently a charactersitc of condescending comments. I don't intend it to be condescending, merely reflective of self-analytical, but as is my theme here, we all make mistakes :-)- Use a language that minimizes the code to write
- Use a helper function, maybe at some runtime cost
- Use a macro, turning a short piece of code into a longer one, at a compile-time cost
Having a DTI (debt-to-income ratio) macro and a very short definition of DTI that looks similar everywhere in code sort of do the same thing.
A brilliant programmer I met who came from DE Shaw said he reimplemented a K-based portfolio optimization pipeline because the performance hit a wall once the dataset got large enough. He was able to beat K with Java of all things.
Columnar and timeseries dbs have continued to evolve, K is the same tech it was in the 2000s. The only reason it gets used at a Millennium is that whatever trade is still printing money, not any tech advantage.
Based on what I've seen before as well as the supplied code, I would be very skeptical to have him on my team.
Data pipeline A is written and maintained by a team in a type safe language with extensive unit tests.
Data pipeline B be was written long ago by a scientist who has since left, in sql in a day.
Both compute the same dataset, but B gets the answer correct.
Which is the better pipeline, and why?
If you just care about today, then clearly B is better because it provides the correct result today.
Also, just because A is written in a type-safe language and has extensive unit tests doesn't in itself mean it's any less complex and undecipherable than B.
I can think of several takes, with different assumptions, leading to very different perspectives.
One take could be this:
Lets assume A has been written by a competent team, using good practices. Lets also assume the problem of incorrect answers in A has been known for some time and has been investigated a fair bit. That is, it's not just a trivial bug that's not been caught yet.
Since A doesn't work one could reasonably assume B is complex and difficult to understand, otherwise A's team should be able to find their error based on studying the SQL in B. Otherwise it indicates A's team is not competent, which goes against our previous assumption.
Given that, one could reasonably assume changing B will be very difficult.
Thus if one cares about maintaining and evolving the pipeline due to changing demands over many years, then it's likely A is better, as the bug in A producing the wrong answer should be fixable by a competent team.
Again, just one take of many possible...
An alternate, more trivial take could be that team A were given an incorrect specification. So while they implemented the specification correctly, B actually implements something slightly differently.
We see this one with customers all the time. Where they think the old system does X but it does in fact do something slightly different, so when we implement the new system as requested, the customer files a bug report because it doesn't do what the old system actually did.
The disbelief on first encounter is totally reasonable, but from personal experience, once you've gotten past that and invested the time to really grok the array language paradigms, code like this Whitney style actually ends up feeling more readable than whatever features our current SE culture deems Good and Proper.
There are a whole lot of moving parts to the why and how of these ergonomics, so I don't expect to be able to convince anyone in a simple, short comment, but if you're at all interested and able to suspend disbelief a little bit, it's worth watching some of Aaron Hsu's talks to get more of a taste.
(I mean, you could skip the whole hallucination thing and write an exact converter, but that'd be a lot of effort for code that'd probably get used about as much as M-expression to S-expression converters do in the lisp world?)
https://arxiv.org/pdf/1912.05234Regular expressions can be constructed using various syntaxes. Some are optimized for writing them out quickly, and some are not. Choose the latter when going to production, and you'll be fine.
As for K/J/APL - it's similar. You can write incredibly terse code which works wonderfully in a REPL (console). Working in an interactive manner is the default mode of operation for those languages. If you ever worked with Numpy/Pandas or similar, in a REPL, you likely concocted similarly terse monstrosities. Check your %hist the next time you have a chance. Going to production, you naturally rewrite the code instead of just committing the first one-liner that produces the desired results.
Otherwise, I don't see anything you can do in an array language that you couldn't do in any other language, albeit less verbosely. But I believe in this case a certain amount of verbosity is a feature if you want people to be able to read and understand the code. Array languages and their symbol salad programs are like the modern day equivalent of medieval alchemists writing all their lab notes in a bespoke substitution cipher. Not unbreakable (like modern cryptography) but a significant enough barrier to dissuade all but the most determined investigators.
As an aside, I think the main reason these languages took off among quants is that investing as an industry tends toward the exultation of extremely talented geniuses. Perhaps unintelligible "secret sauce" code has an added benefit of making industrial espionage more challenging (and of course if a rival firm steals all your code they can arbitrage all of your trades into oblivion).
A few years ago I wrote a pipeline in J and then re-implemented it R. The J code was exactingly crafted and the R code was naive, but the R code was still faster, easier to read and maintain and, frankly, easier to write. J gives a certain perverse frisson, but beyond that I don't really see the use case.
I've seen systems with this structure. Part of the fun is B's code is likely to have a lot of errors in it that cancel each other out in exciting ways when running on the domain of interest, which makes using it to work out why A's code is failing to correspond to reality much harder than it could be.
This more than anything demonstrates the hothouse-flower mentality of K stans. Quants have long since stopped being the best-paid or most value-generating engineers, and since K has zero application outside of quant, it's no longer even a particularly lucrative skill to acquire.
It's interesting though that the opacity of the "I make more money than you" argument fits so snugly with other unverifiable and outdated claims of K supremacy, like performance, job security, or expressiveness.
That said, I would personally rather do research in python, especially now that the performance situation is reversed.
If I was paid by the hour to write C, then I'd use single letter variables too, but I'm too lazy to do twice the work, when I can make my life simpler.
Simplicity is a virtue, there is nothing interesting about complexity for complexity's sake.
In the words of Terry Davis: https://youtu.be/k0qmkQGqpM8?si=larQzV0Ngdba6vQI
But then the scope grows as the code evolves and suddenly you've got 200 lines with a bunch of single variable names. If I'm not a sadist and I rename the code before submitting the PR, but there's definitely a flow state where I've been living the code for too long and what makes total sense to me looks like line noise to others, or even future me. (I was real good at perl, back in the day.)
Point being, Arthur Whitney writes like this, even if you and I can't comprehend it, and yes it's obtuse. I wouldn't even want to work with myself if I wrote code like this, but I'm not as smart as Arthur Whitney.
As you say though, simplicity is a virtue. This is simpler for Arthur Whitney, even if it's more complicated for the rest of us.