JavaScript for Data Science(js4ds.org) |
JavaScript for Data Science(js4ds.org) |
We will cover:
Core features of modern JavaScript
Programming with callbacks and promises
Creating objects and classes
Writing HTML and CSS
Creating interactive pages with React
Building data services
Testing
Data visualization
Combining everything to create a three-tier web application
- this isn't data science.Like, all the stuff "for my data science", such as making a visualization website etc.
Publishers should just offer a free e-book of said language, and make it a requirement.
That sentence from the book clarifies a lot for me. It is Javascript for Data Science People. Taken in that context this is an excellent book written with empathy for the Data Science user who is usually making uneasy excursions which they hope and pray is only temporary into Javascript and running back to Python the first time they encounter a Promise or a Callback.
Compare this with: „Data wrangling with JavaScript“ [1]
[1] https://www.amazon.de/Data-Wrangling-JavaScript-Ashley-Davis...
A better name would be “JS for data scientists”
I’d suggest that there are 3 important primitives for data science: flexible numeric types, fast math/algorithm libraries, and data manipulation being easy.
JavaScript doesn’t really have any of these. Numbers are 64bit floats only - no integers, no big numbers. There aren’t equivalents to Numpy/Pandas/Scikit Learn, and the lack of standard library and expressiveness in data manipulation in the language makes basic tasks harder.
JavaScript has its uses, but there’s really no reason to force data science be one of them.
There is a lot of great work happening in this space already. In the R world for example, shiny makes heavy use of js to the point that you often can't tell where R code ends and javascript begins. Plotly's Dash provides bindings for R, Python, and Julia. Personally, as a data scientist, I have been excitedly learning React because it really rips the landscape wide open for all the use cases I mentioned above. It then makes sense to have libraries that give JS users a good data model and can do most of the same numerical computation that we'd be doing in other languages. Again, you probabaly don't want to do serious numerical work in js, but remember people said that about Python ten years ago too.
I love the framing of this book, because I want more data scientists to start thinking about the presentation of data and spark some bits of ingenuity to make datasets and model outputs accessible to non-data scientists. Data scientists should be the ones writing the tools that interface data with humans because of their domain knowledge. But this is a different skillset and usually the work of SW engineers. Of course engineers can also have great data intuition too, but I really do encourage data scientists to develop their front end skills, it's well worth it.
Python definitely has some problems, but if you were going to have a new lingua franca for data science, it would probably be something like Julia, certainly not JavaScript.
So it seems a bit weird to me that they advocate using arrow function syntax instead of the regular syntax. They seem to be advocating using the new class syntax instead, so I guess they don't need the constructor or method creation features of the normal syntax, but I still don't see why they would specifically advocate for arrow function syntax. Is it faster? They say it interferes with other features, but which features?
[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
FWIW I personally like the function keyword, since it's clear what it is to non-JS readers, but primarily because it hoists to the top of its file, so unimportant utility functions can sit unobtrusively at the end of the file, thereby letting readers encounter more important logic earlier in the file.
side-note: Most of my JS experience is writing userscripts for myself, so I definitely do my share of 'this' shenanigans.
even python is not used for data science, all heavy lifting is done in C/fortran, and python is just a glue
I can't find any benchmark how they compare to data.table or pandas.
Without a dominant and high performance data frame library as a foundation, I wouldn't even try.
Personally, I'm excited to build apps that don't require cloud computing and if they do, have access to one of the largest software engineering libraries through NPM. Sure, I'm not doing just Data Science in JavaScript but rather building apps that use DS/ML/AI, but that's still a valid use case. The alternative would be to use Python for prototyping then rewrite for production apps.
https://github.com/nicolaspanel/numjs
https://www.npmjs.com/package/fast-math
https://smartbear.com/de/blog/2013/four-serious-math-librari...
That's not the problem. The problem is mindshare and network effects. When analyzing why Python is used one way and JS another we're tempted to retroactively rationalize this with something fundamental about the language. There's nothing fundamental about it. It's just happenstance. Python was around longer as a general purpose script, and it filled that niche. JS is relatively new as a script outside the browser.
https://github.com/nicolaspanel/numjs/graphs/contributors
I sincerely believe it is possible for JavaScript to be a viable language ecosystem, but there is dire need for cohesion, collaboration, and longevity. As it stands, there are so many potentially viable projects strewn across the NPM landscape like old, discarded toys.
I'm not aware of an initiative, let alone ethos, in the JS community that comes anywhere close to something like NumFocus.
I'm not suggesting these are the first tools you'd reach for for data science in production, but I've found them extremely useful for prototyping, experimenting with algorithms, and visualization. I think it's got to the stage they should be seriously considered for some types of relatively simple data processing work due to their ease of deployment.
[0]https://github.com/uwdata/arquero [1]https://observablehq.com/
* tensorflowjs, which runs on GPUs https://www.tensorflow.org/js and
* danfo, which aims to be a pandas equivalent for JS: https://danfo.jsdata.org/
Given the powerful interactive visualisation capabilities available in JS, its only a matter of time until JS becomes a serious contender IMO.
performance-wise, BigInts are terrible. Tried to use them, made things about a hundred times slower.
When you’re writing data science code, the value is in the answer more than the process of getting to that answer. Anything that complicates that gets in the way. This is why things like Pandas are so popular despite having some questionable engineering. Using a library for big number support, having to get that to play nicely with other libraries, it all goes against the aims.
Now for data engineering it’s very different. I wouldn’t choose JS myself, but it’s a much more reasonable choice. For engineering the process by which you get the answer matters far more - is it scalable, testable, repeatable, etc. Having to use a library for big number support is fine.
It’s two very different ways of working and I’m still fairly convinced that JS is not conducive to the former.
Is that generally true for data science type tasks, though, where the "fast" in python is really numpy, pandas, etc?
>or can use bindings just like Python
But there's not really anything like numpy/pandas for it to bind to at the moment, is there? Meaning anything as broad in functionality, fast, mature, etc.
both of those libraries are for arbitrary precision decimals, not floats.
NodeJS evented architecture is great for multitasking training (and prediction) jobs. I use Node Streams to extract and process data flows out of several data sources on my Macbook Air M1 using the new Neural Engine 16-core chip to train CNN models with excellent results.
Data prediction then runs on a ReactJS app, which gives my users a way to model, transform and visualize data on a browser. Everything is in Typescript, which reduces cognitive overload for our programmers and produces good end-to-end duck-type coherence and easy, integrated testing.
Now, most data science libs for Javascript are either on their infancy, are proof-of-concept or just abandoned, but TFJS is solid and if you know what, why and to which extent you're using JS for data science, then it's absolutely fine.
JavaScript does have integers (e.g. `Uint8Array`) and it also has big numbers (e.g. `BigInt`). It's true that there's not yet an equivalent to Numpy/Pandas/Scikit yet, but POCs show that it will be possible to create such a thing and that we will be able to use the WebGPU API to access higher performance than is available using Python [2].
I'm not saying that it will definitely happen, but why not?
[1] http://benschmidt.org/post/2020-01-15/2020-01-15-webgpu/
That is not true. BigInt has been available for a bit already.
MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
Availability: https://caniuse.com/bigint
I don't want to argue for or against using JS for "data science" (I myself used R for that but I use JS a lot for other things), just a clarification on this one concrete claim.
performance-wise, BigInts are terrible. Tried to use them, made things about a hundred times slower. What JS needs are 64 bit integer types, and some form of typing system that allows differentiating between various number types.
There is absolutely nothing wrong with coders/analysts/scientists building solutions in any language. The "hate" that you mention -- and then proceed to echo -- is a narrow way of asserting the superiority of $mylanguage and the inferiority of $yourlanguage.
> flexible numeric types, fast math/algorithm libraries, and data manipulation
Your point b) is usually written in a performant, compiled language, and your point c) can be built from robust primitives in any language. However, I will add a point d) about speed and memory usage.
I do data analysis with the simplest set of performant tools: sqlite, bash-awk-sed-grep, Perl, Python, C++, SVG, and a browser to render. Any kind of glorified REPL beyond a terminal creates fragile complexity and dependency Hell.
My kit doesn't include Node.js or ECMAscript but I'm willing to open my mind enough to think it might, one day. The current tooling for data analysis (or "data science" if we want to be faddish) is a mess and I look forward to better tools in the future.
Take the following example, which is a normal class method:
> alertSum() { alert(this.a + this.b); }
And here we have an arrow function used to create an instance method (just an arrow function assigned to a property on the instance):
> alertSum = () => { alert(this.a + this.b); }
Then let's say we want to pass the method directly as callback:
> this.button.addEventListener('click', this.alertSum)
The first example (class method syntax) won't have the necessary `this` context unless it has its context bound to the instance through `Function.prototype.bind`. There are other patterns to avoid this (e.g. wrapping all callbacks in arrow functions when passing them), but it's useful to consider that classes methods can easily create confusion because that's _exactly where_ someone more used to a different language may assume the `this` context is bound lexically.
Edit: I was confused about how this could work, so I dug through [1] for a bit. It appears that for each object of that class created, an arrow function will be created on that object and its this will indeed be bound to the same scope that the constructor function is bound to. This is really cleaver and I applaud whoever thought it up!
It is interesting to note that this creates a new arrow function on each object as opposed to the normal definitions which create a single function which is stored in the prototype of the class. (its easier to check this in a browser's dev console then it is to decode the spec)
This would suggest that one should use different approaches for different types of objects: It makes sense to use arrow functions for "resource" or "actor" objects, of which there are few but they may have callback functions. It makes sense to use normal method definitions for "plain old data", of which there may be many, (which would make the arrow functions too expensive) but they should not have callback functions.
Not really. It's contortionist and wasteful and one of the many reasons why mainstream web apps are one big celebration of bloat on a boat.
The neophyte programmers who have turned into expert Modern JS programmers are always recommending arrow functions like this because they've never actually looked at the event listener interface. What happens is they try to make things more complicated than they need to be and bodge their event registration. So they apply a "fix" by doing what they do with everything else: layering on even more. "What we need," they say, "are arrow functions."
No.
Go the other way. Approach it more sensibly. You'll end up with a fix that is shorter than the answer that the cargo cult NPM/GitHub/Twitter programmers give. It's familiar to anyone coming from a world with interfaces as a language-level construct and therefore knows to go look at the interface definition of the interface that you're trying to implement.
Make your line for registering an event listener look like this: `this.button.addEventListener("click", this)`, and change the name of your `addSum` method to `handleEvent`. (Read it aloud. The object that we're dealing with (`this`) is something that we need to be able to respond to clicks, so we have it listen for them. Gee, what a concept.) In other words, the real fix is to make sure that the thing we're passing in to `addEventListener` is... actually an event listener.
This goes over 90% of frontend developers' heads (and even showing them this leads to them crying foul in some way; I've seen them try to BS their way through the embarrassment before) because most of the codebases they learned from were written by other people who, like themselves, only barely knew what they were doing. Get enough people taking this monkey-see-monkey-do approach, and from there you get "idioms" and "best practices" (no matter whether they were even "good" in the first place, let alone best).
---
As I understand the interface, practical use of the EventListener interface boils down to the implementer performing a form of event delegation, where you'd wind up delegating from a single `handleEvent()` method on a class to handle different event types and/or events with different target elements -- as opposed to a single click handler in a simple button click example. I'd love to understand and quantify the benefit of this. If it's a significant improvement, it'd be doubly unfortunate, as many callback-based interfaces in JavaScript APIs and libraries at a higher level don't support such a model.
Assuming there's a strong benefit, I also wonder if there's an opportunity for build-time tooling to rewrite code from ad-hoc callbacks to more efficient delegation along these lines. Tangentially, I also don't know about `Function.prototype.bind` vs wrapping arrow functions in terms of performance; that's also something I'm curious about, as they're often a 1:1 analog.
I'm tempted to test a few of these things out myself, but if you have references, that would be helpful.
You're mixing up cause and effect; there's a reason why the perspective laid out in that comment has come to carry the amount of derision evident in its tone. It's wrong to assume things started out like this (and it's wrong to assume that a different tone will achieve better results; experience shows that the difference in results, if any, is worse otherwise).
> Assuming there's a strong benefit, I also wonder if there's an opportunity for build-time tooling to rewrite code[...]
That's another example of "layering on more" approach, and the small matter of changing the actual runtime mechanism in isolation isn't going to solve much. The big, relevant matter at hand is a complete difference in mindset that programmers have when they're approaching a problem with the ultimate intent of solving it with code.
For example, let's say you have a list of items, and M different kinds of events you want to respond to. That's not just a matter of a difference of e.g. 1 listener (implementing `handleEvent`) vs 2 or more (individual arrow functions). If that list is N items long, now we're talking about M × N different objects (worse, closures—each with their own lexical scope, keeping the bindings alive and potentially thwarting the GC's work)[0]. But the programmer who knows not to use arrow functions here doesn't just stop at implementing `handleEvent`, he or she also recognizes that if all of those listeners share common behavior, then the whole thing can be handled by a single event listener—on an ancestor element that just looks at the event target.
Once again, though, this is much bigger than event listeners. This is about a mindset that pervades everything at every level. The mindset of programmers working in "modern", "standard"[1][2] JS is a mindset that fundamentally impacts how they approach everything, from how to decompose a system into its natural parts, to what routines need to be written and what factors to be aware of while implementing them.
Note that unlike the comments elsewhere, this isn't mistakenly misplaced aggression towards JS per se, nor is it an endorsement of other communities/ecosystems like Python, which tends to have programmers that are just as bad, if not worse. A big part of the problems afflicting the world of modern JS for that matter is programmers who have ended up there by way of Python or similar.
0. Runtimes will have some mitigations for this, but eventually that runway ends; the myth of the sufficiently smart compiler although an enduring one, is unattainable
2. Let's consider, for that matter, the obnoxious temerity that it takes to decide to anoint one's opinionated, less-than-well-informed set of personal preferences—many of which ironically reek of fighting against the language itself—as "standard"
The only item which standardjs forbids that I could identify as possibly producing a decent performance improvement if used is the "with" statement. They also forbid wrapping non-objects in objects, which I guess could provide extra-expressiveness, but seems like it would usually just be a performance penalty.
I have seen people recommending obviously inefficient things like including large third-party libraries just to look up an element by class name, but standardjs looks like it is just a set of linting rules. I don't agree with all of them but I don't think there is a set of rules everyone would agree to.
Eh, show me an object that both owns a button and is an event listener attached to that button, and I’ll bet you just showed me an object that violates the SRP. OO event listeners make sense to encapsulate state narrowly associated with handling the events being listened for.
But, having been doing OOP since I was a teenager in the late 1980s (and even that after programming for more than half a decade), I don’t prefer to do as much JS as possible in a functional style because I don’t get interfaces or other OO concepts, but because I understand the limits of their utility and managed not to be one of the one-paradigm-to-rule-them-all programmers that came out of the excessively OO-focussed pedagogy of much of the 90s and 00s.
Sounds like a veiled criticism towards an implied target (of me), which would be interesting, considering the retort is an appeal to "SRP" (referred to only with shorthand, even), and indeed, the first mention of OO(P) is your comment. The shape of your foe isn't what you think it is.
> OO event listeners make sense to encapsulate state narrowly associated with handling the events being listened for
For this defense to be valid, you have to ignore the preconditions that got us here. The entire context here is making sure that `this` refers to the correct thing at the time that the event listener is executed. Either you want that, or you don't—pick one.
(And as with, like, 90% of mainstream applications of the label "functional programming", look at what people are actually talking about when they use those words, and it's immediately apparent that theirs is a style of programming where the label is unjustifiable, considering it's fundamentally at odds with what functional programming actually is. Again, pick one: you either intend to do FP and do so through the use of functions, or you want your state baggage. Modern appeals to FP by JS programmers in the mainstream and the practices that get smuggled in under that brand are just as bankrupt as arguments for modern JS practitioners' best practices regarding anything else—ones that are entirely self-defeating or fail to hold up under scrutiny otherwise.)
It's a platform API (the browser DOM). It's a completely separate matter from JS the language. It has, though, been the "correct" way to work with events since at least as far back as 1998 or so. It's the official platform APIs that had to change to accommodate the function-as-a-listener pattern (requiring an entire revision to the interface definition language itself so that it would have the power to express it), since so many browsers implemented that non-standard extension, and so many programmers insisted on using it. And that pattern is also directly responsible for why arrow functions and Function.prototype.bind were added to the language itself. If only it had been addressed differently, writing programs for the web browser might look a lot different than the way it does today.