I was working on a new type of locking mechanism and thought I would be smart by modelling it in spin [http://spinroot.com], which has been used for these kind of things before.
I ended up with a model that was proven in spin, but still failed in real code.
Given that's anecdata with a sample size of 1, but still was a valuable experience to me.
Shower: Out-of-date sources
This article isn't about showers, nor positive results, making the title quite confusing :)
[1] https://www.medicalnewstoday.com/articles/325725 [2] https://www.wimhofmethod.com/science
https://news.ycombinator.com/item?id=8289007
Topic: The curious case of the cyclist’s unshaven legs
From a comment (this part clearly intended to be witty I think): Really, I thought it was weird, and probably inappropriate, to mix in so much of an outsider's amateur and unsopported opinion about science into an otherwise interesting story about leg hair drag.
I love being able to show that study, if you properly architect your sqlite system and am willing to purchase hardware, you can go a long long way, much further than almost all companies go, with your data access code needing nothing more than the equivalent of System.Data.Sqlite
1. Only use a single connection for all access. Open the database one time at startup. SQLite operates in serialized mode by default, so the only time you need to lock is when you are trying to obtain the LastInsertRowId or perform explicit transactions across multiple rows. Trying to use the one connection per query approach with SQLite is going to end very badly.
2. Execute one time against a fresh DB: PRAGMA journal_mode=WAL
If at this point you are still finding that SQLite is not as fast or faster than SQL Server, MySQL, et.al., then I would be very surprised.
I do not think you can persist a row to disk faster with any other traditional SQL technology. SQLite has the lowest access latency that I am aware of. Something about it living in the same process as your business application seems to help a lot.
We support hundreds of simultaneous users in production with 1-10 megs of business state tracked per user in a single SQLite database. It runs fantastically.
E.g.:
transaction {
if (expensiveFunction(query()))
update();
}
(My applications always get much faster when I plug them into Postgres after SQLite. But then I do do the odd sort and group by, not OLAP, but because they express the computation I want and Postgres is simply much better at that.)What about large aggregation queries, that are parallelized by modern DBMS?
Does it still scale that well if you have many concurrent read and write transactions (e.g. on the same table)?
If you can come to my company and replace our 96-core SqlServer boxes with SQLite I'll pay you any salary you ask for.
> SQLite scales almost perfectly for parallel read performance (with a little work)
They aren't using stock SQLite, they're using SQLite wrapped in Bedrock[1], and their use case is primarily read-only.
SQLite is fantastic at read-only, or read-mostly, use cases. You start to run into trouble when you want to do concurrent writes, however. I tried to use SQLite as the backend of a service a couple of years ago, and it locked up at somewhere around tens of writes per second.
[1]: www.bedrockdb.com
(a) built their own transaction/caching/replication layer using Blockchain no less.
(b) paid SQLite team to add a number of custom modifications.
(c) used expensive, custom, non-ephemeral hardware.
Now you could do all of this or just use an off the shelf database that you aren't having to write custom code to use and if you choose a distributed one e.g. Cassandra will be able to run on cheap, ephemeral hardware.
(a) They implemented a very boring transaction/caching/replication layer that is like any other DB except they borrowed the idea that "longest chain" should be used for conflict resolution.
(b) They worked with upstream to get a few patches that were unique to their use-case. Once you're in deep with any DB this really isn't that uncommon.
(c) They used a dedicated (lol non-ephemeral) white-box server that has a lower amortized cost than EC2.
(d) Bedrock isn't bound to the hardware. You could run it on EC2 and reap the benefits just the same except you'd pay more.
But what does "properly architect your sqlite system" mean and how does this compare to just spinning up a postgres service (nothing sharded or fancy otherwise)?
- https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q... - https://bedrockdb.com/
I've been so many solutions that would be easily and reliably implemented on a single or small SQL database cluster of various types that turn into these complex systems to avoid the costs of scaling up the RDBMS.
SQLite resources are a lot lower due to the database being a flatfile on the OS. It's only main resource is storage.
MySQL is a application that not only requires configuration, tweaking, turning and tender-loving-care but consumes constant resources utilising the processor, memory and storage.
Not knocking SQLite - great for desktop apps or maybe local dev environments.
You can store it in localStorage and read it making your reload be a lot smaller. If it's not in localStorage, you can just request a fresh copy of SQLite from us.
I recall that F Scott Fitzgerald said "the test of a first-rate intelligence is the ability to hold two opposed ideas in mind at the same time and still retain the ability to function."
The value of new methodologies, languages, and techniques is partly that the enthusiastic proponents of them are given a chance to prove out that there is value, and so become motivated to go the extra distance to achieve the project specific outcome.
This value is destroyed if people are forced to use the technique, instead of championing its introduction. So measurement is made even harder!
My favorite thing about static typing is that it makes code more self-documenting. The reason I love Go specifically is because if you have 10 people write the same thing in Go, it's all going to come out relatively similar and use mostly built-in packages. Any API requests are going to be self-documenting, because you have to write a struct to decode them into. Any function has clear inputs and outputs; I can hover a parameter in my IDE and know exactly what's going on. You can't just throw errors away; you always are aware of them, and any functions you write should bubble them up.
Typescript addresses this somewhat, but basically offsets that complexity with more configuration files. I like Typescript in use, but I can't stand the fact that Javascript requires configuration files, transpilers, a million dependencies. Same for Python and mypy.
Yes, I could just look at class members in a dynamic language, but there's nothing that formally verifies the shape of data. It's much more annoying to piece apart. I don't use static analyzers, but my guess is that languages like Go and Rust are the most compatible with them. Go programs are the closest thing to a declarative solution to a software problem, of any modern language IMO. As we continue experimenting with GPT-generated programs, I think we're going to see much more success with opinionated languages that have fewer features and more consistency in how finished programs look.
Microservices are also great at making large applications more maintainable, but add additional devops complexity. It's harder to keep track of what's running where and requires some sort of centralized logging for requests and runtime.
That list reminds me of [1], which rants about this state of affairs and [2] that puts many beliefs to the test.
[1] https://youtu.be/WELBnE33dpY
[2] https://www.oreilly.com/library/view/making-software/9780596...
If the spec is 5x more complicated than the code would be then I'm not sure I see much of a point coz you're just creating different spaces for bugs to hide in.
We are throwing a lot of resources against a problem because we are not able to educate people good enough to understand basic performance optimizations.
You are a Data Scientist/anyone else and you don't understand your tooling? You are doing your job wrong.
It would be nice to have proper studies, but it‘s difficult to control the other variables ...
(Not a balanced evaluation, just cherry-picking failures. But I suppose we're at the point in the hype cycle where it's easier to find success stories being talked about.)
The hype is huge in that train, but I'm sure there must be lots of professionals that have already learned about its shortcomings. Not sure if proper studies exist about Kubernetes yet, though. Hopefully you'll get a PR with some content.
There used to be an often cited paper by Boehm about the cost of catching bugs early vs late on production, usually mentioned by advocates of testing early, where the quoted conclusion was something like "studies show it's 10 times more costly to catch bugs late on production" or something like that. This is a very well known study, I'm likely misquoting it (the irony!) and readers here are probably familiar with it or its related mantra of early testing.
I haven't read the paper itself (I should!), but later someone claimed that a- Boehm doesn't state what people quoting him say he said, b- the relevant studies had serious methodological problems that call into question the conclusion he did say, c- there are plenty of examples where fixing bugs late on production wasn't particularly costly.
edit: I'm not arguing testing isn't necessary, in case that upset someone reading this post. I'm not really arguing anything, except that the study by Boehm that most people quote was called into question (and was probably misquoted to begin with). This doesn't prove/disprove anything, except maybe hinting at a possible Cold Shower. It does show that we as a field have a serious problem in software engineering with backing up claims with well designed studies and strong evidence, but this shouldn't come as a surprise to anyone reading this.
Often a fancy new thing is introduced with a very long list of pros: "fast, scalable, flexible, safe". Rarely, is a list of cons included: "brittle, tough learning curve, complicated, new failure modes".
This practice always strikes me as odd because the first law of engineering is "everything is a trade-off". So, if I am going to do my job as an engineer I really need to understand both the "pros" and "cons". I need to understand what trade-off I'm making to get the "pros". And only then can I reason about wether the cost is justified.
I would not have expected that. Still, I prefer to use full(er) identifiers. I don't like to guess how things were abbreviated, especially when consistency isn't guaranteed. If I were using a different language and IDE, this might be better.
If you don't have more data than can fit on a reasonably large hard drive, you do not have big data and you are likely able to process it faster and cheaper on one system.
Today that threshold would be around 10TiB.
Thoughts on this one? I found the presentation to be somewhat mixed.
I found the initial comb through of the agile principles to be needlessly pedantic ("'Simplicity... is essential' isn't a principle, it's an assertion!"); anyone reading in good faith can extract the principle that's intended in each bullet of that manifesto.
The critique of user stories (~35 mins in) was more interesting; it's something we've been bumping up against recently. I think the agile response would be "if your features interact, you need a user story covering the interaction", i.e. you need to write user stories for the cross-product of your features, if they are not orthogonal.
I'm not really convinced that this is a fatal blow for user stories, and indeed in the telephony example it is pretty easy to see that you need a clarifying user story to say how the call group and DND features interact. But it does suggest that other approaches for specifying complex interactions might be better.
Maybe it would be simpler to show a chart of the relative priorities or abstract interactions? E.g. thinking about Slack's notorious "Should we send a notification" flowchart (https://slack.engineering/reducing-slacks-memory-footprint-4...), I think it's impossible (or at least unreasonably verbose) to describe this using solely user stories. I do wonder if that means it's impossible for users to understand how this set of features interact though?
Regarding the purported opposition in agile to creating artifacts like design docs, it's possible that I'm missing some conversation/context from the development of Agile, but I've never heard agile folks like Fowler, Martin, etc. argue against doing technical design; they just argue against doing too much of it too early (i.e. against waterfall design docs and for lean-manufacturing style just-in-time design) and that battle seems to have largely been won, considering what the standard best-practices were at the time the Agile manifesto was written vs. now.
Rxjs, etc.
Angular uses typescript and rxjs excessively and, while I used to like typescript, the combo has made me reconsider.
Rxjs send like an overcomplex way to do common tasks. Has RRP caught on anywhere else? Is there a usage that doesn't suck?
All research is inconclusive? Sure. I wonder what kind of type systems were in there? I guess Java and similars are accounted and yet I wouldn’t put any faith in them. ML, Swift, Haskell... now that’s something else.
1. e.g. what percentage of the gains attributed to Lisp were more likely due to the candidate pool in the 90s/2000s skewing heavily towards people who learned it at elite CS programs, especially if you're doing a challenge competition which benefits from having studied various algorithms?
success, err := fail()
do(success)
success, _ := fail()
fail()True, if you completely ignore the function's return values, you can throw errors away, but then you wouldn't be using the language in the way that makes it powerful to me; that there are simple and clear interfaces that you interact with.
The first and third one fails on go vet. The first one also fails to compile if you never read from err in the entire function.
I'm hoping to have a written version by the end of September.
I'm somewhat ADD and get bored easily, so not only do I need to do something more like software but I also have to stay as broad and high-level as possible within the discipline, to stave off ennui. NOTE: very much not arguing this is a good way to go through life.
The hardest part of most projects is taking unrealistic and ever changing client demands and trying to turn them into something that will actually work in reality; a process which is probably all too familiar to many software developers.
I think software 'engineering' is uniquely ambiguous in this regard, because software development as a discipline is in equal parts both design, and construction, and the design part bleeds into the 'construction' part, corrupting it (for want of a better word) in a way you would imagine that 'pure' engineering would not.
The issues are us - the developers. The machine code is mostly fine...
This seems applicable to everything, almost, in this whole experiment humans have going on here on planet earth, it just doesn't seem like it. To see it, you have to have (at least) the ability and willingness to look.
I'm honestly not convinced it helps that much. And it seems to cost a lot to me.
I like database and API schemas though. And I like clojure.spec and function preconditions a lot.
I just don’t get how people are working that it represents a time cost rather than a large time savings. I don’t mean that as a dig, I just mean I genuinely don’t know what that must look like. And I’ve written a lot more code in dynamic languages, and got my start there, so it’s not like I “grew up” writing Java or something like that.
On the other end people who prefer weakly typed languages see problems as primarily that of data transformation. For example from HTML Form to Http Request to SQL Db to CSV File and so on.
Both approaches are differentiated by the perspective on the problem.
That, plus type inference makes the static typing pretty painless.
There's also the case that I find the type systems of Rust, Elm, etc to be much more helpful than the type systems of C++ or Sorbet (type system for Ruby).
But my current job has very, very little that would benefit from static typing. Adding it into the mix would slow us down, both literally and figuratively.
However I do think static typing provides an enormous benefit to picking up code that is 5 years old and written by someone else. The ability to see “this is a nonnullable int32 value type” greatly reduces the amount of paths you have to go down when you have to change something or understand what’s going wrong with it. Tradeoff is you end up with a lot more code to maintain...
For example I‘m using TypeScript with a GraphQL code generator. Now let‘s assume I add a new value to a GraphQL enum. I run codegen, then fix everything until the compiler is happy. Afterwards, all places where this enum was ever touched will take it into account correctly, including mappings, translations, all switch statements, conditions, lists where some of the other values are mentioned and so on.
This is something that‘s not possible in a dynamic language and it‘s not even possible in Java, really.
I rely on this daily.
I want my code to be clear and with certain expectations fulfilled, rather than a mystery in front of me. I'm not there to learn what could be passed into my functions - I'm there to create functionality.
In saying that, I'm interested in if there is any accepted, peer reviewed literature with quantifiable data as to whether strongly typed languages are "better" (whatever the study might define as better such as being faster, more scalable, etc). From what I've heard and read, most of the better-ness that strong typing provides is related to people problems and being able to scale a team, not necessarily scaling a system or making the system better. When learning Go and TypeScript after primarily writing Ruby and Javascript, I'm convinced of the better-ness strong typing provides whether it's related to readability, better IDE intellisense, or speed (although Go for example is faster then Ruby and JS not just because it's strongly typed, but compiled), I'm just interested in if there's real data to support using them instead of anecdata.
The TypeScript Tax: A Cost vs. Benefit Analysis - https://medium.com/javascript-scene/the-typescript-tax-132ff...
The author leans against TypeScript, but does cite some relevant studies on the benefits.
---
One of the cited articles:
To Type or Not to Type: Quantifying Detectable Bugs in JavaScript - http://earlbarr.com/publications/typestudy.pdf (PDF)
So I’d guess that the number of type related bugs in dynamic languages is just a little bit greater than in static languages, simply because it is harder to make that kind of mistake in a typed language. But as a category, they aren’t common mistakes in the first place.
I can confidently say that I’m a bit of an expert at writing bugs :) and of all the kinds of bugs I write, type related bugs are probably no where near the top of the list.
That’s not to say that static typing isn’t better - I definitely think it is. But I can also believe that it doesn’t necessarily reduce the bug count by a huge margin. (For whatever it’s worth I think the main benefits are documentation and refactoring...)
But a lot of the time, their language is simply unable to encode certain properties as types, so by definition they don't think of some classes of bugs they do write as "type errors". Maybe in a statically typed languages they would have been type errors indeed!
It's as if the tool you use sometimes reinforces your blind spots: "you don't know what you don't know".
PS: anecdote is probably irrelevant, but I've written plenty of dumb type errors with Python. Things that would have been caught by a test or an external tool, sure, or the type checker of a statically typed language could have caught for me for free, leaving the more relevant logic tests to me. I tend to write type errors left and right. Maybe I'm simply not a good Python programmer, of course!
I also believe that, especially after it outscales what one person is able to (fully) overview. Static types are something a program can reason about so it allows so much more productivity boosting tooling to be created. This also goes way beyond simply catching type errors at compile time vs. runtime (a downside which can largely be mitigated by test coverage). Just look what an IDE for e.g. Java can do simply in helping you navigating a big codebase. Then throw in refactoring which is in many cases can even be a completely automated operation and in much more cases is at least greatly assisted by the tools. Tools for dynamic languages can often at most guess, making good guesses is hard so in practice you get mostly stuff which is pretty limited in its usefulness.
database.db <=> SQLiteAgent <=> localhost HTTP JSON <=> MyPythonWebApps
This exercise would also encourage development of a concise and generic schema for storing your business data (presumably because changes to the above JSON contract would be time consuming).
These data types are much more powerful than the fancy arrays that wowed me back in the day :)
They serve very different purposes and generally are not first and foremost good communication tools the way static types are, for a bunch of reasons. That doesn’t mean tests aren’t very useful and welcome things to have, however.
No, but it's quite plausible that it was a niche thing that might have been a fad at some points in the past, only to be revived by a new generation that includes many fad-chacing types, SV people, and BS-artists (aka influencers)...
And the term fits here, I believe: cold showers do very much wake you up and bring you into reality quickly. There's no dreaming about hypes when you're under a cold shower.
https://github.com/kisielk/errcheck
In any case, it’s trivial to detect via static analysis.
You can add that in eastern Europe you can get engineering degree which is "technical bachelor" from technical university, so I am software engineer as it is printed in my diploma.
Software has far more distinct pieces than any other product you can find anywhere (maybe the human body?) so it's impossible to completely check the modes of failure. I was just reading before about a hardware corruption bug due to a kernel feature [1] and it's hard to imagine the same chain reaction in other engineering areas.
In software it's also really hard to model behavior. In engineering you'll get tolerances, strength and other features of the pieces you use. In software, you can't even benchmark something and expect the same benchmark to translate to a different computer.
not to mention things like generative design and process automation are getting us to that point.
Would you mind expanding on this? I'd be interested in what processes you have and if you use any additional tooling.
Thanks!
For database stuff, an ORM with some validation rules is generally enough, and couldn't be replaced with static typing anyhow.
For anything that absolutely has to be a certain kind of data, there are things built into dynamic languages to check the type of something, and you just call it as needed on a case-by-case basis.
My summary would be: The spec must cover all possible implementations so it is usually larger than the most simple one.
An example from there:
> The authors of SibylFS tried to write down an exact description of the `open` interface. Their annotated version of the POSIX standard is over 3000 words. Not counting basic machinery, it took them over 200 lines to write down the properties of `open` in higher-order logic, and another 70 to give the interactions between open and close.
> For comparison, while it’s difficult to do the accounting for the size of a feature, their model implementation is a mere 40 lines.
Handwaving this layer away as "very boring" isn't exactly fair, either. What does boring even mean here? I mean, this layer solves problems that are both essential to performance scaling of RDBMS and have been proven time and again to be hard to reliably solve in a general case. And it has furthermore been built from the ground up tailored towards the specific needs/use cases of the company.
By the aforementioned handwaving the presented successes are implicitly attributed to SQLite to a degree that isn't justified IMO.
1U of cohost with 100mbps in a cheap Eastern European DC will cost a few hundred euro per month... and my info is a few years old. It's more expensive now.
It is very specific case, that we should not extrapolate to mean it's general use case.
AWS/GCP/Azure are still a better places to start for most people.
I would also say that there's a pretty big difference between "stiff upper lip/no emotions" that people imagine when using the the adjective "stoic" and the Stoic writings of Marcus Aurelius and the like.
Static/dynamic is whether type checking is done at compile time or run type.
Strong/weak is how flexible the language is with type conversion.
Another explanation: https://en.hexlet.io/courses/intro_to_programming/lessons/ty...
As in:
Static typing - 1 == "1" is false (Python style, integer isn't converted to string for comparison)
Dynamic typing - 1 == "1" is true (PHP style, integer or string may be converted)
Types can't check the correctness of everything, but they do prove that certain classes of errors don't exist in your program.
Tests, on the other hand, can test for many more types of bugs, but they can only look for errors, they can't prove correctness (except in very small, closed environments where you can literally test every possible combination of inputs and outputs).
That's particularly important when refactoring because you want to assert that you haven't introduced new bugs, and the type system will often let you prove that with almost zero effort on your part.
Static typing makes both of those trivial if your language (or linter) has enum-exhaustiveness checks for switch statements.
Tests are existentially quantified, types are universally quantified.
(Yeah, existential types, I know. Shut up. ;) )
Whether or not I could perform this miracle depends entirely on your specific use cases. Many people who have this sort of reaction are coming from a place where there is heavy use of the vendor lock-in features such as SSIS and stored procedures.
If you are ultimately just trying to get structured business data to/from disk in a consistent manner and are seeking the lowest latency and highest throughput per request, then SQLite might be what you are looking for.
The specific core counts or other specifications are meaningless. SQLite scales perfectly on a single box, and if you have some good engineers you might even be able to build a clustering protocol at the application layer in order to tie multiple together. At a certain point, writing your own will get cheaper than paying Microsoft for the privilege of using SQL Server.
I can think of a couple of places I've worked where they had simple problems that could have been solved by some thinking and coding but instead were solved* by more expensive hardware.
In my experience, SQLite is likely to be faster when you have lots of reading. Being in-process gives SQLite a natural advantage in read-heavy situations.
See: https://www.sqlite.org/wal.html
"Write transactions are very fast since they only involve writing the content once (versus twice for rollback-journal transactions) and because the writes are all sequential. Further, syncing the content to the disk is not required, as long as the application is willing to sacrifice durability following a power loss or hard reboot."
I also had a server with 96 cores until we realized a developer had inadvertently made a query happen every time a scroll event fired... it was a nice chunk of chance saved.
What people are conveniently leaving out is they wrote a serious wrapper around it that makes it very similar to other conventional large scale systems like MSSQL or MySQL: https://bedrockdb.com/
If we wanted to run an aggregate that could potentially impact live transactions, we would just copy the SQLite db to another server and perform the analysis there. We have some telemetry services which operate in this fashion. They go out to all of the SQLite databases, make a copy and then run analysis in another process (or on another machine).
I am not aware of any hosted SQL technology which is capable of magically interleaving large aggregate queries with live transactions and not having one or both impacted in some way. At the end of the day, you still have to go to disk on writes, and this must be serialized against reads for basic consistency reasons. After a certain point, this is kinda like trying to beat basic information theory with ever-more-complex compression schemes. I'd rather just accept the fundamental truth of the hardware/OS and have the least amount of overhead possible when engaging with it.
No, absolutely not.
That's why modern databases use a thing called multi version concurrency control. You can run (multiple) queries on the same table that is updated by multiple transactions at the same time without one blocking the others (assuming the write transactions don't block each other). Of course they are fighting for I/O, but there is no need so serialize anything.
Mixing OLTP and OLAP becomes increasingly "normal" theses days as the capabilities of the database products and the hardware improve. With modern high-end hardware (hundreds of CPUs, a lot of SSDs, large RAM) this actually scales quite nicely .
OLAP databases tend to write columns in large blocks and apply sort orders to improve compression. This type of structure works well if you write the data once and read it many times. It's horrendously inefficient for concurrent updates to things like user session contexts. (Or even reading them for that matter.) You are better off using a row store with ACID transactions and relatively small pages.
The dichotomy has been visible for decades and shows no sign of disappearing, because the difference is mostly how you arrange and access data, not so much the hardware used.
So in your case, MVCC is what you're talking about, which is not the same level of consistency guarantee as serializable, rather it is based on snapshot isolation. Some database vendors consider them effectively the same isolation level because the anomalies associated to other common non-serializable isolation levels aren't typically present in most MVCC implementations, but there's a lot more complexity here than you are acknowledging.
Mixing OLTP and OLAP workloads on the same database is pretty much always a bad idea. This is why it's common practice to use ETL jobs to move data from an OLTP optimized database like Postgres or MySQL to a separate database for OLAP (which could be another MySQL or PG instance, or could be something like ClickHouse or another columnar database optimized for OLAP). Just because you /can/ do something, doesn't mean you /should/ do something...
Just because it's "normal" doesn't mean it's correct. Just because you can doesn't mean you should.
All hail bob1029:
> We aren't running any reports on our databases like this. I would argue it is a bad practice in general to mix OLTP and OLAP workloads on a single database instance, regardless of the specific technology involved.
Persisting a transaction to the database is still (and especially in MVCC): "send data write". "wait for write to be flushed". "toggle metadata bit to mark write as completed". "wait for bit to be completed" which still serialises transaction commits while reads can complete in parallel as fast as the device can handle.
Especially now that the reads and writes don't have to share the disk head, it makes sense for random reads to keep on scaling better than writes.
I do sometimes wonder if dirty reads are what the business folks actually want.
Not necessarily unconstrained dirty reads. But if it were possible to say, "The statistics in your reports may only be accurate to (say) +/- x%," would that be good enough?
Going really philosophical, might they even make better decisions if they had less precision to work with? There are certainly plenty of studies that suggest that that's basically how it works when people are managing their investments.
Primary/replica is probably the correct way to solve this. In some places, I have also shunted writes through SQS queues, which in practice protects us from a locking operation in one place impacting other operations in a customer-facing way. I don’t think this is strictly necessary but it is a nice technical guard against the sociological problem of a contractor merging code like that. They don’t feel the pain of locks gone bad because they (per contract) can’t be on call.
The fact that most of the stuff we care about is time-domain & causal means that we can typically leverage this basic ideology. Very rarely does a time-series aggregate query need to be consistent with the OLTP workloads in order to satisfy a business question.
Strongly typed dynamic: Python
Weakly typed dynamic: Javascript
Strongly typed static: Haskell
Weakly typed static: C
Liskov, B; Zilles, S (1974). "Programming with abstract data types".
http://web.cs.iastate.edu/~hridesh/teaching/362/07/01/papers...
You'll see modern NVMe disks with sustained writes greatly outpacing reads until the write cache is saturated, at which point what you say is true and reads will greatly outpace writes. But you don't want your disks to ever be in that threshold.
And sometimes it's just a matter of framing. Don't say accurate, say, "Accurate to within x%" or "Rounded to the nearest $x" type of thing. But I certainly would never actually pick an argument over it. Sometimes they do know what they want. Other times they really don't, but you still don't get to decide for yourself how the problem is going to be solved.
For instance, we use .NET Core Self-Contained Deployments combined with SQLite. As a result, we ship a zip file containing the dotnet build artifacts to a blank windows or linux host and have a working application node within a matter of seconds. The databases will be created and migrated automatically by our application logic, and the host is automatically configured using OS interop mechanisms.
So, when you really look at it, the constraints imposed upon us by SQLite encouraged us to completely sidestep the containerization game. Our "container" is just a single platform-specific binary path that has all of its dependencies & data contained within.
Without SQLite, we would have to have some additional process for every environment that we stand up. This is where the container game starts to come in, and I feel like its a bandaid to a problem that could have been avoided so much further down in the stack (aka SQLite vs MySQL/SQLServer/Postgres). Sure, there are applications where you absolutely must use a hosted solution for one reason or another, but for many (most) others where you do not, it's really the only thing stopping you from having a single process/binary application distribution that is absolutely trivial to install and troubleshoot. You literally just zip up prod bin path and load it on a developer workstation to review damages. 100% of the information you need will be there every time. No trying to log into a SQL server or losing track of which trace folder is for what issue # and sql dump. It keeps things very well organized at all levels. We can just launch the production app with --console to see its exact state at the time of the copy operation and then attach debuggers, etc.
The point is that I robbed someone of the chance to buy a shiny new computer.
It's crazy. I could maybe understand if there's a time crunch where it's quicker and easier to get more hardware in order to make a sale that'll keep the company alive (which I have experienced once) but that's maybe 1% of the cases.
Anyway, in lieu of their gratitude, I offer my thanks because I appreciate the effort.
A quick way to find out what sort of self-proclaimed lover of Meditations you’re dealing with is to ask what they think of its physics and metaphysics.
I don’t give a damn whether people call themselves Stoics or not and whether they’re sincere—whatever that means—or not, but I’m very sure, specifically, that there are a lot more fans of Meditations than people who’ve read it or even meaningfully read about it, which is funny in that “oh boy, aren’t we humans goofy” sort of way. It’s also an easy phenomenon to stumble on innocently while trying to discuss the book, though I think that goes for a lot of Very Important Books that more people claim to have read than actually have. IIRC someone wrote a whole tongue-partially-in-cheek guide to pretending at having read books, for the reason that it’s pretty common.
In most cases, snapshot isolation is sufficient, and some database vendors even conflate snapshot isolation with serializable, but they're not the same thing. I'd be hesitant to believe any vendor claims that they implement serializable MVCC without validating it via testing. As we've been shown by Jepsen, database vendors make many claims, some of which are unsubstantiated. Spanner is very cool technology, however I have personally heard some very interesting claims from folks on the Spanner team that would violate the laws of physics, so again, without a demonstration of their claims, I'd take them with a grain of salt.
If your condition for believing these claims is approval from Jepsen (i.e. Kyle Kingsbury), he apparently didn’t both testing FoundationDB because their test suite is “waaaay more rigorous”: https://twitter.com/aphyr/status/405017101804396546. In particular their test suite is able to produce reproducible tests of any variation in messaging (dropped messages, reordered messages) across their single-threaded nodes, which is extremely useful in narrowing down places where serialization violations can hide. He also seems to believe Spanner’s claims: https://www.youtube.com/watch?v=w_zYYF3-iSo
I’m not sure where this “conventional wisdom” about serializability having an unavoidable large performance hit is coming from; the databases I mentioned are very well known in the distributed systems field and not some random vendors making outlandish claims.
I mean, at least three orders of magnitude, minimum.
It's just a ludicrous argument. SQLite is fine for a file format, and in very specific dumb CRUD scenarios it's just about ok. But it's not worth sticking with if you need anything interesting over any volume of data, far far far below what would warrant a different DB tech to rdbms.
Ironically, you're more representative of the "we'll need Hadoop" crowd.
That seems an especially odd assertion to make. What is your basis for this comment? I'd suggest it makes clear you know nothing about me.
"I don't like to waste time writing tests, because I need that time to fix bugs on production that happened because I don't write tests".
The relation to static typing is that static types are a kind of test the computer automatically writes for you.
An example would be Common Lisp's `map` function [0] (it takes a number of sequences and a function that has as many parameters as there are sequences). It would be hard to come up with a type for this in Java, and it would be a pretty complicated type in Haskell.
Another example of many people's experience with static typing is the Go style of language, where you can't write any code that works for both a list of strings and a list of numbers. This is no longer common, but it used to be very common ~10-15 years ago and many may have not looked back.
[0] http://www.lispworks.com/documentation/HyperSpec/Body/f_map....
http://learnyouahaskell.com/functors-applicative-functors-an...
max <$> ZipList [1,2,3,4,5,3] <*> ZipList [5,3,1,2]
> [5,3,3,4]Anyway, bitter syntax sugar aside, the way you wrote the function I proposed was... a completely different function with similar results, which does not have the type I was asking for, and you only had to introduce 2 or 3 helper functions and one helper type to do it. I wanted to work with functions and lists, but now I get to learn about applicatives and ZipLists as well... no extra complication required!
Edit to ask: could this method be applied if you didn't know the number of lists and the function at compile time? CL's map would be the equivalent of a function that produces the expression you have showed me, but it's not clear to me that you could write this function in Haskell.
Edit2: found a paper explaining that this is not possible in Haskell, and showing how the problem is solved in Typed Scheme: https://www2.ccs.neu.edu/racket/pubs/esop09-sthf.pdf
> Another example of many people's experience with static typing is the Go style of language
Remember that a lot of backlash against Go's type system comes from static typing advocates used to more expressive static type systems :) It'd be a shame if, after all we complained about Go's limitations, newcomers held Go as an example of why static typing is a roadblock...
I mostly agree, don't get me wrong. And it's important to note that Common Lisp's `map` functions do more than what people traditionally associate with `map` - they basically do `map(foo, zip(zip(list1, list2), list3)...)`.
Still, this is a pretty useful property, and it is very natural and safe to use or implement, while being impossible to give a type to in most languages.
C++ can do it with the template system, as can Rust with macros (so, using dynamic typing at compile time).
Haskell can make it look pretty decent (if you can stand operator soup) by relying on auto-currying and inline operators and a few helper functions. I would also note that the Haskell creators also though that this functionality is useful, so they implemented some of the required boilerplate in the standard lib already.
In most languages, you can implement it with lambdas and zips (or reflection, of course).
So I think that this is a nice example of a function that is not invented out of thin air, is useful, is perfectly safe and static in principle, but nevertheless is impossible to write "directly" in most statically typed languages.
Just to show the full comparison, here is how using this would look in CL, Haskell and C#:
CL
(map 'list #'max3 '(1 3 5) '(-1 4 0) '(6 1 8))
Haskell
max3 <$> ZipList [1 3 5] <*> ZipList [-1,4,0] <*> ZipList [6,1,8]
OR
(<*>) ((<*>) ((<$>) max3 (ZipList [1,2])) (ZipList [-1,4])) (ZipList [3,1])
C#
new int[]{1,3,5}.Zip<int,int,Func<int,int>>(new int[]{-1,4,0}, (a,b) => (c) => max3(a, b, c)).Zip(new int[]{6, 1, 8}, (foo,c) => foo(c))
Note only the CL version, out of all these languages, can work for a function known at runtime instead of compile-time. None of the static type systems in common use can specify the type of this function, as they can't abstract over function arity.Here's a paper showing how this was handled in Typed Scheme: https://www2.ccs.neu.edu/racket/pubs/esop09-sthf.pdf
The language developers themselves have repeatedly stated that its type system being very limited is intentional.
See e.g. here: https://github.com/golang/go/issues/29649#issuecomment-45482...
TBH, sometimes I wonder why they bothered with static typing at all...
This is not a question of just supporting parametric polymorphism, but of abstracting over the number of arguments of a function, which is not supported in almost any type system I know of; and then of matching the number of arguments received with the type of function you specified initially.
self[expr.type](self, expr)There are better languages for expressing this more natural (such as Idris) but in the end, the fallacy seems to lie in your claim that this would be "safe and easy to do with dynamic typing". That's what you think until you find out that your solution works in 99% of the cases, except in some special cases, because the compiler didn't have your back.
Examples are the standard sort functions in Java and python, which were bugged for a very long time.
Btw, here is the executable code in Scala: https://scalafiddle.io/sf/UrDu12b/1
Posting it for reference in case Scalafiddle is down:
import shapeless._, ops.function._
def multiMap[InputTypes <: HList, MapF, HListF, MapResult] (inputs: List[InputTypes])(mapF: MapF)
(implicit fn: FnToProduct.Aux[MapF, InputTypes => MapResult]) = inputs.map(fn(mapF))
val testList2Elems = List(
3 :: "hi" :: HNil,
5 :: "yes" :: HNil
)
multiMap(testList2Elems){ (num: Int, str: String) =>
s"$num times $str is ${List.fill(num)(str).mkString}"
}.foreach(println)
val testList3Elems = List(
3 :: "hi" :: 3 :: HNil,
5 :: "yes" :: 2 :: HNil,
2 :: "easy" :: 1 :: HNil
)
multiMap(testList3Elems){ (num: Int, str: String, mult: Int) =>
s"$num * $mult times $str is ${List.fill(num*mult)(str).mkString}"
}.foreach(println)
// As expected, the compiler has our back and the following does not compile
val testListWrongElems = List(
3 :: "hi" :: HNil,
5 :: "yes" :: "ups?" :: HNil
)
/*
* Whoops, does not compile, list shape not good for multiMap :)
*
multiMap(testListWrongElems){ (num: Int, str: String) =>
s"$num times $str is ${List.fill(num)(str).mkString}"
}.foreach(println)
*/
/*
* Whoops, does not compile, 2-sequences vs 3 argument function :)
*
multiMap(testList2Elems){ (num: Int, str: String, mult: Int) =>
s"$num * $mult times $str is ${List.fill(num*mult)(str).mkString}"
}.foreach(println)
*/template <typename... Lists, typename Func> auto map(Func && func, Lists &&... lists) -> std::vector<decltype(func(std::declval<typename std::decay<Lists>::type::value_type>()...))>;
Yes, nothing hard to understand or discover about that at all...
Is this an unusual use case?
This is the classic use case for code generation. (And IMO one of the few justified ones.)
1. There's no guarantee the correct theoretical model of your program fits the type system of your programming language.
2. Sometimes there are multiple correct models for different purposes in the same program, similar to how sometimes you need multiple views onto the same database tables.
3. Sometimes you just need the ability to bodge things.
Just wanted to point out that even though you can have multiple views or your database tables, they all still adhere to the same type system.
More explicitly, the expression there seems to rely on knowing the arity of the function and the number of lists at compile time. Basically, I was asking for a function cl_map such that:
cl_map foo [xs:[ys:[zs:...]]] = foo <$> xs <*> ys <*> zs <*> ...
Edit: found a paper explaining that this is not possible in Haskell, and showing how the problem is solved in Typed Scheme: https://www2.ccs.neu.edu/racket/pubs/esop09-sthf.pdf > traverse_ print (sequenceA [ZipList [1,2], ZipList [3,4]])
[1,3]
[2,4]Basically, instead of your example I would like to do something like this:
> cl_map (+) [ZipList [1,2,3], ZipList [4,5,6]]
[5,7,9]
> cl_map (+ 3) [ZipList [1,2,3]]
[4,5,6]
> cl_map max3 [ZipList [1,2], ZipList [3,4], ZipList [5,6]] where max3 x y z = max x (max y z)
[5, 6]
Can this be done? What is the type of cl_map?Note: If this doesn't work with ZipList, that's ok - the important part is being able to supply the function at runtime. Also, please don't assume that the function is associative or anything like that - it's an arbitrary function of N parameters.
> (+) <$> ZipList [1,2,3] <*> ZipList [4,5,6]
ZipList {getZipList = [5,7,9]}
> (+3) <$> ZipList [1,2,3]
ZipList {getZipList = [4,5,6]}
> let max3 x y z = max x (max y z)
> max3 <$> ZipList [1,2] <*> ZipList [3,4] <*> ZipList [5,6]
ZipList {getZipList = [5,6]}
If you want to use "functions unknown at runtime that could take
any number of arguments" then you'll have to pass the arguments
in a list. Of course these can crash at runtime, which
Haskellers wouldn't be happy with given an alternative, but
hey-ho, let's see where we get. > let unsafePlus [x, y] = x + y
> fmap unsafePlus (sequenceA [ZipList [1,2,3], ZipList [4,5,6]])
ZipList {getZipList = [5,7,9]}
> let unsafePlus3 [x] = x + 3
> fmap unsafePlus3 (sequenceA [ZipList [1,2,3]])
ZipList {getZipList = [4,5,6]}
> unsafeMax3 [x, y, z] = x `max` y `max` z
> fmap unsafeMax3 (sequenceA [ZipList [1,2], ZipList [3,4], ZipList [5,6]])
ZipList {getZipList = [5,6]}
So the answer to your question is that cl_map :: ([a] -> b) -> [ZipList a] -> ZipList b
cl_map f = fmap f . sequenceA
except you don't actually want all the elements of the list to be
of the same type, you want them to be of dynamic type, so let's
just make them Dynamic. > let unwrap x = fromDyn x (error "Type error")
>
> let unsafeGreeting [name, authorized] =
> if unwrap authorized then "Welcome, " ++ unwrap name
> else "UNAUTHORIZED!"
>
> fmap unsafeGreeting (sequenceA [ZipList [toDyn "tome", toDyn "simiones", toDyn "pg"]
> , ZipList [toDyn True, toDyn True, toDyn False]])
ZipList {getZipList = ["Welcome, tome","Welcome, simiones","UNAUTHORIZED!"]}
and the type of cl_map becomes cl_map :: ([Dynamic] -> b) -> [ZipList Dynamic] -> ZipList b
cl_map f = fmap f . sequenceA
One could polish this up a bit and make a coherent ecosystem out
of it, but Haskell programmers hardly ever use Dynamic. We just
don't come across the situations where Clojurists seem to think
it's necessary.It's nice that Haskell does offer a way to circumvent the type system to write somewhat dynamic code, but it's a shame that in order to write a relatively simple function we need to resort to that.
Note that the type of cl_map is perfectly static. It would be `Integer N => (a_0 ->... a_N -> r) -> [a_0] ->... [a_N] -> [r]` assuming some fictitious syntax.
Steady on! You posed a question and I gave an answer. You weren't happy with that answer. I think it's a bit premature to conclude that "this function can't be written in a simple, safe way in Haskell".
> as the article I linked claims, Haskell's type system can't encode the type of the cl_map function.
Could you say where you see that claim in the article? I can see three mentions of "Haskell" in the body, two of them mentioning that one researcher's particular implementation doesn't handle this case, but not a claim that it can't be done.
> Note that the type of cl_map is perfectly static. It would be `Integer N => (a_0 ->... a_N -> r) -> [a_0] ->... [a_N] -> [r]` assuming some fictitious syntax.
OK, fine, it's a bit clearer now what you are looking for. How about this:
> cl_map (uncurry (+)) ([1,2,3], [4,5,6])
[5,7,9]
> cl_map (+3) [1,2,3]
[4,5,6]
> let max3 (x, y, z) = x `max` y `max` z
> cl_map max3 ([1,2], [3,4], [5,6])
[5,6]
Notice that the function arguments are have different,
statically-known types! The type of this miracle function? cl_map :: Default Zipper a b => (b -> r) -> a -> [r]
And the implementation? -- Type definition
newtype Zipper a b = Zipper { unZipper :: a -> ZipList b } deriving Functor
-- Instance definition
instance a ~ b => D.Default Zipper [a] b where def = Zipper ZipList
-- These three instances are in principle derivable
instance P.Profunctor Zipper where
dimap f g = Zipper . P.dimap f (fmap g) . unZipper
instance Applicative (Zipper a) where
pure = Zipper . pure . pure
f <*> x = Zipper (liftA2 (<*>) (unZipper f) (unZipper x))
instance PP.ProductProfunctor Zipper where
purePP = pure
(****) = (<*>)
Given that the only two lines that actually matter are newtype Zipper a b = Zipper { unZipper :: a -> ZipList b } deriving Functor
instance a ~ b => D.Default Zipper [a] b where def = Zipper ZipList
and the rest are boiler plate that could be auto-derived, I think this
is pretty satisfactory. What do you think?Still, you haven't written exactly the function I was asking for. You require a manual, compile-time step of transforming the N-ary function to a unary function taking a tuple. Still, it's impressive that this can define variable-length, variable-type tuples. Unfortunately I am not able at all to follow your solution, as it's using too many types that I'm not familiar with, and it seems to require some external packages, so I can't easily try it out in an online compiler to understand it better (as I have been doing so far).
Either way, I would say we are well outside the limits of an easy to understand way of specifying this kind of function - even if you are only showing 2 lines of code, it seems that your definition requires, outside of lists and functions (the objects we intended to work with): ZipList, Default, Functor, Profunctor, ProductProfunctor, Applicative, and a helper type. Even if these were derivable, someone seeking to write this function would still need to be aware of all of these types, some of which are not even part of the standard library; and of the way they work together to magically produce the relatively simple task they had set out to do.
> Could you say where you see that claim in the article?
The claim is presented implicitly: for one, they conjecture that, were Haskell or SML to "pragmatically support" such a feature, it would be used more often (offering as argument the observation that both Haskell's and SML's standard libraries define functions that differ only in the arity of their arguments, such as zipWith/zipWith3 in Haskell). This implies that, to their knowledge, it is not pragmatically possible to implement this in Haskell.
Similarly, given that in their "Related Works" section they don't identify any complete implementation of variadic polymorphism, it can be assumed that they claim at least not to have found one.
I'm afraid I'm now completely stumped about what you're asking for. If you have a function with a known arity and want to apply it to a known number of arguments then you can use the original formulation:
f <$> args1 <*> args 2 ... <*> argsN
You then asked what happens for unknown numbers of arguments, so I produced a solution that works with lists, which isn't very Haskelly, but does the job. After that you said you wanted something with a more specific type, so I came up with the answer that works generally over tuples (or indeed any type that contains a sequence of arguments). That's not satisfactory either! It seems you literally want a function with type `Integer N => (a_0 ->... a_N -> r) -> [a_0] ->... [a_N] -> [r]`. Well, I don't know how to do that in Haskell -- maybe my most recent solution extends to that -- but nor do I know why you'd want to do that! If you have a known number of arguments the first solution works fine. If you have an unknown number of arguments then you must have them all together in one datastructure, so the most recent solution works fine. Haskellers would be very happy with either of those and I don't see how we're missing out on programming convenience because of that. Maybe you could elaborate?Try this. It's a full working program. The packages it depends on are "profunctors" and "product-profunctors".
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeFamilies #-}
module MyExample where
import qualified Data.Profunctor as P
import qualified Data.Profunctor.Product as PP
import qualified Data.Profunctor.Product.Default as D
newtype Zipper a b = Zipper { unZipper :: Traverse ZipList a b }
deriving Functor
instance a ~ b => D.Default Zipper [a] b where
def = Zipper (P.dimap ZipList id D.def)
instance P.Profunctor Zipper where
dimap f g = Zipper . P.dimap f g . unZipper
instance Applicative (Zipper a) where
pure = Zipper . pure
f <*> x = Zipper ((<*>) (unZipper f) (unZipper x))
instance PP.ProductProfunctor Zipper where
purePP = pure
(****) = (<*>)
cl_map :: D.Default Zipper a b => (b -> r) -> a -> [r]
cl_map f = getZipList . fmap f . runTraverse (unZipper D.def)