Lambda: The Ultimate Excel Worksheet Function(microsoft.com) |
Lambda: The Ultimate Excel Worksheet Function(microsoft.com) |
Lambda: The ultimate Excel worksheet function - https://news.ycombinator.com/item?id=25923628 - Jan 2021 (4 comments)
http://lambda-the-ultimate.org/papers
Including:
"Lambda the Ultimate Imperative"
"Lambda the Ultimate Declarative"
Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.
I get that calling code by its name can make it sound scary, but this whole notion of it being 'easy because it is not code' seems to be a big fat lie for comfort. Same goes for the magic no-code systems where code is replaced with 'expressions' or graphical 'workflows' which essentially is exactly the same thing, only shaped slightly differently.
This makes me wonder if it wouldn't be much better if we could focus on making people be able to code and have more 'coding capacity' instead of having less of that capacity and then reducing it even more by using some of it to create 'let us pretend this is not code'-applications.
Microsoft has some initiatives exploring it[2] (including the limited autocomplete in Excel as a toy version of the concept), but this is not it.
[1] https://en.wikipedia.org/wiki/Programming_by_example
[2] https://www.microsoft.com/en-us/research/wp-content/uploads/...
VBA's potential as an attack vector results in it being unavailable or heavily restricted in many corporate environments through stuff like Group Policies[1]. And I've worked with some clients whose IT goes a step further and completely blocks sending or receiving emails with .xlsm[2] attachments.
Since lambda-defined logic is all formula-based, it's not considered 'code' in that sense and can be used and passed around as a standard Excel file without any of the VBA-oriented restrictions. So you can approach your Excel workbook more like a programming project, centrally defining your complex logic once and referencing it elsewhere every time you want to use it. This is super helpful for audibility and maintenance, while staying within the bounds of what'll be applicable/usable across any Excel environment.
[1] https://4sysops.com/archives/restricting-or-blocking-office-...
[2] .xlsm is the extension Excel uses for spreadsheets containing VBA code
I think those are basically Excel formulas, which in context, are not what Excel users would consider “code”.
The argument I've heard against doing this sorta thing was that they wanted to keep Excel simple enough to not alienate many non-technical users, sorta forcing it to be a simple, accessible environment for everyone.
It'll be neat to see how the user-base adapts to a more powerful feature-set. I mean, it'd seem like a lot of folks will be thrilled, finally having some extra functionality without having to use macros/VBA/VSTO/COM/etc., though how might non-technical folks feel about a coworker sending them a spreadsheet with function-values?
I don't really see the addition of new advanced functionality changing that paradigm.
That's the biggest issue with LET / LAMBDA at the moment. Users are terrified of the name manager and simply do not understand what they are for or what "scope" means. On top of that, copying content from one workbook to another leads to names being copied over as well, which is how I often end up with ancient names such as FXRATE1997
It could of course be a reflection of my teaching ability, but it always seems to be a tough one.
Isn't it a feature of natural languages to have the same word assume different meanings depending on where it's used? The concept translates nicely, and in PLs it's completely explicit whenever this happens.
- You have a special area or special kind of sheet, where some cells are inputs, one is output, and all others are used for temporary calculation
or:
- You define your calculation as usual, in B5: = 10*A5 - then in C5:
=MYLAMBDA(B5; A5)(3)
Meaning: Take the formula in B5, treat A5 as an argument, and return a function. Then call this function with the argument 3.The benefit of this? You can have an area in your sheet where the user can enter formulas and multi-cell-calculations, not just numbers, and they are applied elsewhere.
On that note, TFA claims that the introduction of LAMBDA finally makes Excel Turing complete, unlike the kind of Turing machine simulators the stick figure is referring to in the XKCD comic...
> (In contrast, Felienne Hermans’s lovely blog post about writing a Turing machine in Excel doesn’t, strictly speaking, establish Turing completeness because it uses successive rows for successive states, so the number of steps is limited by the number of rows.)
Also, does this:
> With LAMBDA, Excel has become Turing-complete.
sound like a threat to anyone? :)
https://en.wikipedia.org/wiki/History_of_the_Scheme_programm...
=LAMBDA(global_function_name, [cell_input_1, cell_input_2, ...])
Wouldn't this be a cleaner design? Trying to deal with cells whose formulas are way too long to be put in a single cell is Excel's Achilles Heel (and a footgun that you are nearly guaranteed to enounter sooner rather than later). This LAMBDA proposal as written seems to exacerbate that problem, not improve it.
If I'm writing a longer formula that's going to be tough to read, I make it multiple lines and add spaces at the start of the lines for indentation. Makes readability so much better!
Lambda: The ultimate Excel worksheet function - https://news.ycombinator.com/item?id=25923628 - Jan 2021 (4 comments)
(Is the title a deliberate call to that site, or something even older?)
What's next? A full implementation of scheme? Common lisp?
"A paper written with Ronen Gradwohl on Lisp and Symbolic Functionality in an Excel Spreadsheet: Development of an OLE Scientific Computing Environment, August, 2002. (code available on request) "
Also, Emacs's org-mode implements some basic spread-sheet utilities, so...
https://techcommunity.microsoft.com/t5/excel-blog/bg-p/Excel...
Now they lose termination for the use case where someone knows how to program but can't or won't program in some "normal" programming language.
https://powerapps.microsoft.com/en-us/blog/introducing-micro...
I'm not sure what you mean by this.
One of the most annoying things about Excel is it has so many parts apparently designed by people or groups that didn't talk to each other and didn't have a grasp of all the rest of it, let alone the world of the (various groups of) users.
Who ordered another Turing-complete system in Excel? One that is, like all the others, a pain and a half to debug or analyze? Has anyone figured out how to turn this into a security vulnerability yet?
Saying "yay people are making videos" only makes me think of all the horrific tutorials on Power Automate. And this: https://xkcd.com/763/
I hope this gets implemented in libreoffice too; I will certainly tell non-programmers to stop using python or whatever and go back to spreadsheets!
Because Power Query is not a spreadsheet application, and has some much more severe performance cliffs than Excel proper does.
I don't think people at Microsoft are looking at Excel as a whole, like lost souls squatting in a mansion and building sand castles in the room that they live in that have no relationship to the actual building and what it needs to keep from falling down.
I'm not sure what you mean by performance cliffs. Can you give an example of where and how you would better accomplish something without Power Query? Are you talking about processing data in the range of a few hundred megabytes?
Mess up in Outlook, you right click and it gives a couple suggestions. It'll call out the typo right after you finish the word.
Mess up in Teams? It'll wait until you finish the next word (charitably, giving you a second to figure it out?) then will suggest a different word than Outlook would.
One thing that would greatly improve the experience would be to allow for formulas to contain just a lambda and then reference that lambda from another cell as a cell reference. Currently you have to use manage lambdas under Formulas > Name Manager. This would make debugging a lot easier in my opinion so that you can freely mix data entry with computation. Not sure why they haven't done this already, but I suspect it is because of assumptions baked into Excel.
My pet project from a couple of years ago[1] had cells-as-functions. I think it works really well. I also think names are important, but yeah they should either be easy or optional. Glad the Excel folks liked my rad idea though, even if they didn't quite hit all the high notes :-).
1: https://6gu.nz/, IMO worth watching the first minute of the video to see it in action
It's possible we're not talking about the same thing. Microsoft has slapped "Power" on so many different things. When I google "Power Query" I get a lot of "Power BI" stuff and I try to avoid that like the plague. In my limited experience, it's flaky, unstable, and adds negative value to my reports.
From my perspective, Power Query appears to be similar to the scripting language in something like Qlikview. Except much less painful (for me). I also think "grokking" Power Query could lead to improving SQL, even. The split between SQL and things like PL/SQL or T-SQL always felt wrong to me. Just having functions as a seamless part seems like the thing that was always missing.
https://xkcd.com/2453/ Wouldn't you know it?
https://powerapps.microsoft.com/en-us/blog/introducing-micro...
I’m still impressed with their renewal as a company. Rare for a stagnant tech firm to come back.
I suspect it is also related to the Curse of Knowledge (https://en.wikipedia.org/wiki/Curse_of_knowledge). Once you are past the hurdle of initially learning a concept it makes it hard to imagine not being able to grasp it: especially when dealing with abstract concepts such as scopes.
The amount of raw material in the universe is finite at a given point in time (it could be infinite over time, we don't know if time is infinite either).
I think we've already established (especially over the past year) that fiat money is infinite.
The Turing machine is a mathematical model. Infinity only exists in the world of mathematics. The physical world is by definition finite.
This isn't obvious to me, would you elaborate?
https://answers.yahoo.com/question/index?qid=20200123104919A...
Sadly that thread will soon be gone due to the approaching Yahoo Answers apocalypse. Hopefully Internet Archive will save it!
They are as much a software developer as they are a writer because they write e-mails; a presenter because they present the annual accounts to the CFO; and a cleaner because they put away their mugs at the end of the day (usually.. hopefully..)
You don’t have to be someone because you sometimes do something that other person also does.
Lots of people write code, using lots of applications and devices - spreadsheets, MATLAB, database queries, it's all code.
It’s not going to replace functionality of core spreadsheet-based excel for accountants, for instance, who typically won’t have a use for PowerQuery as their data is structured differently.
In a spreadsheet the data is much less structured which is where a lot of the power comes from - for instance PowerQuery doesn’t really support things like subtotals easily, or doing scratch-calculations, or building quick financial models. It is closer to a paper-ledger with calculations scribbled into the margins than a big-data database.
PowerQuery is more about ingesting lots of data and cleaning it, while finance is often about working stuff out and playing with numbers to see what happens - and playing with numbers is easier in a less-structured-loosely-typed environments.
Accumulation patterns perform abysmally even with data in the 100Ks of elements. Say you have a table of inventory movements and want instead a snapshot table of inventory at point in time. You can do an O(n^2) self-join of a table with itself to all records with a lesser date, summing all movements to derive a total quantity at that time.
If you want to use an accumulation pattern, you can sort and cast your table to a list of records and then use List.Accumulate to iterate over each list element, deriving a new field with the running total of inventory amount. If you do this, you will find that it falls right over even with 1Ks or 10Ks of records. This is because the intermediate list that you're appending to through the accumulation is itself a lazy stream. Thus, you have to use List.Buffer at each step. Even with List.Buffer at each step, this solution falls over at high 10Ks or low 100Ks of records.
Incredibly unintuitively, you can use List.Generate with an already-buffered input list to derive a new list that can then be cast back to a table, though this still struggles with 100Ks of records.
If your snapshots can be aggregates, then you can happily throw out the idea of such an accumulation pattern and just join to a date table at the appropriate grain with all movement records less than or equal to the date in that date table.
I'll note that I regularly speak with several of the people whose blogs you will inevitably come across when performance tuning Power Query. The approaches above are the current state of the art in PQ for iteration and accumulation patterns. This is not an appeal to authority or a brag. This is to highlight the difference with the Excel spreadsheet formula approach below, which even beginners can derive from first principals.
In an Excel spreadsheet, for the same challenge, you just define a new column with a special first row formula, and each subsequent cell referencing the row above. This will happily run right up to the spreadsheet row limit with no performance concerns. If you really want, you can spill over to multiple spreadsheets, which is clunky to manage, but still performs just fine, and degrades slowly. The M approaches above hit a cliff and start hanging.
Excel formulas make it trivial to reference arbitrary cells. M is a nearly-general purpose language. PQ uses M, but as a framework for writing M, it has a strong emphasis on a query/table paradigm. A table-based operation model cuts against the grain of a spreadsheet, because a spreadsheet is a collection of arbitrary cells. A tabular approach is a collection of similarly shaped records stacked one upon the other. These two paradigms have a fair amount of overlap, but are not isomorphic. There are things trivial to express in one that become difficult bordering on impossible in the other.
My language is strict and statically typed. However, after arrays (tables are arrays of records conceptually) exceed a certain length, rather than processing them in-memory as arrays, they will be offloaded to storage and processed (transparently) in a streaming fashion.
I’m surprised that this doesn’t work well in PowerQuery. I would have thought that 100K would be peanuts for it.
Mine is a SaaS however, so the user’s laptop isn’t a constraint, and I can transparently throw a million records in BigQuery or some other data warehouse and use its aggregates if needed. Although at the 100K scale you can use SQLite and it can handle that scale of data trivially on commodity laptops.
So your experience is interesting indeed.
I'll note, as I did to a sibling reply of yours, I made observations about a specific pattern that showcases performance issues in PQ/M. PQ/M easily scales beyond 100Ks of records, but not for arbitrary processing patterns.
That's not my experience. At work, the data usually isn't very large, but I have experimented on my own time with, for instance, a public covid data file that I think was several GB.
I also thought lazy semantics is a good thing, not a fundamental flaw.
Rather than debate, I would be interested enough to spend some time on a sample problem, if you could provide one, where you believe Power Query inadequate, and at the same time have an alternate solution to provide a benchmark of what is adequate.
These are two very different statements. I've happily used PQ to ingest GBs of data. Its streaming semantics are fine to great for some types of processing and introduce performance cliffs for others. There's no binary judgment to be made here. Laziness is neither a fundamental flaw not an unmitigated good.
I've already shared one specific pattern above. I can share some mocked up data if you need me to, but that might be a day or two. Also, feel free to reach out via email (in my profile).
If you mean this:
"Say you have a table of inventory movements and want instead a snapshot table of inventory at point in time"
Then I can make my own data to play with - I only want to be clear about the constraints. Would 500K records be enough to obviate the distinction between naive and non-naive approaches? Can you quantify (not precisely) "struggle"?
I have used Table.Buffer, but I probably don't thoroughly understand its use yet.
(I belatedly realized your problem is something I've done with Sharepoint list history recently, but not that many records, so I'm going to look for a public dataset to try)
P.P.S. I guess it also makes me think - I frequently am getting my data from an Oracle database, so if something is easier done there, I'd put it in the SQL. Analytic functions are convenient.
P.P.P.S. Aha! I found a file of parking meter transactions for 2020 in San Diego, which is about 140MB and almost 2 million records. This seems like a good test because not only is it well over the number you said was problematic, but it's well over the number of rows you can have directly in one Excel sheet.
https://data.sandiego.gov/datasets/parking-meters-transactio...
I am very not an algorithm person, but I got a huge speedup from a "parallel prefix sum" instead of the obvious sequential approach or the even worse N^2.
I translated this to M by rote and trial and error (page 2): https://www.cs.utexas.edu/~plaxton/c/337/05f/slides/Parallel...
Implementing the parallel, recursive solution got me a million rows in about three and a half minutes.
Fill down (which I had to do anyway to compare) was about 10 seconds.
So...probably not the first choice in this scenario but could be worse?
Subtotals? I was used to using GROUPING SETS with Oracle SQL, and found I could roll my own in Power Query. It's a good example of exactly why I like it.
Also, Power Query doesn't prevent you from using the regular table total feature or a pivot table based off of the Power Query output.
That is, even if Power Query doesn't provide all the subtotaling features you'd like in the way you'd like, it doesn't restrict you from anything, does it?
> or doing scratch-calculations, or building quick financial models
I do use it to do all sorts of ad hoc calculations - for instance, it can ingest PDF files or HTML with tables.
It sounds as if you're saying it's too complicated for really trivial calculations?
I'm saying it's not the right tool for some classes of calculations.
For instance I work in designing warehouses, and use both tools. Here are some use cases where Excel doesn't do well and I would use PowerQuery:
* Ingesting millions of historical orders
* Handling relational data
* Data cleaning and aggregations
Here are some example use cases where PowerQuery doesn't work as well, but Excel is perfectly good:
* What height should the pallet racking bays be in this warehouse, and how many pallets am I likely to fit in the building envelope? (considering my other space requirements)
* What's the likely transport impact of opening a new distribution point?
* Running lots of scenarios or sensitivities.
Why are these better in excel? Well there are just some things PowerQuery doesn't do well, for instance excel can take into account any other arbitrary cells value into it's own calculation, while in PowerQuery you generally have to use an intermediary table and joins to handle this.
Can both tools physically do it? Yes, it's just some problems suit one rather than the other, and identifying the right tool for the right problem saves you lots of time. One thing that makes Excel better for scratch calculations for example is the fact that it's a live environment (with PowerQuery you have to run it after changes to get the results back, and this can be really slow compared to excel).