More Itertools(more-itertools.readthedocs.io) |
More Itertools(more-itertools.readthedocs.io) |
results = list(get_some_stuff(...))
assert len(results) = 1
result = results[0]
into result = one(get_some_stuff(...))
I guess you could also use tuple-unpacking: result, = get_some_stuff(...)
But the syntax is awkward to unpack a single item. Doesn't that trailing comma just look implausible? (Also I've worked with type-checkers that will complain when a tuple-unpacking could potentially fail, while one has a clear type signatures Iterable[T] -> T.) [result] = get_some_stuff(...)result, _* = iterable()
The proposal seemed very close to getting shipped alongside https://github.com/tc39/proposal-iterator-helpers while basically accepting many of the constraints of current async iteration (one at a time consumption). But the folks really accepted that concurrency needs had evolved, decided to hold back & keep iterating & churning for better.
I feel like a lot of the easy visible mood on the web (against the web) is that there's too much, that stuff is just piled in. But I see a lot of caring & deliberation & trying to get shit right & good. Sometimes that too can be maddening, but ultimately with the web there aren't really re-do-es & the deliberation is good.
Disclaimer: this code was written several years ago with few downstream users, not all of these are super high performing, and they have not been super extensively tested.
Here is an update that should be much easier to convert to JS:
def tee(iterable, n=2):
iterator = iter(iterable)
shared_link = [None, None]
return tuple(_tee(iterator, shared_link) for _ in range(n))
def _tee(iterator, link):
try:
while True:
if link[1] is None:
link[0] = next(iterator)
link[1] = [None, None]
value, link = link
yield value
except StopIteration:
returnI'm not sure if it was this proposal or another one in a similar space, but I've recently heard about several async improvements that were woefully under-spec'd, and would likely have caused much more harm than good due to all the edge cases that were missed.
For an idea of the process followed, look up PEP417 (Python Enhancement Proposal.
Or maybe you mean the backport of dataclasses to 3.6 that is available on PyPI? That actually came after dataclasses was added to 3.7.
Source: I wrote dataclasses.
from itertools import chain
flatten = chain.from_iterable
Ref: pytudes - https://github.com/norvig/pytudes/blob/main/ipynb/Advent-202...
https://pybites.circle.so/c/python-discussion/functional-com...
Usally, I'd cast my arrays into a pandas DF and then use the equivalent dataframe operations. To me, pandas and numpy might as well be part of the python stdlib.
How should I reason about the tradeoff of using something like this vs pandas/numpy ? Esp. with Numpy 2.0 supporting the string dtype.
I promise I mean no offense by this but this is so comically absurd. Like you know it's not a cast right? Ie that you're constructing pandas dataframes.
> How should I reason about the tradeoff of using something like this vs pandas/numpy ?
For small sizes, operations on native types will be faster than the construction of complex objects.
The only way to understand what's going on with DF code is to step it in a debugger. I know they can be much faster, but man you pay a maintainability price!
My tasks aren't usually bottlenecked by the df creation operation. To me, the convenience offered by dfs outstrips the compute hit. However, if this is an order of magnitude difference , then it would push me to adopt the more-itertools formulation.
I would like to see some kind of query AST for this stuff in a query engine for semantics that its ops can be fused together for efficiency. For example, like a Clojure transducer.
Much appreciated!
My friend it's much worse than a single order magnitude for small inputs
import time
import pandas as pd
ls = list(range(10))
b = time.monotonic_ns()
odds = [v for v in ls if v % 2]
e = time.monotonic_ns() - b
print(f"{e=}")
bb = time.monotonic_ns()
df = pd.DataFrame(ls)
odds = df[df % 2 == 1]
ee = time.monotonic_ns() - bb
print(f"{ee=}")
print("ratio", ee/e)
>>> e=1166
>>> ee=656792
>>> ratio 563.2864493996569(Sure, it's easy to write obfuscated pandas, and it sometimes has version-specific bugs or deprecations which need to be hacked around in a way that compromises readability, and sometimes the API has active changes/namings that are non-trivial. But that's miles from "only way to understand is with a debugger". If you want to claim otherwise, post a counterexample on SO (or Codidact) and post the link here.)
I don't have anything I can show because the stuff I was working on was commercial and I don't code Pandas for fun at home ;)
The code I was maintaining / updating had long pipelines, had lots of folding, and would drift in and out of numpy quite a bit.
Part of the issue was my unfamiliarity with Pandas, for sure. But if I just picked a random function in the code, I would have no idea as to the shape of the data flowing in and out, without reading up and down the callstack to see what columns are in play.
Breakpoint and then look at the data, every time!
> The code I was maintaining / updating had long pipelines, had lots of folding, and would drift in and out of numpy quite a bit.
(Protein folding?)
Anyway yeah if your codebase is a large proprietary pipeline that thunks to and from pandas-numpy then now I understand you. But that's your very specific usecase. The claim "The only way to understand what's going on with DF code is to step it in a debugger" is in general overkill.
(Disclaimer: I wrote three of them and spend a good deal of my time helping others level up their Pandas. Spent this morning helping a medical AI company with Pandas.)