Python 2 vs. Python 3: A retrospective(dropbox.com) |
Python 2 vs. Python 3: A retrospective(dropbox.com) |
Set and dict comprehensions
{x**2 for x in range(10)}
{x: x**2 for x in range(10)}
Why reduce() must die:
... the applicability of reduce() is pretty much limited
to associative operators, and in all other cases it's
better to write out the accumulation loop explicitly.
int [divided by] int should return float
nonlocal
Explicit nonlocal variable modifier which (I guess) "promotes" the variable outside its local scope. Kind of the inverse of Java requiring 'final' to bind a variable to a closure.Douglas Crockford made a point in a recent interview[1] (not the first time, I'm sure) that this is exactly the wrong reason to keep doing things "the way it's always been". Other examples he mentions: line endings (CR/LF), integer overflow, short vs long. Big vs little endian would be another obvious example.
Fred Brooks (Mythical Man Month) calls this "accidental complexity".
> [novices are] only part of Python's audience, and probably won't be the longest-lived part
By definition. But, it's not really a good use of anyone's time to be dealing with truncation in a time when it no longer has any reason to be the default except historical accident.
[1] http://hanselminutes.com/396/bugs-considered-harmful-with-do...
The first one is in 2.7 (it's completely backwards compatible). The division is in 2.5 or 2.6 imported from __future__. #4 skirts the line, new keywords have been added in point release in the past. #2 wouldn't fly.
But here's the thing, P3 was not about these (or not only about them), they got bundled in because P3 was allowed to be significantly backwards-incompatible and thus a lot of changes became acceptable which were not justifiable or much harder to justify in a point release. The primary breakage point of P3 is not any of these, it's the string changes.
from __future__ import division
And then dividing numbers works a bit more logical (well I think it's more logical). You can do division with rounding with the // operator: a // b, and it even works with floats: (3.9 // 1.2) == 3 3/2 = 1
as the default behaviour.instead of writing
for index, element in enumerate(some_list)
for key, value in my_dict.items()
they should unify and make items and enumerate default behavior. i.e.
for index, element in my_list:
for key, value in my_dict:
I really don't see the benefit of not doing this as default behavior. I always find if I need to loop a list there is a good chance the index can help, and even if I don't need it it doesn't hurt to have one either. Simple is better. And the whole looping dict and get back the key only sucks too because you often need the value as well so you essentially do dict["key"] but why not just default return both key and value?
Wouldn't it be better to pick a better-suited language then?
John Carmack put it nice way:
"One of the lessons that we took away from Doom 3 was that script interpreters are bad, from a performance, debugging, development standpoint. It’s kind of that argument “oh but you want a free-form dynamically typed language here so you can do all of your quick, flexible stuff, and people that aren’t really programmers can do this stuff”, but you know one of the big lessons of a big project is you don’t want people that aren’t really programmers programming, you’ll suffer for it!"
I have a huge respect for Carmack but some other people prooved him wrong in the past. His opinions are often taken as gospel but more discreet people (like Sweeney) may have different and a s worthy points of view.
a = '12345'
a[0:-1] == '1234'
a[-1:0:-1] == '5432'
Personally, I think that "downcounting" slices are rarely used. For code clarity, I prefer reversing the string/list first.Because too much shit is broken (NumPy, hello). Because Python 3 has been the default on basically no system ever (OK, maybe this is changing right now, slowly).
As Guido says, it's been five years and it will take another five. This whole experiment has been a huge misstep for Python, an absolutely massive gaffe. Some of Python's peers did it too, roughly around the same time (Perl, and to a lesser extent Ruby).
Python (Guido?) noticed its own maturity a bit too late. The damage is incredible; along with the performance stuff (which is in a way easier to overcome) this may be a key factor leading to the fall of a great language.
I'm looking forward to 2.x dying out to eliminate that retrofitting step (and it's happening: the improving dependency landscape means I find I have to do it less and less often), but I've not experienced any major pain overall. From where I'm sitting, Python 3 is a better, cleaner language, and as someone new to Python, I'm happier for it.
Too many ugly renames, too few alternatives of doing things. To be honest the only attractive thing to me is all the libraries that they support but I don't find the language itself interesting.
I have been using Python full time for the last 7 years and I very rarely have to call either of those function.
> Too few alternatives of doing things.
Can you explain that as well? What do you mean by alternatives of doing things? Like say you want to read a file and you might want to use a wider variety of options when opening the file handle or say you want to parse JSON and you'd like standard library to have more parsers available?
That's what you want for maintainability. I'm not interested in maintaining a codebase where every programmer have their own idea of how something should be done. Of course, you have code reviews for this kind of thing. Except when the code is already written. And when it is not, arguing over minor points is an unnecessary timesink.
It's just that if you want to use a few new language niceties like range-based for loops then you'll need to conform to that convention.
Edit: not really needed, just loaded the dropbox preview, it is still readable
Ah, the road not taken.
Then Py3 could be nicknamed "Vista," I suppose.
I think you think for is magical and could be modified like this, but really for just iterates over something. It's enumerate and items that are the magic. Enumerate zips a range onto a list, the ``index,element`` unpacks the zip. Items returns a list of (key,value) tuples and the key,value unpacks that.
You couldn't modify the iterators because it would effect EVERYTHING. sum([1,1,1]) would now be sum([(1,1),(2,1),(3,1)]) AHH! And ``key in dict`` wouldn't work any more, since the iterator would return key,values. EEK.
Being simple vs explicit is a political debate. I prefer if Python has simpler magical syntax.
The "in" operator is actually a separate operator; it doesn't depend on the dict iterator implementation. So you could keep the semantics of the two separate.
However, if the dict iterator semantics were changed, it would make sense to change the semantics of "in" as well--and since it doesn't really make sense to change the semantics of "in" (if you're asking if something is "in" the dict, you mean the key), that would explain why the dict iterator semantics are the way they are: the dict is a container of keys, not key-value pairs.
You may always find this, but lots of people don't. I find myself doing lots of iteration that doesn't need the index.
and even if I don't need it it doesn't hurt to have one either
Yes it does, because you're adding extra code to compute the index to every iteration, whether it's needed or not. It's not a good idea to encumber such a basic language construct with any extra baggage; that's why the extra baggage is in "enumerate", so you only get it if you actually need it.
`list(seq)` will return a list of elements. `list(dct)` will return a list of keys. A for-loop will always loop over the abstract "list representation" of the object (the list result itself is just an accumulation of values from a for-loop).
So your first suggestion would make no sense whatsoever. As for the second...one could maybe argue for that. Many people end up using `dct.keys()` anyway when they want the keys, and `.items()` is so common that it could maybe be made the default. Ruby actually does this by default. However, it would break a ton of current code where people expect to be looping over the keys.
Dictionaries also have the same ambiguity problem (tuples can be dictionary keys). The noise problem is a bit more justified, but the entire construct is less needed since `D[k]` is less clunky and error-prone than `L.index`.
It's definitely the case that python was shoehorned into some places where it probably wasn't the best fit, but, at the time (1999-2001) was really the high-level dynamic language that had a lot of mindshare. A lot (almost all?) of companies that tried to use Zope as their application server back then would probably be looking at Java as their deployment platform today.
Python sits in that nice "batteries included, easy to read, reasonably fast to write" space. I tend to write most of my scripts in Python, because a week later, I could never understand the perl code I wrote, but, for some reason, python code never had that problem for me.
I don't expect too many people are writing quick "one-off" scripts in Go (I'd be interested to be proven wrong though), so perl/python/awk/bash all still have a place in people's toolkit.
Python went everywhere Perl did and then expanded the map for "scripting languages". This didn't happen by accident: Python is, by design, very easy comprehend and learn. The Python community also one of the most newbie-friendly around, with mountains of freely available resources for beginners.
A programming language cannot be sustained by uber hackers, PLT nerds, and hipsters alone. You gotta make it reach the world (Like JavaScript, Java) or it'll never be"Tier 1" programming language. I have yet to see the Go's developer or community put forth a strategy to make this happen -- which is completely understandable given how new the language is.
Imagine someone who was used to doing something like this:
import string
list=open('foo.txt').readlines()
list_strip=map(string.strip,list)
And then discovering this no longer works because the strip function has been removed from the string module (and moved to the string object). If all they want to do was a quick read of a file and then parse - they are not going to dive into Python3 and figure out the new way to do it - they'll just stick with pythonv2.It does, on the other hand, break backwards compatibility. Which is why hardly anyone updated.
When Python 3 started out, it had almost no libraries. So most projects at the top of the software stack couldn't use Python 3 because all the libraries they needed only worked with Python 2.
Below the top, to this day any individual library that wants to switch to 3 basically has to maintain separate forks for 2 and 3, because a lot of downstream users still use 2 because not all libraries are 3-compatible yet.
I think the Python developers are crazy for not using the proven __future__ import mechanism to allow new features to be introduced gradually and have new code interoperating with old code.
Change is hard. People don't want to spend the money on upgrading. It's almost all downside with very little upside.
Don't ever upgrade a running system until you are forced to do so.
http://lucumr.pocoo.org/2013/5/21/porting-to-python-3-redux/
Yes.
> Or is the Python community just very reluctant to jump on new things?
In my experience no. Lots of features have been added over the years, and they seem to be adopted quickly. v3 is a really big change though.
I'm kinda impressed though. I don't think I have a program that will view pptx files, so I was happy to be able to read it online.
For sequences that aren't strings, yes. :-) Strings are a sort of hybrid between "atomic" values and sequences, so their semantics can be different.
In fact it should never happen, that's what the `iter` and `next` builtins are for. The only use case for calling __next__ by hand is iffy as hell, it's overloading it while inheriting from an iterator.
RPython is a restricted subset of python indeed, but its purpose is to be a toolkit for implementing virtual machines. It is not and does not aim to be a general-purpose programming environment.
PyPy is a JITed Python runtime implemented in RPython.
The relation between PyPy and RPython is more or less the relation between CPython and C.
x = int(input(">>> "))
a = x / 2
append_int_to_magical_db(a)
If the division does a "naturaL" thing, you suddenly have a float "polluting" your integer algorithm, but it's _not consistent_. If the user enters "4", you get an int back. If they enter 5, you get a float.No, it always returns a float in Python 3. 4/2 gives 2.0.
It's a breaking change, yes, but in general I think it's a good one.
When I decided to use pelican for a non-English blog, I thought it would be piece of cake; just changing the theme and plugging a calendar converter and I would be done with it. In reality, I had to fork pelican and the calendar library (which was not well-maintained) and bang my head to the wall for three days to make them work together, all because of the whole string/unicode seperation and the fact that things work automagically as long as you're just using ASCII.
I like the explicit separation that Racket has between "here is a buffer of binary data" and "here is a sequence of Unicode characters," and (looking on the outside without working with it), I'm glad that Python 3 began to adopt some of that.
In Python 2, it's really easy to write code that confuses bytes and characters, which introduces bugs and crashes when non-ASCII characters show up.
In Python 3, they made it easier to work with Unicode, because it's the default for everything, and much harder to confuse bytes and characters, because of that separation between the data types.
There's also a lot of reuse and expansion on existing code bases, which would involve a lot of work to migrate to 3.x. There's also the matter that on top of moving to 3.x, you also have the task of making sure there's no hidden bugs that may alter your results in ways you may not notice. A lot of scientific code has been repeatedly vetted to make sure that there's no bias or glitches that may skew your results.
Hell, I know astronomers who are still using Fortran code that was written in the 80's. It still works (though now it requires a long build process, as it is no longer compatible with the latest Fortran compilers), so no reason to try to rewrite it just because the language is dated.
They were some of the earliest ported widely-used packages. Numpy was ported in 2010, scipy very early in 2011 (except for weave iirc)
[0] http://sourceforge.net/projects/numpy/files//NumPy/1.5.0/NOT... https://pypi.python.org/pypi/numpy/1.5.0
[1] http://docs.scipy.org/doc/scipy-0.9.0/reference/release.0.9....
What are the use cases where you frequently need the index? I write and read a lot of python and I almost never see it. Maybe if you're working in a problem domain where it's a common issue you could create some abstraction to better handle it for you.
It doesn't against ethos of Python. There is a Zens of Python, I don't know Ethos of Python. http://www.python.org/dev/peps/pep-0020/
Beautiful is better than ugly. Writing .items() or surrounding enumerate() does not make your code look prettier, does it?
Explicit is better than implicit. In fact, my proposal is more explicit than the implicit of for key in my_dict or for item in my list. I think returning both key,value are more explicit than say remembering the default return value in looping a dict is a key, not value, or relying on remembering enumerate or items.
It is simpler to write such code, and if you just need one of the return values, just that one, they are named.
You want a use case. I will give you one: find the location of a specific item in the list. and if you use django, you know your template engine can read your psudeo python template code. And you often need that index.
One could just as easily say that your proposal is less explicit, because a sequence is a sequence of items, not (index, item) tuples, yet you're making the "for" iteration yield tuples.
Similarly, a dict is a container of keys, not (key, value) pairs. The reason is that otherwise you would be unable to check to see if a key was in the dict unless you knew the value that went with it: but in that case why would you need the dict? (Technically, you could still iterate over the entire dict looking for your key, but that's extremely slow; the whole point of having a dict is to be able to do fast lookups of keys in order to retrieve their values.)
Writing .items() or surrounding enumerate() does not make your code look prettier, does it?
Neither does having to extract just one item from a tuple when I don't need the index. There's no way around the fact that one of the types of iteration (either just items, or index, item tuples) is going to have to be spelled with something extra. So just saying "I shouldn't have to type something extra" isn't a sufficient argument. You need to justify why your preferred type of iteration should be the one with the shorter spelling: and since your preferred type of iteration has extra baggage attached to it, it seems perfectly legitimate to me that it should have the longer spelling, not the shorter one.
find the location of a specific item in the list
That's what the "index" method is for.
>>> b=['one','two','three','four','two']
>>> b.count("two")
2
>>> b.index("two")
1
>>> b.index("two",2)
4If the best use case you can come up with is the one you've given then we've got real issues. If my team were ever iterating over a list to find the index of an item I would be very upset. That is categorically not the right way to do it.
Regarding the ethos - zen, ethos, call it what you will. Explicit is better. Maybe English isn't your first language but your idea is less explicit than the way it works T te moment.
Consider the following:
menu = [("Apples", 5),
("Cream Pie", 2),
("Tea and scones", 3)]
for food, price in menu:
print "To buy %s, please pay %d dollars" % (food,price)
Right now, it works unambiguously -- just the way you'd expect. The above prints: To buy Apples, please pay 5 dollars.
To buy Cream Pie, please pay 2 dollars.
To buy Tea and scones, please pay 3 dollars.
What if we implemented your rule? Would the intprereter print the above, or would it say this? To buy 0, please pay ("Apples", 5) dollars.
To buy 1, please pay ("Cream Pie", 2) dollars.
To buy 2, please pay ("Tea and scones", 3) dollars.
What if you wanted to print the first one? If your syntax were implemented, the programmer would have to write something awful like for index, (food, price) in menu:
or even for food, price in destructuring_without_index(menu):
which puts us in full circle again!The reason why most of us don't like your idea is because it introduces ambiguity and doesn't even remove the trade-off. No matter how you implement it, there's going to be a trade-off.
The zen of python, by Tim Peters:
Explicit is better than implicit.
Special cases aren't special enough to break the rules.
In the face of ambiguity, refuse the temptation to guess.In retrospect, I suppose the .index() method would be fine, since Lists can't hold two of the same object, and Dicts return the key by default.
Yes, they can. Try it! The index method has extra arguments to let you specify a range of indexes in the list to search, so you can find multiple indexes that point to the same object (by picking a range that excludes indexes you've previously found).
Dicts return the key by default
Dicts don't have an index method; dict keys are not ordered.