Why Is the Migration to Python 3 Taking So Long?(stackoverflow.blog) |
Why Is the Migration to Python 3 Taking So Long?(stackoverflow.blog) |
How can we expect Python 3 to become the default if Python 2 still asserts such dominance?
I've been meaning to dig into Maya, Houdini, Nuke's Python 3 transition plans. I know Houdini will offer a Python 3 option with Houdini 18 (shipping in the next month or so).
I don't think the reason was because of downstream users. Python 3 was an inevitable change. Previously, they swapped out PyQt for PySide which wasn't a forced change, but required everyone to update their Python scripts.
So much effort wasted doing this in a large codebase. And what do you get for it? It’s just not worth it. Nobody actually needs Python 3, it was foisted on them by the developers. What everyone really wanted was Python 2.8.
I think many people underestimate the challenge that the 2 to 3 migration presents for large enterprises. The core issue is that even though the migration for any given module is normally really easy, the total effort required to migrate is still essentially O(n) in module count/file count, because even with current tooling you still need to have an engineer look at every module to do the change safely. Even if it only takes ~5 minutes per module to make the changes and validate that it works correctly, this becomes a giant undertaking when you have tens of thousands of files to migrate.
The fact that it takes a long time also creates other problems. Your business isn't going to hit "pause" on other development, so there will be changes constantly introduced into modules you've already "swept". It's going to be hard to make sure 100% of your engineers and code reviewers are knowledgeable about the specific requirements to make sure the code works in both 2 and 3, so you would really like some automated safeguards to make sure they don't introduce anything that won't work in 3. Pylint helps with this, but won't catch everything. Unit tests are obviously essential, but:
1. Even a well-tested project won't have tests that cover 100% of code paths and behavior.
2. You're stuck running the tests on both python2 and python3 for the duration of the migration, which doubles the resource (compute, memory, etc.) cost of your Python CI and regression testing infrastructure for the duration of the migration.
Most big companies have passionate Python advocates who really want to be on Python 3, but the scale of the problem and the lack of tooling to tackle it with a sub-O(n) amount of effort make the overall project risky and expensive for the business.
The problem we have where I work is some very clever 2.7 code that isn't easy to redo in Python 3. For any new project I do, I use Python 3.
The unicode switch is a nightmare in terms of having to go through and double/triple check everything and still get it wrong half the time. Particularly when it comes to moving data over the network.
The big selling point for Python3 finally came with the built-in async support, but we've been using Twisted for a decade, which works nearly identically, so even that wasn't a huge draw for us.
Further, many of our dependencies were python2-only up until the last year or two.
Really the only reason we're going through the effort right now is that Python2 is rapidly approaching End of life.
This doesn't require parallel testing. These all improve the quality of 2.x code even if you never make the leap to 3.x.
Once this is done you can use 2to3 to mechanically fix the remaining differences. Anything else that remains broken can be special-cased in the 2.7 code until 2to3 works without intervention.
That's why six and manual changes are always needed...
It's a great comment otherwise.
No, it could not. Python itself is a C executable, which makes the distinction moot.
For some reason, a lot of people seem to be laboring under the impression that Python 2 code is just going to stop working in 2020. The only thing stopping is the Python core team's bug-fix releases. Python 2 itself will continue to exist. Existing installations will keep working. Linux distributions _can_ choose to keep Python 2 in their repositories and maintain it separately going forward, although they are not likely to. Ubuntu, Red Hat, and other OS providers all have operating systems which include Python 2 that they are contractually obligated to support and patch for years in the future. And of course, the source code for Python 2 will never just up and disappear within our lifetimes unless human civilization does as well.
As for businesses, if your application is mission-critical and you want to keep it going, then you get to decide whether to invest in keeping your application current with the state of the art, or invest in keeping the application's environment static. This means having a reliable source of the required hardware, archived copies of the OS, all dependencies and libraries, and the application itself. And presumably you still need someone knowledgeable enough to fix bugs in the stack from time to time.
EDIT: Personally for me while I find for example older windows interfaces ugly they were very consistent and functional. In modern designs I sometimes could hardly find what is clickable/actionable. It is not interface working for me but the other way around
No, users most definitely do not care about Material Design. They only care about being able to quickly do the task the app or web site claims to allow them to do.
That's kind of a low bar. An IBM PC running MS-DOS 3.3 works as advertised but I wouldn't want to use one today. Except for the keyboard.
This suggestion was adopted by nearly everyone, including Arch Linux. If you're writing a script that calls Python, you probably still want python3 as your command for compatibility across all current distributions.
In other words, the intention of the PEP was not to ingrain Python 2 as the "default" Python, it was to get people to stop making assumptions about what Python version they would get and use the python2 and python3 commands instead. I do agree that it has had that effect.
Python 2 was happy to (try to) let you call text methods on a JPEG. Python 3 draws an appropriate distinction between the two.
1. Unicode support was actually an anti-feature for most existing code. If you're writing a simple script you prefer 'garbage-in, garbage-out' unicode rather than scattering casts everywhere to watch it randomly explode when an invalid byte sneaks in. If you did have a big user-facing application that cared about unicode, then the conversion was incredibly painful for you because you were a real user of the old style.
2. Minor nice-to-haves like print-function, float division, and lazy ranges just hide landmines in the conversion while providing minimal benefit.
In the latest py3 versions we've finally gotten some sugar to tempt people over: asyncio, f-strings, dataclasses, and type annotations. Still not exactly compelling, but at least something to encourage the average Joe to put in all the effort.
Actually that's the behavior of python 2, it works fine, until you send invalid characters then it blows up.
In python 3 it always blows up when you mix bytes with text so you can catch the issue early on.
> In the latest py3 versions we've finally gotten some sugar to tempt people over: asyncio, f-strings, dataclasses, and type annotations. Still not exactly compelling, but at least something to encourage the average Joe to put in all the effort.
That's because until 2015 all python 2.7 features were from python 3. Python 2.7 was basically python 3 without the incompatible changes. After they stopped backporting features in 2015. Suddenly python 3 started looking more attractive.
> In python 3 it always blows up when you mix bytes with text so you can catch the issue early on.
Sometimes you don't care about weird characters being print as weird things. In python 2 it works fine: you receive garbage, you pass garbage. In python 3 it shuts down your application with a backtrace.
Dealing with this was one of my first Python experiences and it was very frustrating, because I realized that simply using #!/usr/bin/python2 would solve my problem but people wanted python3 just because it was fancier. So we played a lot of whack-a-mole to make it not explode regardless of the input. And the documentation was particularly horrible regarding that, not even the experienced pythoners knew how to deal with it properly.
This is definitely the case. I've been wrestling with bytes and strings all the time during the port of a Django application to Python 3 for a costumer. I can see myself encoding and decoding response bodies and JSON for the time being. For reasons I didn't investigate I don't have to do that with projects in Ruby and Elixir. It seems everything is a string there and yet they work.
Not that I've seen.
Example of where Python 3 has rained shit on my parade: I wrote a program that backs up files for Linux. It works fine in python 2, but in python 3 you rapidly learn you must treat filenames as bytes otherwise your backup program blows up on valid Linux filenames. It's not just decoding errors, it's worse. Because Unicode doesn't have a unique encoding for each string, so the round trip (binary -> string -> binary) is not guaranteed to get you the same binary. If you make the mistake of using that route (which Python3 does by default) then one day Python3 will tell you can't open a file you os.listdir() microseconds ago and can clearly see is still there.
Later, you get some sort of error when handling one of those filenames, so you sys.stderr.write('%s: this file has an error' % (filename,)). That worked in python2 just fine, but in python3 generates crappy looking error messages even for good filenames. You can't try to decode the filename to a string because it might generate a coding error. This works: sys.write('b%b: this file has an error' % (filename,)), but then you find you've inserted other strings into error messages and soon the only "sane" thing to do is to to convert every string in your program to bytes. Other solutions like sys.write('%s: this file has an error' % (filename.decode(errors='ignore'),)) but corrupt the filename the user sees, are verbose, and worst of all if you forget it isn't caught by unit tests but still will cause your program to blow up in rare instances.
I realise that for people who live in a land of clearly delineated text and binary, such as the django user posting here, these issues never arise and the clear delineation between text and bytes is a bonus. But people who use python2 as a better bash scripting language than bash don't live in that world. For them python2 was a better scripting language than bash, but is being being depreciated in favour of python3 that's actually more fragile than bash for their use case. (That's a pretty impressive "accomplishment".) Perhaps they will go to back to Perl or something, because it stands Python3 isn't a good replacement.
Not always. As far as I can tell writing garbage bytes to various APIs works fine unless they explicitly try to handle encoding issues. First time I noticed encoding issues in my code was when writing an xml structure failed on windows, all because of an umlaut in an error message I couldn't care less about. The solution was to simply kill any non ascii character in the string, not a nice or clean solution but the issue wasn't worth more effort.
> In python 3 it always blows up when you mix bytes with text so you can catch the issue early on.
That is nice if your job involves dealing with unicode issues. My job doesn't, any time I have to deal with it despite that is time wasted.
We're talking about simple scripts, the solution is to not send in invalid characters.
Personally, asyncio and type annotations are a big turnoff. I know this is a bit contrarian, but I've always favored the greenlet/gevent approach to doing cooperative multi-tasking. Asyncio (neé twisted) had a large number of detractors, but now that the red/blue approach has been blessed, it seems like many are just swallowing their bile and using it.
Type annotations really chafe because they seem so unpythonic. I like using python for it's dynamicity, and for the clean, simple code. Type annotations feel like an alien invader, and make code much more tedious to try and read. If I want static typing, I'll use a statically typed language.
No one wants to spend energy re-programming to stay in place.
Especially APIs.
- run 2to3
- spend 2h max fixing any failing tests
- cook of any remaining issues in a few days of beta testing like you'd do for any new release
Now now doubt Python 2.7 is a excellent and solid release and will remain so for as long anyone keeps the bitrot in check, but to keep using it because porting is 'hard' is patent bs.https://www.mercurial-scm.org/repo/hg/log?rev=py3&revcount=2...
They've been porting hg into Python 3 for the last 10 years and are only now nearing completion.
I've written a bit more about this in Lobsters:
https://lobste.rs/s/3vkmm8/why_i_can_t_remove_python_2_from_...
The only real killer feature of Python3 is the async programming model. Unfortunately, the standard library version is numbingly complex. (Curio is far easier to follow, but doesn't appear to have a future.)
On the down side, switching to Unicode strings is a major hurdle. It mostly "just works", but when it doesn't, it can be difficult to see what's going on. Probably most programmers don't really understand all of the ins and outs. And on top of that, you get weird bugs like this one, which apparently is simply never going to be fixed.
Migrating from Python 2 to Python 3 is way worse than that -- code changes are required, and because Python is a dynamic language you may not notice bugs until you actually run the code (or even worse, until after you release it to production and some code branch that is rarely invoked somehow gets called...). In other words, the tooling and the type system are not confidence-inspiring and it's really hard to verify that you migrated without breaking stuff.
Early Python 3 was hell for conversion. The syntax was changed for no good reason. u'word" became illegal. (That later went back in.) The "2 to 3 converter" was a joke. I didn't have the "print statement problem" because my code called a logging function for all debug output.
Many of the P3 libraries didn't work. (The all-Python MySQL connector failed the first time I tried to do a bulk load bigger than a megabyte, indicating that nobody was using it.) It took years before the libraries were cleaned up.
Python 3 got some really weird features, such as type declarations that don't do anything. I can see having type declarations, especially for parameters, but they need to be used both for checking and optimization. CPython boxes everything, which is terrible for numerics and is why most serious math has to be done in C libraries. My comment on that was "Stop him before he kills again."
Case in point, I worked in a project using Ruby. When we migrated from Ruby 2.4.0 to 2.4.6 (yeah, a minor upgrade), it broke spectacularly. Trying multiple Ruby versions, the change was actually introduced in Ruby 2.4.1. After some investigation, a change in Net::HTTP library from stdlib had a change that broke a dependency from a dependency. The fix was just a line of code (we just need to change the adapter used for HTTP communication), however it was two days of work for a minor upgrade.
My current job tried to migrate from Java 8 to Java 11. It also broke multiple services. This one is still in progress, months later.
Python 2 to Python 3 is bigger than both of those version changes (however it is equivalent to Ruby 1.8 to 1.9 changes), so yeah, it does take more time. And like some projects that are forever running Ruby 1.8 or Java 8 (or even worse, Java 6), we will have projects forever running Python 2 too.
According to my highly unscientific survey of the packages in Gentoo's package repo, there are roughly:
- 2500 packages that work with Python 2 or 3
- 1350 packages that work with Python 2 only
- 350 that work with Python 3 only
My methodology:
- 3122 Python 3 only
- 88 Dual support
- 8 Py2 leaf (standalone packages; may be dropped)
- 77 Not ported (will be dropped unless ported)
- 100 Blocked (require 1 or more "not ported" packages)
- 18 Legacy (will be dropped)
note that py3only/dualsupport only reflects how it is packaged in fedora, not what upstream provides.
For the same reason why migration to IPv6 is taking so long.
Both technologies don't solve immediate problems end users are facing. Instead they solve 'nice to fix' problems that few people care about.
I work in an industry where there is basically one 800lb gorilla of a vendor. They update rarely, because their product is a mission-critical, life-or-death sort of thing. Their current product is heavily, heavily integrated with x.y.z version of software from a different vendor in a different segment, but also weighing in at 800lb. Yes, they specify x.y.z, not just x or even x.y. That software comes bundled with a Python 2.7.5 distribution.
Imagine my woes trying to get pip running, which unhelpfully suggests I upgrade Python. Cannot seem to find any other path to even get pip going because of what I call the "lol just upgrade n00b" factor. Perhaps that information once existed but I cannot find it.
So, I am stuck on this version because of some pretty tight integration, at a couple of removes. I think the vendor-linkage can cause some "drag" that folks who work in a greenfield environment might not be thinking about. It can be unfortunate but there it is.
If it can help you. The trick I use is to install a normal python 2.7 interpreter with pip. Then you can use it to install software to any directory, including the one from the other application. There are flags to specify what to install, from where to where, internet or not, something like
pip install packagename --target=/to/app/libConsidering all the stuff that is written in Py2 I really don't see it being out and out abandoned. That wouldn't really make any sense. With computer languages stuff never goes away.
$ python3
Python 3.7.4 (default, Sep 7 2019, 18:27:02)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>>
[2]+ Stopped python3
$ lsof -c Python | sed -n "/fortran/s/$USER/<redacted>/gp"
Python 35190 <redacted> txt REG 1,4 1550456 12887664541 /Users/<redacted>/Library/Python/3.7/lib/python/site-packages/numpy/.dylibs/libgfortran.3.dylibIt was good in its time, and great things were done in it that are still around... but let's move onto F90 already.
I say this in part because comedy, but also because it was anticipated to be a long project. It was originally called "Python 3000".
And you can't do it gradually, so it's all-or-nothing. (yes, "six" exists, but you still execute one way or another)
And you'll have to change the versions of all your libraries, which is not usually a smooth experience in the Python ecosystem. (this is another place where it's "all or nothing", since six can't help you if your dependencies don't all use it + use it correctly)
---
It's a huge risk with huge cost for already-working, running code. For new stuff, sure, write it in 3, but 2.7 works fine and has the added benefit of being very well understood by this point.
I still haven't forgiven them for killing the print statement, which could have peacefully coexisted with a print() function.
The migration is financially negative in the short term, and very clearly so. It might be financially positive over the long term (due to easier maintenance and higher performance), but that is definitely maybe. Especially for an app that is otherwise very stable.
If you have a hole it's hard to dig yourself out of it. This is why I prefer modular apps instead of monolithic codebases. You can upgrade piece by piece. Otherwise it's all or nothing and dangerous
I think it is true (as of pretty recently) that Red Hat is the only company employing a Python core dev to work on Python core dev stuff full time (see https://discuss.python.org/t/official-list-of-core-developer...). But the core dev team is focused on Python 3, so that isn't a sign of Red Hat's Python 2 commitment either.
That means they'll patch Python 2 should vulnerabilities be found on their OS.
https://en.wikipedia.org/wiki/Red_Hat_Enterprise_Linux#Versi...
RHEL 6/7 and Centos 6/7 will support Python 2 until at least mid-2024.
However, barring speed improvements, there isn't much to offer, apart from unicode, f strings and annotations.
If python 3 had proper multithreading, that might have been worth breaking backwards compatibility for.
I have a lot of Python 2.7 code that I wrote years ago which has been running smoothly and my team is generally going to rewrite rather than "convert" because I really don't trust conversions. I'd rather see all bugs upfront rather than hidden in the fog.
A lot of my code is performance critical, and, for example, I'm still salty about dictionary operations taking O(log(n)). But the proliferation of active minor versions makes it very difficult to write portable, performant code.
It's become a sticky wicket. I want to migrate to Python 3 (and, by and large, I have in most of my projects). But what version do I target? Will my dependencies make the same choice? Or does "migration" turn into a sisyphean task? It's becoming burdensome enough that I'm contemplating abandoning the language for something more stable.
Current version is 3.7. If you expect your migration work to take a year, you should consider going for 3.7 and above only, because the previous minor versions will be dropped by the time you're done.
And fwiw "3.7 is the current version" doesn't help my users.
Migration in interpreted languages that implement major breaking changes is really tedious.
That’s the reason I am so upset with today’s JavaScript ecosystem - things move so fast that good technology is being deprecated and changed constantly which breaks all kinds of things in other places.
The model is similar to Golang in many ways, e.g. communication using channels [2] and cancellation [3] reminiscent of context.WithTimeout, except that in Golang you need to reify the context passing.
The author has written some insightful commentary on designing async runtimes [4] and is actively developing the library, so I'm optimistic about its future. There were plans to use it for requests v3 until the fundraiser fiasco [5].
[0] https://github.com/python-trio/trio
[1] https://vorpus.org/blog/announcing-trio/
[2] https://trio.readthedocs.io/en/stable/reference-core.html#us...
[3] https://trio.readthedocs.io/en/latest/reference-core.html#ca...
[4] https://vorpus.org/blog/notes-on-structured-concurrency-or-g...
[5] https://vorpus.org/blog/why-im-not-collaborating-with-kennet...
The link to support requests (which is a great piece of software) is here:
https://cash.app/$KennethReitz
Note: This is NOT a charitable donation, it is a gift to an individual. These are not tax deductible under US law.
Njs has a long attacking blog post saying this needs to go through PSF (huh?) and that they should be getting most of this money not the person the funds were directed towards (it's not clear how much they've actually contributed to requests over time). This supposedly also may trigger folks who have also suffered from "gaslighting".
Supporting the developer of a piece of software does not, as far as I know, require that they sign up to handle it on a charitable basis. A big todo is made about the "large" amount raised. The amounts is 33K. To be frank, this is almost zero in tech land at least in the bay area and requests is a very highly used project. I was literally expecting something like 300K or even $1M - silly kickstarter projects raise for more and deliver nothing. Requests has already delivered a lot of utility.
Just a bit of perspective from someone who wasn't familiar with this "fiasco".
Dropbox invested three years of work, actually hired Python's creator, and are still not done. What are they getting out of it that they wouldn't have gotten if Python2 simply had been maintained?
Who wants to break old SQL? Nobody.
Yes, it's expensive to upgrade from Python 2 to Python 3, but it's also expensive for the Python project to maintain 2 versions of Python indefinitely. If someone wanted other than the core Python team wants to step up and maintain Python 2, they are free to do so, it's open source. But failing that, expecting the Python team to support the older/ less functional version of the code indefinitely is unrealistic. Corporate owned languages have even shorter lifecycles for exactly this reason.
I understand that this is one of the major features, but I personally never saw the appeal, given that gevent exists and in my experience works well most of the time. It also allows me to multiplex IO operations and doesn't rely on new syntax. I'm probably missing something?
- mandatory keyword arguments
- multi-dict splatting
- nicer yield semantics for generators
- Fixing system-specific encoding ambiguities
- dataclasses
- inline type annotations
- better metaclass support
- more introspection tooling
- pathlib (for nicer path handling)
- mocking pulled into the standard library in a cleaner way
- stable ABIs for extensions
- secrets handling
- ellipsis instead of pass (yeah who cares but I care)
- lots of standard lib API cleanup
All of this is very helpful for making clean applications. But I would say it's _very_ helpful for making good libraries as well. This stuff is about having a strong language foundation to avoid plain weirdness like the click issue .
Obviously it doesn't kill all of them, but there used to be even more of that kind of thing all the time. Library issues would basically get exported to its users, all basically due to language problems.
pd.read_excel(filepath) will read an entire dataset even if it contains unicode characters.
pd.ExcelFile() silently drops(!!) unicode rows. The resulting object will simply skip unicode-containing rows (in ANY column) them without even a warning.
For example, if you had an excel file:
word
---
"hello"
"hello"
你早
你早
"hello"
then pd.read_excel() would give you a dataframe with 5 rows. ExcelFile() on the other hand would return (silently!) a dataframe with only the first two and the last row.
Maybe this is a pandas issue, not a python issue, but it was really horrendous to debug for such a long time only to realize this was the issue.
I understand why it's the way it is, but when it comes the the typical unixy things I need to do shuffling of files around, tar'ing stuff, etc, it definitely trips me up more than I'd wish.
At a certain point this sort of compatibility/forward motion of a codebase through big language revisions is something that has to be designed as part of the language in either being able to break it down into small enough chunks to chew through in pieces (updating a submodule with the updated language without affecting anything else), completely transparent to the code being run through it (this happens for compilers for C for different standards), or to have a version to version automated rewriting mechanism that is so reliable the outcome of the automated tool is not in question (tools like Go's gofmt). Python in my opinion only has partial solutions to all of those answers so it turns into a lot of hand work.
So while there are other languages that may do other things better there are still a class of programs that are very effective to write in Python, and that's plenty enough reason to keep it around. Do not forget that Python 2 was released in 2000 and Python 3 was released almost a decade later. The general time scale makes worrying about the next release for many people, but for people who do they start considering other languages because that's important to them.
Besides Java and Python already discussed, another big mess of a transition was from Qt 4 to Qt 5, where all the strings became unicode.
It did, but in a way that chainsaws support sculpting just fine. Technically possible. Very advanced people will know how to handle it. Everybody else is just going to injure themselves randomly.
Most people writing py2 got the text/binary processing working on accident. Things appear to work until you throw actual Unicode into parameters and then nobody knows what happens. There's a number of "what does this decoding exception mean" questions on stackoverflow every day. They're often actual bugs people could ignore before. Now they're told immediately and I believe that's better.
It didn't help that Py2's IDLE had (have? I didn't recall they actually resolved this, simply closed the issue) a major bug [1] that even if you explicitly use u-literals (a = u'日本語'), it will still be encoded in your locale (shift_JIS [2] in Japanese case), instead of unicode/utf-8. You can imagine how confused people would get when they were testing unicode support of py2 in IDLE and saw this.
Or so you wish, it's not necessarily true though. It's just as likely to pass through gibberish without blowing up.
I have a tiny relay service written in Django that lets me pass messages between my phone and home computer, that I recently upgraded both the python and Django version. The service is only two views of about 3 lines each - and a unicode conversion bug crept in such that it stored "b'text'" in the database instead of "text". No warnings, no errors.
Some are quite good and finely detailed in my opinion. It's really nothing like what you'd expect after hearing chainsaw. There actually are small chainsaw, maybe even one-handed ones, to do that.
I wouldn't even be surprised if they provide support contracts specifically for Python 2 on their newer OSes. ActiveState announced that they'll continue providing paid support for Python 2 on all platforms, so some people at least feel like there's a business model there...
https://www.activestate.com/company/press/press-releases/act...
The only reasonable scenario I can think of is when you are porting python 2 code to python 3 and play with .decode() and .encode().
This! IMO Python 2 has better usability for prototyping and thinking and doing things on the fly. Python 3 also often seems to have deprecated the functions I want to use in favor of those that are more cumbersome and take more keystrokes. More explicit sure, but less fluid.
When python was created the Unicode didn't even exist.
Anyway in python 3, many os functions accept string and bytes, and might behave depending on it. For example os.walk, if you pass path as byte string, will output paths as bytes.
Then you wouldn't have much to port.
You run your python 2 code on python 3 and it fails, most people at that point will place encode() or decode() in place where you have a failure. When the correct fix would be to place encode/decode at I/O boundary (writing to files (and in python 3 even that is not needed if you open files in text mode), network etc).
Ironically a python 2 code that doesn't use unicode is easier to port.
When you program in python 3 from the start it's very rare to need encode/decode strings. You only do that if you are working on I/O level.
> And the documentation was particularly horrible regarding that, not even the experienced pythoners knew how to deal with it properly.
Because it's not really python specific knowledge. It's really about understanding what the unicode is, what bytes are, and when to use each.
The general practice is to keep everything you do as text, and do the conversion only when doing I/O. You should think of unicode/text as as a representation of a text, as you think of a picture or sound. Similarly to image and audio text can be encoded as bytes. Once it is bytes it can be transmitted over network or written to a file etc. If you're reading the data, you need to decode it back to the text.
This is what Python 3 is doing:
- by default all string is of type str, which is unicode - bytes are meant for binary data - you can open files in text and binary mode, if you open in text the encoding is happening for you - socket communication - here if you need to convert string to bytes and back
Python 2 is a tire fire in this area:
- text is bytes - text also can be unicode (so two ways to represent the same thing) - binary data can also be text - I/O accepts text/bytes, no conversion happening - a lot (most? all?) stdlib is actually expecting string/bytes as input and output - cherry on top is that python2 also implicitly converts between unicode and string so you can do crazy thing like my_string.encode().encode() or my_string.decode()
So now you get a python 2 code, where someone wanted to be correct (it is actually quite hard to do it, mainly because of the implicit conversion) so the existing code will have plenty of encode() and decode() because some functions now expect str some unicode.
At different functions you might then have bytes or unicode as a string.
Now you take such code and try to move it to python 3, which no longer has implicit conversion and will throw an error when it expected text and got bytes and vice versa. The str now is unicode, unicode type no longer exists and bytes is now not the same thing as str. So your code now blows up.
Most people see an error so they add encode() or decode() often trying which one works (like what you were removing) when the proper fix would be actually removing encodes() and decodes() in other places of the code.
It's quite difficult task when your code base is big, so this is why Guido put a lot of effort with type annotations, mypy. One of its benefits supposed to help with these issues.
Native English speakers are usually the ones blissfully unaware of it, because it just happens to cover all their usual inputs. But as soon as you have so much as an umlaut, surprise! And there are plenty of ways to end up with a Unicode string floating around even in Python 2 - JSON, for example. And then it ends up in some place like a+b, and you get an implicit conversion.
https://sunscrapers.com/blog/python-best-practices-what-ever...
With 2 vs 3 code is easiest to write your code for python 3 and then in 2 import everything you have in __future__ package including unicode literals. That's still not enough and you still might need to do extra work. In python 3 there's argument encoding, which could do the encoding which doesn't look like it is available in python 2. So you probably shouldn't be use it and treat all input/output as bytes (i.e. call encode() when sending data to stdin, and decode() on what you get back from stdout and stderr).
Perhaps that might be enough for your case, although many things is hard to get right in python 2 even when you know what you should do, because of the implicit conversion.
Edit: this also might be useful: https://unicodebook.readthedocs.io/good_practices.html
Also this could help: https://unicodebook.readthedocs.io/programming_languages.htm...
Python 3 came out in 2008 so say no backported features after 2009 no bug fixes after 2012. All announced in 2008 of course.
Given 4 years to migrate most would have made the jump sooner.
Once again, how can you ask/require users to expend precious limited energy to re-program just to stay in place? It's totally obnoxious and completely unnecessary.
This is exactly backwards of reality. It's as if they were eating at someones home and had turned a cup off coffee into a week long stay during which they rudely complained when the host asked them to please do something about their pile of dishes, trash, laundry, and leavings.
Nobody is after all taking away your version of python 2 or ability to use and maintain it. It takes active effort to keep fixing bugs in software that may be network facing. If you want to do that maintenance you can of course but people it seems aren't going to be doing this for python 2 forever. If you disagree either take up the reigns or pool your funds to pay someone to do it.
The thing to do back in 2008 was to figure out when you wanted to switch and schedule a bit of time to learn python 3. Anyone who did this by oh 2009 or 2010 would have virtually no work to do now. Any work that has been created since based on something you were told 11 years ago was going away is most assuredly work that you have created for yourself and will be obliged to take up.
Anyone who did this in 2014 would have a decade of runway before they can no longer run their python 2 apps on rhel/centos. Anyone who switches TODAY 11 years late to the party can run python + redhat for another 4 years.
>completely unnecessary
It would be more work to do otherwise. Nobody wants to do that work. You don't and they don't.
If/when the day comes that using Python 2 isn't realistic, I may go with 3, or I may choose a different language, depending on the project. I'll cross that bridge when I come to it.
Don't forget that that person/organization would not only have to maintain python but also all the packages that you will be used.
I'm quite certain I'm not the only one.
genuinely curious...
I'm not sure how to submit a bug report, to be honest.
Maybe write in the README that the package is only tested on 3.7 and above so users can be aware of that and check. Bet they struggle to figure out what version to upgrade to as well.
I don't think pip can install new versions of python. This becomes burdensome, especially for novices.
> Maybe write in the README that the package is only tested on 3.7 and above so users can be aware of that and check. Bet they struggle to figure out what version to upgrade to as well.
I must disagree with this philosophy. As a package maintainer, I think it's my responsibility to "struggle" so that my users don't need to. And fwiw, my employer also thinks that this is part of my job.
But changing database vendors for a company can be a big deal, as bad as going from Python 2 to 3.
As a language however, it's a whole lot of nonsense. Extremely inconsistent syntax, stuffing a trinary logic into a boolean system, a standard that gets extended arbitrarily, and even the tooling ecosystem is a fair bit pathetic (the lack of formatters particularly annoy me; everyone tries to support SQLs generally, and end up missing every extended feature.. if it's not a simple select query/ddl, you're not getting a decent format output)
And it most certainly is a whole lot of fuss to migrate unless your database is tiny, or you didn't actually utilize the DB except as a dumb datastore (eg you relied solely on your ORM + indexes); there's a reason no good translator exists, and those that exist only support a very limited subset of any particular SQL variant, despite programming languages having a whole array of transpilers; it's simply not at all a simple language, and the variants only superficially look the same.
Every couple of months there's a new startup / dev site that says "SQL is broken/old/bad, so we reinvented it!". They all sink without trace, but there's a cohort who agrees with them.
It's not clear to me why postgres hasn't simply grown a whole array of frontends..
The visibility of the errors is a minor point, but I think it more appropriate that it be solved by e.g. the windowing toolkit API.
You need to slap a decode anyway on reads from subprocesses in python3, and files open in Unicode mode by default. Wouldn't that fix the majority of silly UTF-8 compat bugs? Or am I missing a class of bugs that's not avoided automatically by python3 strings?
Perhaps there’s something about a port that requires encoding/decoding bytes/strings?
Ironically when your python 2 app doesn't care about unicode, the porting to python 3 is actually much easier.
If you write code in python 3 from the start you rarely need to use encode() and decode(). Typically what you always want is a text not bytes.
Exception to it might be places where you want to serialize like IO (network or files, although even files are converted on the fly unless you open file in a binary mode).
Example, I just had to write this
return urllib.request
.urlopen(url, timeout=60)
.read()
.decode("utf-8", errors="backslashreplace")
(probably not valid code because of the newlines but you'll forgive me)
Then I use that string in a regexp, etc.This is the only language where I have to explicitly deal with encodings at such low level. I don't feel like I want to use it for my pet projects.
Of course urllib could have method text() that would do such conversion, but then urllib is not requests. It never was user friendly.
Edit: personally I use aiohttp, the interface is much nicer: https://aiohttp.readthedocs.io/en/stable/client_reference.ht... if I can't use asyncio then would use requests.
The money was raised specifically to support development of requests 3
> [Reitz] announced that work had begun on "Requests 3", that its headline feature would be the native async/await support I was working on, and that he was seeking donations to make this happen.
It's not so much that PSF needed to be used, as that there needed to be some accountability as to how those funds were used.
> [Reitz] chose a fundraiser structure that avoids standard accountability mechanisms he was familiar with. He never had any plan or capability to deliver what he promised. And when I offered a way for him to do it anyway, he gave me some bafflegab about how expensive it is to write docs. Effectively, his public promises about how he would use the Requests 3 money were lies from start to finish, and he hasn't shown any remorse or even understanding that this is a problem.
It sounds like a great deal of the work being done on requests is done by volunteers but the funding only goes to support Reitz
> I think a lot of people don't realize how little Reitz actually has to do with Requests development. For many years now, actual maintenance has been done almost exclusively by other volunteers. If you look at the maintainers list on PyPI, you'll see he doesn't have PyPI rights to his own project, because he kept breaking stuff, so the real maintainers insisted on revoking his access. If you clone the Requests git repo, you can run git log requests/ to see a list of every time someone changed the library's source code, either directly or by merging someone else's pull request. The last time Reitz did either was in May 2017, when he made some whitespace cleanups.
The issue is not so much that money is being made, but the way that it is done and the lack of accountability
> I don't have any objection to trying to make money from open-source. I've written before about how open-source doesn't get nearly enough investment. I do object to exploiting volunteers, driving out community members, and lying to funders and the broader community. Reitz has a consistent history of doing all these things.
NO WHERE that I can see did Reitz say he would hire NJS to do some work. Is Reitz even setup properly to report for taxes or withold taxes on amounts paid to NJS, check paperwork for work eligiblity etc? Would this even be allowable if requests is not a business, or would deductions be disallowed as non business (ie, a payment to NJS -> subject to tax to both Reitz and NJS potentially?).
If you as a donor want full charitable compliance using PSF you would ask to give through them - and perhaps more would have been given if PSF had been an option.
I don't even see NJS on the requests contributor list:
https://github.com/psf/requests/blob/master/AUTHORS.rst
Finally, requests 3 has a number of features.
> And on a more personal level, I felt his interactions with me were extremely manipulative. I felt like he tried to exploit me, and that he tried to make me complicit in covering up his lies to protect his reputation. I was extremely uncomfortable with the idea of going along with this, but he created a situation where my only other options were to either give up on working on async entirely, or else to go public with the whole story, at potentially serious cost to myself.
> Ultimately, I decided to speak out because I care deeply about the Python community and its members. If one of our community's most prominent members freely lies to donors and harms volunteers, and if we all let that go without saying anything, then that puts everything we've built together at risk. And I'm in a better position than many to speak up.
The intent seems not to be trying to get people to blacklist or dogpile on Reitz, but to simply make people aware of the issues so they won't get caught off guard.
> This is the classic "missing stair" problem. Those in the inner circle quietly work around the toxic person. Outsiders come in blind. I'm pretty well-connected in the Python world, and I came in blind.
> Since this is the internet, I have to say explicitly: Please do not harass or abuse Reitz. That's never appropriate
The takeaway here appears to be "never work for free". If NJS has worked on his own project, controlled by him alone, this wouldn't have happened. If you donate a bunch of work to an open source project, then... well... the source is open.
And is it impossible that KR is doing an order of magnitude more work on the requests projects, with a much longer track records and raised the money directly to himself (and so has to pay taxes potentially on it) such that using it to support his work on requests is unreasonable?
"Dealing with unicode" is really just about dealing with it at the input/output boundaries (and even then libraries handle it most of the time). But without the clear delineation that Python 3 provides, when you _do_ hit some issue you probably insert a "fix" in the wrong space. Leading to the classic Py2 "I just call decode 1000 times on the same string because I've lost track"
Interesting text follows company set naming schemes, which means all english and ascii. The rest could be random bytes for all I have to care about. Many formats like plain text or zip don't have a fixed encoding and I am not going to start guessing which one it is for every file i have to read, there is no way to do that correctly. Dealing with that mess is explicitly something I want to avoid.
This is a lot of old code, and it's all ASCII, no matter what the locale of the system is. And even if the code was updated, all the messages would still be in some text == bytes encoding, because there's no "user data" involved, and the throughput desired is in many gigabytes of text processed per second.
So yeah, unicode is not "everywhere": it may be everywhere on the public internet, but there is a world beyond this.
So you can throw in your emoji and they might not correctly show up on the xml logging metadata I write, because I don't care. But they will end up in the processed file the same way they came in instead of <?> or some random Chinese or Japanese symbol that the guessing algorithm thought appropriate.
Also, there's no guessing happening in this instance. A locale configured in your environment variable are used if you open files using text mode.
Even taking into account the fact that new features were still being added and not all focus was on porting, this doesn't really seem like a reasonable representation of what's going on; I have a suspicion that "10 years" of porting here does not entail nearly as much work as it seems.
The average few hundred to few thousand loc app, which should be 98% of all production code-bases will almost certainly port with no issue.
In python2, that's trivial. Whatever system you're on would normally be configured so that filename bytes dumped to the terminal would be displayed correctly, so you could just treat the strings as bytes and it would be fine.
In python3, it was a nightmare. No, you could not just decode from/encode to UTF-8, even if that was what your system used! Python had its own idea of what the encoding of the terminal was, and if you used the wrong one, it wouldn't let you print. And if you tried to convert from UTF-8 to whatever it thought the terminal was using, it would also break, because not all characters were representable. And your script could just not tell Python to treat the terminal as UTF-8, either; you had to start digging into locale settings, and if you tried to fix those, then _everything else_ would break, and nobody had any idea what the right thing to do was, because you were using an obscure OS (the latest macOS at the time).
I assume that it works better now.
What about codebases with python2 third party dependencies that don't work in python3? Now you have to port that entire library as well, or write it yourself while crossing your fingers that it is well documented and easy to work through.
What about codebases without decent test suites? I'd argue most production codebases don't have good test suites, or at least the most complex of code is usually poorly tested. You'll end up spending most of your time digging for regressions especially if your code creates large amounts of user interfaces.
What about code bases that were written by scientists, mathematicians, or other professionals who may not be as fluent in writing "good" code?
There are almost no relevant 3'rd party libs that has not been ported at this stage. If they have not they probably have been abandoned and the client codebase have bigger issues. Same for uncovered code bases, and 'unprofessional' Python production code. That's hardly Python's fault.
Would love to see your point in action. For me personally, speaking strictly writing simple scripts, they pretty much translate fairly well. Regarding formatting, are you referring to the output?
1. it's free so it's OK to be user-hostile
2. if you don't like the direction, just fork/fix it
You don't have to fork it to fix it personally. You may also consider putting your money where your mouth is and organizing an effort to fund the change you want to see in the world. If you succeed the world will have additional value it wouldn't without it and owe you kudos. Everyone likes options. If you fail you ought to move on you have no basis for complaint. I think this is informative.
Open Source is Not About You
https://gist.github.com/richhickey/1563cddea1002958f96e7ba95...
If anyone can call anything anything then how is it even possible for the consumer to make intelligent choices? Having it be called something else allows your consumers to make an informed choice about using it rather than allowing you to incorrectly trade on the official projects reputation. Of course YOU might merely want someone to competently maintain python 2.
Others might opt to do so badly and thus damage the actual python brand. Worst yet others might opt to make changes to projects that serve their nefarious needs like folding in ads or data collection. Without a defining line between official and unofficial how do we prevent such?
Call it cobra and brand it pythons cooler cousin if you like.
On the other end, most programs don't actually care what the data encoding is. They just move it.
Well, no, not really. You go read the docs and try to find out. Most of the time, there is a definitive encoding - if there weren't, a lot more things would be broken. Sometimes, it is not guaranteed, even though de facto that is the case - and this highlights broken interface specifications. When it is truly unknown, you explicitly treat it as raw bytes.
And the good thing about Python 3 is that it forces you to think about this. In Python 2, most of the time, data processing code can be hacked together, and it "just works", right until the point the input happens to include something unanticipated. Like, say, the word "naïve".
> On the other end, most programs don't actually care what the data encoding is. They just move it.
It doesn't necessarily mean that they get to dodge the bullet. In Python 2, if you read data from a file, you get raw bytes, but if you read data from parsed JSON, you get Unicode strings - because JSON itself is guaranteed to be Unicode. Guess what happens when the byte string you've read from the file, and the Unicode string you've read from a JSON HTTP response, are concatenated?
However, the lack of GUI frontends is also really weird. I don't see why it'd be harder to support than any other DB, and afaik pg has gotten fairly popular..
From where do you derive the requirement to graciously work for free to serve your ends?
I wasn't commenting on how buggy my own code is. Which version of a language I'm using doesn't really affect that variable.
But I guess you know this, and are OK with the compromises involved. I'll stop here ;)
This is true, and if we were talking about code that is exposed to the world at large, then my stance might be different. However, the projects that I've used Python for are not exposed in that way.
Note that I'm talking about personal projects, not work-related ones. At work, I use whatever is required.
And the alternative is cargo cult "newer is better".
>Yes, it's expensive to upgrade from Python 2 to Python 3, but it's also expensive for the Python project to maintain 2 versions of Python indefinitely.
On the other hand, they could progressively enhance upon a backwards compatible single 2 version. JS manages to do that just fine, as does Java...
How do you define just fine? It's taken us many years to migrate EMCA versions only to have multiple incompatible runtimes.
And JS "The good parts" is like 1/10 of the full language so often it feels like a lot of pile on.
>as does Java
How are them generics?
ECMAScript versions are forward-compatible. Any valid ECMAScript 3 code is also valid 5.1, 2015, 2016, etc.
I'm not sure what migration you're talking about. If you mean using new language features before your runtime targets support them, that's kinda on you. Even so, the ecosystem has tons of robust solutions for supporting legacy interpreters. Most notably, Babel does a wonderful job of transpiling to lower language-version targets.
Besides the total domination of the web programming space, which is of course aided by it being the only option:
1) Used by choice even on the server and application development (where it was never the only option, and wasn't even preferable/viable before)
2) Fast pace of language development
3) A thriving package ecosystem with millions of packages
4) Adopted by all major companies
5) Three best of class runtimes (v8, JavascriptCore, Tracemonkey (?)) by 3 major vendors, with performance that smokes any dynamic language that is not LuaJit
6) Increasingly adopted as an embedded scripting language in all kind of apps
7) With a viable gateway into both a native trans-language runtime (webassembly) and a typed version of the language (typescript).
>How are them generics?
They're doing great. It's not that type erasure is that big of a deal, and Java might even get it with Valhalla eventually anyway. It's not a "backwards compatibility prevents this" issue (which is our topic here), it's a "no time devoted to add it in yet" issue.
You are moving the goalposts and ignoring the fact that Python3 still didn't deliver anything for most users.
Of course it's a "cargo cult" when someone disagrees with you.
> On the other hand, they could progressively enhance upon a backwards compatible single 2 version. JS manages to do that just fine, as does Java...
"Just fine"... that explains why so many shops are dropping support for straight Javascript and switching to TypeScript or CoffeeScript before that. And why Javascript is littered with band-aid libraries like Underscore that are needed to turn it into an effective development language.
Likewise, Java development is slowly being superseded by Kotlin. Java is a mess, there are often 3-4 ways to do simple things and many of them are just terrible for performance.
No, it's obviously "backwards thinking", right?
>* "Just fine"... that explains why so many shops are dropping support for straight Javascript and switching to TypeScript or CoffeeScript before that.*
CoffeeScript was just adopted (and not that much in the first place) because it brought new syntax/features earlier. Now JS has been getting new syntax itself at a great pace and CoffeeScript just died off.
As for TypeScript, this is just Javascript + type annotations. Kinda like what Python is getting with 3.6 and mypy, but more useful and with actual tooling available. So not sure how "TypeScript or CoffeeScript" prove anything about JS not doing great.
>Likewise, Java development is slowly being superseded by Kotlin. Java is a mess, there are often 3-4 ways to do simple things and many of them are just terrible for performance.
Python has 10x+ worse performance, and more than 3-4 ways to do simple things (from package management to basic libs), most of which are just terrible for performance.
Compared to that, nobody has had any problem with Java performance for 15+ years...
And Kotlin is still insignificant except in the Android space where it's pushed, so there's that. Java sees an order of magnitude more usage.
Common Lisp has a backwards compatibility which goes into decades, and implementations like SBCL had no difficulties at all to absorb Unicode.
Racket even supports different language standards and completely different languages (such as Scheme and Algol) running on the same runtime. And both SBCL and Racket are compiled languages with a high-end GC which should make such things more difficult than CPython, which is purely interpreted and has a simpler GC.
But the incompatibility between Python 2 and Python 3 is perhap s only a symptom of a larger problem. The Python developers have decided that backwards compatibility is not that important any more. This is not a problem for companies like Dropbox, or small start-ups, from which 95 % will not even exist in five years on. It is, however, a huge problem for domains like scientific computing, where most code has no maintainers and even for very important code there is no budget or staff for maintenance:
https://blog.khinsen.net/posts/2017/11/16/a-plea-for-stabili...
Exactly: and that was a wrong decision for anybody but the developers of Python.
Everybody else prefers having something that works: "The improvements are welcome, but please allow us to to run our old programs too, thank you, and allow us to use that new feature only once we need it."
It's an obvious expectation. We would also hate a new version of a word processor which wouldn't open our old documents. Or a new version of Photoshop which wouldn't open our old pictures. Or a new version of the browser where only the newest web pages are visible.
It follows that it was absolutely technically possible to have a new version of Python in which the old programs still work. It's the failure of the developers that they haven't made it.
Compare that decision of theirs with the policy of Linus Torvalds who insists that the old user programs should never break on newer kernels.
I think that's a tool selection problem, not just confined to the python world. If the language and libraries won't have a supported lifespan that matches with the maintenance budget of the projects using them then the wrong tool was chosen. If a project is expected to have a 10+ year lifespan of little to no maintenance then it needs to be built on languages/libraries that will have supported versions for that long.
Data science is doing just fine, in fact is leading the migration: https://www.jetbrains.com/research/python-developers-survey-...
I just don't think that's true.
This is a story about language upgrades. Those languages showed how to do it right.
Some hipsters hate "Java and Javascript". The world at large loves them.
At some point plain users hated Java applets and Java desktop apps, but those are not a thing much more. In the server space, very few that use it hate Java, and millions use it.
Javascript was knocked up over a week or so. Sure it implements concepts from Scheme and other languages but it was certainly not "designed reasonably well from the start". Otherwise we wouldn't have needed books such as Crockford's "Javascript The Good Parts" to help us understand areas of the language to avoid/misuse.
> Python's language was not solid (e.g., strings not unicode by default)
From the article:
"Python itself predates the first volume of the Unicode standard which came out in October 1991."
In any case, I think the jury is still out on whether Unicode in the primary string type is a good idea.
Python didn't get Unicode until later, so it had a chance to do it right - and it finally did, even on platforms like Windows where wchar_t is also 16-bit for historical reasons.
No maintaining 2 versions of python is much cheaper, it's only being done in one place compared to the thousands and thousands of python 2 code bases you'd have to convert.
It also only needs bug fixes, there are plenty of people/organisations out there that would be perfectly happy for the language to be unchanging.
It's extremely hard to keep compatibility with Python 2, many authors can't wait to do the support next year, many already dropped.
Presumably any packages worth maintaining will have far more dependent projects, so it's still far less overall effort.
> It's extremely hard to keep compatibility with Python 2
So don't? I don't think most of the people dragging their feet on the upgrade need or even want new features. A stable python 2 branch with bug fixes and security patches would suffice for most and be ideal for many. Over time the bug fixes should trend to zero and there probably aren't a heap of security issues in python projects anyway.
Only if they name it something completely different from python or py-anything. Guido refuses to allow anyone to just step in to maintain py2.
Tauthon is a project that aims to keep compatability with py2 while adding whatever features of py3 that won't break py2 and to have a maintained py2.
Lots of people learn and use these languages because:
1 - That's what they're taught at school.
2 - That's where many if not most programming jobs are.
3 - There are a bazillion libraries they can use, compared to other languages.
4 - JS is built in to browsers.
5 - They don't know any better.
6 - Marketing.
Also, many companies want their staff to develop in these languages because of the reasons on this list plus that's what most programmers know, so it's relatively easy to find employees.
"Data science" is a broad term but usually just means the application of numeric, and sometimes scientific, tools to commercial means. It is almost always done in companies. Typically, between such companies there is no open exchange of tools and methods, no exchange of knowledge, and no long-term use of generated codes. This is the reason why data science companies don't have the problems which Hinsen pointed out. But, they could become affected by a degrading suitability of Python for computational science, because their tools were initially developed by scientists.
int max = new Max(10, 5).intValue();In any case, that's not some counter-argument, even if it points to a real wart.
It's "let me throw a random Java wart, as if it means something, and as if other languages don't have their own warts".
You could say then, "well, then Python is perhaps just not a good match for those pesky scientists".
And this brings up two more points:
* A lot of important tools and libraries in the Python ecosystem was developed by scientists. Numarray/Numpy is a good example.
* If the core Python developers don't have the intention to maintain a backwards-compatible language version for more than, say, 15 years, they should perhaps clearly state on the python.org main page something like: "great, as you are a scientist, we welcome your contribution, but Python might not be suitable for tools that support long-term research".
Not that I'm implying that 'popularity' is a good measure of anything.
Python is popular largely based on the fact that it's so approachable. It is the BASIC/ VB of modern times for whatever that is worth. It does scale up to larger projects and is frequently used for big scale stuff, but I suspect the fact that it's so ubiquitous has more to do with the fact that it's also easy to pick up and for companies to find people with Python dev skills (or train them up).
https://lkml.org/lkml/2012/12/23/75
"How long have you been a maintainer? And you still haven't learnt the first rule of kernel maintenance?
If a change results in user programs breaking, it's a bug in the kernel. We never EVER blame the user programs."
Why do some python developers have to maintain installations of a whole handful of python versions just to ensure that their code is working? Why all the mess with pyenv, virtualenv, and so on? If the python developers, as well as the library developers would support backwards compatibility, this would not be necessary at all.
Exactly. The incompatibility mess continues when using different 3.x versions.
https://www.youtube.com/watch?v=oyLBGkS5ICk
Discussion:
About some of the design decisions in Guido's own words: http://python-history.blogspot.com/2009/02/adding-support-fo...
def foo(self):
instead of a more logical: def self.foo():
to match the calling syntax.Not saying Python fell on the wrong side of that line, just that it's an easy line to end up on the wrong side of.
Actually Python is one of my favorite programming languages, probably the language with the closest mapping to how I naturally think about a problem. I really like it. But I'm also willing to admit it has some warts, as does any language.
This was done intentionnaly, because "Explicit is better than implicit". It also has some uses, eg. if you want to do this:
class Foo:
def inject_bar(self):
def new_bar(self2):
pass # you can refer to both 'self' and 'self2 here
other_object.bar = new_bar
It's rare, but it has its uses.Too bad Python breaks this "commandment" pretty much whenever it wants to.
The second is not true, if you add methods with double underscore (also design decision) they do behave like private. The real method name will be randomly generated so you won't get conflicts, and you can still access it for debugging purposes.
Perhaps more importantly, it makes even more sense in a language like Python where classes are (unlike many, especially statically-typed, class-based OO languages) first class objects to do what Python does, because of the relation of methods to classes. It also, to me. makes unbound/bound methods slightly more intuitive.
Now, I too was initially thrown by it because I'd used a bunch of OO languages that did it the other way first, and for quite a long time.
How do you define enums? with a superclass.
How do you define data classes? With a class decorator.
And then there's metaclasses too.
Separate from that, I think that would be a much bigger breaking change than you think. __setattr__ and __setattribute__ means that self.y doesn't necessarily refer to a traditional attribute so without self you could end up with either:
1. `x = y` where the expression `y` executes code
or
2. `x = y` where `x = ???` and `x = self.y` where `x = self.__setattr__('y')`.
That said, I do think the language could use something to reduce the size of __init__() since
self.arg1 = arg1
self.arg2 = arg2
self.arg3 = arg3
...
can get pretty verbose
I don't understand this. Aside from people seeing an OOP for the first time in their life, are they getting confused about where "this" comes from?
How would an implicit "self" a la "this" make it any less understandable where the data come from?
How is passing self making "navigating an unfamiliar code base much easier"? Aside from total newbs who see an OO codebase with an implicit instance variable for the first time?
@dataclass
class Foo:
arg1: str
arg2: int
arg3: int = 5 # default value
I use them a lot in my 3.7+ projects. It helps reduce the boilerplate for typical classes. (Where for me, typical classes are a 1:1 mapping of inputs to attributes)And if you need to compute some values at object initialization, then they have a __post_init__() hook you can use [2].
[1] https://docs.python.org/3/library/dataclasses.html [2] https://docs.python.org/3/library/dataclasses.html#post-init...
Rust does the same thing - with self, it's an instance method, without, it's a static method.
https://www.geeksforgeeks.org/class-method-vs-static-method-...
Any function inside a class declaration becomes an instance method, with its first argument becoming the explicit receiver. You have to use @staticmethod to prevent that (or @classmethod to get the class as the first argument, instead of the instance).
Furthermore, this behavior is not parse-time, but runtime. The "def" statement that defines a function produces a plain function object, regardless of whether it's inside a class or not. Once the class finishes defining, the function is still a function - which is why C.f gives you a plain instance that can be called by passing "self" explicitly as an argument.
However, Python allows objects to provide special behavior for themselves whenever they're retrieved via a member of some class - that is, when you write something like x.y, after y is retrieved from x, it gets the opportunity to peek at x, and substitute itself with something else. Function objects in Python (here I mean specifically the type of objects created with "def" and "lambda", not just any callable) use this feature to convert themselves to bound methods. So, when you say x.f, after retrieving f, f itself is asked if it wants to substitute something else in its place - and it returns a new callable object m, such that m(y) calls f(x, y). That's what makes x.f(y) work.
In fact, that's a key difference between Python and many other class-based OO languages.
I'm not sure whether that (and the associated need to make the receiver an explicit parameter—conventionally, though this is not required, named “self” in instance methods—in method definitions) is your problem, or if your problem is that (unlike some, but fewer than the previous difference, OO languages) you can't omit explicitly naming receiver in references to its own instance variables in an instance methods, making instance variable reference syntactically distinct from local variable references.
When I upgrade to a new version of C# ... nothing happens.
Backwards compatibility is what made Microsoft the company it is. I think Python deserves all the crap it gets for 2 vs 3.
The question is rather whether it would have been better to gradually improve support for it in a Python2-esque way, rather than creating a discontinuity and a raft of new problems, some of which linger to this day.
Also, for many purposes, there is wide agreement that ASCII is still the way to go. Even if Americans vanished tomorrow, the majority of remaining programmers in the world would prefer to look at source code in English, which they already know, rather than a host of other languages, most of which they don't.
You do realize that C# was itself Microsoft's replacement for C++ right? And that when C# was released it had it's own growing pains and long roll-out in spite of having the worlds biggest corporation pushing it.
Python as a language is far older than C# so it had a lot more baggage than C# does.
If I were listing Python warts, I'd point to things like single-element tuples (1,), the `datetime` support (timezone-naive datetimes are an ambiguous disaster), or the cpython Global Interpreter Lock.
My editor supports snippets so moderate python boilerplate is not a problem.
So instead of having:
this.x = y
People tend to do: x = y
Which isn't always the most clear that it's referencing an instance variable rather than a scoped variable, especially if you see it midway through a method. someInstance.foo()
When invoking a method from within the class, you still need a reference to the object the method will be invoked on. In python, the reference is passed implicitly as the first argument, and usually called self. That has nothing to do with the method name or signature itself, and calling the method self.foo() would result in calling it like someInstance.self.foo()
for cases when you aren't referencing the function from within the class, which would be even more confusing.Sure there is: consistency. If member functions on an object that happens to be a class (i.e., methods) did magic transformations that member functions of other objects did not do, the mental overhead to understand Python code would be higher, and the ability to build custom general abstractions would be weaker.
It would perhaps make the simplest cases microscopically easier, but at the expense of making the already hard things more confusing and difficult to wrap with generalities.
Most statically-typed OOPLs don't have first-class classes that are just normal objects, so this isn't an issue because the things that enables aren't available; other dynamic languages may use models where methods aren't data members of classes (e.g., Ruby where methods are associated with classes, but not as instance members which, in Ruby, wouldn't be externally accessible—while Ruby classes are objects, the ability to hold methods is a special power of Ruby classes/modules compared to other objects, not just having instance members. This is one way Ruby's model is more complex than Python’s, and it definitely bites you in terms of seeking general solutions some times..)
It's true that languages are abstractions, but not all abstractions are useful.
thing_instance.foo(bar, baz)
you're secretly calling Thing.foo(thing_instance, bar, baz)
Which succinctly explains where the self argument comes from. member this.foo(x, y) = ...
Again, "this" is just an identifier here, and doesn't have any special meaning.It's more elegant firstly because it follow use, and secondly because it means that "def foo(x)" has the same meaning both inside and outside of a class declaration - it's just a function, and there's nothing special about its first argument. As it is, we need stuff like @staticmethod and @classmethod. It's especially annoying when you have class attributes that happen to reference a function, because conversion from functions to methods is a runtime thing - so you have to remember to wrap those in staticmethod() as well.
You would need those even with your suggestion (except you could drop @staticmethod if you add another layer of magic so that methods declared without an leading identifier were assumes static; you'd still need @classmethod some equivalent mechanism to distinguish which non-static methods were class vs instance methods.)
No it doesn't. Currently `def foo(self): pass` is called as `instance.foo()`. You're suggesting that `def self.foo(): pass` would be called as `instance.foo()`, except now it looks like self and instance are syntactically related in ways that they aren't.
> Again, "this" is just an identifier here, and doesn't have any special meaning.
But the grammar is no longer LL(1), and you have weird conditionally valid syntax, like `.` in a function name is valid only in a class block.
> "def foo(x)" has the same meaning both inside and outside of a class declaration
This is a stretch, especially since you're now optimizing for the uncommon case. Staticmethods are rare compared to instance methods (I'll go further and claim that static methods are an antipattern in python, modules are valid namespaces, you can stick a function in a module and couple it to a class also defined in that module and nothing bad will happen. Banning staticmethods entirely doesn't reduce expressiveness). Aligning staticmethods with functions, instead of aligning instance methods with functions (as python does currently) encourages you to do the wrong thing.
> classmethod
Your changes don't affect classmethod at all, if anything they'd make classmethod more of a special case. How do you signal that `self.foo()` takes `self` as the class instead of the instance?
> It's especially annoying when you have class attributes that happen to reference a function, because conversion from functions to methods is a runtime thing - so you have to remember to wrap those in staticmethod() as well.
What do you mean? Like
class Foo:
a = Foo.func()
@staticmethod
def func():
return 1
I'll say again: staticmethods are an antipattern in python: def func()
return 1
class Foo:
a = func()
works just as well, better in fact. Modules are great namespaces. Classes are more than namespaces, and if all you need is a namespace, you shouldn't use a class.> because conversion from functions to methods is a runtime thing
I'd also quibble with this: it's a binding thing.
class Foo:
a = Foo.foo(None)
def foo(self):
return 1
will work, and if you check, type(Foo.foo) is still just `function`, its only when you create an instance of Foo that the function `foo` is bound to the instance, and when that is done, the bound `foo` is converted to a method object. This was different in python2, where Foo.foo and instance.foo were both "instancemethod" objects, but in python3, Foo.foo is a plain old function, and instance.foo is a method.Specifically this means that if you can get your hands on the `method` constructor (like with `type(instance.method)`), you can then do silly things like
class A():
def foo(self): pass
instance = A()
def f(self): return 5
assert instance.func() == 5
and this will work. You'll have bound the function to the instance. Of course, if you stick an attribute on `instance` (or `A`), and reference `self.attribute` in the function, this will still work. (this also lets you do things like bind a given instance of a function to a different instance of the class, but that's because the method constructor is essentially just partial with some bookkeeping for class information)But also, there's no reason to make those legal only inside classes. All it needs to do is make "def foo.bar" produce a different type of function, that has the method descriptor-producing behavior that is currently implemented directly on regular functions.
As far as less vs more common case - I think it's more important to optimize for obviousness and consistency. If "def foo" is a function, it should always be a function, and functions should behave the same in all contexts. They currently don't - given class C and its instance I, C.f is not the same object as I.f, and only one of those two is what "def" actually produced.
What I meant by function references inside classes is this:
class Foo:
pass
Foo.bar = lambda: 123
foo = Foo()
print(foo.bar())
This blows up with "TypeError: <lambda>() takes 0 positional arguments but 1 was given", because lambda is of type "function", and it gets the magic treatment when it's read as a member of the instance. So you have to do this: Foo.bar = staticmethod(lambda: 123)
and even then this is only possible when you know that the value is going to end up as a class attribute. Sometimes, you do not - you pass a value to some public function somewhere, and it ends up stashed away as a class attribute internally. And it all works great, until you pass a value that just happened to be another function or lambda.On the other hand, this only applies to objects of type "function", not all callables. So e.g. this is okay:
Foo.bar = functools.partial(lambda x: x, 123)
because what partial() returns is not a function. Conversely, this means that you can't use partial() to define methods, which can be downright annoying at times. Suppose you have: class Foo:
def frob(self, x, y): ...
and you want to define some helper methods for preset combinations of x and y. You'd think this would work: class Foo:
def frob(self, x, y): ...
frob_xyzzy = functools.partial(frob, x=1, y=2)
frob_whammo = functools.partial(frob, x=3, y=4)
except it doesn't - while frob_xyzzy() and frob_whammo() both have the explicit "self" argument, they aren't "proper" functions, and thus that argument doesn't get treated as the implicit receiver: foo = Foo()
foo.frob(x=0, y=0) # okay
foo.frob_xyzzy() # TypeError: frob() missing 1 required positional argument: 'self'
foo.frob_whammo(foo) # okay!
Which is to say, this all is a mess of special cases. You can argue that this all isn't really observable in the "common case" - the problem is that, as software grows more complex, the uncommon cases become common enough that you have to deal them regularly, and then those inconsistencies add even more complexity into the mix that you have to deal with - just when you already thought you had your plate full.Yes it is[0]. LL(1) Grammars can still be recursive, they just can't change the parsing rules based on distant context.
> As far as less vs more common case - I think it's more important to optimize for obviousness and consistency
Yes, but having the "easiest" thing you do:
class Foo:
def bar():
pass
silently do a usually unwanted thing (create a staticmethod) instead of an obviously wrong thing (raise an error) isn't obvious. It's building a footgun into the language.The rest of your comment complains about inconsistencies of how python converts various callables to methods. This is a fairly valid and interesting complaint, but has nothing to do with syntax, it is solely a semantic complaint that would be solved by having class creation treat all attributes that are callables as functions. In fact, you could customize class creation yourself this way using __new__, no syntactic changes required.
> as software grows more complex, the uncommon cases become common enough that you have to deal them regularly
While this is true, I think you vastly overestimate how common these constructs are. Like, you're in the realm of "this doesn't appear on github" levels of uncommon.
Personally, again, I think "you can't use partial() to define methods" is a very good thing: if you're doing this, you're into weird metaprogramming land. Its not any harder to, for example, write out
class Foo:
def frob(self, x, y): ...
def frob_xyzzy(self): return self.frob(x=1, y=2)
def frob_whammo(self): return self.frob(x=3, y=4)
unless you're doing weird metaprogrammy magic and then, as someone who does a lot of weird metaprogrammy magic1. You deserve what you get
2. You can invoke deeper magic to solve these problems
[0]: https://discuss.python.org/t/switch-pythons-parsing-tech-to-...
My complaint is of course more complicated than the syntax alone. I'm just saying that a distinctive syntax for self-as-receiver could be used to drive other changes that would make things more intuitive and self-consistent overall. And, of course, any discussion about "elegance" is going to be inherently subjective.
With respect to commonality of various constructs - this is all from personal experience writing Python code (for a project that is hosted on GitHub, by the way). It doesn't require particularly fancy code to trip that wire in general - it just requires code that tries to be generic, i.e. not make more assumptions that it needs to about the types of values that flow through it. Python classes break that genericity by treating functions, and only functions, in a special way whenever they flow through class attributes. This is particularly egregious in a language where every object is potentially callable, and non-function callables are very common; so functions really aren't all that special in general - except for that one case.
With partial() specifically, of course you can avoid that in this manner. But why would you, if it works with regular functions? I prefer it over defs, not just because it's more concise and avoids repeating things, but because it's also clearer - when you see partial(), it immediately tells you that it's a simple alias, nothing else.
But regardless, it's a function that has a certain documented behavior, and common sense would dictate that this behavior works the same for methods as well as functions. That it doesn't is not an intentional limitation of partial() - it's an unfortunate quirk of the design of methods themselves. And if you don't know exactly how functions become methods in Python, you wouldn't have any reason to expect the behavior that it exhibits. That's why the docs for partial() have to spell it out explicitly: "partial objects defined in classes behave like static methods and do not transform into bound methods during instance attribute look-up" - because that's not a reasonable default assumption!
The bigger problem is that every library that offers a generic wrapper callable has to add the same clause to its docs, because they're all affected in the same way. And if they don't document it, and you use, say, a decorator from a library - how do you know whether the fact that it returns a function and not some other callable is part of the contract that you can rely on, or an implementation detail? Conversely, whenever you implement a decorator, you have to be cognizant that changing the type of the return value from/to plain function can be a breaking change for your clients - and that is even less obvious.
Let's take the following code snippet as an example:
class Foo:
def bar(val):
foo = val
I claim that most of the time this is a mistake, and the author would have preferred `def bar(self, val): self.foo = val`. In current python, this will raise an exception when called. In your proposed python, this will silently do nothing, possibly leaving the instance in an invalid state. This is a footgun. I admit the example is contrived, but forgetting `self` is a thing I've seen happen, and having it fail loudly is preferable to having it do something likely unintended. Again, if someone wants to do the unusual thing, `@staticmethod` is still around.> With partial() specifically, of course you can avoid that in this manner. But why would you, if it works with regular functions?
Simply: because I'd prefer it if functions look like functions. Understanding that `def x` is a callable is easier than trying to discern if `x = foo(other_thing)` results in x being callable or not, where it does for some values of `foo`, but not for others. Which isn't to say that python shouldn't make this change, I think I mostly agree with your complaint, I just probably wouldn't take advantage of it.
> My complaint is of course more complicated than the syntax alone.
To be frank, I don't see any connection between your syntactical suggestions and your semantic ones. They seem to be entirely orthogonal.