The Python Standard Library - Where Modules Go To Die

The Python Standard Library - Where Modules Go To Die(leancrew.com)

122 points by b14ck 14 years ago | 72 comments

nkoren 14 years ago |

I'm not really a Python programmer, but a hacker who occasionally has cause to pick up Python scripts and do stuff with them. Perhaps I've been unlucky, but every time I've done this, it's turned into a profoundly frustrating exercise. There have always been dependencies outside the standard library, and those have had dependencies -- which more often than not, are incompatible with whatever version of Python my environment is set up for. I've frequently run across scripts with dependencies that somehow only execute in mutually incompatible versions of Python, which always makes for an exceedingly aggravating day of programming.

As much as people love to bash PHP -- and I agree that it's pretty awful as a language -- its standard library is so comprehensive, backwards-compatible, and superbly-documented that I have never had a comparably aggravating experience with it. The same is true of Javascript: a language with warts, but whenever I try something, it Just Works.

Like I say, perhaps I've just been unlucky, but my distinct impression of Python has been that it's a beautiful language surrounded by a particularly problematic ecosystem of incompatible libraries and sparse documentation. I suspect that the Python community would benefit from paying less attention to the purity of the language, and a lot more attention to the quality of everything surrounding it.

espeed 14 years ago | |

There have always been dependencies outside the standard library, and those have had dependencies -- which more often than not, are incompatible with whatever version of Python my environment is set up for.

Create a virtual environment for the Python version you want to use, and install the package with pip (pip will install all the dependencies in your project's local environment)...

  $ mkdir myproj
  $ cd myproj
  $ virtualenv --python=python2.7 env
  $ source env/bin/activate
  (env)$ pip install somepackage

mas1n 14 years ago | | |

Or you can install virtualenvwrapper (http://www.doughellmann.com/projects/virtualenvwrapper/) and after a little configuration...

  $ mkvirtualenv env1
  (env1)$ pip install somepackage

Creating self containing python projects has never been so easy!

nkoren 14 years ago | | |

Yep, I learned this trick on my first go-around. The problem has occurred when I've tried to run scripts with dependencies which somehow are only available in mutually incompatible versions of Python. I'm not sure how this is possible (it's baffling -- it shouldn't be possible), but it's happened three out of four times that I've tried to use a Python script of any real consequence. Usually after half a day of futilely trying to find an environment which will actually accommodate all dependencies, I end up having to port everything to whatever version of Python appears to be the most common denominator. In all fairness to Python, this is relatively easy to do (except in one case, when I had to hire a Python-expert friend to do it), but it still means that what should've been a five-minute affair (less the dependency hell) turns into a full-day affair.

Like I say, I've probably just been unlucky. But at this point it's given me a pretty serious aversion to Python. Will probably have to get over that someday, I suppose.

anthonyb 14 years ago | |

I hate PHP's standard library - I find it to be weird and inconsistent, arguments are in a random order, etc. and Python to be not so bad. I suspect this is because you have a lot of PHP experience, and I have a lot of Python experience :)

Virtualenv and PIP go a long way towards fixing these problems in Python, similar to how PEAR and CPAN work with PHP and Perl.

slurgfest 14 years ago | |

Having used Python in anger for some years I have never encountered anything like this (scripts with mutually incompatible dependencies, really?) It sounds like you ran into some specific package which was poorly made or had a bad release, or perhaps just a package which should have had its dependencies pegged at specific versions?

I'm completely lost as to why you think this is a problem with Python or its standard library. (Except that this seems like an opportune venue to bash a perceived competitor to your favorites)

It was years before I even saw any need to use virtualenv, (and that was because of misbehaving packages from Google and a desire never to modify PYTHONPATH again).

About documentation, again I feel strongly that you must have used some specifically bad package and again I can't say how I see that to be an inherent flaw with Python or its standard library.

"Pure" is one of the last things I'd call Python. For example, imperative constructs are jumbled with OO constructs and functional constructs.

fuzzix 14 years ago | |

Most frustrations I've had with Perl/Ruby/Python have been down to inconsistent, incomplete or out-of-date packaging by the software distribution.

I've found I can mitigate most of this by using perlbrew/cpanminus, pythonbrew/pip and rvm/gem.

This adds complications when it comes to deployment - you've added additional maintenance dependencies to your servers if you don't stick with what the package manager provides... Nothing insurmountable, but keeping an eye on updates (would usually use email to notify on dated versions) and maybe having your CI system always use the latest versions of everything are possible methods. I'd like to hear better ideas, to be honest.

My background is both sysadmin and development, I think there should be no reason we can't make this all simpler for everyone.

slurgfest 14 years ago | | |

Yes, I agree completely.

At least in Python, the problem occurs because distros make several mistakes which build on each other. (A) insist on generating their own packages for modules (and typically, taking forever to update them). (B) not providing any kind of isolation. (C) building on top of these non-isolated modules.

What I do these days is leave the system Python 100% for the use of the distro (at most, dependencies for minor command line scripts which don't have import relationships with code I'm working on). Development occurs inside virtualenvs, production apps run inside virtualenvs. I use pip to install, remove, pin versions.

Dependency versions are really part of the app; the right versions should be pip installable inside a virtualenv, and this is just something which should be done automatically on a deploy. I don't mean you need to include the whole code for your dependencies in your project, but pinning versions is important to not having to put out fires.

You really don't want any versions to advance automatically until you have had a chance at least to run unit tests - if things are working then pin the new version. Of course, if you are using a library which never suffers regressions or API changes then you don't need to do this.

In short, the dependency list and versions should be included with the project and managed by the developers so that deploys are really a matter of creating a virtualenv (with --no-site-packages) and running pip -r requirements.txt to install the right stuff.

heretohelp 14 years ago | |

I think you've really just been unlucky.

I've had the same experience before, but chiefly with Ruby.

Interestingly, my problems with this in both Python and Ruby have evaporated once I got in the habit of using virtualenv/rb-env/rvm for my development environment.

Python is about the cleanest/nicest experience I have in any language, for the record. Only language that comes close is Clojure.

Leiningen is...legendary.

xaa 14 years ago | | |

I would argue that Python's approach is FAR better than Clojure/Leiningen's "no batteries included" approach.

Suppose you want to do a very common task like parse some XML. In Clojure, the workflow is:

  1. Go to Github or Clojars, find the latest version number  of clojure.data.xml

  2. Add this version number to your project.clj

  3. Lein deps and restart the repl

  4. Re-acquire whatever REPL data you had

In Python, it's:

  1. import xml.{sax,dom,etree}

And, paradoxically, the availability of all these different versions of libraries in Clojure leads to MORE conflicts between libraries than would otherwise be the case, not less. In Python, you may not agree that, say, the "os" or "subprocess" modules are optimal -- but by golly, they're consistent.

leephillips 14 years ago | | |

I've had the same experience before, but chiefly with Ruby.

Tell me about it: http://lee-phillips.org/badruby/

darklajid 14 years ago | | |

What is Leinigen offering over Maven? Isn't it based on the same stuff?

That said: Leiningen & npm _are_ really nice solutions for me.

chaostheory 14 years ago | |

I think the problem is because when you used it the Python community probably still hadn't decided on a one true dependency system like Gems / rvm for Ruby, and Maven for Java. It's been about four years since I've heavily used Python, but hopefully this has been fixed by now.

I don't have this problem on either Ruby or Java.

slurgfest 14 years ago | | |

Nope, pip and virtualenv are pretty much it. Use 'em

atdt 14 years ago |

The fact that the standard library is well-maintained, carefully debugged, and backward-compatible is a far stronger indicator of Python's awesomeness than the existence of shiny, new libraries. Hackers naturally gravitate toward high-visibility projects with brave horizons and bold scopes. By contrast, it is incredibly hard to find the motivation to update, for the umpteenth time, a warty API -- and that's precisely the reason why contributions of the latter sort are the truer test of the vitality of a language's ecosystem.

IgorPartola 14 years ago |

Agreed with the OP. The following is a shameless plug:

Python's ConfigParser module is a pain to use. It provides no validation, only supports a limited number of types of data you can retrieve, etc. Similarly, getopt vs optparse vs argparse is a mess. getopt is universal: not only is it going to be in all versions of Python, but it is also the same library available in virtually every other language. The problem with it is that it is not declarative, so you will typically see a giant if/elif statement that goes with it. argparse/optparse are better, but aren't universal even between versions of Python, though argparse has been backported and is available via pypi.

To unify all this into one convenient module, I ended up writing http://ipartola.github.com/groper/. groper lets you specify your parameters declaratively, and if you specify defaults, use them right away without having to create/modify a config file. It automatically figures out the priority of arguments: cmd > config > defaults. It also has some niceties such as the ability to automatically generate usage strings, give the user intelligent error messages, generate sample config files, etc.

slurgfest 14 years ago | |

I don't understand why I shouldn't be using argparse. Just using argparse means no mess of 'getopt vs optparse vs argparse' because I am not using all those other libraries. I don't see anything seriously wrong for argparse. How does it help me to use a third party module rather than argparse?

IgorPartola 14 years ago | | |

Using argparse is probably the safest approach. However, argparse does not work with config files; groper does. So if you have more than a half-dozen options, you should use groper (or something similar).

bunnyhero 14 years ago |

The permalink for the article is http://www.leancrew.com/all-this/2012/04/where-modules-go-to... (the posted link is actually the home page of the blog).

agentultra 14 years ago |

I've heard core Python developers tell people not to worry about getting a module into the stdlib. The problem being that once the module is there it won't be able to change much. APIs have the exact same problem. If you change it, you're changing other peoples' software. Tight coupling.

Is it a terrible way to write software? Maybe... but perhaps that's a different discussion.

I think the requests library is amazing. It has a much more simple API than urllib/urllib2. Does it need to replace those modules in the stdlib? I hope not!

There are only three reasons I would write a module/package that depended solely on stdlib:

  1. The module/package would be distributed primarily through package management systems.

  2. The installation of my module needs to avoid depending on anything else outside of a base python installation.

  3. The module or package will need to be supported for a long time and will likely not be updated frequently.

The first case is because you can't control what versions of third-party libraries the package manager will make available. Some might run your setuptools script while others may not. It's just easier to live with the cruft/warts of the stdlib and be sure that they'll always be there.

The second case covers a very unique situation. Modules and libraries written with this constraint are typically targeting one of two different kinds of developers. The first are the beginners who may not know about development environments and versioning. The other are experienced developers who want a minimalist script for their little one-off utility. Both should require zero dependency installation if possible.

The final case is harder to define up front. If you're writing something that you expect to run for a long time and receive little maintenance (ie: cron scripts, tools, etc) then you don't want to deal with API updates breaking your code. Fire and forget is what a long-term stable API gets you.

chimeracoder 14 years ago |

> An overstatement, certainly, but with more than a germ a truth. Once a library is enshrined in the standard set, it can’t change radically because too many programs rely on it—and its bugs, idiosyncrasies, and complications—remaining stable.

That's a problem inherent in the standardization process, though - it's all but contradictory to have something be both 'standard' and 'continuously improving'.

Once something enters the standard, does anyone propose a better way of removing cruft without constantly deprecating everything, rendering the concept of a 'standard' somewhat meaningless?

lloeki 14 years ago | |

Things come and go, and we have seen a number of deprecations in python already. urllib predates urllib2, while subprocess deprecates a number of things itself that really came from C. getopt was a port of the eponymous C library, which optparse meant to replace, which was itself deprecated on favor of argparse.

I would really not be surprised to see envoy, requests and so on come up in the standard lib at some point.

norswap 14 years ago |

The thing with python standard library is that it is crapily documented imho. I often can't make heads or tails of it, while I have a much simpler time with any other language (you name it: Java, Ruby, PHP, C, Scala, Lisp, ...).

aslewofmice 14 years ago | |

I felt that way about some of the standard documentation but was relieved to find: http://www.doughellmann.com/PyMOTW/

I also highly recommend checking out Doug Hellman's book 'The Python Standard Library by Example'. He presents every (or almost every) standard library module with simple explanations and plenty of examples.

zeeg 14 years ago | | |

I've never seen this before, but that is an AMAZING improvement on the standard docs

chaostheory 14 years ago | | |

Nice post. This is way better than the standard docs. Having lots of examples really helps.

TazeTSchnitzel 14 years ago | |

The Python standard library has great documentation.

Except for some of the "batteries included" stuff. urllib2? Ouch.

norswap 14 years ago | | |

That was indeed the kind of stuff I had in mind.

eliben 14 years ago |

Yes, for a language as widely deployed and used as Python, retaining backwards compatibility and stability is more important than adding new and shiny tools to the stdlib at a faster pace. Users rely on the fact that a module in stdlib will remain there and will remain stable for a long time. More modules means more maintainers, and Python is an open-source project developed by volunteers. It's that simple.

I'm not sure what the solution this article proposes is. The tradeoff between "coolness" and "stability" is inherently difficult, and I'm sure Python is not the only language "suffering" from it.

After all, it's quite easy to install a new Python module, and not much harder to distribute it with your application (for web apps it's even easier), so what is the problem?

EvilTerran 14 years ago |

It's funny, I've had the opposite problem. I was trying to write an IRC bot in Python, noted there didn't seem to be a standard library module for the IRC protocol, and so found myself looking at this:

http://pypi.python.org/pypi?%3Aaction=search&term=IRC

That's 400+ results - at least 20 of which are actually IRC protocol modules. There's no way of telling how mature each one actually is 'til you download it. It turned out the first three I tried were undocumented, buggy, incomplete, or otherwise no good.

So I gave up on PyPi and hacked it as an xchat plugin instead.

----------------

Perhaps the way forward would be styling your package repo after, say, addons.mozilla.org -- add just enough community functionality (as in ratings/reviews/"times downloaded" counters/etc) to allow the occasional gems to rise to the top of the muck. Once one solution for a given problem has been established as the best (well, most popular), that'll get more eyeballs on its internals as well, and it'll only increase its lead until it's de facto standard -- but the possibility is still there for a newcomer to dethrone it if it's genuinely better. And meanwhile, both can exist side-by-side without causing ugly compatibility issues.

slurgfest 14 years ago | |

I believe that PyPI used to have some kind of popularity contest functionality that got killed.

I have to say I'm not sure that selecting the package you want to use is really the problem which PyPI needs to solve. It isn't the app store. That said, PyPI does provide a 'weight' in searches, which seems to track with popularity and freshness somehow.

EvilTerran 14 years ago | | |

The title text on "weight" says "Occurrence of search term weighted by field (name, summary, keywords, description, author, maintainer)". So, not popularity/freshness, just a rough metric for how well it matches your search.

Indeed, PyPI might not be the right place for a community rating system -- perhaps a site could be built on top of it to provide that sort of functionality.

zeeg 14 years ago |

This happens in every language. I dont think it's that big of a deal.

In all honesty, you could continue to maintain a package outside of stdlib, and just require a newer version (which gets installed via the standard packaging tools). This type of behavior isn't well defined in Python, but it's not unrealistic to think it could happen.

true_religion 14 years ago | |

How is it not well defined in Python?

zeeg 14 years ago | | |

Well it would work just fine, but you'd always end up requiring the external dependency even if you didn't need to.

For example, let's say there was a new urllib released (its still called urllib). It's now version 2.0, but the stdlib version is 1.0.

If your package said "I need urllib==1.0", it would have know way of understanding that the version was already included within the standard library.

That said, it would download the correct package (assuming it existed) and work just fine.

100k 14 years ago |

This has kind of happened in Ruby, too.

Fortunately, Ruby gems are super easy to install and the standard library got some much-needed spring cleaning in 1.9.

Python could use the same. There have been many times where I've wanted to do some simple task that would be made easier with an external library (like Requests) but I'm not going to bother dealing with the Python module install pain for a one-off task.

atdt 14 years ago | |

> the Python module install pain for a one-off task.

"pip install requests" ?

RegEx 14 years ago | | |

Unfortunately, it seems the official documentation on installing Python modules[0] makes absolutely no mention of pip or even easy_install. Seems like something that should be there, right?

[0]: http://docs.python.org/install/

nickpresta 14 years ago | |

I tend to use pip, requirements file, and virtualenv. No real trouble.

lars512 14 years ago | |

I find the opposite. Installing a useful module for a once-off task is a no brainer, since I won't have to worry about whether I'm introducing some long-term dependency I'll have to maintain.

The real issue is in discovering that a better option exists in the first place.

100k 14 years ago | | |

And then you have to deploy it to the server or someone else's computer and it sucks.

slurgfest 14 years ago | |

Python has had a lot of cleaning in 3, but everyone is screaming about how 3 doesn't work just like 2 so I guess you just can't satisfy everyone.

lbotos 14 years ago | | |

If anyone was wondering (Like I was) what was dropped from the standard lib, this looks to be the list:

http://www.python.org/dev/peps/pep-3108/

makecheck 14 years ago |

Every language's standard library needs a "current best practices" concept, even if it's just a well-maintained document and not something structural like a special namespace.

I think the Python "decorator" concept goes a long way toward cleaning up code. Basically you can add a decorator to a routine that you've deprecated so that it will complain if it's actually used (you can even include advice on what would be a good replacement call).

As far as cleaning up what's installed as standard, it's not really practical to remove anything (the fact that it stays is one of the attractive things about Python in old code bases). What you can do though is define a preferred namespace, e.g. "preferred"; this would physically contain only those libraries that are recommended, and perhaps even forked copies of modules that only contain the functions that should be used. This gives programs the option to explicitly import from "preferred" and request purity over long-term stability.

daxelrod 14 years ago |

Permalink to this post: http://www.leancrew.com/all-this/2012/04/where-modules-go-to...

(The current article link goes to the front page of the author's blog.)