The Python Standard Library - Where Modules Go To Die(leancrew.com) |
The Python Standard Library - Where Modules Go To Die(leancrew.com) |
As much as people love to bash PHP -- and I agree that it's pretty awful as a language -- its standard library is so comprehensive, backwards-compatible, and superbly-documented that I have never had a comparably aggravating experience with it. The same is true of Javascript: a language with warts, but whenever I try something, it Just Works.
Like I say, perhaps I've just been unlucky, but my distinct impression of Python has been that it's a beautiful language surrounded by a particularly problematic ecosystem of incompatible libraries and sparse documentation. I suspect that the Python community would benefit from paying less attention to the purity of the language, and a lot more attention to the quality of everything surrounding it.
Create a virtual environment for the Python version you want to use, and install the package with pip (pip will install all the dependencies in your project's local environment)...
$ mkdir myproj
$ cd myproj
$ virtualenv --python=python2.7 env
$ source env/bin/activate
(env)$ pip install somepackage $ mkvirtualenv env1
(env1)$ pip install somepackage
Creating self containing python projects has never been so easy!Like I say, I've probably just been unlucky. But at this point it's given me a pretty serious aversion to Python. Will probably have to get over that someday, I suppose.
Virtualenv and PIP go a long way towards fixing these problems in Python, similar to how PEAR and CPAN work with PHP and Perl.
I'm completely lost as to why you think this is a problem with Python or its standard library. (Except that this seems like an opportune venue to bash a perceived competitor to your favorites)
It was years before I even saw any need to use virtualenv, (and that was because of misbehaving packages from Google and a desire never to modify PYTHONPATH again).
About documentation, again I feel strongly that you must have used some specifically bad package and again I can't say how I see that to be an inherent flaw with Python or its standard library.
"Pure" is one of the last things I'd call Python. For example, imperative constructs are jumbled with OO constructs and functional constructs.
I've found I can mitigate most of this by using perlbrew/cpanminus, pythonbrew/pip and rvm/gem.
This adds complications when it comes to deployment - you've added additional maintenance dependencies to your servers if you don't stick with what the package manager provides... Nothing insurmountable, but keeping an eye on updates (would usually use email to notify on dated versions) and maybe having your CI system always use the latest versions of everything are possible methods. I'd like to hear better ideas, to be honest.
My background is both sysadmin and development, I think there should be no reason we can't make this all simpler for everyone.
At least in Python, the problem occurs because distros make several mistakes which build on each other. (A) insist on generating their own packages for modules (and typically, taking forever to update them). (B) not providing any kind of isolation. (C) building on top of these non-isolated modules.
What I do these days is leave the system Python 100% for the use of the distro (at most, dependencies for minor command line scripts which don't have import relationships with code I'm working on). Development occurs inside virtualenvs, production apps run inside virtualenvs. I use pip to install, remove, pin versions.
Dependency versions are really part of the app; the right versions should be pip installable inside a virtualenv, and this is just something which should be done automatically on a deploy. I don't mean you need to include the whole code for your dependencies in your project, but pinning versions is important to not having to put out fires.
You really don't want any versions to advance automatically until you have had a chance at least to run unit tests - if things are working then pin the new version. Of course, if you are using a library which never suffers regressions or API changes then you don't need to do this.
In short, the dependency list and versions should be included with the project and managed by the developers so that deploys are really a matter of creating a virtualenv (with --no-site-packages) and running pip -r requirements.txt to install the right stuff.
I've had the same experience before, but chiefly with Ruby.
Interestingly, my problems with this in both Python and Ruby have evaporated once I got in the habit of using virtualenv/rb-env/rvm for my development environment.
Python is about the cleanest/nicest experience I have in any language, for the record. Only language that comes close is Clojure.
Leiningen is...legendary.
Suppose you want to do a very common task like parse some XML. In Clojure, the workflow is:
1. Go to Github or Clojars, find the latest version number of clojure.data.xml
2. Add this version number to your project.clj
3. Lein deps and restart the repl
4. Re-acquire whatever REPL data you had
In Python, it's: 1. import xml.{sax,dom,etree}
And, paradoxically, the availability of all these different versions of libraries in Clojure leads to MORE conflicts between libraries than would otherwise be the case, not less. In Python, you may not agree that, say, the "os" or "subprocess" modules are optimal -- but by golly, they're consistent.Tell me about it: http://lee-phillips.org/badruby/
That said: Leiningen & npm _are_ really nice solutions for me.
I don't have this problem on either Ruby or Java.
Python's ConfigParser module is a pain to use. It provides no validation, only supports a limited number of types of data you can retrieve, etc. Similarly, getopt vs optparse vs argparse is a mess. getopt is universal: not only is it going to be in all versions of Python, but it is also the same library available in virtually every other language. The problem with it is that it is not declarative, so you will typically see a giant if/elif statement that goes with it. argparse/optparse are better, but aren't universal even between versions of Python, though argparse has been backported and is available via pypi.
To unify all this into one convenient module, I ended up writing http://ipartola.github.com/groper/. groper lets you specify your parameters declaratively, and if you specify defaults, use them right away without having to create/modify a config file. It automatically figures out the priority of arguments: cmd > config > defaults. It also has some niceties such as the ability to automatically generate usage strings, give the user intelligent error messages, generate sample config files, etc.
Is it a terrible way to write software? Maybe... but perhaps that's a different discussion.
I think the requests library is amazing. It has a much more simple API than urllib/urllib2. Does it need to replace those modules in the stdlib? I hope not!
There are only three reasons I would write a module/package that depended solely on stdlib:
1. The module/package would be distributed primarily through package management systems.
2. The installation of my module needs to avoid depending on anything else outside of a base python installation.
3. The module or package will need to be supported for a long time and will likely not be updated frequently.
The first case is because you can't control what versions of third-party libraries the package manager will make available. Some might run your setuptools script while others may not. It's just easier to live with the cruft/warts of the stdlib and be sure that they'll always be there.The second case covers a very unique situation. Modules and libraries written with this constraint are typically targeting one of two different kinds of developers. The first are the beginners who may not know about development environments and versioning. The other are experienced developers who want a minimalist script for their little one-off utility. Both should require zero dependency installation if possible.
The final case is harder to define up front. If you're writing something that you expect to run for a long time and receive little maintenance (ie: cron scripts, tools, etc) then you don't want to deal with API updates breaking your code. Fire and forget is what a long-term stable API gets you.
That's a problem inherent in the standardization process, though - it's all but contradictory to have something be both 'standard' and 'continuously improving'.
Once something enters the standard, does anyone propose a better way of removing cruft without constantly deprecating everything, rendering the concept of a 'standard' somewhat meaningless?
I would really not be surprised to see envoy, requests and so on come up in the standard lib at some point.
I also highly recommend checking out Doug Hellman's book 'The Python Standard Library by Example'. He presents every (or almost every) standard library module with simple explanations and plenty of examples.
Except for some of the "batteries included" stuff. urllib2? Ouch.
I'm not sure what the solution this article proposes is. The tradeoff between "coolness" and "stability" is inherently difficult, and I'm sure Python is not the only language "suffering" from it.
After all, it's quite easy to install a new Python module, and not much harder to distribute it with your application (for web apps it's even easier), so what is the problem?
http://pypi.python.org/pypi?%3Aaction=search&term=IRC
That's 400+ results - at least 20 of which are actually IRC protocol modules. There's no way of telling how mature each one actually is 'til you download it. It turned out the first three I tried were undocumented, buggy, incomplete, or otherwise no good.
So I gave up on PyPi and hacked it as an xchat plugin instead.
----------------
Perhaps the way forward would be styling your package repo after, say, addons.mozilla.org -- add just enough community functionality (as in ratings/reviews/"times downloaded" counters/etc) to allow the occasional gems to rise to the top of the muck. Once one solution for a given problem has been established as the best (well, most popular), that'll get more eyeballs on its internals as well, and it'll only increase its lead until it's de facto standard -- but the possibility is still there for a newcomer to dethrone it if it's genuinely better. And meanwhile, both can exist side-by-side without causing ugly compatibility issues.
I have to say I'm not sure that selecting the package you want to use is really the problem which PyPI needs to solve. It isn't the app store. That said, PyPI does provide a 'weight' in searches, which seems to track with popularity and freshness somehow.
Indeed, PyPI might not be the right place for a community rating system -- perhaps a site could be built on top of it to provide that sort of functionality.
In all honesty, you could continue to maintain a package outside of stdlib, and just require a newer version (which gets installed via the standard packaging tools). This type of behavior isn't well defined in Python, but it's not unrealistic to think it could happen.
For example, let's say there was a new urllib released (its still called urllib). It's now version 2.0, but the stdlib version is 1.0.
If your package said "I need urllib==1.0", it would have know way of understanding that the version was already included within the standard library.
That said, it would download the correct package (assuming it existed) and work just fine.
Fortunately, Ruby gems are super easy to install and the standard library got some much-needed spring cleaning in 1.9.
Python could use the same. There have been many times where I've wanted to do some simple task that would be made easier with an external library (like Requests) but I'm not going to bother dealing with the Python module install pain for a one-off task.
"pip install requests" ?
The real issue is in discovering that a better option exists in the first place.
I think the Python "decorator" concept goes a long way toward cleaning up code. Basically you can add a decorator to a routine that you've deprecated so that it will complain if it's actually used (you can even include advice on what would be a good replacement call).
As far as cleaning up what's installed as standard, it's not really practical to remove anything (the fact that it stays is one of the attractive things about Python in old code bases). What you can do though is define a preferred namespace, e.g. "preferred"; this would physically contain only those libraries that are recommended, and perhaps even forked copies of modules that only contain the functions that should be used. This gives programs the option to explicitly import from "preferred" and request purity over long-term stability.
(The current article link goes to the front page of the author's blog.)
To pick just one that bites Python programmers all the time: by default, the ssl library does not validate the server certificate at all. Not validating the certificate makes SSL/TLS almost useless. But this is still the default (see http://docs.python.org/dev/library/ssl.html#socket-creation, "CERT_NONE"), because the standard library is "stable".
The certificate system surrounding SSL is a complete mess. It does virtually nothing other than trigger false positives for people who who haven't paid the appropriate "security partner."
The very rare person who is actually using SSL for identification rather than just to establish an encrypted TCP connection, and therefore cares about certificates, can change the default.
PS: I know the standard response to this, that encryption without identification is useless, because without identification your counter-party might be Eve. In reality, in the real world, that doesn't happen. MITM attacks are extremely rare. And the real Eves on the net (phishers) can easily obtain signed certificates that will fool pretty much any end user.
Hardly. Backward-compatible? Ok, I'll agree to that one.
But seriously, if you want to see an example of a not-so-well-maintained or carefully-debugged standard library. I challenge you to go look at 'shutil'.
That's just one example of many in the standard library that needs some serious TLC. Want another? Go look at tarfile or subprocess.
And don't even get me started on the lack of documentation for many parts of the standard library; the source code is the only real documentation.
I've rarely heard anyone say this. Do you have an example of an area that is severely lacking in documentation? Maybe I don't use a wide array of modules, but I can't remember the last time as a user I had to dive into the source code.
I know the stdlib lacks examples in a lot of areas, often going for purely API coverage, which is welcome to change.
_domains_text = requests.get(API_URL + "/domainlist.xml").content
_domains_db = pyquery.PyQuery(_domains_text)
DOMAINS = [d.values()[0] for d in _domains_db('domain')] jQuery = pyquery.PyQuery(someHTMLDocumentString)
So I can use jQuery like I'm used to. At this point, you can do links = jQuery('a')
or whatever.That in itself would not be reasonable. But it seems as if Ruby's own libraries broke going from 1.8.1. -> 1.8.7. Regardless of the number of years involved, that's kind of unexpected. But I'm not familiar with the Ruby world, and maybe that change in version number is considered major.
The whole experience left me with reduced confidence in Ruby stuff and I still avoid it and programs that use it. The comments below by people who know more about the Ruby ecosystem than I do don't give me any reason to change.
The problem is that the various packaged versions of Ruby are a shambles. RVM has a number of significant problems and difficulties (rbenv is both better and worse).
Right now getting a good, modern Ruby version on a standard modern computer is a big pain in the ass.
As long as that stays true, we will be dinged for not supporting old (but common) versions.
Years from now, when everybody has good 1.9 compatibility, things may be better. At least, if 2.0 doesn't have the same problems...
So many of the language features have changed in ruby programmer will have to edit every file in her project, and upgrade every single dependency, just to get her application running on a new ruby interpreter. Add that to the culture of "let's use as much 3rd party code as possible!" and the library writers emulating the core developers and changing their APIs all the time ... the process of upgrading an interpreter converges on "rewrite the entire application"
Which is what we do. And some percentage of those rewrites are in languages that don't have this problem.
edit: Python docs front-page[1] also notes following:
A new documentation project, which will be merged into the Python documentation soon, covers creating, installing and distributing Python packages: http://guide.python-distribute.org/
[1] http://www.python.org/doc/
So I guess that they have acknowledged that the docs are suboptimal currently in this part.
Other than by introspecting which packages are installed, that is. Most of them will have a VERSION, __version__ or _version attribute which tells you.
Encryption without identification and authentication of your communication partners is useless. You may very well end up with a very secure link with the wrong communication partner (google 'man-in-the-middle-attack').
I agree that the (public) CA system is a mess, however especially with machine-to-machine communication it is very easy to generate, sign and use your own certificates. And contrary to popular belief, self-signed certificates are not any less secure than public CA signed ones. Both have their own use-case though.
If someone cares I'll be happy to explain the above points in more detail.
IF you assume that MITM-attacks are rare, you probably also assume that traffic snooping is rare (which is after all a form of a MITM-attack). If that's the case, why use encrypted communication channels at all?
Security is never perfect - it always is about adding layer upon layer to make the bar high enough that the remaining number of adversaries becomes more manageable.
Spoofing a site that is not using SSL is trivial. Using SSL with public CA signed certificates significantly raises the bar. Not to the 'perfect' level, but enough to make a real difference. Not checking the server certificate throws you back to the 'trivial' level.
"IF you assume that MITM-attacks are rare, you probably also assume that traffic snooping is rare (which is after all a form of a MITM-attack)"
Well, no! You can snoop traffic without being the MITM (WiFi, local network snooping, etc). Snooping is much more easier.
As you said, security is never perfect, and 'security implementers' less so.
If the other part of the communication uses a self signed certificate (or signed by "Bob's SSL") well, I can try to convince them to change, but it will be hard.
Sure, I'll never accept a self signed key from my bank or e-commerce, but there are several other uses.
And when using APIs to connect to https you should be able to tell it to ignore the certificate, it doesn't matter, way more often than the opposite, unless you don't trust your ISP.
The only thing you're protected against with encryption sans authentication is passive sniffing. If that's all you care about, fine, do realize however how limited the protection is you gain.
Either go all the way on security, or be obvious about not having any. Appearing secure when in actuality you're not is the worst option.
Really? NSA boxes in AT&T (and presumably other) switching stations suggest that for US traffic it's extremely common.
I think they're pretty common, even for fun and recreation (http://codebutler.com/firesheep). I know I could start reading people's emails in Starbucks with what's on my laptop now and the knowledge in my head, but if I wanted to mount a MITM attack I would need to do some research.