Python's New Package Landscape(andrewsforge.com) |
Python's New Package Landscape(andrewsforge.com) |
edit: I requested an exemption but corp IT staff came back and said there's definitely been malware identified on that site. So... be careful with your clicks.
edit2: Well who knows where the malware alert is coming from, might be an ad or something.
Being able to select P2 or P3 environments is great.
Unfortunately it decided all my packages were in /var/mail.
No patience to debug it, so I gave up on it.
With its ability to install Python and non-python packages (including binaries), conda is my go-to for managing project environments and dependencies. Between the bioconda[2] and conda-forge[3] channels, it meets the needs of many on the computational side of the biological sciences. Being able to describe a full execution environment with a yaml file is huge win for replicable science.
1. https://conda.io/miniconda.html https://conda.io/docs/user-guide/install/index.html
Also, a conda package is not a replacement for a distutils/setuptools package. When building a conda package, one still calls setup.py. So every python conda package has to be a distutils/setuptools package anyway.
The fact that it's not a community-driven project might be one of the reasons.
Meanwhile, in a galaxy far away, people are also using buildout.
- there is Miniconda that doesn't force you to install all PyData packages. - virtual envs and needed packages are all defined in simple yaml file. - it works well with pip. So if a package isn't in Conda repository, you can install from pip. The annoyance here is that you must try conda, fail, and then try pip. - you can easily clone envs. So you can have some base envs with your usual packages (or one for Python 2 and another fo Python 3), and just clone them to start a new project.
I "feel" like it is overkill to use anaconda for things unrelated to 'data science' and the likes, but I'm not sure why I feel that way.
It kind of makes sense to use for other projects as well since you don't need to import all the things conda offers.
Do you have scipy/numpy/keras or cython somewhere in the deps? pipenv lock is slow, but not 20-30 mins slow unless there's a very very large download and/or a long compilation somewhere in there.
This wasn't for complex pipenv operations either. A simple command: pipenv run python main.py took progressively longer to execute.
I've never seen anything like that on a number of fairly large apps – a minute or two, at most. Are some of those dependencies extremely large or self-hosted somewhere other than PyPI?
I rely on it for pretty much everything and I didn't run into game breaking problems.
Anyway, for me the most annoying thing about the Python projects architecture (and about the whole Python perhaps) is that you can't split a module into multiple files so you have to either import everything manually all over the project trying to avoid circular imports or just put everything in a single huge source file - I usually choose the latter and I hate it. The way namespaces and scopes work in C# feels just so much better.
There's no concept of explicit versus implicit dependencies. You install one package, and end up with five dependencies locked at exact versions when you do `pip freeze`. Which of those was the one you installed, and which ones are just dependencies-of-dependencies?
If you're consistent and ALWAYS update your requirements.txt first with explicit versions and NEVER use `pip freeze` you might be okay, but it's more painful than most of the alternatives that let you separate those concepts.
Interesting, this has never been a problem for me. I've built some large tools and while it isn't fast, it's always completed in a few minutes.
pipenv lock 5.65s user 0.29s system 77% cpu 7.639 total
i think 7.6 seconds is fine for an operation that you'd rarely doit would probably take ages at work though. just opening a WSL terminal takes several seconds there, which is predictably instantaneous (<100ms) on fedora linux at home
After trying to replace pip with Pipenv, we had to stop. The dependency resolution time for 20 declared dependencies (that in turn pull down > 100 components) takes well over 5 minutes. With poetry - it takes less than 33 seconds on a clean system. The times are consistent for both Ubuntu 16.04 and Mac OS X.
Our only goal is to get to the point we're now in - tracking dependencies, and separate dev requirements (like ipython and pdbpp) from our other requirements. Poetry made it fast, simple, and made me an addict.
Over two days, I moved our entire codebase and every single (active) personal project I had to poetry. I don't regret it :)
There's a certain joy working with tools when it's clear that the person making those tools actually cares about the developer and making it work well.
Man, the Python packaging ecosystem is one of those things which really bring me down regarding the state of Python, because there is such an extremely high barrier for breaking backwards compatibility and nothing really works.
The JS ecosystem is far better in this regard. Pipenv was most promising because it followed in Yarn's footsteps, but it didn't go all the way in replacing pip (which it really should have). So now there's still a bunch of stuff handled by pip, which pipenv does not / cannot know about, and this isn't really fixable.
The end result is that instead of telling people about pip + virtualenv, we now have pip, virtualenv and pipenv to talk about. And people who don't understand the full stack, and the exact role of each tool, can't really understand how to properly do the tasks we choose to recommend delegating to each one of them.
There's three separate-but-related use cases:
- "Installing a library" (npm install; pip install).
- "Publishing a library" (setup.py. Or Twine if you're using a tool. Both use setuptools.).
- "Deploying a Project", local dev or production (pipenv. Well, if it's configured with a pipfile, otherwise virtualenv, and who knows where your dependencies are, maybe requirements.txt. Pipenv does create a virtualenv anyway, so you can use that. Anyway you should be in docker, probably. Make sure you have pip installed systemwide. Yes I know it comes with python, but some distributions remove it from Python. Stop asking why, it's simple. What do you mean this uses Python 3.6 but there's only Python 3.5 available on Debian? Wait, no, don't install pyenv, that's not a good idea! COME BACK!)
The JS ecosystem manages to have two tools, both of which can do all of this. I don't know how we keep messing up when we have good prior work to look at.
Have you, or are you not using explicit versions supplied by eg pip freeze?
This article seems well-written and well-intentioned. Despite reading it, I don't know why I would not have loose dependencies in setup.py and concrete, pinned dependencies in requirements.txt. It's never felt hard to manage or to sync up - the hard part is wading through all the different tools and recommendations.
How does that work? How would someone else coming to work on your project use them?
> concrete, pinned dependencies in requirements.txt
How do you maintain that requirements.txt? And while that might work for applications, what do you do for libraries?
pip install -e .
in a virtual environment. I thought this was quite well-established. Is there a problem with it that I'm not aware of? pip freeze > requirements.txt
for requirements.txt generation. For libraries just omit this? I'm not sure I understand the question. The article also mentions that several of the new tools aren't appropriate for libraries anyway.For libraries, I've been using Poetry for molten[1] and pure setuptools for dramatiq[2] and, at least for my needs, pure setuptools seems to be the way to go.
In my limited experience, Clojure's Leinengen is a far more pleasant way to solve these problems. I'm sure there are many other examples in other languages, but in the few I've used, nothing comes close. Each project has versioned dependencies, and they stay in their own little playground. A REPL started from within the project finds everything. Switch directories to a different project, and that all works as expected, too. It's a dream.
I haven’t used Pipenv yet but it works with pyenv to create virtual envs with a specified puthon version as well as all the correct packages.
Then I start reading (these comments)... mayyybe I should try Julia... or anything else, at least while I’m still getting started.
In the future — you have a set of entry points to your program, these are crawled by the language aware tool chain to identify and assemble all the requirements for the program (including 3rd party functionality). There’s no need for separate tools to manage packages, caches, and virtual environments — let’s just put all this logic into the compiler(s) — where necessary let the application describe the necessary state of the external world and empower language toolchains to ensure that it’s so ... let’s live in the future already ...
I have seen poetry is working on their bootstrapping story. I could not get their current solution to work on Ubuntu. Maybe what they are developing towards will work.
Often there is a very simple way, even in Python packaging and deploying. In my situation the easiest way began along these lines --
1. Install python3 for the local user from the source distribution (make sure you have compilers etc that the configure check lists out)
2. After compiling the sources and finishing with 'make install', make Python available in your local search path
3. And use pip with this magical --user flag as needed. No virtual env, conda, etc etc.
4. Leave HOMEPATH etc alone as this conflicts with the setup of the admin's system wide installs (when you su)
Things can go smoothly with pip alone.
For library development I target pure pip/setuptools but still use pipenv during development phase. There have been a few cases where pipenv had problems and I had to either remove my virtualenv and reinitialize it or even remove my pip-file/lockfile, but since I still have my setup.py it's not a big deal for me.
As for uploading etc I use twine but I wrap everything in a makefile to make handling easier.
A problem I noticed recently was a case where one of my developers used a tool which was implicitly installed in the testing environment since it was a subdependency of a testing tool but it was not installed into the production image. This resulted in "faulty" code passing the CI/CD and got automatically deployed to the live development environment where it broke (so it never reached staging). Caused a little bit of a headache before I found the cause.
That's what Go and tools like Bazel allows for : static builds, which forces to modularize the project into smaller independent components.
In case of static builds, the protocol between components is the C ABI, or an RPC protocol, but it could be a mesh of microservices too.
What is currently happening with the explosions of tools with Python is the result (take it with a grain of salt, only my opinion) of people only working with Python and not exploring enough outside of it
A Python project does not only depend on Python modules, but non-Python modules as well. Beyond Python, conda helps manage your other dependencies, like your database. I use Miniconda instead of Anaconda, to avoid the initial mega-download.
I recently wrote about it as blog post, using conda within containers has solved almost every pain point we had with python packaging and how to get things into production reliably.
Now most projects have wheels pip is pretty damn good.
The conda CLI is also just terrible. It's good for ad-hoc research, but for big deployments? No thanks, I've had enough pain using it.
The biggest pain point of Pipenv for me is that it cannot as yet selectively update a single dependency without updating the whole environment.
There are even languages like C++ where the community as a (w)hole has given up on that topic and instead opts for completely building every tool by building the underlying libraries up first manually.
Considering all this, who can actually beat Python at this point? Java maybe? Is Ruby still competing? How is NodeJS doing?
Currently with what I see around me (mostly Go and C++) I don't feel too bad about setuptools+pip+virtualenv anymore.
What should the alternative be now?
Edit: I'm reading about twine right now, but I cannot begin to comprehend why it's not bundled directly if this is what they are intending for us to use to upload packages.
The article mentions Pipsi is designed to make command-line apps globally accessible, and I'll try it out.
Additionally, adding git/src/package/module.py may be fine when you're using an IDE, but when browsing in a file manager, you must navigate 3 directories deep to even see any source files, which seems to be trending towards the inconvenience amd pain of Java projects.
I just pin the project's direct dependencies in the setup.py file and install the folder directly. I know it might cause bugs with different developers (or the CI) using different versions of the upstream dependencies but I guess I trust the developers who create each library I'm using. The moment I directly import something from what used to be an upstream dependency, I pin it too.
So far this approach hasn't given me trouble, but I'll still take a look at poetry based on what I read in the comments here.
This makes the situation sound a lot more complex than it actually is by conflating separate layers: the system distribution issue is exactly the same for both Python and JS (if Debian ships an old v8 you either need to install a new one, perhaps using Docker to make that easy and isolated). Similarly, the question of whether you install the app using pip or pipenv is a different layer from whether you're using Docker or not, just as Docker is unrelated to the question of whether you use npm or yarn.
For a new project in 2018, you can simply say “Use pipenv. Deploy in Docker, using pipenv.” and it works as well as the JS world. People sometimes choose to make their projects too complicated or to manage things at the wrong level but that's a social problem which is hard to solve with tooling.
Most Node developers grew up with
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash
or brew install node
or using one of the dozen other ways to install node. Distinct versions and per-project packages were the norm from day one. That was not true with Python.As a developer I definitely see the advantage of this approach and realize that it trivially solves some very hard problems (and causes a whole bunch of different problems if you're running Windows...). As an end user I'm not super thrilled about the prospect each tool I use coming in its own docker container and needing to spin up 20 different containers each time I want to do anything.
Agreed. You know you really blew it when even PHP does it better, and composer is unquestionably better than anything we've got in Python.
"pip install --user pkg" is good enough for beginners and writing libraries with few deps.
Part of the issue is how well Python integrates with non-Python dependencies. Before conda, when I wanted to upgrade some Python projects, I'd get errors complaining about my Fortran compiler. These days, most of the major projects upload precompiled binaries for major platforms to PyPI, but when it was just source code...
Using pip and virtualenv is usually fine or using pipenv .
Perhaps “Primer” was the wrong word, but I believe the sentiment is valid. Reading the comments, there is simply no consensus. If code readablility is important since code will be read more than written, packing is important because in many many scenarios that count code will be distributed more than it will be read.
It is simply frustrating that the typical response to comments like my original comment is some form of “it’s not as bad as you think”. Look, we have a problem here. A problem many other languages deem important enough to solve upfront. It’s been a problem for a long long time.
If the problem is that stdlib cannot move as fast as PyPI-related development requires, maybe that should be fixed, rather than trying to bypass all quality checks and then relying on obscure shared knowledge to navigate the ecosystem. Maybe there should be a system where specific network-sensitive stdlib modules could be updated faster than the rest.
> Maybe there should be a system where specific network-sensitive stdlib modules could be updated faster than the rest.
This is essentially what `setuptools` does, by putting a package on PyPI that monkeypatches/plugs in to the stdlib.
The reason for this is that right now, that command comes from `distutils`, which is part of the standard library. There is a huge disadvantage to bundling this functionality with your Python distribution, namely that it can only get upgraded when you upgrade your Python distribution. A lot of folks are still running versions of Python from several years ago, which is fine, but it means that they are missing out on anything new that's been added in the meantime.
For example, earlier this year, we released a new package metadata version which allows people to specify their package descriptions with Markdown. This required a new metadata field, which old versions of `distutils` know nothing about.
Upgrading `distutils` to support it would require that these changes go though the long process of making it into a Python release, and even then they would only be available to folks using the latest release.
Moving this functionality from `distutils` to a tool like `twine` means that new features can be made available nearly immediately (just have to make a release to PyPI) and that they're available to users on any Python distribution (just have to upgrade from PyPI).
The `distutils` standard library module comes from a time when we didn't have PyPI and thus, didn't have a better way to distribute this code to users. We have PyPI now though, so bundling `distutils` with Python is becoming less and less useful.
[1]: https://packaging.python.org/tutorials/packaging-projects/#u...
Recently I switched to pipenv because zappa insists on having virtualenv (as app dev I never had any need for it - but it seems my case is an exception, as I almost never work on multiple apps in parallel). Pipenv does make version management a bit easier, but it wasn't difficult (for me) to begin with.
From talking with other developers I know my view is somewhat unorthodox, but I haven't encountered the problems they describe, or the pain hasn't been that big for me to embrace all the issues that come with virtualenvs.
other people tried different approaches, like, pipenv and is just that, a separate project trying to solve the same problem.
I don't know why / who said that pipenv is the official recommended way, if it is it should not be and I hope it is not.
The Python Packaging Authority says that pipenv is the first "recommended way" for managing application dependencies: https://packaging.python.org/guides/tool-recommendations/#ap...
Except if pipenv doesn't meet your needs. Then use pip.
Or, if you need cross-platform support, use buildout.
Or, if you are doing scientific computation, don't use any of those, use conda, Hashdist, or Spack.
Or if you need to create a package, use setuptools and twine.
So, no, pip and virtualenv don't solve the problem, because there are a lot of different problems and use-cases. I can say from my experience that conda is _the_ best solution to the problem for scientific work.
The `pip` package is not actually bundled with your Python distribution, instead the standard library has `ensurepip` which provides a means of bootstrapping a `pip` installation without `pip` itself. See [0].
> In fact, why not merge the twine functionality into pip?
This has been considered and still might happen, see [1], specifically the comment at [2].
[0] https://docs.python.org/3/library/ensurepip.html
[1] https://github.com/pypa/packaging-problems/issues/60
[2] https://github.com/pypa/packaging-problems/issues/60#issueco...
It is bundled, as mentioned in the link [0] you posted: "pip is an independent project with its own release cycle, and the latest available stable version is bundled with maintenance and feature releases of the CPython reference interpreter."
> the standard library has `ensurepip`
Ensurepip is for Python distributions, which are supposed to do use it automatically to provide the bundled pip. See [3]: "Ensurepip is the mechanism that Python uses to bundle pip with Python." Basically it's the installer of the bundled pip. At least that's how I understand it.
> This has been considered and still might happen, see [1]
Note that while the users there all basically say the same thing (twine should be merged into pip as "pip publish") the (two out of three) PyPA devs say it "would be a major mistake" and they are "against adding pip publish". (Before starting offtopic rants against poetry...) I somehow doubt this will improve soon.
[3] https://mail.python.org/mm3/archives/list/distutils-sig@pyth...
Oh, so you weren't around when Requests went 2.0 backward-incompatible (because they changed .json() with .json, or the other way around, can't remember) and half of PyPI, with its happy-go-lucky ">=1.0", broke...?
Since then, most people have learnt that you pin first and ask questions later.
If you configure the cache properly you might solve it, but yeah it's kinda dumb it has to do that just for locking.
This can make a difference of tens of minutes for some packages which have a 1 gigabyte (!!!) download.
Being able to define and install external dependencies (e.g. ImageMagick, libsodium, etc.) from a configuration file local to a project is something I missed the most, especially when I'm working on several projects at once.
Any examples of how Nix itself doesn't do what you need? One example I can think of: Nix doesn't support Windows.
So ignoring your requirements.txt, and potentially working with different versions of dependencies from the ones you were working with and encountering different bugs?
(Also managing your virtual environments "by hand" is tedious and error-prone when you're working on multiple projects).
> pip freeze > requirements.txt for requirements.txt generation.
The problem with this is that it's not reproducible - if two people try to run it they might get different results, and it's not at all obvious who should "win" when the time comes to merge. If you mess up the merge and re-run then maybe you get a different result again, and have to do all your testing etc. over again.
> For libraries just omit this?
Maybe, but then you'll face a lot of bug reports from people who end up running your library against different versions of upstream libraries from the ones that you tested against.
This is why I would not worry about pip freeze being non-reproducible. It is a manual step: upgrade our dependencies. Testing should happen all the time. If you are happy with the result of testing after upgrading dependencies, commit requirements.txt. I don't see new tools easing the burden of co-ordinating and testing dependency upgrades. Did I misunderstand them in this context?
I don't understand the concern for the library case. Pipenv doesn't address libraries. It seems to be an explicit goal of many people not to pin library dependencies. I'm asking what the new tools are solving - and again I can't see that they are solving this. Nothing is preventing you from pinning your library dependencies if you want (using old tools) but you'll probably get people complaining about being incompatible with other projects.
Agreed that developers should be moving the dependencies forward, but you want to do that as a deliberate action rather than by accident. E.g. if you want to consult another developer about a bug you're experiencing, you want them to be on the same versions of dependencies as you.
> This is why I would not worry about pip freeze being non-reproducible. It is a manual step: upgrade our dependencies. Testing should happen all the time.
It's a manual step, but you still want to be able to reproduce it. E.g. if a project is in maintenance mode, you want to be able to do an upgrade of one specific dependency without having to move onto new versions of everything else.
I don't work in Python any more so I don't know what the new tools do or don't do, I was just starting from your "I don't know why I would not have loose dependencies in setup.py and concrete, pinned dependencies in requirements.txt." and I know that workflow gave me a number of problems that I simply don't have when working in other languages. So I'm hoping that Python has caught up with the things that are known-working elsewhere, but maybe not.
The short answer to "why isn't this solved?" is "it's hard, and there's a lot to do". Development practices change over time, and the tooling continues to evolve with them. It's easy to see a broad survey like this and think that there's too much going on, but taken at a high level, the space is definitely trending in the right direction.
(Note: I'm not part of the PyPA, but I'm interested in this area and try to follow along from the outside.)
Part of the issue is due to the success of Python in very different niches. The likes of Rails or Node can concentrate on specific ecosystems, which account for the bulk of their users and have a limited set of scenarios they have to support; whereas Python users come from sysadmin to data-crunching to web to desktop development to games to to to...
So each packaging tool comes with certain ideas, usually a result of the author's experience; maybe they work very well in this or that scenario, but then they break badly on others and sizeable chunks of the community revolt. So a new tool comes around and the cycle starts again, but now people also want compatibility with the old tool.
I suspect part of the solution will require splits between niches. It already happened with Anaconda, which has basically become the standard in a particular group of users (academia / datascience). Since that came around, lamentations around building C libraries have substantially reduced (to be fair, the arrival of precompiled wheels on PyPI also helped). Some similarly-specialized tool might eventually emerge as standard for other niches.
Python developers are cats and they are pretty hard to herd at the best of times, which is unsurprising -- who would stick around a language that is almost 30 years old and was never promoted by any major vendor? Only hard-headed fools like myself.
https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-mi...
> Myth #5: conda doesn't work with virtualenv, so it's useless for my workflow
> Reality: You actually can install (some) conda packages within a virtualenv, but better is to use Conda's own environment manager: it is fully-compatible with pip and has several advantages over virtualenv.
> [...] the result seems to be fairly brittle – for example, trying to conda update python within the virtualenv fails in a very ungraceful and unrecoverable manner, seemingly related to the symlinks that underly virtualenv's architecture.
Doesn't sound like much of a myth then, if Conda's take on virtualenv is "you can technically do this, but everything will break ungracefully and unrecoverably, so please don't".
At the end, one of his conclusions is: "If you want to install Python packages within an Isolated environment, pip+virtualenv and conda+conda-env are mostly interchangeable". So don't change if you don't have to.
But he does give reasons why conda may be superior to virtualenv -- managing different version of Python, tracking non-Python dependencies, true isolation of environments, etc.
Edit: correction, there are two examples that take longer, one at 3.5 minutes, one around 8 minutes. I don’t think it changes any takeaways though.
It’s funny to me that you would expect operations involving “conda install anaconda” to be fast though. 11 minutes seems perfectly fast for that and would be comparable to pip or anything else if doing that huge of a set of package installs.
That is not at all a “trivial environment” like you said previously.