One does not simply 'pip install'

One does not simply 'pip install'(ianwootten.co.uk)

84 points by ianwootten 3 years ago | 115 comments

lazka 3 years ago |

The next Debian/Ubuntu releases will no longer allow `pip install` outside of a venv: https://discuss.python.org/t/pep-668-marking-python-base-env...

You can still force it via `pip install --break-system-packages ...` if needed.

unxdfa 3 years ago | |

This makes me so happy. Back when we had Jenkins slaves, one of our devops guys set a pipeline up that pip installed different versions over the top of system packages causing weird intermittent failures everywhere. Different pipelines would be running in different requirements files. I revoked sudo privs immediately for Jenkins (I didn't add them in the first place) and reprovisioned the whole build cluster resulting in pipelines breaking consistently where they should have been breaking: trying to do stupid stuff.

Personally I only ever use the system python packages on Linux if I can get away with it. Saves a whole world of problems.

KronisLV 3 years ago | | |

> Back when we had Jenkins slaves, one of our devops guys set a pipeline up that pip installed different versions over the top of system packages causing weird intermittent failures everywhere.

Not everyone might like containers, but using them for CI seems like a good way to avoid situations like this, at least when viable (e.g. web development). You get to choose what container images you need for your build, do whatever is necessary inside of them and they're essentially thrown away after once you're done with what you need to do, cache aside. They also don't have any significant impact or dependencies on the system configuration either, as long as you have some sort of a supported container runtime.

hnfong 3 years ago | | |

> I revoked sudo privs immediately for Jenkins (I didn't add them in the first place)

If you allowed sudo in your jenkins jobs you're morally barred from blaming python for screwing up the system.

wheelerof4te 3 years ago | |

The way it's meant to be.

On Linux, you either use the system packages via "apt install", or you use venvs.

EDIT: For context, I've meant "managed" distros like Debian and Ubuntu.

qbasic_forever 3 years ago | | |

Nowhere in the official Python documentation (where 99% of new python users are going to go) does it warn or even talk about Linux and Debian specific issues like only using apt packaged versions of dependencies. It wasn't even until recent years that pip gave a hint or warning something might break in those setups. The situation with Python on Debian has been pretty bad IMHO with a cloistered group of people saying the status quo is just fine because it works for them exclusively.

Groxx 3 years ago | |

This is fantastic to hear. Hopefully this will be the beginning of a wave of other OSes doing the same.

For anyone on other systems who wants this kind of protection right now, pip has had this available for a few years at least:

    pip config set global.require-virtualenv True

I absolutely recommend doing it. Immediately.

ziml77 3 years ago | |

Mixing pip with another package manager has always seemed weird to me. You're just asking for things to conflict and break.

I noticed with Homebrew that there was no way to untangle packages installed through pip and ones installed through Homebrew. After dealing with that mess once, I now make sure to use pip install --user. It can still cause things to break, but if that does happen it's at least easy to nuke the packages installed to my home directory.

TheRealPomax 3 years ago | |

Good. Now we just need to get pip itself updated so it refuses to run outside of a venv, and refuses to run unless invoked with "python -m pip" and we'll finally have something at least half decent.

And don't even get me started about how much better npm is at publishing packages, versus pip's refusal to add the same user friendliness.

ianwootten 3 years ago | |

PEP 704 is a recent proposal to require a virtual environment by default for any package installer - https://peps.python.org/pep-0704/. Again, you can opt-out if you want.

meitham 3 years ago | |

Hopefully that’s not going to be the case inside a container!

forgotpwd16 3 years ago | | |

Why? Can overwrite it and even if couldn't making a new venv is just a `python -m venv venv` away.

virtualritz 3 years ago |

> There’s no shortage of package management alternatives available for Python [...]

> How someone is meant to pick between these as a new developer is a mystery.

This.

Every time I get booked to look at some Python project hours are usually wasted initially figuring out what dependency mgmt solution was used how. And with what 'special sauce' the resp. developers deemed to be 'the right way' (or some library required because ... it just does)

As the author wrote: it seems common to omit the dependency setup in the Readme for Python projects.

I can understand why one would not mention this 'step' in a Rust or Node project but for Python it seems very much necessary.

davidmurdoch 3 years ago |

I'm convinced that there are very few python libraries that Just Work if you follow their installation instructions. I've never found one that didn't come with issues myself.

Complain about this to a Python dev and you'll be "Well actually"ied to oblivion and each and every one will have their own opinion-as-fact on the best practice for managing these -- totally unaware how antithetical Python development has become from The Zen of Python.

__MatrixMan__ 3 years ago | |

Yeah, the well-actually's are a problem. It's not all of us though.

Python dev's know we have a problem, it's just hard to fix because "people developing apps and worrying about dependencies" is a rather small part of the python community. It's not like Java or something where everybody writing the language is a developer. Most are scientists or business people or students working in places like anaconda or Jupyter. So it's really hard to get momentum behind an all-together-now solution.

I've slowly been gravitating toward Nix flakes so I can use it to pin to a project versions of all of the things you can't reliably install with pip alone (like python itself, or numpy, or postgres or whatever) and then have it read deps from poetry (via poetry2nix) for everything that "just works," but that's never gonna fly with the non-developer Python community. Hell, it probably won't even fly with half of the developers either, but it works well for me.

I think my situation is typical of python developers, which is why we have this problem. I think it'll stick around for a while because it's not like "just use a different language" is gonna fly with the non-dev crowd. They're going to expect somebody else to solve these problems for them.

(I may have a bias because my company offers OSS python apps in a SaaS form factor, so our support folk are the ones solving these problems--typically by either handling the virtualenv behind the scenes or by ensuring that users with conflicting dependencies are using different images).

aflukasz 3 years ago |

Slightly tangential, but... one also does not simply `pip download` if one does not want execute code - https://github.com/pypa/pip/issues/1884.

I wanted to run guarddog on source packages. Only then build them locally and install. Turns out, `pip download` triggers code execution in fetched packages.

Somewhat surprising and in this day and age worth spreading awareness of.

throwaway892238 3 years ago |

A Makefile makes this trivial:

  # Makefile
  all: venv frozen test
  
  venv:
   python3 -m venv install venv
  
  frozen:
   [ -e frozen.txt ] || { echo "ERROR: run 'make update-frozen'"; exit 1 ; }
   ./venv/bin/pip install -r frozen.txt
  
  update-frozen: clean install-requirements freeze
  
  freeze:
   ./venv/bin/pip freeze > frozen.txt
  
  install-requirements:
   [ -e requirements.txt ] || { echo "ERROR: make a requirements.txt file"; exit 1 ; }
   ./venv/bin/pip install -r requirements.txt
  
  test:
   ./venv/bin/python3 run_tests.py
  
  clean:
   rm -rf venv

Put your package names in requirements.txt and run `make update-frozen`. To reinstall everything from frozen state, `make clean frozen`. (And replace the first space with a tab; HN is stripping my tabs out)

I know Pythonistas like to use Python for everything, but there are other tools out there that will make your life much simpler.

jasonpeacock 3 years ago |

The article talks about installing Python packages for development, but if you find yourself using `pip` to install Python tools/scripts then you should use `pipx` - it will properly sandbox those tools so they don't break (or be broken by) the system or other Pythons:

https://pypa.github.io/pipx/

KptMarchewa 3 years ago | |

The problem is that everyone has different problems with python packaging and everyone has different idea what you "should" do.

qbasic_forever 3 years ago | | |

The tool specific usecase with pipx is unique though, it's laser focused and perfect at the job of getting a python tool to users regardless of whatever wacky state their Python install is in. It's kind of separate from the issues of managing dependencies. It's a fantastic tool I wish more python documentation and users would embrace.

SAI_Peregrinus 3 years ago | | |

The core problem (as I see it) is that Linux distros tend not to have any firm distinction between "system" packages, "user" packages, and "development" packages (which are a subset of user packages). The system package manager installs everything globally, while also being considered the only approved/safe way to install packages.

Languages tend to try to get around this by providing their own package registries and build systems to use them (npm, pip, cargo, etc), and developer tools often include some sort of sandboxing to avoid interference from the system packages (venv, bazel, cargo, nix develop, etc).

For user packages a tool like Snap, home-manger, Flatpak, or AppImage seems necessary.

Python makes the problems very obvious, especially since it has so many package management systems, gets used for system packages, and gets used for user applications.

Groxx 3 years ago | | |

Not really.

Pipx uses the ecosystem standard of "make a venv" and it just exposes the binary entrypoint of what you installed.

It is exactly what everything says you should do, because everyone agrees. It just does it for you.

AndyKluger 3 years ago | |

If anyone's interested in a pipx clone with excellent tab completion, I would appreciate any feedback on pipz, a function of my zsh plugin for python environment and dependency management: zpy

https://github.com/andydecleyre/zpy

mark_l_watson 3 years ago |

While I agree with the author to not do global pip installs for every new project, I also don’t want to see text in every git repo README explaining Python package managers.

throwboatyface 3 years ago | |

The lack of one true package management approach is a failure of the language. OP is advocating for a saner default like npm, instead of the current venv + pip mess.

galleywest200 3 years ago | | |

I like venv/pip. I can blow out the directory when I am done with it. I do not need to remember what is installed were.

Compare this to my GOPATH/GOROOT which is insanely full of mods...gigabytes...

kcartlidge 3 years ago | | |

> instead of the current venv + pip mess

It isn't a mess: venv + pip is simple and (usually) sufficient.

Legacy/existing code or genuine justifications excepted, of course, there is no need to use anything else - even if an alternative is better, the use of alternatives is usually worse. Short of any massive technical reason, the best option is almost always to use the default option.

georgyo 3 years ago | | |

Npm, yarn, yarn2, pnp, pnpm, and more.

The only thing they have in common is package.json, but even then they can interpret things differently, such as workspaces.

And then node_modules, which packages should not rely on but do, forcing many other tools into compatibility mode which often takes an install take a very long time.

Yes, the node ecosystem is very healthy.

wheelerof4te 3 years ago | | |

How is npm any saner?

Last week I've had one colleague complain about his brokem npm install. He had to manually install each module and it's exact version.

A month before that, we had one broken old nodejs project which couldn't update itself cleanly.

AtlasBarfed 3 years ago | |

As a non-python person who has hair-pulling with python pip / pip3 / python2 / python3 python-is-python2-or-python3, this was a relevation.

pipenv looks like what pip should have been.

Another story on HN is "what happened to Ruby" and that really crystallized what I don't like about python. I'm not a ruby programmer, but I have to admit how much fantastic software came out of Ruby.

Ruby was always fighting Java for some reason, it should have been fighting Python. If only Ruby had won THAT war.

robertlagrant 3 years ago | |

This is where npm gets it right. It's so much simpler to have the default install in a local folder, and then have an option to install globally if you like.

brenns10 3 years ago |

I appreciate the concern for new developers, but I really don't think it's a good solution to have every project readme describe pip, poetry, pipenv, and whatever other new hotness there is in the package management world. There's a reason that all the readmes describe pip installation: it's the lowest common denominator, present with every standard python install, and along with virtualenv (also standard) it can do most of the requirements for package management.

I think to help new developers, we could encourage documentation to briefly point to the official PyPA documents on the variety of options available. It would be better to focus on making that more accessible, rather trying to throw the burden onto package maintainers to describe using their package with every new tool.

https://packaging.python.org/en/latest/key_projects/

alanng 3 years ago |

Omg this is so true! I installed a package globally, but then my interpreter was using another version of python, which doesn't have the installed package. It took me an hour to find out about this. What a waste of time.

savingsPossible 3 years ago | |

python -m pip install

then the 'pip' is running the same version as the 'python' command (I believe, can you check and comment latter?)

(you'd still have to check your IDE if you are not running python from the CLI)

savrajsingh 3 years ago |

Pyenv virtualenv is my personal fav for this issue

sammy2255 3 years ago | |

I personally hate the venv shell, its oddly confusing and I don’t understand how it works

lrobinovitch 3 years ago | |

+1. The lack of mention of pyenv and pyenv-virtualenv is surprisingly common but has always been my best experience.

jackhoy 3 years ago | |

Yes, I don't write a lot of Python but found that was the easiest to get minimal repeatable builds and isolate them per project.

Link for anyone not familiar with pyenv/virtual env usage: https://www.jackhoy.com/web-applications/2017/02/12/setting-...

tyingq 3 years ago |

It's also interesting how things like AWS Lambdas, Graviton, etc, are exposing all the shortcomings of the various pip install, venv, poetry, etc, approaches.

It's not impossible to figure it out, but you end up spending a lot of time to come up with something that works locally, within containers, inside a CI/CD system, and then deployed out across things like Lambdas, or non x64 machines.

Then, after it's all working, upgrading the Python version, or an extension that has C code, etc, repeats some of the hard bits.

scarface74 3 years ago | |

At least with Lambda it really is easy, just use Serverless Application Model and when you do “sam build” choose “--use-container”. It will look at your CloudFormation template where you are referring to the local directory containing your source code and requirements.txt and download and build in the appropriate Docker container for your language, version and architecture.

It works great when you have native dependencies.

tyingq 3 years ago | | |

I assume that means container based Lambdas, which would have slower cold start times and maybe some other disadvantages, but yes, it would be simpler.

rgavuliak 3 years ago | |

Lambdas now work with Docker

mharig 3 years ago |

When I switched to Arch Linux, I learned that pip has a --user option to install Python packages in the home dir of the current user. This is essential to not interfere with the system install from the system package manager. I had really trouble with that in former times.

Furthermore, as I now be used to bleeding edge packages, I update at least once a week all the outdated Python packages of my >450 installed ones. When some packages get downgraded because of requirements, I ask: Do I need the package that caused the downgrade more often or with more of the packages in the main environment, or is this true for one or some of the downgraded packages?

According to the answer, I put the 'problematic' package(s) in a new or existing venv, and update the downgraded ones in the main environment, if necessary.

This work cannot be done by a package manager!

Costs me <10 minutes every week to keep the main environment up to date, a bit more if I want that for some or all venvs.

kgwgk 3 years ago |

> You might expect if I were to pip uninstall requests that I get back to a clean system, right?

Why would i expect that? If one day I install A and another day I install B, which depends on A, I wouldn’t expect to lose A of I were to uninstall B.

ziml77 3 years ago | |

If I didn't have A installed and then I install B which transitively installs A, then I expect that uninstalling B will also uninstall A. If only one system is managing the packages, then it is able to do this. It will have a record of the things I've explicitly installed so it knows what dependencies are safe to uninstall.

kgwgk 3 years ago | | |

I prefer a package manager that tells me that there are things that may be safe to uninstall to one that decides to uninstall things on its own.

Maybe I installed B who installed A. Maybe sometime later I needed A and I didn’t do anything because it was already there. Seeing A disappear when I uninstall B may be unexpected.

ZGDUwpqEWpUZ 3 years ago | |

apt handles this by marking packages as manually installed. You and the author could both be happy with that solution but afaik pip doesn't currently store such information.

404mm 3 years ago |

This post points out one of my struggles with python.

I am not a python developer but I use python heavily for some tooling. So all I need to do is to “distribute” my tools to other servers in a replicable and consistent matter, isolated from global packages.

Can you please help me understand two points?

1. If I use venv+pip to install some python app, do I have to “activate” that specific virtual environment before executing that tool or can I just simply call it by its path on the file system?

2. Are there any official guide rails for making venv-wrapped app accessible to other users on a server? Or just as simple as placing links to /usr/local/bin/ for example?

Groxx 3 years ago | |

1: usually you can just run the binary by its path. tbh I don't fully understand why it doesn't always work, but it's fairly rare, and most of the ones I can kinda-remember may have been during install time.

2: due to 1, symlinks often work. It's how I've installed all of my custom python binaries. Otherwise you'll very frequently see python "binaries" installed by e.g. Homebrew that are actually ~5 lines of minor environment prep and then running the actual binary - that's the only reliable way afaik.

Bonus answer to 2: pipx looks pretty decent.

404mm 3 years ago | | |

Thanks! I’ll check out pipx!

okasaki 3 years ago | |

1. You can call it directly by referencing the venv python exe eg /pqth/to/your/venv/bin/python /path/to/your/script.py

404mm 3 years ago | | |

Nice, I think this is what I was looking for!

aflukasz 3 years ago |

Also, it's 2023 - please do not suggest approaches that do not use package hash based pinning. For example, use pip-tools.

Hopefully, in 2024, we will be able to say same thing about signing via sigstore ecosystem.

Karellen 3 years ago |

> Lets say you use the same package again, but theres been a new release with some additional features. When you upgrade your global Python to use it, you now need to ensure every project you’ve done now works with it. That’s annoying and unlikely to happen, what you’ll be left with is a broken build.

Wait, what?

Don't python packages generally use `semver` versioning, and ensure that upgrades in the same major version are backwards-compatible?

And that different major versions are co-installable?

alexchantavy 3 years ago |

I saw this Twitter thread the other day (https://twitter.com/fchollet/status/1617704787235176449?s=46...) about similar problems, and some comments suggest using Docker. I couldn’t find any guides or ways to do this for a Python project; anyone here know more or has done this before?

rgavuliak 3 years ago | |

I was using a docker interpreter with Pycharm. It's fairly simple.

synergy20 3 years ago |

poetry breaks once a while for me, so I am not using it these days.

pipenv used to be my first choice but it became inactive, seems it is actively under development again?

a few weeks ago there is a recommendation for PDM but I have not really used it.

For now I am using the pip+venv approach.

By the way, you better do: `python -m pip install` instead of `pip install`, don't remember why anymore but I did read somewhere that explained the difference and I agreed on then to prefer 'python -m pip install'

savingsPossible 3 years ago | |

I think there is an issue with different versions of python

If there are 2 installed, then "python" can refer to (say) python 3.10 and pip to python 3.9

using python -m makes you pip with 3.10

atemerev 3 years ago |

I use Python for research. If I need some package, I simply want the latest version; pip install is usually fine.

If something depends explicitly on the fixed (old) version, that's when problems happen and I grudgingly remember how to use pyenv. But I like to use the most recent versions and most recent Python, and I like packages that share this bleeding edge approach.

dissent 3 years ago |

Article conflates global installation into the system python with global installation in general. Not everything is a project dependency. If you want, say, ipython, available everywhere, global installation is appropriate. You can get this without clobbering my system python by simply not using the system python for my projects.

yboris 3 years ago |

I've been a happy user of pipenv for several years (at work, in production) and still recommend it. You lock the versions you want independently of the requirements.txt so you can update just the packages you want without worrying about sub-dependencies. 10/10 recommend.

switch007 3 years ago |

Not to mention dependencies that compile C modules so you also need a compiler, headers etc

disgruntledphd2 3 years ago | |

You probably want conda if you're in this situation, as it basically solves for these issues (but doesn't have great docs for actually adding packages to it, unfortunately).

hungryforcodes 3 years ago |

npm certainly has a number of problems (at the end the article compares pip to npm) -- but after reading this article I didn't realize pip was so problematic. I also didn't realize it installed things globally.

So the solution is?

kcartlidge 3 years ago | |

> I also didn't realize it installed things globally

It doesn't. It's a subtle distinction but the 'blame' doesn't lie with pip. When you do a pip install it does it in the context of the python interpreter you're using.

If you use your global python you get an installation in a global context from pip. If you use a non-global python you get a non-global installation from pip. And this is what venv etc give you; a local interpreter, which means the associated pip installs in a local context (a separate one for each venv).

wildrhythms 3 years ago | |

I don't understand why pip doesn't do it like npm. Admittedly, I don't write Python code much, but "npm install xyz@1.2.3" simply installs to a node_modules/ folder in the current directory. Very easy to parse and nuke if I need to. I don't really understand how venv and its weird shell prompt are a better solution.

hungryforcodes 3 years ago | | |

Yeah. I often nuke the node_modules directory and start again.

zokier 3 years ago | |

> So the solution is?

Always use venv.

bentaber 3 years ago |

Is there a canonical example of how python projects should manage dependencies and sandboxing such that other developers can just clone, install, and get to work?

qbasic_forever 3 years ago | |

Put everything in a docker container/OCI image and have someone own managing and babysitting the build of that image for everyone else.

There really is no single tool or workflow for everything in the python world. What works for a simple source only python package can break horribly if you try using sophisticated scientific computing packages with numerous native dependencies (and then you realize you need conda or a whole other set of tools).

bayesian_horse 3 years ago |

Personally I usually found pip + venv (or conda) less painful than NPM, Nuget or whatever voodoo and goat sacrifices you have to do for C++.

bjd2385 3 years ago |

Poetry is pretty much the way to go.

realitysballs 3 years ago |

Well written article

sam_goody 3 years ago |

Obligatory XKCD: https://xkcd.com/1987/

nntwozz 3 years ago |

Just use Docker.

kkthxbb 3 years ago |

I'm not sure if I get the point of this article. So basically the author has learnt that there are a different ways of managing packages in Python? I'm aware that this might be a problem in Python, but let's be serious guys, you only need to spend 5 mins to learn about venv/conda and you will never face any problem in a basic Python project. You don't have to write an article about that.

vlunkr 3 years ago | |

As an outsider, the ecosystem looks like a mess. The point of the article is to illustrate some of those issues.

Groxx 3 years ago | | |

As an insider (to some degree), the ecosystem looks like a mess because it is a mess.

It's entirely reasonable for people to have problems when using it.