How Python virtual environments work(snarky.ca) |
How Python virtual environments work(snarky.ca) |
Most of the complaints here ironically are from people using a bunch of tooling in lieu of, or as a replacement for vanilla python venvs and then hitting issues associated with those tools.
We've been using vanilla python venvs across our company for many years now, and in all our CI/CD pipelines and have had zero issues on the venv side of things. And this is while using libraries like numpy, scipy, torch/torchvision, etc.
I don’t drive python daily, but my other projects thank Python for that.
I mean, I think you genuinely believe that what you suggest is simple... so, I won't pretend to not understand how you might think that. I'll explain:
There's simplicity in performing and simplicity of understanding the process. It's simple to make more humans, it's very hard to understand how humans work. When you think about using pip with requirements.txt you are doing the simple to perform part, but you have no idea what stands behind that.
Unfortunately for you, what stands behind that is ugly and not at all simple. Well, you may say that sometimes it's necessary... but, in this case it's not. It's a product of multiple subsequent failures of people working on this system. Series of mistakes, misunderstandings, bad designs which set in motions processes that in retrospect became impossible to revert.
There aren't good ways to use Python, but even with what we have today, pip + requirements.txt is not anywhere near the best you can do, if you want simplicity. Do you want to know what's actually simple? Here:
Store links to Wheels of your dependencies in a file. You can even call it requirements.txt if you so want. Use curl or equivalent to download those wheels and extract them into what Python calls "platlib" (finding it is left as an exercise for the reader) removing everything in scripts and data catalogues. If you feel adventurous, you can put scripts into the same directory where Python binary is installed, but I wouldn't do that if I were you.
Years of being in infra roles taught me that this is the most reliable way to have nightly builds running quietly and avoiding various "infra failures" due to how poorly Python infra tools behave.
("why not make everyone install their own venv and run pip install?" because, and here's the part that's going to blow your mind: because they shouldn't have to. The vast majority of packages don't need compilation, you just put the files in the right libs dir, and done. Your import works. Checking this kind of thing into version control, or moving it across disks, etc. etc. should be fine and expected. Python yelling about dependencies that do need to be (re)compile for your os/python combination should be the exception, not the baseline)
Or just, y'know, rename the containing folder. Because last night I liked the name `foo` but this morning I realized I preferred `bar`, and I completely forgot that I had some python stuff inside and now it doesn't work and I have to recreate the whole venv!
The article says it is explicitly not designed for that: "One point I would like to make is how virtual environments are designed to be disposable and not relocatable."
That's because the vanilla python venvs feel like a genius idea but not thought out thoroughly, they feel as if there's something missing..., So there's naturally lots of attempts at improvements and people jump at those...
And when you think about it in bigger depth, venvs are themselves just another one of the solutions used to fix the horrible mess that is python's management of packages and sys.path...
The "Zen of Python" says "There should be one-- and preferably only one --obvious way to do it.", so I can't understand why it's nowhere near as easy when it comes to Python's package management...
I actually find it amazing that they python community puts up with that. But I suppose fixing it is not that pressing now the language is widely adopted. It's not going to be anyone's priority to mess with that. It's high risk low rewards sort of project.
Areas where I have felt a lot of pain is with legacy Ruby projects/bundler. Don't get me started on golang.
Can pip be made better? Sure. Should we have an attitude of disgust towards it? Heck no!
I agree the packaging and distribution setup in python is an absolute mess, but that's entirely unrelated to venvs. It's like bringing up how python uses whitespace instead of curly-braces.
There's also NodeJS's ability for dependencies to simultaneously use conflicting sub-dependencies.
The problem is fundamental in Python in that its runtime doesn't have a concept of a program or a library or a module (not to be confused with Python's modules, which is a special built-in type) etc. The only thing that exists in Python is a "Python system", i.e. an installation of Python with some packages.
Python systems aren't built to be shared between programs (especially so because it's undefined what a program is in Python), but, by any plausible definition of a program, venv doesn't help to solve the problem. This is also amplified by a bunch of tools that simply ignore venvs existence.
Here are some obvious problems venv doesn't even pretend to solve:
* A Python native module linking with shared objects outside of Python's lib subtree. Most comically, you can accidentally link a python module in one installation of Python with Python from a "wrong" location (and also a wrong version). And then wonder how it works on your computer in your virtual environment, but not on the server.
* venvs provides no compile-time isolation. If you are building native Python modules, you are going to use system-wide installed headers, and pray that your system headers are compatible with the version of Python that's going to load your native modules.
* venv doesn't address PYTHONPATH or any "tricks" various popular libraries (s.a. pytest and setuptools) like to play with the path where Python searches for loadable code. So much so that people using these tools often use them contrary to how they should be used (probably in most cases that's what happens). Ironically, often even the authors of the tools don't understand the adverse effects of how the majority is using their tools in combination with venv.
* It's become a fashion to use venv when distributing Python programs (eg. there are tools that help you build DEB or RPM packages that rely on venv) and of course, a lot of bad things happen because of that. But, really, like I said before: it's not because of venv, it's because venv is the wrong fix for the actual problem. The problem nobody in Python community is bold enough to address.
What Python needs is a tool that understands your project structure and dependencies so the rest of your tools don't have to.
In other languages, that's called a build tool, which is why people have a hard time understanding that Python needs one.
Having taken a deep-dive into refactoring a large python app, I can confidently say that package management in python is a pain compared to other interpreted languages.
Trying top keep a Pygame/Numpy/Scipy project working has been a real struggle. I started it with Python 2 and ported to Python 3 some years ago. The whole Python 3 transition is a huge mess with every Python 3 point release breaking some things. No other interpreted language’s packaging system is so fucked up.
On a positive note: Lately I've liked using pdm instead of pip, and things seem to work quite a lot better. I evaluated Poetry, Flit and something else also.
I just commented about this on Twitter, when someone asked “Which programming language do you consider beginner's friendly?” https://twitter.com/peterhil/status/1633793218411126789
Guess in taking this stance we're both part of the problem... \s
If you're using docker then it's a lot easier I guess.
On top of that there are some large libraries that need to only be installed once per system because they're large, which you can do but does mess with dependency resolution, and god help you if you have multiple shadowing versions of the same library installed.
I wish it was simpler. I agree the underlying system is solid, but the fact that it doesn't solve some issues means we have multiple standards layered on top, which is itself a problem.
And great if you've been using vanilla venvs. Good for those that can. If I want hardware support for Apple's hardware I need to use fucking conda. Heaven help me if I want to combine that in a project with something that only uses pip.
The only gotcha I've had is to make sure you deactivate and reactivate the virtual environment after installing Jupyter (or iPython). Otherwise (depending on your shell) you might have the executable path to "jupyter" cached by your shell so "jupyter notebook" will be using different dependencies to what you think.
Even comparatively experienced programmers don't see to know this and it causes a lot of subtle programs.
Here's some details on how bash caches paths: https://unix.stackexchange.com/questions/5609/how-do-i-clear...
It very often not as simple as going to your target system, cloning the repo and running a single line command that gives you the running software. This is what e.g. Rust's cargo would do.
The problem with python venvs is that when problems occur, they require a lot of deep knowledge very fast and that deep knowledge will not be available to the typical beginner. Even I as a decade long python dev will occasionally encounter stuff that takes longer to resolve than needed.
I would recommend using virualenvs.in-project setting so Poetry generates venv in the project folder and not in some temporary user folder.
My experience with poetry has been great. I only disliked that they had auto-update on when locking files, but they changed the default.
(Seriously, I’ve gotten so fed up with Python package management that I just use CondaPkg.jl, which uses Julia’s package manager to take care of Python packages. It is just so much cleaner and easier to use than anything in Python.)
- use miniconda ONLY to create a folder structure to store packages and to specify a version of python (3.10 for example)
- use jazzband/pip-tools' "pip-compile" to create a frozen/pinned manifest for all my dependencies
- use pip install to actually install libraries (keeping things stock standard here)
- wrap all the above in a Makefile so I am spared remembering all the esoteric commands I need to pull this all together
in practice, this means once I have a project together I am:
- activating a conda environment
- occasionally using 'make update' from to invoke pip-compile (adding new libraries or upgrading), and
- otherwise using 'make install' to install a known working dependency list.
Thanks!
Python: we’re going to force all packages from all projects and repos to be installed in a shared global environment, but since nobody actually wants that we will allow you to circumvent that by creating “virtual” environments you can maintain and have to deal with instead. Also remember to activate it before starting your editor or else lulz. And don’t use the same editor instance for multiple projects. Are you crazy???
Also: Python “just works”, unlike all those other silly languages.
Somebody IMO needs to get off their high horse. I can’t believe Python users are defending this nonsense for real. This must be a severe case of Stockholm-syndrome.
Discussion from 2021: https://news.ycombinator.com/item?id=25611307
All these different languages have their own approach and each then also user/global/multiple versions...it's just not worth figuring out
LXC creates environments, while Docker creates apps, is another way to say it.
Depends on task though I've got dockers and VMs in use too
python3.10 -m venv ./venv # or your favorite version
. ./venv/bin/activate
pip install pip-tools
Manage dependencies using pip-compile from pip-tools. Store direct dependencies in "requirements.in", and "freeze" all dependencies in "requirements.txt" for deployment: . ./venv/bin/activate
pip-compile -U -o ./requirements.txt ./requirements.in
pip install -r ./requirements.txtIs the author saying that relocating them will actually break things, or that it's just as easy to recreate them in a different location? Because I've moved my venv directories and everything still seemed to work OK. Did I just get lucky?
The real way to move venvs is to freeze the venv (i.e. make a requirements.txt) and then pip -r requirements.txt to recreate the venv.
This process is really the only thing about venvs that ever causes me trouble.
I've given up.
EDIT: also just finding myself reaching for go in most cases
FreeBSD ports are significantly closer to "what the repo has, localized" where it feels like linux apt/yum/flat is "what we think is the most convenient thing to bodge up from the base repo, but with our special sauce because <reasons>"
It seems that a virtual environment created by Poetry looks very similar, except that it doesn't contain an `include` directory. It contains:
* `bin` directory
* `lib/<python-version>/site-packages/` directory
* `pyvenv.cfg`
"Package management in python is so easy, just use [insert tool or workflow that's different to literally every other comment in the thread]."
I prefer to use a combination of pip-tools and pyenv for my projects
> So while you could install everything into the same directory as your own code (which you did, and thus didn't use src directory layouts for simplicity), there wasn't a way to install different wheels for each Python interpreter you had on your machine so you could have multiple environments per project (I'm glossing over the fact that back in my the day you also didn't have wheels or editable installs).
This is a single run-on sentence. Someone reading this, probably doesn't know what "wheels" means. If you are going to discount it anyway, why bring it up?
> Enter virtual environments. Suddenly you had a way to install projects as a group that was tied to a specific Python interpreter
I thought we were talking about dependencies? So is it just the interpreter or both or is there a typo?
> conda environments
I have no idea what those are. Do I care? Since the author is making a subtle distinction, reading about them might get me confused, so I've encountered another thing to skip over.
> As a running example, I'm going to assume you ran the command py -m venv --without-pip .venv in some directory on a Unix-based OS (you can substitute py with whatever Python interpreter you want
Wat? I don't know what venvs are. Can you maybe expand without throwing multi-arg commands at me? Maybe add this as a reference note, rather than inlining it into the information. Another thing to skip over.
> For simplicity I'm going to focus on the Unix case and not cover Windows in depth.
Don't cover Windows at all. Make a promise to maintain a separate doc in the future and get this one right first.
> (i.e. within .venv):
This is where you start. A virtual environment is a directory, with a purpose, which is baked into the ecosystem. Layout the purpose. Map the structure to those purposes. Dive into exceptional cases. Talk about how to create it and use it in a project. Talk about integrations and how these help speed up development.
I also skipped the plug for the mircoenv project, at the end with a reference to VSCode.
Charitably, I will assume you are a non python user, and that's why this is a miss for you.
Anyways. Today I have to help scientists to deal with it. And... I didn't think it was possible to be worse than pip or other Python tools, but they convinced me it is. Conda is the worst Python program of note that I had ever dealt with. It's so spectacularly bad it's breathtaking. You can almost literally take a random piece of its source code and post it to user boards that make fun of bad code. When I have a bad day, I just open its code in a random file, and like that doctor who was happy running around the imaging room exclaiming "I'm so, so, so happy! I'm so unimaginably happy that I'm not this guy! (pointing at an X-ray in his hand)" I'm happy I'm not the author of this program. I would've just died of shame and depression if I was.
Simple in the sense that it's actually simple, the software you need can be installed with pip install with precompiled binaries for your platform when necessary it supports Python 3.something+, and all it's dependencies are either >= version or the version is >= x.y <= x+1.0
Then there's simple as in the software is actually incredibly useful but is an absolute nightmare of complicated dependency trees where only specific pinned minor versions work together, you need multiple incompatible compiler toolchains and distro packages, it only works if you have CUDA, precompiled binaries exist for some but not all and if you use the precompiled binaries then it changes the dependency story, if you want jupyter support that's a whole different thing AHHHHHHHHHHHHH
In that case some people with more time than sanity said fuck it we'll make it work and conda was born. For me it's a lifesaver when you want to use a piece of software but I wouldn't ever dare deploy production software with it without it being strongly isolated from everything else.
conda exists because it deals with an entirely different package registry and is a whole distro on its own (I dont know why people need that either, my vague impression is that scence-y types want complete pushbutton installation, typing a command == fail, I guess, I dont know).
poetry exists because it does some kind of automated version management thing (as did pipenv), that I'm sure is nice but not something I've ever needed, but the buzz got out and it took over, people who have never typed "virtualenv" use poetry and they have no idea why.
Also on M1/M2 Macs some libraries (especially for ML) are only available through conda-forge.
Ironically, given the usual software dev experience on Windows vs. Unixy systems, this is not a problem with the standard install on Windows with the py launcher.
plain venv never advertised itself as a solution to this problem... I don't like this tool, but, sorry to say so, you are barking on the wrong tree.
Also, it's not hard to maintain different versions of Python on the same machine without conda. I can literally do this with my eyes closed w/o touching the mouse: it boils down to:
cd ~/src/cpython
git checkout 3.8
git fetch --all
git reset --hard origin/3.8
git clean -Xdf
./configure --with-ensurepip=install --with-system-ffi=yes
make
sudo make altinstall
Sorry. I've done this from memory and I kept one eye open. So, I kinda lied. But not really.As long as you always remember to run exactly the right two commands every time you work on any project and never run the wrong ones, or run project A's commands in project B's terminal. There's no undo BTW, if you ever do that you've permanently screwed up your setup for project B and there's no way to get back to what you had before (your best bet is destroying and recreating project B's venv, but that will probably leave you with different versions of dependencies installed from what you had before).
(And as others have said, that doesn't cover multiple versions of Python, or native dependencies. But even if your project is pure python with only pure python dependencies that work on all versions, venv+pip is very easy to mess up and impossible to fix when you do)
If you need performance, just use native code.
I have these aliases in my .bashrc, and I can't remember the last time I had a major issue.
alias venv='rm -rf ./venv && virtualenv venv && source ./venv/bin/activate'
alias vact='source ./venv/bin/activate'
alias pinstall='source ./venv/bin/activate && pip install . && pip install -r ./requirements.txt && pip install ./test_requirements.txt'
I don't have all the fancy features, like automatically activating the virtualenv when I cd into the directory, but I've always found those to be a bigger headache than they are worth. And if I ever run into some incompatibility or duplicate library or something, I blow away the old venv and start fresh. It's a good excuse to get up and make a cup of tea.
To this day I'm not quite sure why the venv developers decided that sourcing was a good idea; all it does can be effectively replaced with
#!/bin/sh
export VIRTUAL_ENV="path to venv"
export PATH="$VIRTUAL_ENV/bin:$PATH"
unset PYTHONHOME
exec "$SHELL"
Just run this script to get into an "activated" shell. To deactivate, just press Ctrl+D. If you're really fancy, you can replace the last line with exec "${@:-$SHELL}"
to run a command directly in the activated environment (and then deactivate it immediately).That would be python2, in 3 it's "python -m venv venv" (first venv is package to run, second is directory to put it in)
Otherwise yeah, it's the same and I also use it manually. Never had any problems.
[1] https://github.com/python-virtualenvwrapper/virtualenvwrappe...
It was a nightmare.
Never had a single problem, venv + pyenv is a great combo. As far as I can tell, like so many sources of frustration in tech, the issue typically lies with user error/not fully understanding the tool you're using. That isn't saying that there isn't room for improvement -- most notably, package management in Python flies in the face of "there should be one -- and preferably only one -- obvious way to do it" -- but the tools we have work quite well.
[1] https://github.com/pyenv/pyenv [2] https://github.com/pyenv/pyenv-virtualenv
Using virtualenv directly has also been my approach, and has not failed me yet.
I also used Poetry for one of my personal projects, and I liked what I saw.
I should learn to use venv properly
Thanks
I'm now curious whether there are languages out there that do have a really nice packaging system.
https://hpc.guix.info/blog/2021/09/whats-in-a-package/ does a good job of explaining why installing packages like that is a complete shitshow.
i dont need them demystified, i need someone smarter than me to just tell me what to do lol
It doesn't really matter, by the time you sit down and use it you'll find whatever that is, has also been deprecated and replaced by 2 more.
Dockerfile ;)
Python is the first language that I've used, where the user community is a major attraction, resulting in significant inertia. Replacing Python requires a new language and a new community. Also, the tools that helped build that community, such as Google and Stackoverflow, have (by some accounts) deteriorated.
If package management is that bad, then yeah, time to switch languages.
No high horse here. Like Sancho Panza, I have to be content with an ass. ;-)
Python is 30 years old and back then system packages were more common, that's it. It's been a bumpy ride and I'd prefer if there was a standard manager, but the community has produced some decent ones.
The python community has always tried to make devs happy. Pip has an amazing number of libraries. Virtual envs were included into the standard tools. Pipenv was integrated in the python github organization. The org doesn't hate virtual envs!
I don't do anything you mention, so there must be a simpler way.
* External state is changing such as project contributions
So its not a one-off unless the project and dev environment is static. The real problem is different tooling doing different amounts of hand holding and automation. Your editor may configure some things automatically, brew may configure some things automatically, and so a set of instructions for setup or problem fixing could be voided without the end user knowing. So now you're off on a adventure of unknown duration wading through internet forums trying to find the resolution that works for you.
Ironically using Docker to isolate the environmental changes is a approach some people use to avoid some of this esoteric crap.
My typical count is 2-3 per project per machine I work on (could be anywhere from one to one hundred). Then there's a different number of people who need to set up these environments on different machines too (and sometimes require my support).
So, the answer is: who knows?
> putting everything in a docker container
I think, you meant image, not container, but that's an easy mistake to make. And the answer is: both options are terrible. It's a choose your poison kind of thing.
> cool kids
My impression from working with the cool kids is that our industry selects for particular taste in t-shirts rather than knowledge or experience. I'm afraid that the correlation might swing into the negative, if we take either knowledge or experience vs coolness.
---
Most importantly: venv is not the problem. It's a bad fix for a problem. Bad as in it doesn't fix the problem actually, it pretends to. I mean, maybe it covers some 90% of the problem -- who knows, I didn't count. So, it kinda works for a lot of people. But, honestly, I'd prefer that the problem was fixed s.t. it doesn't require venv. It's kind of like discussing various dental prosthetics options: it's better to just have healthy teeth.
I have yet to come across a situation where I need a virtual environment at all. A lot of projects use it, but then lazy me just runs git clone && python3 clone/main.py and it works just fine, sometimes after an apt install python3-somedependency or two.
It always seemed weird to me to depend on such a specific version ("package foo can't be older than 7 months or newer than 4 months"), how even does one manage to use such obscure features that they were removed and so you need an older-than version?
And then there's the people in this thread (seemingly a majority when judging by top level comment loudness) that have trouble with virtual environments and so add another layer on top to manage that virtual environment thing. The heck. Please explain
Maybe everyone has come to think we need layers on layers on layers because management tools (like venv) are blogged and talked about, whereas it's a bit dull to write/talk about nothing (i.e. not using/needing such tools)? I genuinely wonder
Yes, absolute paths are hardcoded in several places.
I actually have a use case for copying/relocating them (for https://apibakery.com), and simple search/replace of paths across the entire venv works, but I wouldn't recommend that as a best practice approach :-)
We ended up making a new environments for each. Honestly it’s a bit of a mess.
I don’t know which problem python intended to solve by separating the two, but it doesn’t occur often in contemporary software engineering work.
Having said that, the point you make is valid and Poetry is a good option, but it feels so maddening having to learn about like seven different tools which all do more or less the same but not quite, and everyone and their mother having an opinion on which is the best. Doesn’t help that there’s an arbitrary yet blurry line where package managers end and environment managers begin.
Side note, there are way too many React/React-Native "router" type packages, and at least one of them breaks its entire API every update (I think https://reactrouter.com/en/main/upgrading/v5, how are they on version 6 of this). It's so bad that you can't even Google things anymore cause of the naming conflicts.
Beginner-friendly is to avoid all this complexity in the first place. I think it's a mistake to mention venvs to newbies until they are comfortable with python, paths, links, and environment vars.
Because if you need performance, you need to use native code.
That's just for managing your python installation and virtualenv though. You still need to manage your packages and for that you have options like requirements.txt, pipenv (not pyenv lol), Poetry, and others.
In practice, pip may not honor your requirements.txt the way you think it would. Even if you require exact versions of packages (which is something you shouldn't do for programs / libraries). This is because pip will install something first, with its dependencies, and then move to the next item, and then this item may or may not match what was already installed.
The reason you don't run into situations like this one often enough to be upset is because a lot of Python projects don't survive for very long. They become broken beyond repair after few years of no maintenance. Where by maintenance I mean constant chasing of the most recent set of dependencies. Once you try to install and older project using pip and requirements.txt, it's going to explode...
I haven't tried out virtual environments yet.
I do remember python2-only being a problem back in the day, but this was solved... hmm, maybe in 2017 somewhen? At least for the packages I used then that had py3 issues before, like sklearn, numpy, and scapy come to mind. I think it more or less coincided with Debian deciding the next release was not going to ship Python 2. Somehow that made everyone 2to3, fix a few remaining bugs, release, done. I'm too young (I'm 30) to really have done much with Python 2 so I didn't have this legacy in my own code (besides a few early files), I've ~always just tried to find and install Python 3 versions of everything.
Hahaha... (rolls on the floor) Do you want to know why that is? No seriously? I'm not laughing at you as much as I'm laughing at Python now, but hey, well, anyways, do you want to know why that happened to you? I know you don't. But I'll tell you anyways!
Until quite recently, pip didn't give a rat's ass if the dependencies it installed were consistent. It would blink a message in the long stream of vomit it spills on the screen saying something like "you have package X installed of version Y, but package Z wants X of version Q, which will not be installed". And happily streamed more garbage to your screen.
It was an issue that was filed against pip for something like 12 years until it got resolved about a year or so ago. Even after it got resolved a lot of people tried to upgrade, saw that that would "break" their deployment, and rolled back to the latest broken version.
Things are sort of improving gradually since then, but we are light years away from the system working properly, and I know you don't want to know why, but I'll tell you anyways!
So, when for whatever reason pip doesn't find a dependency it thinks you need, a lot of packages, when they roll out their "releases", they upload also what Python calls "source release". Which should have never been treated as an installation option, but it is, and is treated like that by default. So, what will happen once pip finally gives up on finding a match, right, you guessed it! -- It's going to try to build it! Installing build dependencies along the way. What you get in the end is anyone's guess, but most likely, it's something broken because the developers who made this release didn't make a release specifically for your version.
Don't despair. There's a flag you can use with pip install that should prevent it from trying to build stuff. But two bad things will happen to you if you use it: in any non-trivial project your dependencies will irreparably break. And, who knows if that flag is implemented correctly... nobody in the real world is using that. So, who knows, maybe it'll format your hard drive along the way.
Python's problem isn't with source distributions as such, but with really bad metadata control (and excessively permissive approaches to metadata generation, like arbitrary Python setup files). Better metadata makes source-based language package management work just fine in every other language's ecosystem; much of the effort in Python packaging over the last ~8 years has been slowly turning Python in that direction.
One doesn't preclude the other. I'm not against having a mechanism for automating source installs (like this is done in, eg. RHEL-based distros), but it's insanity if you allow this to happen by default. You may not remember Bumblebee deleting /usr while running some innocuously-looking code during install, but things happen... really bad things...
Things don't need to happen all the time in order for them to be scary. It's enough to have possible catastrophic consequences, even if the event itself is rare.
> Better metadata makes source-based language package management work just fine in every other language's ecosystem
I haven't seen a single one, and I used dozens at this point. This is never a good idea. It's OK to do source installs for development, it's never a good idea to do source installs for deployment. It "works" in other places because of how it's presented (i.e. nobody expects this to be the method of software delivery to the end user). Like, eg. in Cargo (Rust): you, as a developer, download sources and build programs from all the sources on your computer, but your user gets a binary blob they put on the system path and run. It would be insanity and a security nightmare if users were supposed to compile program code before they could run it. The select few who can audit what's being downloaded and how it's been compiled would probably manage, the rest would become victims of all sorts of scams or just random failures propagating beyond their builds into their systems.
> much of the effort in Python packaging over the last ~8 years has been slowly turning Python in that direction.
I'm sorry, but PyPA is managed by clueless people. Whatever they do there only breeds more insanity over time. They neither have a general direction where they want to take the packaging system, nor do they understand the fine details of it. They are also bombarded by insane requirements for useless and harmful features, which they often quick to implement... It's a circus what's going on there. I've lost hope years ago, and now I've become an accelerationist. I just like to see it burn and people run around screaming while their backs are on fire. I get paid to fix this mess. So, PyPAs incompetence is my job security.
I've encountered them with other languages and they're the sort of thing where one time is more than enough to make me feel like it could get me fired; they're Never (with a capital N) okay imo
(Sorry, not a python expert)
That said, re-creating a venv shouldn't be hard and if it is you're doing something wrong in your development setup.
1. Copy my code out from the venv folder
2. Delete the venv folder
3. Make a new venv
4. Copy my code back into the new venv folder
5. Re-install dependencies
This doesn't take much longer than 60 seconds, but that's 55 seconds more than I want to spend. How is this a good process? It just makes me avoid using python (at least when I'd need anything outside the standard library).
Is there a simple command that will do this all for me?
Note that I don't typically have a git repository or similar set up because I use python for very simple semi-throw-away scripts. I just want to be able to rename the containing folder and have my script still work.
pip install <my_entire_universe_of_requirements>
pip freeze > constraints.txt
And now in any new environment: pip install <any_subset_of_requirements> -c constraints.txt
Now you can install prod requirements or dev requirements or whatever other combination of requirements you have and you guarantee to have the exact same subset of packages, no matter what your transitive dependencies are doing.You can use pip-compile from pip-tools if you want the file to include exact hashes.
Also pip freeze does not emit a constraints file, it emits (mostly) a requirements file. This distinction is rarely important, but when it is, it can cause a lot of problems with this workflow. For example, a constraints file cannot include any information about which extras are installed, which pip freeze does by default. It also can't contain local or file dependencies, so if you have multiple projects that you're developing together it simply won't work. You also can't have installed the current project in editable mode if you want the simple "pip freeze" workflow to work correctly (although in practice that's not so difficult to work around).
Pip-tools does work a bit better, although the last time I used it, it considered the dependency chains for production and for development in isolation, which meant it would install different versions of some packages in production than in development (which was one of the big problems I was trying to solve).
From my experience trying basically every single option in the packaging ecosystem, there aren't really any solutions here. Even Poetry, which is pretty much best-in-class for actually managing dependencies, struggles with workspace-like installations and more complicated build scripts. Which is why I think pretty much every project seems to have its own, subtly unique build/dependency system.
Compare and contrast this with, say, NPM or Cargo, which in 95% of cases just do exactly what you need them to do, correctly, safely, and without having to think about it at all.
They're constraints not dependencies they don't need to be installed and you can just update your requirements as you need and regenerate them.
> Also pip freeze does not emit a constraints file, it emits (mostly) a requirements file. This distinction is rarely important, but when it is, it can cause a lot of problems with this workflow. For example, a constraints file cannot include any information about which extras are installed, which pip freeze does by default
Pip freeze does not use extras notation, you just get extra packages listed as individual dependencies. Yes there is an important distinction between constraints and requirements but Pip freeze uses an intersecting subset of the notation.
> You also can't have installed the current project in editable mode if you want the simple "pip freeze" workflow to work correctly
That's why the workflow I gave to generate the constraints didn't use the -e flag, you generate the constraints separately and then can install however you want, editable or not.
> From my experience trying basically every single option in the packaging ecosystem, there aren't really any solutions here. Even Poetry, which is pretty much best-in-class for actually managing dependencies, struggles with workspace-like installations and more complicated build scripts. Which is why I think pretty much every project seems to have its own, subtly unique build/dependency system.
People have subtly different use cases that make a big impact on what option is best for them. But I've never been able to fit Poetry into any of my use cases completely, whereas a small shell script to generate constraints automatically out of my requirements has worked exceedingly well for pretty much every use case I've encountered.
It's still not great though, since that only pins version numbers, and not hashes.
You probably don't want to manually generate requirements.txt. Instead, list your project's immediate dependencies in the setup.cfg/setup.py file, install that in a venv, and then 'pip freeze' to get a requirements.txt file. To recreate this in a new system, create a venv there, and then 'pip install -c requirements.txt YOUR_PACKAGE'.
The whole thing is pretty finicky.
pip freeze > requirements.txt
if you want. The only catch is you should be using a similar architecture and Python version in both development and production.This would also pin a few non-project dependencies such as `disttools` but that shouldn't be a problem.
Edit: TIL that pip constraints is a thing. See the comment posted by oblvious-earth for an even better approach.
Also doesn't play nice with publishing to custom pypi destinations (e.g. self-hosted Gitlab) in my experience. I could track down the issue but the code around was clearly a mess so that I gave up on that one.
Probably bad documentation/tutorials.
There is literally nothing about a venv that somehow magically makes it impossible to still work after relocation. Breaking the venv on relocation was a conscious choice that has been insisted on to this day for no good reason other than "a long history of not bothering to fix this nonsense is all the justification we need to continue not fixing this nonsense".
...you know...
...fix virtual environments?
project
├─ venv
| ╰─ ...
├─ pyproject.toml
╰─ project
├─ __init__.py
├─ __main__.py
╰─ app.py
Which means recreating the venv is as easy at removing the venv folder, creating a new venv, and running `pip install -e .` when using pyproject.toml or `pip install -r requirements.tx` when using a requirements file.This of course doesn't quite solve the moving the folder issue, for which unfortunately there isn't an amazing solution currently. One thing you can do is have the venv somewhere else entirely, That way you can keep the venv in a fixed place so it doesn't break but still move the code to wherever you want to put it. In the use-case for tiny scripts like you do you might be better served not using a venv, and just using `pip install --local` for all your packages. Which is a bit messy but has served me for years and years before I landed on the pattern I use now.
Another "unfortunately" is that none of this stuff is documented very well. Writing a working pyproject.toml for example requires switching between the PEP introducing them, the pip documentation, and the setuptools documentation.
Also, you don't have to use them.
I asked what's wrong with a venv and I got a rant…
That's a pretty serious downside to the language. Virtual environments are needed to help people deal with that downside.
The normal, permanent, stuff gets installed system wide the normal way, with apt.
I didn’t brick the machine, just the ability to setup a typical python venv.
Also LiveCDs are a thing for about twenty years. Recovery has never been easier, even after hardware failure.
I don't use that as a reason to choose what I'll use in my projects, that's decided by the PTSD incurred from 7 years of php.
On Mac I simply use the official installers from python.org.
> Recovery has never been easier, even after hardware failure.
If you can use a LiveCD to repair it, it most likely wasn't a hardware failure to start.
Essentially, yes, and justifiably so. Try installing science-y Python packages on Windows written in C. When conda was created in 2012 this meant installing Visual Studio 2005 (for Python 2.7) which was hard to find on Microsoft's own website even back then.
Ah, I found your bug!
the Python package index now supports binary wheel files for all platforms and Scipy is there https://pypi.org/project/scipy/ with a few dozen distros.
is the problem solved yet ?
Because often python is only one part of the puzzle, we also need a variety of third party libraries and maybe R as well (along with R packages). And we would rather not have to explain to another researcher how to install it all.
It started in a way similar to how Active Perl started. An attempt to build a reliable, curated and "batteries included" solution. (But it went downhill very quickly).
So, what were the problems:
* Most importantly, using Python on Windows is a pain (it still is, but less so). Many Python packages rely on native libraries which they don't provide. Even if they ship with some binary blob, that blob is often just a wrapper to the actual library. But, at the time Conda was created, even a binary blob was not an option. If you wanted to install stuff, you had to compile it. And, compiling on MS Windows is a world of pain. Especially, if you compile something for Python. Again, credit where it's due, Python sort of fixed some of these problems... but just a tiny bit. It still sucks a lot. So, conda is more of a MSYS2 really: it's a Linux user-space in Windows (firstly), and Python on top of that. But, it kind of also doesn't want to admit it, so, it still pretends it's Windows... but with a bunch of Linux-ey stuff...
* Secondly, pip was a joke until about a year ago. It was not really a package manager (it still isn't) and it couldn't properly solve dependencies (it sort of does today, but poorly because of source-distributions). Conda, when it worked, was at least somewhat consistent. (But usually, it just hangs). Also, conda is a package manager (however awful). I.e. with pip you can run install once, and then you can run a different install command again in the same environment, and god help you to make sure you have consistent dependencies between packages. Conda, in principle, should always keep your environment consistent. (But it hangs more often than not). Also, there's no such thing as installing from source with conda. If you want to use source: git clone, install conda-build and suffer through non-existent documentation and get help from non-existent community of people building conda packages.
* Conda provides a curated set of packages (a.k.a default channel). Anyone worth their salt ditches that channel the moment they install conda and installs everything from conda-forge (cuz its got more stuffs!) So, they tried curated. Didn't work.
---
In a sense, I think that what happened is that conda was too ambitious of a project, using wrong technology to implement it. It also didn't have good enough planning for the kind of ambition they had. So, every time they came to solve a real problem, they didn't have time, human resources and system resources to solve it well. They've accumulated technical debt at close-to-light speed and ended up being a huge mess nobody knows how to deal with.
Some of the obvious mistakes would be: creating too many releases of conflicting packages. They should've worked on some sort of LTS solution, where they release a set of packages with very permissive dependencies of very few versions that had been proven to work well together. Instead it's very common for conda packages to be very peculiar (and without real need to do so) about the versions of their dependencies.
Conda people didn't build good CI. They often release absolutely broken packages, and only discover it retrospectively from community input. (Tensorflow would be a great example). This creates a lot of problems with automatic updates.
Conda gave in to community pressure and decided to build integration with pip. It doesn't work well, and it's not possible to make it work well, but they added this and a lot of people instantly created dependencies on this feature.
Conda picked a bone with some of the infra tools outside of Python ecosystem. In particular with CMake. This just adds an extra aspect of pain trying to build conda packages / work with native libraries / wrappers. It might seem like a minor thing, but it prevented a lot of people who might otherwise release packages both for conda and PyPI from doing so. Often package that is released to conda is ported by someone who's not the package author. It also means that sometimes the names of packages differ between conda and PyPI for the same package.
----
NB. In terms of amount of commands you need to type when working with conda vs with PyPI tools is not noticeably different. Conda is, perhaps, more organized, but is also more buggy due to trying to integrate with various shells in special ways, and failing quite a bit.
Yes, there's also other options like Yarn, which have typically had newer features and different approaches, but pretty much everything that works has been folded back into NPM itself. Unless you really want to live at the bleeding edge for some reason, NPM is perfectly sufficient for all your needs.
In contrast, the closest thing to that in the Python ecosystem is Poetry, which does a lot of things right, but is not supported by Python maintainers, and is still missing a handful of things here and there.
I'm not saying the JS ecosystem as a whole is perfect, but for packaging specifically, it's a lot better than Python.
That's a good way to never get vulnerabilities fixed.
It hardly seems like "the right thing" to me.
But when it comes to packages and "virtual envs" the Javascript ecosystem is leaps and bounds better.
That being said, the Javascript ecosystem has standarized tooling to handle that (npx) that Python doesn't (I wish pipx was part of standard pip), they just pick the convenient footgun approach.
It's one of the major sources of disatisfaction with Python!
https://peps.python.org/pep-0582/
Sample comment from its discussion:
>> Are pip maintainers on board with this? > Personally, no. I like the idea in principle, but in practice, as you say, it seems like a pretty major change in behaviour and something I’d expect to be thrashed out in far more detail before assuming it’ll “just happen”.
As if the several half-arsed official solutions already existing around packaging (the several ways to build and create packages) had deep thinking and design behind them...
I’ll often look for command line tools written in Rust, but not because of Rust fanboyism, but because I know I can just git clone the project and immediately start hacking on a new feature I need or a quick bug fix. In almost every other language I have to jump through one million hoops before I can build and run whatever it is, let alone have a nice developer experience (autocomplete, go to definition, etc).
In my experience, Java, Go, PHP, NodeJS have all got similar package management that works.
https://learn.microsoft.com/en-us/nuget/reference/nuget-exe-...
Rust's system is probably the next best.
ObjC/Swift packaging is a flaming disaster in practice, unless it's improved since I jumped that ship. Last time, I remember every single project having to rely on Cocoapods.
I see anyone sticking with CJS syntax the same way I see Python devs who continue writing 2.7 code by choice in new projects and not because they are maintaining older projects.
Sure tree-shaking and browser support are nice, but they didn't have to make the syntax this complicated to achieve that. Not an issue with other languages.
Swift got fixed over time (which is why every basic SO question has 20 different answers for each Swift version), but it still sucks, and so does UIKit, and Xcode. That whole toolchain has been relegated to being just a dependency behind React Native for me. I mean look at the shitfest involved just to get a substring https://stackoverflow.com/questions/39677330/how-does-string...
I'd even dare to say that Docker _is_ the answer to the Python's packaging problems, and might have never taken off without that "killer app" that is the Python packaging sh*tshow.
But... I'm not going to hold you in the dark: the reason and the major drive to have a packaging system is that you can define dependencies between packages s.t. users installing package can coordinate and install the stuff they need. Docker simply doesn't do that. You get images. They are completely independent. Whether any two images will work with each other is anyone's guess.
Docker doesn't solve Python's packaging problems by being a better packaging system. It solves it by sidestepping the issue of users installing packages themselves, and shipping the disk of the one machine where it once worked.
It's a bad linker.
(not a fan of the condescending tone in your post, btw)
I’ve always seen it recommended to use pyenv or just download directly from Python.org instead.
Python barely survived becoming the default scripting language on Red Hat and other Linux distros which was a major obstruction to the Python 3 transition. If the new cohort of pandas and scikit-learn users had not been such a force of nature we”d be talking today about Python the way we do about Perl.
Not installing venv is a serious crime on the part of Debian as beginners don’t need any more excuse to trash their system with site-local directories and other wrong answers for how to manage packages in Python.
The only time I've ever had a big mess with broken Python installs was after using brew on Linux - luckily killing off brew brought the system Python back fine. I'll grudgingly put up with brew on a Mac out of necessity, but keep it away from Linux.
That said, some Python packages rely on native code that you might find yourself brew-installing. That can be a nightmare.
Java package managers tend to install packages written in java
Go installs packages written in go, and maybe C using cgo
Cargo installs packages written in rust
php package managers install packages written in PHP, extra extensions are rare
etc
People having trouble with python are NOT having trouble with python. They are having trouble because they are trying to use packages that are just python bindings to absolutely massively complex c++ and Fortran libraries.
Often people using python don't even have a C compiler installed (let alone a fortran one for the scientific stuff), so pip blows up the first time they try to install a package that hasn't been pre-built for their system+python version.
pip install —-user
Is one of them. If users don’t have venv they will trash their Python installation almost right away and probably join the many people who’ve left Python.We found a bug on a Python project I worked on recently that only seemed to happen on certain machines. We couldn't reproduce it in a dev environment, and one machine the was affected suddenly stopped being affected after a while. It turns out the issue was a buggy dependency: one particular build of the project happened to have picked up the buggy version, but later builds used the fixed version and so didn't have a problem. So we'd only see the bug depending on which build the machine had last used, and if someone put a different build on there, it would reset that completely. On our development machines, we used slightly different builds that just happened but to have been affected.
Pinning dependencies wouldn't necessarily have prevented the bug in the first place - sometimes you just have buggy dependencies - but the debugging process would have gone much more quickly and smoothly with a consistent build environment. We could also have been much more confident that the bug wouldn't accidentally come back.
In particular, it also makes upgrading a much more complex task. For example, React releases new major versions on a semi regular basis, each one containing some breaking changes, but not many. Ideally there wouldn't be any, but breaking changes are inevitable with any tool as situations change and the problem space becomes better understood. But because the NPM ecosystem generally uses locked dependency lists, end users can upgrade at their leisure, either with small changes every so often, or only upgrading when there's a good reason to do so. Both sides can be fairly flexible in how they do things without worrying about breaking something accidentally.
Under a Linux distribution model however, those incremental breaking changes become essentially impossible. But that means that either projects accumulate cruft that can't ever be removed and makes maintainers' and users' lives more complex, or projects have to do occasional "break everything" releases a là Python 2/3 in order to regain order, which is also more work for everyone. There is a lot less flexibility on offer here.
I don't think these sorts of problems disqualify the Linux distribution model entirely - it does do a lot of things well, particularly when it comes to security and long-term care. But there's a set of tradeoffs at play here, and personally I'd rather accept more responsibility for the dependencies that I use, in exchange for having more flexibility in how I use them. And given the popularity of language-specific package repositories that work this way, I get the feeling that this is a pretty common sentiment.
Updating packages should be strictly left to the developer's discretion. That schedule is up to the developer using the packages, not upstream.
Not to mention that dependencies updating themselves whenever they like to "fix vulnerabilities" is a sure-fire way to break your program and introduce breakage and vulnerabilities in behavior...
The "weird one" of the remaining three is side effect imports which isn't all that weird when you realize you're not assigning it to anything. Functionally this is the same as calling a function rather than assigning a function to a value. eg `myFunction = function myfunction() { //stuff }` vs `myFunction()` and when you think about it like that it becomes significantly less weird but also something you rarely want to do - it's mostly used for polyfills. Good to know it exists but you can probably ignore that it does.
So now you're left with two: Default and Named. Use Default when you want the entire library - or almost the entire library. Used Named when you want specific pieces of the library. That's all there is too it really. If I want a specific function from a library - there's no reason to import the entire library. For a while you'd mix both Default and Named exports for React due to how transpiling worked - this React blog post explains it well: https://reactjs.org/blog/2020/09/22/introducing-the-new-jsx-... but you don't really have a reason to mix the two in modern codebases.
Named imports tend to be preferred because Default imports means giving a name to it which can result in inconsistencies across a codebase when many people are working on it. (eg: `import SumTwo from 'sumtwoNumbers.js'` vs `import AddTwo from 'sumTwoNumbers.js'`. A named import `import {SumTwoNumbers} from 'sumtwoNumbers.js'` solves this problem)
There's still one final little "gotcha" - there can only be a single Default export. Generally it's an object that contains "everything" but it doesn't need to be and those times are the only edge cases you'll run into though I can't say I ever have encountered this so it is a "theoretical" reason to avoid Default imports but I can't say it's ever been an issue in practice.
I guess I avoided a lot of this weirdness by basically only ever using the ES6 syntax and preferring Named imports (and not being stuck in the React ecosystem). The CommonJS got to avoid some of the "weirdness" because it could pretend the Browser doesn't exist (and leave handling it to bundling tools). So I guess I'll capitulate and say it's a little weird but you can basically ignore it and used Named imports as the "One True Style".
The bigger thing is, I'm subject to however the deps I use want to export things, so they use a mix of those. Maybe in some cases you have to use `require` even if you don't want to, I forget.
If somebody wanted to package a modern Python application as Debian package they’d have to make a Debian package that contains several Python packages maybe as a venv or executable wheel, it is a solvable problem but a bit like comparing tic tac toe to 12d chess with supersymmetry in terms of the ambition of Linux distros if not the difficulty.
If you installed python packages as debs without any isolation you'd never be able to install two applications that depended on different versions of the same Python package.
The best thing about Java is that it is so xenophobic that nobody expects to have a working Java runtime installed with the OS so you never have to deal with a broken runtime installed with the OS. JAR files are remarkably self-sufficient so "write once one anywhere" is way closer to the truth than it is on "DLL Hell" platforms like Windows and Linux.
Lol. It doesn't. How are you going to create Docker images? Run more Dockers inside Docker? And then Dockers all the way down? You still need to install the packages... just while creating an image. Makes absolutely no difference whether you do it inside or outside, you still need something that can install and coordinate packages. Docker is simply irrelevant to this problem.
Once. On a build server. Or a dev's workstation who had the time to fiddle with all the breakage that is python envs. This absolutely does sidestep the issue of distributing Python packages to your users.
We're talking about different problems here. You are talking about library dependencies for devs (and I understand that I'm a bit off topic for this thread)
I didn't ask where, I asked how. How will you know what packages do you need to install? Are you going to go on pypi.org, download them one by one, unzip, look for requirements, then go to pypi.org again, download more, and so on, until you have enough packages and have figured out all the requirements? If so, I have very bad news for you: your single installation might take weeks, possibly months, or might in fact be so slow it will never end as new packages will be published faster than you can identify the version you need.
> This absolutely does sidestep the issue of distributing Python packages to your users.
No it doesn't... OMG. You cannot be serious? Do you really search and solve dependency problems w/o using any dedicated software? Like in your head? Or with pen and paper? For any non-trivial project the number of combinations you have to try before you find the one that works will be astronomical...
> We're talking about different problems here.
Nah. You just don't understand the problem. It's similar, but not the same.
If it's that simple, how come no one already did all that work for us?