Overview of Python dependency management tools(modelpredict.com) |
Overview of Python dependency management tools(modelpredict.com) |
Tangentially related is the tool tox [1], which is often used to run a test suite inside of virtual environments created by venv, on multiple versions of Python managed by pyenv.
Now if only setuptools could work well without hackery...
0: https://github.com/rmpr/atbswp 1: https://wxpython.org/pages/downloads/ 2: https://github.com/RMPR/atbswp/blob/master/Makefile
works even on iPhone (for pure Python packages in Pythonista for iOS)
All I was trying is to combine tensorflow lite with opencv IIRC. Just look at the installation instructions:
- https://www.tensorflow.org/install/pip
- https://www.tensorflow.org/lite/guide/python
- https://docs.opencv.org/3.4/d2/de6/tutorial_py_setup_in_ubun... vs https://stackoverflow.com/a/52880211/938236
It works on macOS and Linux without any issues. Windows usually requires some extra steps to setup node-gyp
"There should be one - and preferably only one - obvious way to do it." - The Zen of Python; see also https://xkcd.com/1987/.
When it comes to the package, environment and dependency management, I think that ironically JavaScript environment is light years head, vide: https://p.migdal.pl/2020/03/02/types-tests-typescript.html
On Linux, however, I do not see much benefit, unless you frequently install large binary C library based packages.
To me it feels cleaner to compile these packages from source. You are sure to have no glibc mismatches etc.
Conda, despite its advertising, does have library issues. C Libraries are shared between environments, compiling inside an environment can lead to surprising results when stale libraries are in the miniconda path.
All in all, it feels like a second OS shoehorned into the user's home directory. Compared to apt-get it is really slow and bloated.
It feels too intrusive on a Unix system.
Also, I'm not sure if the repositories are secured in any meaningful way.
It seems to me that Ruby, PHP, JS, and Rust communities have solved the problem.
I dont think there are any real Python devs who thinks dependency management is solved. However, why would you claim Javascript has a good solution? The inconsistencies between node and web dev is odd at best. Babel compilation is annoying and slow. Are we even standardized on webpack yet?
Can anyone say with a straight face that getting a new JS dev caught up on what all these different parts to compile a JS program is a solved problem?
I dont fault either Python or the JS ecosystems. As pioneers in dependency management, there was a lot of trial and error. New languages like Rust benefited from it and that's ok.
You are intermixing dependency management with build tools. Webpack and Babel have very little to do with dependency management.
Node doesn't even need something like venv since module lookup is always local.
Also no problem with dependency hell. Each dependency can have its own private dependencies, even different versions of dependencies shared by sibling modules. Tools like yarn/npm can remove duplicates across a project.
I would not say JS solved it, but yarn (and even npm these days) seem to be superior to all python dependency management tools before pipenv. None of them had a proper lockfile, for example.
Maybe a noob question, but how come they are pioneers? Weren't there languages with package systems before Python and JS?
I think the environment system in python is a confusing design flaw that could have been avoided with project specific installations. I vastly prefer installing packages on a project-by-project basis. Python introduces dependency nightmares because two projects with different needs end up using the same central local package source unless you set up different environments. So when you install an package foo for project B, package bar might stop working for project A due a dependency on an earlier version of foo.
pypi is really the de-facto package index, Pipenv/Poetry/Conda are all venv handlers (using the standard venv tools) + dependency graph, and using pip which is standard as well.
I would call this a small set of solutions (3), and they are all sane (any will do, just pick one).
I should have said that on Windows Conda is a much more understandable choice.
Another thing to note is that dependencies were not that complex in the 90s. The idea of dependency management started when there was too many dependencies to manage and the existing tools could not reliably reproduce builds. My guess is that all these package use took off after the dotcom boom in 2000 and everyone started building websites with Ruby, Python and eventually Nodejs.
With that said, I should have included Yarn and Npm with their own problems. I can't remember how often I solved my dependency problems with rm -rf node_modules.
(I'm actually not sure if it is the average but my anecdotal experience is that it's an order of magnitude higher than python, and 1200 wouldn't be unusual).
Node projects tend to have a lot smaller dependency graphs.
I wish Python can abandon virtual envs. This is the most annoying part of setting up a project.
More details here: https://medium.com/knerd/the-nine-circles-of-python-dependen...
But I actually don't want it to be too smart. Better to keep your dependencies minimal and explicit, and manually specify older 'pkgC' if you need to. I have a few non-trivial services in production, the most complex one with 16 total dependencies + sub-dependencies. That is quite manageable.
So, I strongly recommend manually curating the most appropriate versions of the few tastefully chosen dependencies you really need. Then, pip+venv can easily reproduce that exact set of dependencies anytime. I also do something very similar to this with C applications, and Go. Sub-dependencies should be a big factor in how you choose your direct dependencies.
I don't even see what problems they solve, it seems like you end up with more problems using them.
I have used virtualenv/venv and pip to install dependencies for years and years, since I was a teen hacking around with Python. Packaging files with setup.py doesn't really seem that hard. I've published a few packages on pypi for my own personal use and it's not been too frustrating.
A lot of the issues people have with Python packaging seem like they can get replaced with a couple shell aliases. Dependency hell with too many dependencies becomes unruly in any package manager I've tried.
Is the "silent majority" just productive with the status quo and getting work done with Python behind the scenes? Why is my experience apparently so atypical?
If trying to run various existing python programs to analyze biology data, I soon run into various problems. Is this a Conda?/ or can I use my Python environment? which version of python? will let me run the thing and what libraries do I need? This breaks in that version?
Sometimes I feel that one kinda ok way of doing things, would be better than having 6 ways , one of which will suit my use case perfectly.
This problem is not unique to python.
Can you elaborate a bit there? I use conda because I like some of their features over standard virtualenv (being able to specify a python version when i create my venv) - but I've never had a problem running code in env's created by one vs. the other.
A lot of people who use Python, don't have the luxury of it being their "daily driver for years", so the conflicting documentation, decision paralysis and other problems that come with it end up being a huge time sink.
A lot of non-programmers are being forced to use Python for various automation tasks. A lot of the CAD-software that construction engineers use, support Python-plugins. Network admins that have been configuring switches and routers on CLI for decades now have to configure them using Python.
Look at "cargo" to see what the world could be like.
Still, it's worth keeping in mind that Rust was born 20 years after Python was. Python was being written before Mosaic, Netscape, and Yahoo! were around. I think it can be forgiven for failing to conceive of a perfect package management system in 1990s. There were bigger fish to fry back then, so to speak.
Over the decades (!) there have been many, well-documented attempts at coming up with a package management story. pip and virtualenv have been the obvious winners here for years.
So, in conclusion, again you're right. But 30 years of history produces a lot of "conflicting documentation". It's only the last 10 years or so, that people have fought over the superiority of one language's package management ecosystem or another.
That said... if you work on software that is distributed to less-technical users and have any number of dependencies, python package management is a nightmare. Specifying dependencies is just a minefield of bad results.
- If you specify a version that's too unbounded, users will often find themselves unable to install previous versions of your software with a simple `pip install foo==version`, because some dependency has revved in some incompatible way, or even worse specified a different dependency version that conflicts with another dependency. pip does a breadth-first search on dependencies and will happily resolve totally incompatible dependencies when a valid satisfying dependency exists.[1]
- If you specify a version with strict version bounds to avoid that problem, users will whine about not getting the newest version/conflicting packages that they also want to install. Obviously you just ignore them or explain it, but it's much more of a time sink than anyone wants.
- In theory you can use virtualenvs to solve that problem, but explaining how those work to a frustrated Windows user who just spent hours struggling to get Python installed and into their `PATH` is no fun for anyone. Python's made great strides here with their Windows installers, but it's frankly still amateur hour over there.
- Binary packages are hell. Wheels were supposed to make Conda obsolete but as a packager, it's no fun at all to have to build binary wheels for every Python version/OS/bitness combination. `manylinux` and the decline of 32-bit OSes has helped here, but it's still super painful. Having a hard time tracking down a Windows machine in your CI env that supports Python 3.9? Too bad, no wheels for them. When a user installs with the wrong version, Python spits out a big ugly error message about compilers because it found the sdist instead of a wheel. It's super easy as a maintainer to just make a mistake and not get a wheel uploaded and cut out some part of your user base from getting a valid update, and screw over everyone downstream.
- Heaven help you if you have to link with any C libraries you don't have control over and have shitty stability policies (looking at you, OpenSSL[2]). Users will experience your package breaking because of simple OS updates. Catalina made this about a million times worse on macos.
- Python has two setup libraries (`distutils` and `setuptools`) and on a project of any real complexity you'll find yourself importing both of them in your setup.py file. I guess I should be grateful it's just the two of them.
- Optional dependencies are very poorly implemented. It still isn't possible to say "users can opt-in to just a specific dependency, but by default get all options". This is such an obvious feature, instead you're supposed to write a post-install hook or something into distutils.
- Sometimes it feels like nobody in the python packaging ecosystem has ever written a project using PEP420 namespaces. It's been, what, 8 years now? and we're just starting to get real support. Ridiculous.
I could go on about this for days. Nothing makes me feel more like finding a new job in a language with a functioning dependency manager than finding out that someone updated a dependency's dependency's dependency and therefore I have to spend half my day tracking down obscure OS-specific build issues to add version bounds instead of adding actual features or fixing real bugs. I have to put tons of dependencies' dependencies into my package's setup.py, not because I care about the version, but because otherwise pip will just fuck it up every time for some percentage of my users.
[1] I am told that this is "in progress", and if you look at pip's codebase the current code is indeed in a folder marked "legacy".
[2] I 100% understand the OpenSSL team's opinion on this and as an open source maintainer I even support it to some degree, but man oh man is it a frustrating situation to be in from a user perspective. Similarly, as someone who cares about security, I understand Apple's perspective on the versioned dylib matter, but that doesn't make it suck any less to develop against.
But that has been solved on Windows for quite a while hasn't it?
Python installs the "py" launcher on the path, which allows you to run whichever version you want of those you have installed. Just type "py" instead of "python". Or "py -3.5-32" to specifically run 32-bit Python 3.5, or "py -0" to list the available versions.
Or an error message like:
There's a version conflict. In order to resolve, try one of the following:
pip relax-dep package1 >= 1.0
pip relax-dep package2 >= 2.0
pip remove package3
And then you would want to have pip undo
(Just brainstorming here.)* developers install with language packager
* in between install with OS package manager
* users install bundle
Those who have troubles with pip, gems, cabal, etc should check over options first.
Wait, bundlers Gemfile.lock lists installed versions at least ten years, what is "too unbounded" in pip?
I use yarn for managing javascript dependencies and do a lot of work with Cargo too. The community seems to love both these tools outside of slow compile and install times.
I'm sure Anacona filled a niche at some point, but we have wheels now, can we all just agree to stop using Conda? What value does it actually bring now that makes it worth screwing up the standard distribution tools?
It's also possible that you use Python on Linux, where it is at least tolerable. Try again on Windows.
yes - the standard way of defining dependencies in Python is in setup.py, which has to be invoked as a Python script in order to work. this script may also need to read files from the rest of the project, so you do indeed need to download the whole package to determine its dependencies.
even if the Python community were to agree on a new configuration format tomorrow, there would still be a ton of packages out there that wouldn't migrate for years.
I get backwards compatibility and open-source governance and bla-bla, but the reality is that this cannot be done by a third-party library author and needs to become part of the core stack, including proper support rather than just shipping a tool which covers 90% of cases. It's crazy that apart from venv and pip, nothing else comes with python and you're left on your own.
npm + the registry is part of node
apt-get + registry is part of a normal linux distro
budnler comes with ruby
This is a solved problem elsewhere. What we lack is a fully-supported, agreed upon, working DEFAULT choice, so people don't have to make their own choices. I don't know if not having that DEFAULT is a function of how the python community thinks or its diversity, but it's painful to watch. I've almost given up myself and seen many newcomers give up because of a trivial problem like this.
An alternative way to allow people without software engineering background to play with Python data science and machine learning tool may be providing pre built Docker images with everything pre-installed which may save one from configuration trouble.
Docker is also useful for learning about new programming languages without installing anything. With just one command $ docker "run --rm -it julia-image", one can get a Docker image containing a GOLang compiler; a Julia language installation; a Rust development environment and everything else. Docker is really a wonderful tool.
How do you approach this? How technical are people you prepare Docker images for?
If you used pipenv for a complex project with huge dependency tree, or used it for a long time, you definitely run into a blocker issue with it. That is the worst package manager of all, and probably the reason why Python has such a bad reputation in this area. It's because it's fundamentals are terrible.
Just go with Poetry. It's very stable, easy to use, has a superior dependency resolver and way faster than Pipenv.
I've been meaning to take another look at Pipenv, but the huge pause without a release makes me nervous that it could happen again.
[0] https://johnfraney.ca/posts/2019/11/19/pipenv-poetry-benchma...
We actually used this in production before switching to poetry.
Dependency Management in Pip's Resolver: https://www.pythonpodcast.com/pip-resolver-dependency-manage...
* Developing and deploying production Python solutions
* Helping beginners run their first script
While it's great for beginners to use the same tools that are used in industry, I strongly believe that the problem nearly all of these tools face is that they can't decide whether they want to _manage_ complexity or _hide_ complexity.
You can't do both.
Some of them do a fairly good job at managing complexity. None of them do a good job of hiding it. The dream of getting Python to "just work" on any OS is close to impossible (online tools like repl.it are the closest I've found but introduce their own limtiations). I recently saw a place force their beginner students onto Conda in Docker because getting people started with Conda was too hard. If you're battling with the complexity of your current layer of abstraction, sometimes it's better to start removing abstraction rather than adding more.
That said, I'm also a happy user of `pip` and `virtualenv` and while I'm sure that many people can use the others for more specific needs, I think defaulting to them because they aim to be "simpler" is nearly always a mistake. I still teach beginners to install packages system wide without touching venv at first - it's enough to get you through your first 2-3 years of programming usually.
I picked up Pipenv when a point-point release of a dependency broke a production deployment. Pipenv's dependency locking meant that I wouldn't get surprised like that again.
Part of why this topic comes up so much is the desire to run with a language before learning to walk with it, perhaps. I'm a big fan of Poetry, but I like it because I know what it gives me compared to vanilla pip and a setup.py file.
Installing dependencies at the OS level will get you far as a beginner. And when the time comes that you need a virtual environment, you'll probably know.
Pip feels like an outdated package manager, lacking essential functionality that package managers of other languages have implemented for years. For example, credential redacting in pip was only introduced in 2019, 8 years after its initial release!
Not to mention the global-first nature of pip (package is installed globally unless the user explicitly requests for a local installation). You can still install packages locally, but this only shows that pip was not built with environment reproducibility in mind. As a consequence, the need for additional environment tooling (like venv) arose, which increased the complexity of the local python setup.
Tools wrapped around pip are also under par. I cannot see why Pipenv is that resource intensive, leading to long and noisy builds (my machine gets close to exploding on a pipenv lock), with very fragile lock files. Debugging an unsuccessful locking in the CI of an enterprise project is a mystery that could take an entire week to solve. Its javascript counter-part (npm) does the exact same thing, faster and with less CPU usage.
Trusting the OS community, I understand that there would be very good reasons for Pipenv to perform like this, but as the consumer of a package managing tool all I see is the same generation of file hashes I see on npm, but with npm doing it way more efficiently. I really see value in the principles that Pipenv is promoting, but to me the developer experience of using it is suboptimal.
/home/foo/a/usr/bin/python3
/home/foo/b/usr/bin/python2
Python is so fast to compile and install that I just install as many throwaway Pythons as needed.I do not recall any isolation issues between those installs, unlike with conda or venv, which are both subtly broken on occasion.
But I dislike opaque automation in general.
I wouldn't call venv "opaque automation," there's not much magic going on there.
For me, there are two main choices today:
* An ensemble of single-purpose tools: pip, venv, pip-tools, setuptools, twine, tox,
* An all-in-one tool, for example Poetry, Pipenv or Anaconda (or Miniconda).
I prefer the former approach, but if I had to choose an all-in-one tool it would be poetry.
One is that because Python has been around for so long, it's easy to find outdated or conflicting advice about how to manage Python packages.
I think it's important to stress that pyenv isn't strictly a dependency manager, too, and depending on your OS, isn't necessary. (Supported Python versions are in the AUR[0].)
A lot of pain from Python 2 -> 3 is that many operating systems were so slow to switch their default Python version to 3. Unless something has changed in the last month or so, Mac OS _still_ uses Python 2 as the default.
It's a shame to see Python take a beating for OS-level decisions.
False. Conda manages packages installed from PyPI. This is discussed under the Conda section, so I'm surprised the quoted line wound up in the article.
Any chance you could give me some reference so I can fix it in the original article?
Basically, use Conda to manage environments, use Pip to install packages. If you're using Conda to install anything, do that first.
With Pipenv, last year ownership switched from the Request's lib owner to the Pypa, so more or less an officially blessed solution.
The only downside on this thread that I could understand so far is that it might be slow to install dependencies on larger projects, can't think of anything else.
I've been the person to document setting up development environments for others in macOS (and Homebrew) with a view to deploying in Linux, and pipenv (and pyenv, and Docker/docker-compose for setting up software context/datasets) definitely overall minimized the complexity for those configuring their dev environments.
(EDIT: documenting dev enviroments)
But it forks the ecosystem, twice:
First, Conda packages have to be maintained separately from PyPI packages.
Second, the "default" repo is maintained by Anaconda, but the community maintained Conda Forge repo is also separate, and officially the packages in one are not compatible with the packages in the other. (In practice they usually play nice).
Having three incompatible package repos is not ideal.
Glyph (of Twisted fame, whence pathlib IIRC) pointed this out ages ago: model your domain [in Python] with objects.
> It’s 2020, but reliably compiling software from source, on different computer setups, is still an unsolved problem. There are no good ways to manage different versions of compilers, different versions of libraries needed to compile the main program etc.
I wonder how much stuff like this has to do with python's popularity. When I have opaque issues like "libaslkdjfasf.so is angry with you and/or out to lunch and/or not doing expected things," it's the most frustrating part of programming. I'd pay devops people infinite money to not have to deal with installation/setup issues anymore.
The issue is that lack of packaging C/C++ world spreads to all other communities that depend on them.
I admit a pyenv is nice for testing against different python versions if necessary. But on my Linux systems generally fine with just installing system packages and doing pip install --user for the odd package that is not in the repositories
In other words, while obviously a program written for 3.3 won't work in 2.7, but will a program written for 3.3 fail to run in 3.8?
If it runs fine, why the need for multiple interpreters? I'd think you'd get by just fine by having the latest 2.x and 3.x installed.
In particular, the way it effectively gets you to "I control the lock file; it will be altered when I explicitly request it and never as a side-effect of any other action".
For some reason many other languages' systems (which have had opportunities to learn from others' mistakes) don't seem to treat this as a requirement.
default: requirements-develop.txt
pip install -r requirements-develop.txt
requirements.txt:
pip-compile -v requirements.in
requirements-develop.txt: requirements.txt
pip-compile -v requirements-develop.in
I so nice to just write `make` than doing all the Poetry, Pipenv stuff, that honestly I feel is not adding nothing really really useful to the workflow.There's a decent summary of why someone might still prefer pip-tools even in a world where pipenv and poetry exist here: https://hynek.me/articles/python-app-deps-2018/
For my purposes, the primary downside of this approach is that adding dependencies takes slightly more effort, because you have to edit a file and then execute a shell command, rather than just executing a shell command. But managing dependencies takes up about 0.001% of my time, so this is not an area where I have much to gain by micro-optimizing my workflow.
I do, on the other hand, have a lot to lose by switching to something that's newer and shinier and less stable.
With that, I know the compiled requirements.txt will only have what I need. Now it is just pip install -r requirements.txt or pip-sync.
From the times I've looked at it (almost a year ago and older): pip-tools is / was the core of Pipenv... but it has been a fair distance ahead in terms of bug fixes and usable output for diagnosing conflicts. It seemed like pipenv forked from pip-tools a couple years prior, and didn't keep up.
Given that, I've been a happy user of v(irtual)env and pip-tools. Pipenv has remained on the "maybe some day, when it's better" side of things, since I think it does have potential.
I have used Pyinstaller. It's good but it's a bit too magical for my tastes.
I wouldn't use it over a clean venv management, but I have seen people who prefer using docker containers because they find venvs to not be clean/elegant.
I wonder why you prefer the former approach.
BTW, to comment one of the point you made in the article, it's not that hard to run CUDA inside a container. It's less straightforward but quite well documented. You basically need nvidia-docker [1] on the host and start your containers with the runtime 'nvidia'. docker-compose still doesn't support it officially but there are workarounds. [2] I'm running it on ~50 instances in production and automated all the setup with ansible successfully.
All-in-one tools almost never do things exactly the way you want. They have a higher barrier to entry as well as a stronger lock in effect than smaller tools. If I fall out of love with venv, I can replace it with Docker. I can't just do that with an all-in-one tool.
Having said that, poetry is quite well designed and I do encourage junior developers to explore it for themselves instead of just doing what I do. If I was a junior developer today I might be quite glad for a single all-in-one tool that gets me on my feet with good practices from day one.
I was thinking of my latest install, which was CRISPRESSO2, which installs via docker or bioconda... I was able to get it going, but it took a bit on some systems.. (Python 2.7 old libraries.. etc..) Docker didn't seem to work.
I like virtual env, but sometimes I feel I have to have a new environment for each piece of software I'm running, which feels weird.
Ruby's bundler had these exact same issues 5 or so years ago. I remember attending a talk by on Bundler run by it's core devs and asking about how they make dep resolution faster. Turns out that it was never really a solved problem there either, Bundler just uses a bunch of heuristics to avoid cases like the 18 minute `Pipefile.lock` described above.
Also, libraries with binary component often have to be compiled against specific version of python.
Docker doesn't do anything Python specific on its own. It can be part of a pipeline but only with support from the Python specific tools which is what should be discussed in this kind of article.
Maybe I'm spoiled from working with cargo and npm (I have almost no ruby experience so I can't comment there), but both of them have way fewer such version conflicts in my experience. Obviously there are tradeoffs and I don't want the node_modules experience for my users, but often it seems that would be a much better experience than pip for everyone. With either of those, I just "npm install" or "cargo install" and all my dependencies end up there working.
You can generate a requirements.txt file using "pip freeze" on a functioning system, but then you have to figure out a way to point users at it instead of using "pip install myapp". Also you might have to do it for each OS since windows vs mac vs linux can have different package dependencies specified, and even if you don't do that, a dependency doing it means you have to account for it.
You can copy+paste the "pip freeze" output into your setup.py and add quotes+commas, but then you're back to breaking side-by-side packages.
So what am I, a developer trying to distribute my command-line application to less-technical users, supposed to do? Distribute two entirely different packages, "myapp-locked" and "myapp"? Tell people to install from a copy+pasted "requirements.txt" file? I've started distributing docker containers that have the application installed via the requirements.txt method, which is fucking stupid but at least the users of that complain less about versioning issues... until the day someone yanks a package I guess.
### Checklist
- [ ] I've read [CONTRIBUTING.md](https://github.com/xmonad/xmonad/blob/master/CONTRIBUTING.md)
- [ ] I tested my configuration with [xmonad-testing](https://github.com/xmonad/xmonad-testing)
I think it is brilliant idea, immediately checked latest git versions, I assume you may add - [ ] I tested my application with [latest stable requirements.txt](...)
And something about triangulation and reporting to another repo too.Sorry to hear about breaks on major version. Ruby gems (libraries) freeze dependencies on major, sometimes minor, example [0]. But applications shipped with Gemfile and Gemfile.lock [1], [2]. So `bundle install` is reproducible [3]:
> The presence of a `Gemfile.lock` in a gem's repository ensures that a fresh checkout of the repository uses the exact same set of dependencies every time. We believe this makes repositories more friendly towards new and existing contributors. Ideally, anyone should be able to clone the repo, run `bundle install`, and have passing tests. If you don't check in your `Gemfile.lock`, new contributors can get different versions of your dependencies, and run into failing tests that they don't know how to fix.
Yes, docker, msi, Flatpack, AppImage - whatever works for you and your users. It is sad we can't easily statically compile in one file on scripting languages.
[0] https://github.com/teamcapybara/capybara/blob/master/capybar...
[1] https://github.com/Shopify/example-ruby-app/blob/master/Gemf...
[2] https://github.com/Shopify/example-ruby-app/blob/master/Gemf...
This gets much much worse when a new version of Python comes out and we don't support it yet (because of the build system issues I mentioned). I spent several weeks teaching people how to uninstall 3.8 and install 3.7 before we finally got a functioning package out for 3.8.
https://firefox-source-docs.mozilla.org/setup/windows_build....
Although I don't handle user support for Python packages, if I did, that would be my go-to approach.
For the second, pip-compile computes the new requirements.txt (which is effectively the lockfile) from scratch, and pip-sync (not shown in that Makefile fragment) removes packages that are no longer listed there.
This page covers some of the differences between v1 and v2 of the engine, and particularly its impact on Python: https://pants.readme.io/docs/pants-v1-vs-v2 ... We're using Rust and haven't bootstrapped yet, so we also appreciate Cargo and think that there is a lot to learn there.
We'd love feedback (via any of these channels: https://pants.readme.io/docs/community) on how to make it even better. Thanks!
I’m going to dig into that 2.0 link though!
cargo is great because it manages the build flow, it's extensible (clippy) - the global cache thing is a bit harder, because of rust package features (and other knobs like RUSTFLAGS), and it's not done by default, but it's as easy as setting RUST_TARGET_DIR as far as I know.
Or is there any even greater hidden challenge from using setup.py?
I've used it like this - https://github.com/JaDogg/pydoro/blob/b1b3de38ac15b9254ef1be...
First of all, Python was created around 1989 yet Python 1.0 was released in 1994. Secondly, Python was a pretty obscure language until Python 2.0 (and even long after that...), released in 2000. So realistically, Python had "only" about 15 years of historical baggage :-)
Also, cargo can be ignored because it's "new", but there was a lot of prior art in the area of good programming language specific package managers. CPAN (Perl) was launched in 1993. Maven (Java) was launched in 2004.
Python just botched its package management story, that's it. Sometimes stuff happens just because it happens, there's no good excuse for how things are. Sad, but true.
Rust's 1.0 was 2015, which is indeed "20 years after Python was" at 1.0, so how is gen220's comment a rewrite?
I started using Python around 1.3, and advocating for its broader use (instead of Perl) by 1997. In 1998 I had a job using Python full-time. It was made easier because tools like SWIG already supported Python. Here's a talk I gave in 1999 - https://www.daylight.com/meetings/mug99/Dalke/index.html - and a writeup I did for Dr. Dobb's - https://www.drdobbs.com/cpp/making-c-extensions-more-pythoni... .
In 2000 I helped a company with the minor work to port their 1.5 code base to 2.x.
So I certainly didn't see it as obscure in the 1.x days.
But sure, I'm part of that environment so have a different view on things. If I use your definition, I'll argue that Rust is still "a pretty obscure language".
Rust is a lot better known because of internet fame, which didn't really exist to this magnitude until well after the dotcom crash.
My point is, Python's popularity took off in recent years, basically the last 10 years: https://insights.stackoverflow.com/trends?tags=python%2Cjava, primarily due to data science, machine learning, science in general.
Around until 2005 at least, it was known as a friendly scripting language with a few web frameworks which were not that popular (Django was first released in 2005), as the language that was starting to be adopted by distributions for scripting tools (the first Ubuntu version was launched in 2004 and was one of the first distros to use it extensively). It wasn't really present for development work in most cases, DevOps was the domain of bash/Perl (for older stuff) or Ruby (for newer stuff).
People tend to forget how obscure Python was before 2000, compared to the mainstream language it is today. And I say that as someone who likes Python ;-)
Then there's the issue of Anaconda using it's own package repos, so even if you do manage to figure out what packages an Anaconda developed piece of software needs, you're getting a subtlety or maybe not so subtlety different version of it using standard pip, which creates the worst kind of hard to trace bugs.
Lastly, certain installations of Anaconda overwrite the system python version with it's own (so you can just use numpy or whatever anywhere) causing a huge headache with other system software and making using the standard distribution tools even harder.
I get that it's convenient for scientists that just want to write scripts and have them work, but if you're creating any kind of collaborative software, especially if you'll be working with SW engineers down the line, avoid Conda at all costs.
How is that any different than using python.org python? You'd still be unaware of what versions to use.
> you're getting a subtlety or maybe not so subtlety different version of it using standard pip, which creates the worst kind of hard to trace bugs.
That's way more of a problem with pip. You have no idea what versions a pip package is pulling in until install and then what binary actually gets installed depends on your compilers.
> certain installations of Anaconda overwrite the system python version with it's own (so you can just use numpy or whatever anywhere) causing a huge headache with other system software and making using the standard distribution tools even harder.
That's impossible unless one is actually copying binaries manually overtop of system binaries. You'd have to be root or use sudo to overwrite the system python manually. The whole point of isolation is to keep system python isolated and stable for system stability. That can happen if someone installs python from python.org and copies it into place.
> but if you're creating any kind of collaborative software, especially if you'll be working with SW engineers down the line, avoid Conda at all costs.
If you are working with SW engineers, you better know what versions you are pulling in, because you are going to be in serious pain using pip and trying to understand the provenance of your packages. Conda is way more powerful here for serious engineers to specify exact versions and reproducible and exact builds.
Because python.org doesn't ship with numpy, matplotlib, or any of those other packages. Anaconda does, which makes it possible to import those libraries in projects without explicitly listing them as dependencies.
> That's way more of a problem with pip. You have no idea what versions a pip package is pulling in until install and then what binary actually gets installed depends on your compilers.
What? The problem here is that conda has it's own repos, which contains different packages than are contained in PyPi. What exactly do you mean by "no idea what versions a pip package is pulling". You realize you can set versions, right? numpy==1.13.2. The problem is numpy 1.13 on Anaconda can be different than numpy 1.13 on PyPi.
> That's impossible unless one is actually copying binaries manually overtop of system binaries. You'd have to be root or use sudo to overwrite the system python manually. The whole point of isolation is to keep system python isolated and stable for system stability. That can happen if someone installs python from python.org and copies it into place.
This is just wrong. Anaconda overwrites the system python by messing with the user's $PATH regardless if you are in a conda environment or not (probably easy to disable this "feature" but I've seen a lot of people with this setup). This causes major headaches.
> If you are working with SW engineers, you better know what versions you are pulling in, because you are going to be in serious pain using pip and trying to understand the provenance of your packages. Conda is way more powerful here for serious engineers to specify exact versions and reproducible and exact builds.
I'm not sure why you think you can't specify exact versions with pip. Projects like pipfile take it even further. The issue with conda is it's different package repos, not the ability to lock package versions.
This file contains a few good examples of `@rule`s that collectively partition python targets to generate `setup.py` files for them automatically: https://github.com/pantsbuild/pants/blob/5e4f123a1dbc47313fe...
I think your main problems are very naive users of conda. If you bring years of experience using pip, but use conda thoughtlessly, I can see your point.
If you don't want packages included, just use miniconda and install the ones you like. You could just create a new empty environment: `conda create -n py36 python=3.6`
Either way, it's completely reproducible.
When not using wheels, pip can be pulling in various versions of dependencies. Conda makes it easy to see all of them before they are dumped into your environment.
> Anaconda overwrites the system python by messing with the user's $PATH regardless
I understand what you are saying now. It's covering up system python in the PATH, but it isn't overwritten. Using `type python` (or which python will be correct 99% of the time).
> The issue with conda is it's different package repos, not the ability to lock package versions.
I thought this was your major argument. "Collaboration is difficult" when in fact it is much, much easier. You are getting the same binary everytime without slight differences in how it ends up compiled on the user's system.
To illustrate the point with an example, you could invent cargo for python yesterday or in 2005, but it wouldn't have solved the problem, because you would still have decades-worth of third-party libraries that wouldn't comply to py-cargo's packaging requirements.
In contexts like these, it's the package manager with the fewest hard-asks (i.e. pip, or npm for node) that wins.
Go, for example, endured major controversies over migrating away from GOPATH-managed-with-third-party-dep-managers to go modules. Even though `go mod` would have been the best solution to start with from scratch, inertia and breaking changes are a real thing.
Rust is a pretty obscure language now in pretty much the same way that Python was an obscure language then.
Of course the world of programmers was smaller in the 1990s. But if your baseline is the entire world, then probably every programming language outside of Basic, C/C++, and Pascal was obscure in the 1990s. Just like Rust is now.
It feels very much like you have shifted baselines to determine what "obscure" means.
From my view, Python's popularity took off around 2000. That's when I no longer had to tell people what Python was, and when people in my field (cheminformatics) started shifting new code development from Perl to Python. It's also about when I co-founded the Biopython project for bioinformatics. And SWIG in the mid-1990s included Python support because Python was being used to steer supercomputing calculations at LANL.
So your statement that Python's popularity and use in science in general started only in 2010 sounds like revisionism which distorts the actual history with an artificial baseline.
You wrote "with a few web frameworks which were not that popular".
Ummm.... what? Zope was quite popular. The 2001 Python conference had its own Zope track, and the 2002 conferences felt like it was 50% Zope programmers.
Quoting its Wikipedia entry, "Zope has been called a Python killer app, an application that helped put Python in the spotlight". One of the citations is from 2000, at https://web.archive.org/web/20000302033606/http://www.byte.c... , with "there's no killer app that leads people to Perl in the same way that Zope leads people to Python."