Underappreciated challenges with Python packaging(pypackaging-native.github.io) |
Underappreciated challenges with Python packaging(pypackaging-native.github.io) |
From a quality of life issue -- having the sdist install behind an opt-in flag by default for our package would be great. Unless you're a developer with a lot of -dev packages for imaging libraries already on your system, you're not going to be able to build from source. And even if the error that pops up is along the lines of
The headers or library files could not be found for {str(err)},
a required dependency when compiling Pillow from source.
Please see the install instructions at:
https://pillow.readthedocs.io/en/latest/installation.html
We still get people posting issues about pillow failing to install.Build farms would be nice. We've burned tons of time on it between travis and GH actions and @cgohlke single handedly making all of the windows builds for the entire scientific python community.
Ultimately, something like the debian packaging system is probably the best open source model for this. (though the splitting of the python standard library so that virtual envs aren't in the base install is a pita). Unstable gets a reasonably current set of packages, and crucially all of the underlying library dependencies are compiled together. It's also not _that_ hard to rebuild individual packages from source, in an automated fashion. (This may be what Conda is doing, but I've never looked in detail at their system)
I just have a single python file, meant to be treated as an executable (no package at present). There are a whole bunch of tests, but that's obviously separate. Any suggestions on modern best practices welcome!
With that, you entire package build (for all distribution types) should be reducible to `python -m build`. Here's an example of a full project doing everything with just `pyproject.toml`[3] (FD: my project).
[1]: https://github.com/pypa/flit
Or you can use setuptools, which is the package that enables old setup.py builds, as the backend with pyproject.toml. This has the advantage of being mature, unlikely to be abandoned, and possibly some familiarity if you've used it before. Even then, you can use build as the front end build tool.
There's also annoyances like the inability to install a script in the search path without implementing it as a module. Something setup.py doesn't require.
If you want a stand alone executable, I haven't found a good, single, cross platform tool for that yet... seems like there is a separate tool for each platform.
PyInstaller is cross platform, and arguably good.
from setuptools import setup
setup(
name = 'foobar',
scripts = ['foo'], # install a script from current fldr
# ...
)
A few years ago I had to start using twine to register and upload it to pypi.Better would be a basic pyproject.toml file along the lines of the following:
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[project]
name = "foobar"
version = "0.0.1"
dependencies = [
"...",
]
[project.scripts]
foo = "foobar:main"
See: https://setuptools.pypa.io/en/latest/userguide/quickstart.ht...For step 2 you can use nuitka or similar, but if your audience is somewhat developer-oriented, you can also propose for them to use pipx: https://github.com/pypa/pipx.
e.g. https://github.com/tpapastylianou/self-contained-runnable-py...
Shoutout to Material for MkDocs enabling the swanky theme and Markdown extensions. https://squidfunk.github.io/mkdocs-material/
The main problem with python packaging is that it's often C/C++ packaging in disguise, among multiple OSes and CPU architectures, and that's far from being solved. Building such python wheel is essentially like building a "portable" (aka one you don't need to properly install into the system) linux/windows/macos application. That comes with a variety of caveats and requires some specialized knowledge one wouldn't pick up playing around with just python alone.
I've been out of the loop for a long time, and would like to get an update on how things are in Python in 2023, but I'm not sure if there even is a consensus — what I can find by googling seems to be several kinda competing approaches. This seems surprising, because most "modern" languages seem to have a very well defined set of practices to deal with all of that stuff. Some languages already come with their built-in stuff (Go, Rust), others simply have well-known solutions (like, technically there still exist PEAR and PECL for PHP, but everyone just knows how to use composer, which solves both packaging and dependency-management problems, and it's also pretty clear what problems it doesn't solve).
For Python there seems to be like a dozen of tools and I'm not sure which are outdated and not used by anyone, which are useless fancy wrappers (not used by anyone) and what is the actual go-to tool (if there is any) for all common cases. Dependency-management, version locking, shipping an executable, environment separation for some local scripts, should I even ever use pip install globally, etc.
Sdist is only one letter away from sadist.
https://nixos.org/manual/nixpkgs/stable/#python
Particularly section 17.27.1.2. Developing with Python. Combine that with a generic guide on using Flakes and that should get you started
https://packaging.python.org/en/latest/tutorials/packaging-p...
and the longer story is that this method has the flexibility to allow other implementations of packaging tools to be used, and so it fosters choice and competition in the ecosystem. In contrast, the old method of packaging was tied to a particular implementation.
- package with a `pyproject.toml` file configured to use modern tooling
- package with a `pyproject.toml` file configured to use traditional `setup.py` tooling
- package with traditional `setup.py` tooling
- package with poetry
- package with whatever, deploy with nuikta or pipx
- skip the packaging and deploy with Pyinstaller
- skip the packaging and deploy with nikta
Note that, unless the Python world has radically changed while I was looking away, the packaging does not ensure a simple way to deploy the package and its single script. I remember vividly `pipenv` crashing on me, so switching to venv+pip (or was it virtualenv+pip?) then setting up a bash wrapper to call the Python script with the right venv...
As to the question of whether you should ever use pip to install packages globally, the answer is almost always no. For command line tools, the best option IMO is pipx. The second best option is pip install --user.
If you're developing a library or application, you should always isolate it in a virtualenv, which is something poetry will handle for you when you run `poetry install`.
The problems it solves are very complex if one looks a little below the surface. It is solving different problems to the ecosystems that it's often compared to: golang, rust, java, js.
As a python dev with experience since 2.6 I agree it has gotten better, but it is also rotten at it's core. The problems python has to solve there are hard, but this is why it should be priority number one to solve them elegantly and in a fashion so that 99% of python users never have to worry about them.
Right now packaging and dependency managment are my number one concern when I think about rexommending the language to beginners. The things I needed to learn just to figure out what is a good way of developing something with dependencies and deploying it on another machine is just way too much. When I was going through this there was no official "this is how it is done" guide.
Now there is poetry for example. Poetry is good. But there are still hard edges when you need to deploy to a system without poetry on it for example.
Edit: I've just read another comment which I think pointed out the most pertinent fact - that python has often served as mere glue code for stuff written in lower level languages. This then results in an explosion of complexity as a python package not only has to package the python part, but has to be able to build and package the lower level stuff.
The one thing I understand is npm installs everything locally by default (unless you -g), and in Python it's hard to stay local even if you use a venv.
The TL;DR for Python packaging in 2022 is that, unless you're building CPython extensions, you can do everything through pyproject.toml with PyPA-maintained tooling.
However the scripts=[], keyword is the key to the case above.
I never managed to get data files packaged into a dist elegantly... annoys me to this day.
I wouldn't have considered "how to get BLAS onto a system" to be a "Python packaging" issue, but for people who want to rely on it via scipy/numpy/whatever, it is.
You can also use `hatch new --init` to convert a `setup.py` (whether imperative or setup.cfg-backed) to a `pyproject.toml`.
PEP 621, which I mentioned, covers the format of `pyproject.toml` in detail. I also linked an example which, to the best of my knowledge, covers all current best practices for that file.
> There's also annoyances like the inability to install a script in the search path without implementing it as a module. Something setup.py doesn't require.
I'm not sure I'm following. A module in Python is just a Python file, so your script is a module. Are you saying that you can't distribute single-module packages with pyproject.toml? Because I don't think that's true.
I meant package. A directory with an __init__.py. You can't install standalone script.py (or a generated wrapper) as /usr/local/bin/script with a pyproject.
Did you look at it[1]?
> I meant package. A directory with an __init__.py. You can't install standalone script.py (or a generated wrapper) as /usr/local/bin/script with a pyproject.
I still don't think I understand what your expectation is here: a `pyproject.toml` is just a metadata specification. The only difference between it and `setup.py` is that the latter is arbitrary code.
There's an old, long deprecated way to use `setup.py`, namely `setup.py install`. But that's been discouraged in favor of `pip install` for years, which behaves precisely the same way with `pyproject.toml`. If you want to install a script into `/usr/local/bin`, `pip install` with a package specified in `pyproject.toml` will work just fine.
In the "we would rather people not use this but it's going to stay around for a long time" sense. I strongly doubt it will disappear within the next decade or two. There's a long tail of setup.py-based tools.
Last I checked pyproject.toml only supports the simplest of Python/C extensions. Anything fancy, like --with/--without compilation flags to enable/disable optional support, compiler-specific flags (in my case, to add OpenMP), compile-time code generation (like using yacc/lex), etc. requires a setup.py and a bunch of hacking.
> The difficulty of interfacing with distutils means that there aren’t many such systems right now, but to give a sense of what we’re thinking about see flit or bento.
Bento is dead. Flit doesn't handle extensions, and points instead to Enscons, which in turn depends on SCons - a build system I have no experience with.
Plus, I sell a source code license. My customers make wheels for their internal PyPI mirrors. I would need to consider how any change might affect them, without the experience to make that judgement.
It seems far easier for me to stay with setup.py than explore a PEP 517 alternative.
So far what I've seen is either people using something like Enscons, or a very complex build system like SciPy's where setuptools just doesn't work. I haven't seen much migration for smaller setup.py systems like mine .. but I also haven't been tracking that well enough.
Any pointers for how that would work?
Also this is a few lines of well understood Python, not exactly a huge investment, right? Does several lines even need to be future proof?
My bet is you’ll need to modify the toml solution more often than the setup.py in the next decade.
(I'll note that maturity is not strong evidence here: distutils is very mature, but is in the process of being deprecated and removed from Python entirely. I don't think that's likely to happen to setuptools, but the fact that behavioral PEPs now exist for all of these tools means that the decline/abandonment of any poses much less of an ecosystem risk.)
And I should've been more upfront about the real reason for suggesting setuptools: there seem to be a number of build tools that support pyproject.toml, including flit, poetry and setuptools (and I'm sure I've seen at least one other). For me, at least, when I was making a small library recently, it was an overwhelming choice for a part of my project that feels like just admin rather than core business logic. I came close to giving up and just using setup.py with `setup()`. At least setuptools with pyproject.toml is a choice that feels safe; it may not be the best, but it will certainly be good enough that I'm unlikely to regret it later, so I didn't need to spend a lot of time looking at the detailed pros and cons of all the choices.
The benefits of this is now your project has metadata that tools like pip/poetry/etc can use to figure out what is required (Python project wise) to build your project. For example pip will create an isolated venv with setuptools and Cython for the project I listed when installing from the sdist. You can now also take advantage of `python -m build` to build this project rather than a setuptools specific incantation. This is universal across all build providers so if you want to change to poetry in the future you can will hopefully no build script changes.
My "setuptools specific incantation" is "pip install" or "pip install -e". I do have a setup.cfg.
The recommendation last year was "If you're building an application, use Poetry. If you're building a library, use Flit", and since my package is a library, I've never really considered poetry.
But! I'm switching from argparse to click - my first required dependency! - so within a month or so I'll be putting my toes into the pyproject.toml waters.
Thank you for your pointers. Isn't there also a way to specify the requirements for building the documentation? I didn't see it in your example.
Unlike Node, Python is essentially older than modern package management. When Python developers first decided to tackle distributing their code, `apt-get` did not yet exist.
Early approaches which stuck around way too long let any package do more or less anything at install time, and didn't bother with static metadata (can't figure out what your deps are except by attempting an install!). Subsequent solutions have struggled to build consensus in an already vast and mature ecosystem. Backwards compatibility means compatibility with bad conventions.
In general, if a community doesn't agree on how to fix a problem, someone else will provide a solution at a higher layer. Now it's common to install a whole Docker image to run some Python code instead of a few Python deps.
My impression is that none of that resembled contemporary CPAN, and isn't really what I have in mind with the admittedly ambiguous phrase 'modern package management'.
But I'd love to hear more! The history of package management is very interesting to me. Tales of ancient but sophisticated package management systems are very welcome. :)
* Back when I used this, there was just psycopg2 which had the bin included.