Things I've learned about building CLI tools in Python(simonwillison.net) |
Things I've learned about building CLI tools in Python(simonwillison.net) |
I’ve been using Go to create projects, both big and small, since 2013.
Almost every time I attempt to build something even remotely complex with Python, I end up regretting it, especially when other people besides myself start using these programs. The main problem is the lack of assurance that the same program will function correctly on another person’s computer. With Go programs, it’s as simple as having a statically linked binary, and given the ease of cross-compilation, I’m very confident that what works on my machine will work on my coworker's or customer's computer as well.
You know how some people suggest that Shell scripts should not exceed a certain number of lines, because beyond that point, it’s better to create a Python, Ruby, PHP, or similar script? I experience a similar sentiment when working with Python. A few hundred lines may be acceptable, but anything larger than that, I believe, is better suited to be written in a compiled language.
Python has been my goto language for a long time, but lately I've been noticing that I've been holding off on writing new tools with it because on the back of my mind I have this nagging feeling that making them robust and portable will take too much work—and so I don't even bother getting started.
It's this trap of yes you get to ~99% pretty fast, but the last 1% (packaging/distribution) then take forever.
But I'm still looking for a good alternative... Golang does the job—no question, but it doesn't spark joy for me.
But as a shell script grows, python starts winning.
By the time you get to 1000 lines of python, you are probably doing a lot of heavy lifting and it is probably non-trivial to change languages.
Of course, there are languages like Ruby and Perl that would cover both bases pretty well, but I'm not willing to introduce a third scripting language to most teams and projects I work on. Not to mention that those languages have their own issues.
As someone that has experience with static binaries since 1990, way before dynamic loading was a common option in modern computing, yeah it works on the other computer, provided the distribution is exactly the same, and all required files and network configurations are exactly the same.
For fun I just googled a template and tried: https://github.com/radix-ai/poetry-cookiecutter
And the result is quite good.
Your comment assumes that python cli scripts need to be single liners, but IIRC there are several tools that allow you to bundle a package into a single file like pex, shiv, and zipapp.
But such is the reality of Python world. Every third-party library or tool you use is defective in some major and plenty of minor ways. And you have to be prepared to undo, fix, reimplement whatever you get, and be very, very selective about the tools and libraries you choose to live with.
But I have been looking around for a while for something that's more certain than `pip`, and unfortunately everything I've found (like Bazel or Buck) suffers from having to do a lot of futzing to use dependencies.
Pip doesn't really know how to install programs. Pyproject.toml is completely irrelevant to the problem. What pip can do is install (generated) files from the scripts section of the Wheel it's installing into the directory for executables known to your Python environment. In most cases this directory will not be on system path, and even if it is, you are better of not using this functionality, instead you'd need to rely on tools from your system packaging to install files there, so that the system packaging tools can track them, deal with conflicts caused by upgrades / downgrades, remove them, audit them etc.
> virtualenv
Whoa, this fossil is still alive somewhere? I think, you probably meant venv. virtualenv is a throwback to the Python 2 era. Not that its bad because of that, but you should probably warn your readers about this detail.
> pip vs Bazel or Buck
Are you sure you understand what these tools are supposed to do? pip installs Python packages. Bazel and Buck build (mostly Java) packages. The analogue in Python world to Bazel and Buck would be SCons, maybe setuptools.
In other words, pip doesn't know how to build Python packages. Sometimes it wants to build them (which is bad, and you should never do that), but it never does it on its own -- it uses other tools to do that, and the tools could be anything, setuptools, CMake, MSVC, rustc... whatever the authors of that particular library chose to use to build it. In particular, pip could, in principle, call Bazel to build a package (would be a weird twist, but not impossible).
On the other hand, tools like Bazel or Buck would usually use something else to install packages, if those are needed during build, eg. Maven.
Further, if I am building using Python 3.11 features and you are stuck on Python 3.10 then you cannot install my Python CLI tool.
Python, IMO, has no niche anymore. A few hundred lines of Python is a hundred lines of Zsh, or the same few hundred lines of C++, and to top it off, there's the shit show of Python tooling for deployment. setup.py, requirements.txt, pyproject.toml… Fifteen files with overlapping contents in twelve different grammars (mild exaggeration), with new ones added every other year. Setuptools can't find your entrypoint…
I really hope they succeed.
[0]: https://vlang.io/
You know the tooling is bad and in the long term it will hurt, but the standard library and third party packages are just phenomenally productive and that’s a huge draw.
But the Python ecosystem seems to be such a disappointing mess that I just gave up on the whole idea. I'm learning JavaScript/TypeScript now and you can build CLI programs with Deno.
The part of the ecosystem that belongs to Node/Deno branch of the family tree also tends to promote bad practices (while insisting they're good practices), and that's before you get to the part where the runtimes themselves implement quirky/non-standard dialects and APIs. It's not a community that's known for being especially rational or having high standards for intellectual honesty.
If you really want to write stuff that will on most people's computers, target the World Wide Wruntime—write standard JS that the browser won't choke on. You can do it in a way that people are allowed to run it from the command-line if they want but doing so is optional. Here's a 7-part tutorial that explains how: <https://triplescripts.org/example/>
https://www.npmjs.com/package/pkg
Unfortunately you also need to bundle all your code into a single file for it to work, but you can use any bundler (webpack, parcel, etc) you want at least
I mean for the longest time the AWS CLI used the python/pip installed in your own machine and it probably caused thousands of man-hours of wasted time.
It's possible, just not the norm.
It would clone repositories (microservices) and configure LXC containers.
The fact that the barrier for creation is so low means I'm even willing to do them to solve very niche problems in generalizable ways. [1] is common enough that a few people have starred it. [2] is niche enough that other Anki folks haven't used it AFAICT. [3] is likely something I'll never personally need again, even though Azure VM reservations not letting you customize your reminders for when they're about to expire is probably a costly mistake for a great many firms.
All started with this same starting methodology, because what I wanted was just a little too fiddly to want to hack together with my shell toolkit.
[1]: https://github.com/hiAndrewQuinn/finstem
By the way, argparse (and I assume click too) by default allows having positional arguments and switches in any order, i.e., both:
mycli pospara0 --switch --option A
mycli --switch --option A pospara0
work. This seems like nothing but I've encountered many CLI utilities written in other languages (particularly, go and node.js) that force you to have switches at the beginning. and I really hate that.I don't know if it's caused by their corresponding default/popular CLI library or what, someone could enlighten me.
(Of course, in some cases like things like FFMPEG, the order absolutely matters; but it's not the case for 99% of utilities.)
I pretty much use argparse for making all my CLI tools, but I dont know of an easy way of doing this single character flag thing. Is it possible/easy with argparse?
>>> import argparse
>>> p = argparse.ArgumentParser()
>>> p.add_argument("--foo", "-f", action="store_true")
>>> p.add_argument("--bar", "-b")
>>> p.parse_args(["-fb", "baz"])
Namespace(foo=True, bar='baz')you can do short (one character) or long arguments with argparse directly:
parser = argparse.ArgumentParser(argument_default=None)
parser.add_argument('-d', '--debug', action='store_true', help='debug flag')
I also do lots of other things, like long help with no args like this: if len(sys.argv) == 1:
parser.print_help(sys.stderr)
sys.exit(1)I’ve only been programming for about 5 years in earnest. I held on to Python for dear life in the first days of my career, but have since transitioned to full-time C/C++ development, primarily in embedded and hardware interfacing applications. I feel like my large programs are much more manageable and maintainable now. Some of this is of course due to having grown as a programmer as well.
It seems like you get a lot of the benefit of static typing if you adopt it as a self-imposed constraint?
https://breadcrumbscollector.tech/mypy-how-to-use-it-in-my-p...
It's a neat way to make powerful CLIs more accessible to less-technical users.
That makes the tool nicer to use. To me a CLI tool should stand alone ideally. Obviously that is not the trend as many things that are CLI are installed via node or npm.
I guess docker could solve most of the issues here
clap is a much better developer experience (IMO) and you end up with performant (no terrible cold starts) and strongly-typed code (where possible) without having to deal with building and distributing a Python CLI.
I will never forget falling in love with Python when I first started learning to program, but experiencing internal CLIs written in Python at scale is an experience I would encourage everyone to avoid unless UX and maintenance aren’t concerns.
How does HN provide tab-completion for CLI commands?
For those who don't know. Python has argsparse package that ships with every Python distribution. It's much better in terms of organizing command-line arguments, easier to debug, easier to extend (which is very rarely necessary).
Click is a third-party dependency. It's not solving any real problems. It's not like argsparse had a problem and Click came to solve those. It's just that author had too much spare time on their hands and decided to learn how to do something new. The author made some rooky mistakes along the way. He totally misunderstood how locales and encodings work and for a while Click was a source of errors related to that. Maybe still is, but fewer packages are using it? -- I don't know.
If anyone chooses to use Click over argsparse, it only means lack of research. Following fads w/o any sort of independent thinking. Not someone I'd encourage to take advice from.
Btw, mitsuhiko created Flask, simonw created Django. Total rookies, I know.
Click is not an interface or an improvement on argsparse. It duplicates its core functionality. When compared to argsparse it offers no tangible benefits and lots of downsides. While "improvements" like the mentioned progress bar are worth very little. They are both poorly implemented, so, if you wanted a real thing you'd have to do it differently, and unwanted for the most part. It's a very small niche where you want something half-baked, and you already agreed to install third-party dependencies, but you won't go all the way to use, eg. Prompt Toolkit.
There's nothing commendable about Flask or Django. Both projects are hilariously bad. They are popular because of what they do, not because of how they do it. Web in general is one of those places nobody should go look for quality, but a crossbreed of Python and Web brings the worst of both worlds.
Same here. Why I am force to remember or look up which order arguments should go in? There is no reason for that and they should be able to go in any order.
None whatsoever. Argsparse is better all around. Click is just a worthless piece of software that nobody should be using.
As for the order of options / arguments. I think, the reason is the historical implementations and use of getopt that would be used in a switch inside a loop, which (maybe unintentionally) made the order irrelevant. It's likely that other libraries implement parsers in the way that is sensitive to the order. Whether that's deliberate it's hard to tell. There are definitely advantages to this approach too, but it's hard to know whether authors sought out those advantages deliberately.
For instance, when options can take arguments (especially when they can take multiple arguments) they can be confused with sub-commands or the arguments to commands. Imposing ordering restrictions helps to resolve ambiguities as to what argument is being processed. On the other hand, you may claim that not imposing ordering on arguments prevents CLI authors from creating confusing interfaces where users can accidentally mix arguments to options with sub-commands or arguments to commands.
Python can be relatively easily embedded in a C program and its source code can be compiled to C. The problems come from Python modules that are built to use shared libraries. It's not impossible to solve, but it means that you'd have to find the source code for those modules and recompile them to link statically with those libraries. This could be quite an undertaking, and is probably not worth it, unless you want to learn more about build systems and build tools in general.
Finally, in some cases it's impossible due to the licensing. I.e. you may have a Python module that relies on a shared library with license that prohibits redistribution. In that case it's not a technical, but a legal problem. This, however, isn't unique to Python, and you'd face similar issues no matter the language you chose to use.
Re' Docker: in most cases this is not a solution to making command-line interfaces. I actually struggle to think in what case it is. You'd have to write a program with the command-line interface and then put it in the image for Docker to create a container from (which will usually make it very inconvenient to use due to the Docker containers by default running such programs in separate filesystem, user and network interfaces.) This would make things like user identity, user's data and, well, obviously, network hard to access for the program while gaining you noting of substance.
If you interested in a language that's compiled, fast, but as easy and pleasant as Python - I'd recommend you take a look at Nim: https://nim-lang.org.
Nim has cligen library to generate and parse arguments: https://github.com/c-blake/cligen
And to prove what Nim's capable of - here's a cool repo with 100+ cli apps cligen author wrote in Nim: https://github.com/c-blake/bu
Last I knew, the argparse backing most Py CLI solutions did not support such easier (for many) to read help text, but the PyUniverse is too vast to be sure without much related work searching.
I too would like it for things to just magically be good (I think pyinstaller seems kinda close but I dislike how it works based on scanning your code. PyOxidizer is another), but was just mentioning that pip is an alright distribution tool for a part of the population (one that uses Python)
Certainly not ideal, but I find it's unusual for a tool to require a super-new version. Plus, `pyenv` makes it easy enough to install multiple versions in parallel and run commands under specific versions.
Sure, it would be better if this whole process wasn't so complicated, but I find it's pretty workable overall.
Deno has a way to package up the necessary JS runtime and make a self-contained executable. I'm sure it's bloated as hell, but again I don't want to require a browser.
Do you have any examples of said "bad practices" and non-standard dialects? I'm building a server with Deno right now to provide a REST-style API for a mobile app (nothing fancier than CRUD and some push notifications). The contenders for me were PHP 8 and Deno. Since I wanted to learn JavaScript anyway, I went with Deno. So far I've had a decent experience.
There's an inexhaustible list. But here are some:
- `require`, `module.exports`, and `.mjs`
- `Buffer`
- Abusing arrow functions and generally going out of one's way to reimplement `this`, poorly
- Closures everywhere (and near zero regard for runtime consequences, i.e. perf incl. memory usage, or legibility of code)
- Abusing `===` (i.e. using it everywhere and yelling at you if you don't—even going so far as to write codestyle bots and other tooling that forces you to change occurrences of `==` to `===` e.g. to get the build to succeed); lines where `===` is used instead of `==` should ideally make up something like less than one half of 1% of your code (generous), if it ever occurs at all
- A whole slew of "My First Experience with Polymorphism and Types™" antipatterns that are unwisely encouraged like `function foo(x) { if (typeof(x) == "string") /* ... */ }` and naive use (i.e. misuse) of `instanceof`, plus a bunch of packages like is-uint8array and/or basically the entirety of the (non-standard) utils.types namespace
Let's see, what else... yes, I don't see the point of arrow functions. And the reliance on RTTI is just straight-up bad programming in any language.
So what would you choose to write a server in? I'm writing a fairly straightforward server to present a REST-stye API and access a database for a mobile app. I'm doing all this alone, so presumably I'm going to have to rely on at least a few frameworks for "routing" and serialization because I don't think I have time to roll my own.
Does not compute.
I've started my familiarity with Python after Peter Norvig promised that Python can be an OK substitute to Common Lisp. That promise turned out to be a bold-faced lie, but learning some Python made me more employable, so, I'm not complaining. I've made my first steps using Python when Twisted was popular, there were "old-style" and "new-style" classes and you could raise whatever you wanted (not necessary an exception), setup.py files were written in such a way as to use distutil if setuptools wasn't installed.
I've also contributed to CPython (reported bugs mostly). Wrote a bunch of C, some C++, Rust and Go code that produces Python modules as well as contributed to pip, setuptools, conda... Again, mostly bug reports or small patches for specific bugs, but still.
At day job, my role is in infrastructure, which is mostly written in Python, so, I deal with stuff like Linux kernel to userspace interface, various system utilities, or cloud-related stuff, mostly OpenStack. Another aspect I'm involved with at day job is CI and packaging. Perhaps the utility I wrote that's seen the most use is one that deals with combining multiple wheels into a single wheel to speed up deployment. It's not sophisticated, but turned out to be very useful. Another popular utility is used to dismantle Linux storage so that it can be re-defined and re-assembled. What it does is it traverses /sys/block looking for various devices and connections between them, finds the right order in which these devices need to be stopped / removed / disassembled and does that. Again, this isn't very exciting, but turned out to be useful.
What do you do?
I find it solves the installation problem really well: each installed program gets its own virtual environment, but the single binary is still added to your path.
I would never want that. This is the exact opposite of what an installer should do. The whole point of using Python is to rely on the previously installed stuff: both the runtime and other libraries. If I'm making a Python program, I package it as a DEB or RPM (for work, this is what we support). I'm not saying these are great tools or are pleasant to work with, but I find the end result to be acceptable.
Similarly, I guess, I'd make an MSI or w/e is the modern way to install on MS Windows, if I ever have to. I don't know what's the equivalent is on other systems.
I wonder if pipx and venv could grow functionality based around symlinks that allowed installations using the exact same package version to avoid having two copies of the files?
A compression based file system hack might provide a better result though, since it could optimize differences between two dependency versions with only minor changes.
I honestly don't know why anyone would use that... as in what does Bazel do better than virtually anything else that can provide this functionality. But, I used to be an ops engineer in a big company which wanted everything to be Maven, regardless of whether it does it well or not. So we built and deployed with Maven a lot of weird and unrelated stuff.
Not impossible, but not anything I'd advise anyone to do on their free time.
Specifically wrt' the link you posted, if you look here: https://github.com/bazelbuild/rules_python/blob/main/python/... it says that only pure Python wheels are supported, but that's also a lie, they don't support half of the functionality of pure Python wheels.
So, definitely not worth using, since lots of functionality is simply not there.
1. I think it's the wrong goal.
2. It's easier to achieve that in ways that don't require bloat on my system.
It's much nicer for users if the library interfaces with its dependencies respecting the versioning rules: doesn't use undocumented, unreleased stuff, doesn't rely on undocumented side-effects etc. Only relies on what's tested and released. In this way, there's no need to be very selective about versions you have.
Unfortunately, this is not how the world is. The reality is that especially in popular environments like Python, you get a crapload of very low-quality libraries, with poorly defined dependencies, with people who don't understand the infra side of things and have convoluted dependency requirements. But, I usually try to fight back. If I absolutely have to have a library with convoluted requirements, I fork it and fix the nonsense. Or vendor it.
Another unfortunate quite ironic side-effect of this situation is that people are so dedicated to specifying nonsense requirements (eg. including patch in requirement specification even though Python doesn't even have a patching mechanism) is that popular programs used to install libraries are optimized for these absurdly specific requirements. I.e. it's faster to install requirements with pip or conda if you give them the exact list, preventing the solver from putting it much work. This puts people who want to make good libraries at a disadvantage because their libraries will take forever to install.
Which leads me to the following conclusion: if I want super-precise requirements, I don't need pip or conda. I can simply curl -o the packages I need, and it will be much faster and a lot more reliable.
> without risk of upgrading a library in a way that breaks something else.
I'm not afraid. I usually know what I'm installing. If it breaks, I'll fix it. I actually want to know when and why it breaks, so this is also an anti-feature for me.
> A compression based file system
Do you mean deduplication / CoW filesystem? Not sure why is this a hack. Compression in filesystems typically compresses individual blocks. It won't help you if different files have the same contents, the theoretical benefit comes from entropy within a file, not because some files share contents.