How we rolled out one of the largest Python 3 migrations(blogs.dropbox.com) |
How we rolled out one of the largest Python 3 migrations(blogs.dropbox.com) |
I wish the authors of more Python tools would deploy standalone applications. I do not like having to maintain various sets of Python installers/package managers (because every Python tool seems to use a different installer). Especially on cloud servers that often lack a whole set of dependencies that Python developers just seem to take for granted.
I’m not a Python developer. I don’t have the time or the inclination to repackage various tools and untangle dependencies.
Given the choice between trying to figure out how to get multiple Python tools to behave together, or using another tool, I’ll almost always choose an alternative.
You install all 3rd party dependencies into some directory. The command line entry point is then a simple BASH script which sets the PYTHONPATH to the appropriate installation location and then does the appropriate exec call.
You then have a functionally portable python installation.
Really?
I think as soon as you start writing all caps trivial has gone right out the window.
Trivial would be application dependencies managed by the system package manager.
I'm at the point where I won't touch a python app or library that can't be installed via Pacman. It's just not worth my time.
And... python is still the only thing on arch that gives me the shits everytime it upgrades.
I find this project: https://github.com/Nuitka/Nuitka very interesting, but its written and maintained only by a single person and I never got it to work with any of my apps.
It can internally use an interpreter, JIT, bytecode or full native code like C. That's a different discussion. Just don't make it a pain to distribute.
More often than not, a typical developer's python environment tends towards https://xkcd.com/1987/
This is good for me since I deployed a Python-dependent app under the assumption that the system Python would be stable and reliable. It allows relatively complex things to be achieved with a tiny download package.
I’ve been prepared to adopt Python 3 for awhile but it just isn’t necessary when using system defaults.
>>> On the surface, the application would more closely resemble what the platform expects, while behind various libraries, teams would have more flexibility to use their choice of programming language or tooling.
I'm always fascinated by how the implementation of the core principles of an application is dictated by factors alien to it, such as OS, company organisation, etc. Therefore, the job of coding is often a small part compared to the amounts of trivialities, project management decisions, customer's ideas, corporate policies, etc. Although my soul is a coder's one, I always realize how much coding is just a small part of what I call application development.
I'd say that's a waste (if not abuse) of the system's resources and scheduling system.
We used to build desktop software using pyqt and freeze it. I wonder how that entire toolchain looks like in this new way.
The idea behind embedding is you might have a Python shell in a larger app. But you can also use it to tightly control the execution of the interpreter.
https://blog.pyston.org/2017/01/31/pyston-0-6-1-released-and...
A̵l̵s̵o̵ ̵a̵ ̵s̵m̵a̵l̵l̵ ̵h̵e̵a̵d̵s̵-̵u̵p̵:̵ ̵a̵t̵ ̵t̵h̵e̵ ̵e̵n̵d̵ ̵o̵f̵ ̵t̵h̵e̵ ̵f̵i̵r̵s̵t̵ ̵p̵a̵r̵a̵g̵r̵a̵p̵h̵,̵ ̵"̵v̵e̵n̵e̵r̵a̵b̵l̵e̵"̵ ̵s̵h̵o̵u̵l̵d̵ ̵b̵e̵ ̵"̵v̵u̵l̵n̵e̵r̵a̵b̵l̵e̵"̵ ̵(̵u̵n̵l̵e̵s̵s̵ ̵y̵o̵u̵ ̵m̵e̵a̵n̵ ̵w̵e̵ ̵s̵h̵o̵u̵l̵d̵ ̵l̵o̵o̵k̵ ̵b̵a̵c̵k̵ ̵a̵t̵ ̵t̵h̵e̵ ̵s̵a̵c̵r̵e̵d̵ ̵p̵y̵w̵i̵n̵3̵2̵ ̵l̵i̵b̵r̵a̵r̵y̵ ̵w̵i̵t̵h̵ ̵h̵o̵n̵o̵r̵)̵.̵
There is a reason why it is so much better than iCloud sync, Google Drive, Box or OneDrive.
The fact that you think it's small means they're probably doing something right!
Relevant: https://danluu.com/sounds-easy/
(FWIW I don't use Dropbox myself, but I definitely had people ask me why Google needed 3,000 employees back in the day. Apparently it now has nearly 90K employees.)
>There's also a wide body of research that's found that decreasing latency has a roughly linear effect on revenue over a pretty wide range of latencies for some businesses. Increasing performance also has the benefit of reducing costs.
I wish he cited some of that research, because Google doesn't show much except for this amazon study with the 100ms.
I'm especially interested if there's any research on engineering tools and their latency (long build times) etc., which are chronically under addressed in quite a few large corporations. I'm just wondering if there's some studies that would make the case for me if I were to present this to management.
As other commenters have noted, a lot of their Python use for large scale systems was an artifact of history and available choices at the time, but from my experience during my time there and following as an outside observer since leaving, they seem to make reasonable infrastructure and language decisions for their core product.
The claim that Go is "perfect" doesn't make things better.
The user doesn't need to have python installed at all, and if they have 2.x instead of 3.x or 3.3 when you're expecting features that are only present in >= 3.5, it's no issue.
This may sound trivial, but it's a _huge_ deal, particularly when you need to deploy something that runs on multiple different OSes and versions of OSes.
Other than that, the "directory full of libs, binaries, and code" approach is a lot easier to package into something that will work well with the native package manager (e.g. an .msi for windows, etc).
Given what you're replying to, I'd say irony sure does.
Rust or native platform (Swift/C# .Net Native, depending) are going to be even more ideal for battery usage.
Proper algorithmic choices are even more important and paramount no matter what is used. It goes without saying that poorly implemented Rust can be bested by well implemented Python.
would they do this? does anything macOS internal depend on python 2.7?
If you look at the source in `/usr/bin/xattr` it does some work to deal with different versions of Python. All the work ultimately gets handled by the xattr module preinstalled with the system Python. This module has Apple's copyright in it and is different than the `xattr` module on pypi.
Wonder how this Python one-off in macOS came to be.
Edit: I see the motherlode is in place earlier in the thread "Especially when you could build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem"
https://slashdot.org/story/01/10/23/1816257/Apple-releases-i...
(that post is old enough to vote in this year's election).
I think that in terms of lines if code, my contribution to many projects was net negative.
if err != nil {
... error handling ...
}
Python just uses exceptions, and they bubble up. No need for this excessively verbose error checking.Go is more verbose. It also gives you the wonder of the compiler telling you about doing stupid things. That doesn't make it a worse language.
Watching files. Keeping backup of files. Keeping conflicts resolved. Watching Selective Sync files and folders. Watching Smart Sync files and folders. Notifications for synced files. Etc. etc.
There’s way more the client does than what I mention.
1. Python apps require a distro base image while Go can run on scratch
2. Python images ship with the full standard library; not just the bits you import
3. In Python, if you add a dependency only to use 1 function or variable, you still end up with the whole dependency in your Docker image, while I’m pretty sure Go’s linker strips unused code.
IMHO unacceptably large, but that's because Python doesn't know which parts of pandas it might need to execute the program.
On another note, I believe one thing that has been problematic for pypy adoption is that it does not automatically work with C extensions or Cython, and generally if someone already had performance issues with CPython, they would have written some C/Cython extensions?
It's a tool that allows you to write an entire interpreter in RPython (a subset of Python) and then have it build a native binary with a free jit compiler included, with the specifics of your language encoded within. The reference implementation for this project is a Python interpreter.
It's seriously, seriously cool.
On a server, you’re paying for that 10%. On a client, you’re not. If it was 10% for nearly free then sure - but maintaining a separate implementation of a language is costly.
Math textbooks have been doing this for decades, but it's leaked into common parlance with online discussion.
Of course this is not when most people think when they hear the word so really it's a term of art that should probably be avoided when talking to non-mathematicians.
As i read through my old uni maths notes there are often wild leaps from a to e along with a little scrawl saying "trivially" or "obviously". They may have been true once, but god dammit 21 year old me was a knobber
I have never had to support all of the mobile environments of DropBox nor the scale, so I cannot claim that my 99% solution would ever meet their 99.999% requirements. But I have been able to package Python apps for Mac OSX, WinVista, Win7, Ubuntu, and CentOS at the same time using that strategy.
https://go.googlesource.com/proposal/+/master/design/go2draf...
See the bits about error handling and error values.
The go2 proposal shows that in this respect, he was woefully wrong.
This is an argument for type annotations to be integrated into every dynamically typed language, rather than tacked on via an external tool.
I have, and I wanted to like it. On its face it seems like it should be a lot better than Go's--after all, it supports generics and union types! But it falls over in trivial cases, like recursive types (i.e., there's no way to model tree structures such as JSON or linked lists). A few other hard/impossible/confusing things come to mind:
1. How do you declare a typevar for a certain scope. If I define a type parameter `T` for function `foo`, I only want `T` to be scoped to `foo`. I don't want the type checker getting confused with `T`s for other functions/classes/etc. 2. What is the signature for a function that takes args/kwargs? 3. It straight up doesn't work with popular libraries like SQLAlchemy (last I checked, these were simply not supported because the likes of SQLAlchemy are "too magical"--this is a fair take, but frustratingly limiting for users).
These are just a few because my memory is poor, but I run into these sorts of things by the dozens every time I try to use mypy. It's just not ready for prime time. Go's type system is limiting, but its limitations are much more predictable and even less limiting (it turns out recursive types and poor-man's union types are quite a bit better than first-class, non-recursive union types, for example).
A simpler alternative would be for Numpy and Pandas to provide its features as subpackages, like Airflow does: https://airflow.readthedocs.io/en/latest/installation.html