Python at Scale: Strict Modules(instagram-engineering.com) |
Python at Scale: Strict Modules(instagram-engineering.com) |
This may interest you. This is probably going to be the 'official' way to get what you're talking about.
mypy and mypyc are interesting but their compile-time checks and optimizations are still hampered by Python's dynamic language semantics.
Smalltalk, for example you can completely change the structure of a class by sending a become: message.
What I think is missing is a bit of more PyPy love, and the Truffle and OpenJ9 Python support efforts.
I think the killer language will be typescript with access to both the python and JavaScript ecosystems. We'll see what that looks like.
And of course if something changes the syntax, better anonymous functions will be the absolute first thing I would look for...
I have not used TypeScript, but looking at it's documentation the syntax for type annotations look identical. Would you be willing to expand on why you think its approach is better / how it's different?
I think this is an extremely good idea. Python is horrible but forced on a huge number of developers because of its ecosystem ... I think a bridging layer from typescript to python could be built in a way similar to swift’s Python Interop — and I don’t think it would require any special language support ...
I think could actually make a better/easier to use/more robust design than Swift by requiring all interactions with the python interpreter from node be async.
We build our employee database, and from there our IDM, from a singel XML file in a really shitty format + three txt files in even worse formats (they are single line output files from an old mainframe system predating sap). We used to do it in a rather complicated Microsoft SSIS workflow with a lot of C# services. All in all it’s a 30 minute nightly runtime. I recently replaced it with around 500 lines of Python and a 1-5 minute Runtime (sometimes at the beginning of a school year we’ll see changes to around 1000 positions).
Python eats the XML like it wasn’t shit. It takes things like terrible date formats, we’re talking the output of a SAP free-text box shitty, and ports then seamlessly into a SQL date field. This alone was a nightmare in C# and Python just does it.
Still, after two decades of strict types it feels dangerous.
The high-level-variant is a dynamic language with optional typing, which is good for scripting, fast prototyping, fast time-to-market, etc.
The low-level-variant is similar to the high-level-variant (same syntax, same features mostly, same documentation), but it has no garbage collector, typing is mandatory and it runs fast like C/C++/Rust. Compiled packages that are written in the low-level-variant can be used from the high-level-variant without additional effort at all. The tooling to achieve this comes with the language.
A language like this would be insane, IMHO.
At the same time, I’d love a stronger type system to avoid a bunch of the pitfalls that the dynamism of python has.
So count me in.
I don't know however if this approach could be extended to other domains - say making a web framework. Given, python classes let you do so much tinkering, any attempts to port existing code will probably need a lot of rewriting?
> I've been tracking nim, and would agree it's the most promising so far! I feel though that it's trying to be too flexible in many ways. Examples of this include allowing multiple different garbage collectors and encouraging heavy ast manipulation. I'm also afraid it is different enough to keep it from attracting a significant amount of developers from the Python community. Nonetheless, it's something I plan on using and contributing to, since it's the best option so far.
Though, now that another commenter pointed out mypyc: https://github.com/mypyc/mypyc I believe I'll invest my limited free-time in that project instead, as it will allow me to stay within the Python community and eco-system that I love so much.
Gives some good insight into where Nim is going in the future too.
What takes 3 lines in Python, takes 10-30 on Go.
- I hate it's module system and package eco-system story. - I don't like its syntax. - I don't like its error handling. - I'd much prefer gradual typing. - I want to maintain the ability to use interactive interpreters. - I don't like the fact that instead of being community driven it is Google driven.
But, anecdotally, I see go being used as a second language to Python more than anything else and at an ever accelerating rate.
Yes, this !
That's why I hate Django and some flask app the most for, the fact that by importing a module, you're implicitly creating a database connection, and a lot of other magic stuff, which mean that now I can't import a constant defined in said module outside of `python manage.py`
Also as said below in the article, suddenly it's much harder to handle smoothly the "the database is momentary unavailable" (because someone has put the line starting the database connection in the global space of a module somewhere)
I much prefer frameworks/modules for which code is executed only once you invoke their "setup" function
In our codebase, we have pretty strict developer-enforced rules about not doing I/O at the module level, usually through the use of simple "Lazy" wrappers for module-level objects. I'd be curious to know what other approaches people have taken with Python here.
I always treated this a bit like single underscore private functions/methods, i.e., follow a convention that produces code that's easy to reason about, even if it's not strictly enforced by the language/compiler. So in practice this equates to separating out modules that mutate global state, and placing the majority of logic in "strict" modules that only declare a bunch of "pure" classes/routines. So the "non strict" code is really just a thin layer of wiring gluing everything together. For instance my Celery task files tend to be very thin.
my_db_conn: Lazy[DbConn] = Lazy(lambda: make_db_conn(...))
and MyPy will tell you if you're doing something silly when you try to use it.
EDIT: After typing up this response and submitting I realize you were talking about their strict approach rather than ours. whoops :)
Someone made a change that took down production because of non-deterministic outcomes? How about break out whatever they were changing into it's own service? With proper fallbacks, breaking that part shouldn't take down all of production again.
To be clear, I'm not saying microservices will solve all their problems or be less work. I'm just saying that with an equal level of effort, they would probably get more overall reliability by having multiple services, they'd be able to use multiple languages, whatever is suited to the task at hand, be able to deploy even more often with less risk, and be able to isolate these types of "change on import" behavior to a much smaller surface on any given deployment.
It's hard to know anything about the stdlib as it can be monkey patched, e.g. [1]
That said, you could solve this with diagnostics; calculate signatures of stdlib functions and classes to find any known safe ones that were patched. Run that check in your test suite to find problematic imports.
> If the utils module is strict, then we’d rely on the analysis of that module to tell us in turn whether log_to_network is safe.
I like this. It seems far more usable than proposals like adding const decorators.[2]
[1]: https://github.com/gevent/gevent/blob/master/src/gevent/monk...
[1] https://www.tedinski.com/2018/03/20/wizarding-vs-engineering...
There are currently maybe two ways to tackle this “problem”, without a strict mode:
1. Don’t import at the global module scope; but that’s a bit tedious.
2. Import with rename, like `import os as _os`, and then leave it to the principle of “we’re all consenting adults”. I.e. if anybody imports and used things that start with an underscore, it’s clearly their fault, not mine.
I think the first step here is to get away from the assumption that importing a module will have "interesting" side effects. This is not only a problem with Python...
I tend to create mini "dependency injection" frameworks that create a pattern for loading module code at some point well after import. This patterns tends to reduce to wrapping whatever code you have in the module in a function/closure instead of just running whenever.
Again, I like the idea of enforcing constraints with code, but I don't think it's a substitute for educating developers to avoid certain patterns and giving them infrastructure that makes the alternative easy.
Millions of lines of code in a monolith. 20s start up time. Meta monkey patching. One unit test per process... Yikes!
Software architecture, anyone?
Maybe Instagram should get a copy of Michael Feathers' book...
I added these ideas here: https://github.com/perl11/cperl/issues/406
well, if you ask me to write language X, I would definitely make mistakes for the first couple of weeks/months/years, that is why you need code review, mentoring and education plans for your hires.
> Here’s another thing we often find developers doing at import time: fetching configuration from a network configuration source.
MY_CONFIG = get_config_from_network_service()
I am pretty sure this an anti-pattern, if this code passed the code review, you should make your review process more strict. def myview(request):
SomeClass.id = request.GET.get("id")
> Likely you’ve already spotted the problemWell, yes, why would you do this? why would this pass code review? why do we we have linters and other checks for dynamic languages
> It works great for smaller teams on smaller codebases that can maintain good discipline around how to use it, and we should switch to a less dynamic language.
It seems we are here blaming python for shortcomings of a monolith also, instead of chunking out specific businesses modules to separate services/micro-services.
TO be honest the strict mode seems interesting, but I believe the problems they seem to be facing can be solved by a couple of changes to their pocess and code:
- everyone gets a mentor if they are not experienced in python or django
- code review atleast by two experienced python developers(does not count if you have coded for Java for 20 years)
- teams should try to move their logic outside the monolith(it sounds like they have a monolith)
- write CI tests to measure how much time it takes to import a file, if it takes more than T(line count * LINE_PROCESSING_THRESHOLD) you have to fix your code.
- prepare config and load it before running the actual server, no network call for getting config
All in all, python is suitable for big companies also, the thing is if don't care about the best practices, you would also have problems when you are a small startup, but in a big co it would make it impossible to move forward, trick is to independent of the company size follow best practices and have code review.
Clearly, Instagram's solution saves them time. That means faster code reviews which incidentally makes them more accurate. Your post doesn't really make sense.
It's also important to... use pytest fixtures instead of arbitrarily patching around in tests.
> But if we moved the log_to_network call out into the outer log_calls function, [...] this would no longer compile as a strict module.
My current understanding is that the log_calls method would NOT get executed during module load time!?!
Why would having a side effect in this function violate the intention of __strict__ ?
That's incorrect. log_calls gets executed on import because it's a decorator, so equivalent to `hello_world = log_calls(hello_world)` at the top-level (which does also get executed).
log_to_network in the _wrapped() definition doesn't get executed until hello_world gets called; but outside of the definition of _wrapped does get executed.
Those optimizations won't mean much for cpython, since Cpython doesn't try to run things fast, but for something like pypy this could be a big deal.
The quote is probably wrong, but it is right in spirit.
That's bananas.
Nothing Instagram does requires that much code.
Also, that much Python code means you're doing it wrong.
Python is too expressive to require mega-LoC for that site.
You could implement an OS, relational DB, spreadsheet, and optimizing compiler all in less than that.
You are right in that it’s certainly a high LoC count for Python, but still...
(And for the record, Linux is ~37 million lines of actual code, Postgres ~2 million, and gcc ~8 million)
There's nothing absurd about one of the most visited websites on earth to be a couple million LOC.
> So that's a third pain point for us. Mutable global state is not merely available in Python, it's underfoot everywhere you look: every module, every class, every list or dictionary or set attached to a module or class, every singleton object created at module level. It requires discipline and some Python expertise to avoid accidentally polluting global state at runtime of your program.
> One reasonable take might be that we’re stretching Python beyond what it was intended for. It works great for smaller teams on smaller codebases that can maintain good discipline around how to use it, and we should switch to a less dynamic language.
> But we’re past the point of codebase size where a rewrite is even feasible. And more importantly, despite these pain points, there’s a lot more that we like about Python, and overall our developers enjoy working in Python. So it’s up to us to figure out how we can make Python work at this scale, and continue to work as we grow.
Those are literal quotes from the article. That is quite damning. How did they get to this point? By starting when Python was appropriate, and taking it day by day.
My guess (based on my experiences) is that companies wind up in this position from having inexperienced people building early versions of products instead of hiring experienced engineers (who are usually more expensive).
I would categorize it as a subset of dynamic typing, and that's what Wikipedia says too.
For me, it's not "status anxiety". It's simply not worth the effort.
The last couple static analysis tools I ran on my programs, I spent a while getting the tool to not-crash (because even though the authors obviously had a static analysis tool themselves, they either didn't bother to run it on their own code, or it wasn't good enough to find actual issues). These tools flagged only a couple issues, and almost all of them were places where it couldn't really cause any problems, but the type system was not strong enough for me to prove why it couldn't go bad. So I spent a while sorting through false-positives.
I'm not going to spend hours with a tool to find only a couple (real) bugs, which no user has ever reported seeing, and which I've gotten no automated crash reports about. I have much better uses for my time.
It does create an object that can (lazily) connect to the database, so it needs the required database drivers installed. It also needs the required information about _how_ to connect to the database, so it needs the settings loaded.
That's why you need to use `django.setup()` before, to tell it what settings to load. You should never be importing random Django models without this configured, simply because they cannot be used and will not work. We think an exception saying "don't do this, call django.setup()" is less confusing at import time is than "Databases not configured" at runtime. Not that it would even reach that, because you might be using a field from a third party application that needs to be initialized (i.e INSTALLED_APPS configured) or that relies on a configured settings (maybe an encrypted field that needs your SECRET_KEY available).
Stop making it hard, just write a management command. It's super easy.
Django _does_ have a "setup" function. You can't import and use Django database connections outside of a running application without it.
Flask also has a "run" method and does no i/o without it.
In practice, this means that any script that depends indirectly on Django code will incur a lengthy startup cost (from having to call setup()), and will fail to run if there's no database connection, even if the script itself doesn't need the db.
Yeah, now you'll have 10 interconnected services, 10x the complexity, and everything will have the ability to take down all of large parts of production, plus all the extra pain points of a distributed system...
You'll have added complexity with the network calls, which is why I said it wouldn't be any less work, just different work.
As you keep moving along, some things that depend on that first service will start calling the new service directly, and some will still call it in the monolith. But your tracking will tell you how often and who is doing that, so you can find out why.
In the meantime, nothing will break, because the monolith is still a pass through proxy to your service.
However, at their scale and with their engineering resources, I can only imagine an attitude of "we can make this work" (the monolith) is easier to justify. The same goes for the micro-services approach (except here you have to justify changing what has been working so far?)
I'd love to read more about the history behind this approach at Instagram.
This is precisely what gradually typed languages — like TypeScript, Flow, and typed Pythons — solve!
I talked about this on Software Engineering Radio last week: https://www.se-radio.net/2019/10/episode-384-boris-cherny-on....
You have the madness of thousand of developers flinging code at the universe due to the easiness of browsers, JS, and npm.
This results in great speed, but not great quality.
When your project/company now wants quality, you keep your code but transition to types. (In OSS space, Angular and Yarn projects have both done JS => TS migrations of some form.)
I’ve never worked on a program so small that readability didn’t matter. I consider it a crucial ingredient of expressiveness and development speed.
Though your perspective could explain a few of the more atrocious code bases I’ve seen.
There's two many people that have swallowed SOLID whole and can no longer see good engineering as a trade-off against other factors.
For example, being strict about having the smallest possible public API and making most methods private protects me from future breakage that might never be an issue (I might never upgrade) but forces me copy/paste vast globs of your code into my own if I need access to something you didn't anticipate. (and that's assuming I have access to your source. Worst case is that I have to reimplement things that already exist in the code I'm interfacing with)
Python got this right. Private methods are a weak or strong hint that you might want to think twice before calling them. But you're the boss at the end of the day.
I think this is why it's easy to point a thousand things built in python which people use every day (like instagram), while in, say, haskell, there are barely a handful (pandoc, facebook spam filter, etc.).
[0] https://martinfowler.com/bliki/DesignStaminaHypothesis.html
I've never heard of this before... I love it. Thanks for bringing this up.
I'm not familiar with this use of the term "expressiveness".
My understanding is that expressiveness (as per "On the expressive power of programming languages", Felleisen 1991 [0]) has to do with capabilities that a language has that separate it from another language. C is more expressive than Python in that it gives you direct access to memory management, whereas Python is more expressive than C in that it provides inheritance/OO. (These are just examples.)
Type safety, performance, and readability are all wholly separate from expressiveness, I think. A language's type system and performance benchmarks have nothing to do with the expressive power of a language outright, and "readability" is entirely subjective to begin with.
So: would you mind elaborating on what you mean, exactly, by "expressiveness of [a] language" here?
---
In fact, most of what you (and the linked article) are talking about has to do with the dynamic/static spectrum, not this "wizarding/engineering" spectrum you've coined (though I do kind of like the idea of that for discussing development methodologies).
The article is all about how the dynamically-typed nature of Python allowed for rapid iteration at the beginning of the Instagram project, but has since hindered further progress as they've grown larger. But now they feel they can't just rewrite it all in a statically-typed language because of the engineering overhead involved.
On this note, I want to go to your last point:
> I wish there was a language that let you move gradually from one end to the other, exactly when you need to.
With regard to the dynamic/static distinction, there are languages that allow you to move "gradually from one end to the other", and they are (aptly) called gradually-typed languages.
Gradual typing was invented by Jeremy Siek and his PhD student, Walid Taha, back in the mid-2000s at Indiana [1]. In this discipline, you can have a statically-typed codebase with local dynamically-typed regions. You get all of the static guarantees for everywhere that they can be made, and dynamic regions impose runtime checks to ensure consistency. (This connects closely to contracts, which are primarily worked on by Robby Findler at Northwestern, I think.)
Unfortunately (to me), it seems like a lot of these languages are implemented in terms of existing dynamically-typed languages. For example, Sam Tobin-Hochstadt (Indiana) created Typed Racket, which is (of course) built upon Racket but provides a gradual typing discipline. Wherever possible, static types are checked, and everywhere else utilizes contracts to guarantee runtime consistency.
Anyway, all this is to say: the technology exists, technically, but is in its infancy. There's no doubt it'll be some time before it sees widespread use throughout industry. Sam wrote up a brief overview for the SIGPLAN Perspectives blog recently, if you're interested [2].
[0] https://www.sciencedirect.com/science/article/pii/0167642391...
[1] https://wphomes.soic.indiana.edu/jsiek/what-is-gradual-typin...
[2] https://blog.sigplan.org/2019/07/12/gradual-typing-theory-pr...
I find that pythons OOP + functional aspects, combined with a good understanding of the language hits a sweet spot here. One that simply can't be reached in c/cpp/go/java/haskell, and which is much easier to reach than in js/rust/other langs where I think it is possible.
The wizarding/engineering spectrum was coined by the article I've linked to[2]. I think the post is exactly about that, first Instagram was wizarding and they had a suitable language for wizarding, now they're engineering, but their language is still only good for wizarding.
As I've said in a sister comment, it's not just about static typing, but metaprogramming/macros/side effects everywhere etc. There's more to the expressiveness/powerfulness than just types. While gradual typing is certainly an improvement, I think we need more research in this direction.
In what parallel universe is not Java immensely popular or not used for green projects?
Too many companies need devs but have engineers, or they need engineers but only have devs :/
(Except those people who claim software engineers aren’t real engineers)
This is a really interesting perspective to me. Coming from Python circles, I've heard too often how horrible JavaScript is as a language and how it's only used because the web has dictated it. Doing web development, I've used both, and generally am inclined to agree. I know TypeScript add some niceties on top of it, but it still is stuck with JavaScript baggage. My perspective has always been that Python is by far the better language, which is why people have written that eco-system in it despite the fact it doesn't have a built-in monopoly of the browser.
How's that a problem?
Dynamic typing means that types can be changed arbitrarily at runtime, compared to statically typed languages which define all types at compile time.
Strong/weak means that type coercions rarely/never happen automatically. For instance JS has some interesting behavior enabled by weak typing `[] + [] -> ""`. Whereas Python rarely coerces things for you. The division operator in Python 2 was strongly typed, while they changed it to weak typing in Python 3 (inline with the practicality vs. purity convention).
In Python everything has a type, and you can't use a float as a list, for instance. It's correct to call it both strongly typed and dynamic, those are not antonyms.
Merely "use the network to make calls between them instead of being local" will add 10 times the complexity -- you suddenly have a distributed system, latency, delays, parts that can be on or off, de-centralized configuration (which can also get out of sync), and so on.
meaning that you get to throw network and server errors into the mix of things that can go wrong, and you get the fun of tracing failures back 3 hops to a server that decides to take too long to run a process one day and times out a connection downstream.
it's horrible debugging stuff like this.
Why any enterprise would use C++ for standard "business" or "large scale" programming makes no sense to me.
Enterprises want stability, not speed to market. Most of their infrastructure changes slowly (as in features deployed once or twice a year maybe). They have stable support mechanisms for this, including long and complex processes of approval.
So while you might not write it as full stack C++, a couple of native libraries might be required as dependency, to access OS features, give some help to the AOT/JIT compilers, or in Java's case implementation of more machine friendly data structures.
If, however by i/o you mean the database portion of it then Django works without database configuration, too.
To be more specific: getattr, operator overloading, descriptors, heterogeneous dicts, decorators, etc.
Type checking and metaprogramming are fundamentally at odds [1]. Dynamic languages like Python have more of a focus on the latter. They later added type checking, but it comes at the "cost" of ruling out the more idiomatic metaprogramming and reflection features. In other words, static typing makes your source code bigger.
Well, optional typing to some degree lets you have the best of both worlds -- you can skip type checking of the hard parts. But optional typing doesn't let you compile your program to make it faster -- you need a fully-typed program for that.
----
I'm doing something similar to mypyc with https://www.oilshell.org/ (I actually visited Dropbox and chatted with them about it back in the spring.)
The difference is that I'm compiling Oil's Python source to C++ rather than to Python-C extension modules. So it doesn't depend on the Python runtime. It's not done but it's working well so far, and it's given me a lot of appreciation for which dynamic features Python programs actually use! (both my own and others)
Also note that mypyc was used to speed up mypy, which is a type checker. A type checker is a very particular kind of program that's different than 99% of the use cases of Python. So success on speeding it up is super impressive but it's not clear it generalizes.
The same is true for Oil -- my translation work doesn't generalize to arbitrary Python programs. Lots of people have died on that hill because it's really hard. You have a hard tradeoff between the kinds of Python programs you can support and the speedup you can give them. There are 10-20 projects over the last 2 decades at various points along that spectrum. In addition to mypyc, Oil's strategy was also inspired by Shed Skin, which is an impressive but mostly dormant Python-to-C++ compiler.
----
So in short I would say the problem is that nobody will be able to agree on a subset. You will have a lot of different fragments of Python geared toward particular use cases.
But Python will very often be more appealing than any of those fragments because it has a bigger ecosystem. One thing that I've appreciated more and more while designing a language is how much the network effects and inertia matter. It's why we're still using C and C++ after almost 50 years. I'm sure every day there is still a lot more C++ written than Go, Rust, Swift, and D combined, etc.
Python has a similar network effect and it will be around basically forever in its current form. Software doesn't really get rewritten or reduced -- more stuff just gets added on top.
[1] I wrote some posts about that tradeoff here: http://www.oilshell.org/blog/tags.html?tag=metaprogramming#m...
However golang doesn't have that. So you get the danger of a dynamic language without the features that make powerful.
Very dynamic code shouldn’t be easy; the happy path should encourage clear, simple code. By encouraging people to stay on the happy path, their code is more performant, maintainable, etc and it keeps the average code quality quite high across the ecosystem.
C has void *, writing generic code using it is hell. Enough so that people went through a lot of trouble creating C++ and later Rust to escape it.
I'd say the type casting from interface{} to whatever you assume is in there qualifies as different.
Pretty much every single aspect of these languages is different from what I can see, the only thing they have in common is included batteries, the rest is growing popularity and consequences thereof.
Try it a bit. it truly is enjoyable. Fifteen years of python and I'm still enjoying TypeScript more.
Mypy is limited by annotations having to be compatible python syntax.
Do you mean like https://www.python.org/dev/peps/pep-0604/#id17 ?
> Inspired by Scala language [5], this proposal adds operator __or__() in the root type. With this new operator, it is possible to write int | str in place of Union[int,str]
This at least is being solved in Python: https://www.python.org/dev/peps/pep-0563/
> inline interfaces
Typed dicts in 3.8 look pretty similar at cursory glance https://www.python.org/dev/peps/pep-0589/
I'll definitely play around with it more!
// typescript
let t: {a: string[], b: number} | [string, string]
# python 3.8
from typing import TypedDict, Tuple, Union
class SomeTypedDict(TypedDict):
a: List[str]
b: Union[float, int]
t: Union[SomeTypedDict, Tuple[str, str]]
I had to google a bunch to figure out how to write the Python version, whereas the typescript one was completely natural to write. It takes one line and requires no imports. The interface is inlined. All of this also makes it more readable when you come across it eg. in an IDE tooltip.I agree, but if you for instance look at the TypeScript comparison sub-thread, you'll see that all the issues with both the syntax and implementation of the type-system are being aggressively resolved, and likely will all be so by 3.9.
> Good way to constrain the dynamism so performance could be improved
Couldn't agree more!
> environment
I find poetry a joy to use. If you want to bypass venvs all together, there's a lot of work to make that a reality, such as https://github.com/David-OConnor/pyflow.
> packaging
Python in 3.5 added complete zip app support, which has improved this dramatically from my perspective. Extended by things like shiv https://github.com/linkedin/shiv make it fairly complete.
> async/await
This is interesting to me. I prefer async/await in general, because it has become a standard across programming languages and I find it really easy to reason about. I also find channels to be too widely seen as a cure-all, when the only study so far has shown they actually led to an increase bug count. But I don't discount the value of real-parallelism, and am glad to see that Python has been pushing harder on that lately, with things like subshells that allow bypassing the GIL on a single thread.
I'm happy to hear that; hopefully the efforts really do address these issues well.
> I find poetry a joy to use. If you want to bypass venvs all together, there's a lot of work to make that a reality, such as https://github.com/David-OConnor/pyflow.
I'll have to check those out, but one inherent problem is that even if these tools really do solve my pain points, adopting them means I'm leaving my org on a relatively small island, isolated from the Python community. If these really are the holy grail, why isn't the broader Python community adopting them? Please don't take this as me looking for something wrong--whatever Python build tool I use, I'll eventually need support and there's a lot to be said for having a thriving community that has almost always run into my exact problem before.
> Python in 3.5 added complete zip app support, which has improved this dramatically from my perspective. Extended by things like shiv https://github.com/linkedin/shiv make it fairly complete.
We're currently using this via pex. It mostly works, but we still run into problems occassionally (system dependencies, for example). Figuring out how to integrate these tools into the broader build process is another problem to solve--we're using `pants` which supports pex out of the box, but we're running into lots of bugs or other problems. I'll keep an eye on shiv.
> This is interesting to me. I prefer async/await in general, because it has become a standard across programming languages and I find it really easy to reason about. I also find channels to be too widely seen as a cure-all, when the only study so far has shown they actually led to an increase bug count. But I don't discount the value of real-parallelism, and am glad to see that Python has been pushing harder on that lately, with things like subshells that allow bypassing the GIL on a single thread.
My biggest issues with async/await are
(1) every package needs an async variant (async boto, async docker, etc etc). We work around this by running them in a thread pool executor, and I think that works, but I don't know if I'm holding the GIL unnecessarily and causing performance issues (fundamentally difficult to diagnose). This is roughly the "what color is my function" problem.
(2) it's really easy to starve the event loop by calling into something that transitively makes a sync call or otherwise just does a lot of CPU-heavy work. We've run into both kinds in production and they've been really hard to troubleshoot (because the requests that time out often aren't the ones that are actually causing the problems).
(3) dynamic typing means it's super easy to forget to await things. Tests should catch this, but we find ourselves writing tests _just_ to catch this (e.g., we now write tests for entrypoints that _just_ `await lib_function(params)`; we would normally not write tests for such simple functions, but now we have to). Static typing is the right way to solve this and mypy does, but mypy has too many other issues (at the moment) for our org.
One substantial criticism of goroutines is that they're less safe than async/await because you need to make sure the code you're running is threadsafe. I appreciate this criticism, but I think it's the right tradeoff for Go's performance aspirations (another great high-performance alternative is Rust's borrow checker, but that's the wrong tradeoff for Go's developer productivity aspirations).
I do too: It's Python and it's several million lines.
Metaphor: you've got three pallets of goods and have hired three trucks to move them. I don't have to know how you wrapped the pallets to know that you brought two too many trucks.
I don't have to know the details of what's included in "Instagram Server" et. al. to make this call (obviously) based on my experience and first-hand knowledge of similar codebases. Frankly, I am kind of disappointed in the pushback I'm getting on this. The only reason to have a multi-million line Python project is for the entertainment of devs, or, worse yet, job security.
Let me put it this way, if the CTO of Instagram showed up here I would be willing to bet US$100,000 that I could reduce the Instagram code by 90% in six months. (Do you think the devs there would appreciate that? Even the one that got laid off as a result?)
If I sound cynical it's only because I've seen this sort of thing for myself. I'm not trying to say that the Instagram devs are dumb or nefarious, this kind of code happens organically and often despite our best efforts. But that code needs a diet. I'm sure of that.
- - - -
edit: In re:
> (And for the record, Linux is ~37 million lines of actual code, Postgres ~2 million, and gcc ~8 million)
So, call it 50M LoC, what's your ratio for Python/C? Meaning, how many lines of C code are replaced, on average, by one line of Python?
And how feature-complete are we talking? POSIX? GCC targets a lot of languages and platforms, eh?
If you were going for an integrated system, like Oberon OS or a Smalltalk IDE, I think my claim is still plausible, eh?
:)
Classic HN comment!
I'm just putting some numbers on my confidence level.
> All code could be made leaner and smaller. What's your point?
That Instagram's code could be made much leaner and smaller. I feel like I've been very clear about that.
> Do I have to rewrite my programs in RPython?
> No, and you shouldn’t try. First and foremost, RPython is a language designed for writing interpreters. It is a restricted subset of Python. If your program is not an interpreter but tries to do “real things”, like use any part of the standard Python library or any 3rd-party library, then it is not RPython to start with. You should only look at RPython if you try to write your own interpreter.
Cheers.
It is a fools errand to think Kotlin/Native would ever overtake Java.
Only on Android it might have a future, if Fuchsia never becomes a thing.
Google does all kind of new Java work (Golang contrary to myth, is just one of the languages Google uses for internal stuff, and niche at that), Amazon of course, most of Apple's backend services are Java, Twitter, AirBnB, Uber, LinkedIn, TripAdvisor, and tons of others use Java, and write green stuff in it all the time...
Java projects get done all the time.
Additionally, even though it isn't standard Java, I have plenty of new apps running on my pocket.
> A caller can change sync function to async, breaking the functionality downstream.
I think you mean callee?
Are you taking about using a returned result? Because most languages permit ignoring return values.
If you want to check out promise use, check out https://tsetse.info/must-use-promises
Consider
``` const myIntValue = f(); ```
This code will silently break when f changes from sync to async.
I'm not a fan of the language, but java has a huge number of developers, a rich mature ecosystem of software and is quite productive (enterprise patterns aside), it's a good sweet spot for most companies.
C++ is a hydra of complexity, sure it has it's place, but it's not nearly as productive as Java for your typical web application.
Go is almost the opposite, so simple it lacks features like generics. The last time I used Go it had fundamental usability issues around dependency management(although I think recent versions have improved on vendoring a little).
Modern C++ well is as productive ( probably even more productive ) as Java. The main issue with C++ is recruitment, C++ engineers are rare because C++ is barely teached.
Just like MISRA tries to constrain C programmers from doing dumb things in the embedded world, "modern" C++ tries to the same in the business world. But there isn't an easy way to enforce it, especially when you're outsourcing to some code sweatshop.
The problem with modern C++ is having many devs to actually make use of it.
Many devs don't care to follow up on modern language X best practices, rather just typing away something that works.
Without specs, it will assume every type is 'any()', unless it has information to infer something more stringent. For instance, if it sees you add 5 to it somewhere, it will instead assume it is a number. Etc. Even if in practice it actually is a list of some kind (and so that addition of 5 will fail). Which, yes, ain't great. Hence why I said it was a gradual transition; it will catch provable errors (i.e., if you call append on that same variable as above, it will note that there is no type that allows both append, and + an integer, and error), but leave plenty of things uncaught that could have been caught had it known the type in question (via a type spec).
I have heard good a stuff about "gradulizer" though. It uses a gradual type system instead of dialyzers success typing.
And yes, knowing nothing else about their code base than A) It's in Python, and B) it's several million lines of code, I feel very confident that there is at least an order of magnitude too much of it. Instagram is just not doing anything that complicated.
(I should mention I specialize in maintaining and refactoring legacy Python code. I know what I'm talking about here.)
If you need several millions of lines of Python to do what Instagram server does, the code is bloated.
My bet is that they let too many Java devs loose on the code base, without experienced Python devs reviewing the commits and managing the deluge of unnecessary classes. I've seen it happen before.
If you think it’s just a bunch of static photos in a feed, you’re wrong and it’d be clear you know nothing about the service.
Name three?
Seriously, you're right, I've never used it. Wow me.
Only because they are so new. portray was built a few weeks ago and already has a thriving community building around it - but of course it's still a small drop of the whole ecosystem. Older tools I've built like isort, are now ingrained into the community: https://github.com/timothycrosley/isort, but that took years, even without major issues or complaints being present. It just takes people time to adopt new things.
const myIntValue = f()
// Error TS2365: Operator '+' cannot be applied to types 'Promise<number>' and '2'
const myResult = myIntValue + 2
async function f() {
return 42
}Every mature enough language has a subset that you need to avoid. Including Java. This is precisely due to this kind of things that every company need to have coding guidelines and proper static analysis tools.
> especially when you're outsourcing to some code sweatshop.
If you outsource your dev to cheap, other side of the world, low quality engineers. Then you deserve your problem, in any language.
I worked in the past for a company (embedded programming) that had an entire team of expensive engineers in Luxembourg just to fix the stupidities of an other team of outsourced engineers in India.
They don't have the time or space to engage in a massive technical debt reduction program, they're too busy destroying Snapchat and other competitors, reacting to TikTok, implementing an entirely new IGTV video service that provides their customers (ie advertisers and marketers) the equivalent of youtube within the Instagram universe, etc.
I'm sure that every large internet service's codebase out there could be made much leaner and smaller. The question is whether that is worth their while.
You're totally right.
1) Super easy. That way doing it right is trivial. 2) Impossiblely difficult so the only people who are doing it can be trusted to do it right.
To me go falls between those two. It’s real easy to say interface{} (indeed it is more difficult to make a non empty interface) but doing it in a way that is safe isn’t easy.
I don’t think expressive power is the point here. As they are both compleate languages. More it is an issue of what trade offs and comprises have been made.
These problems simply don't crop up in Go, or at least they're in a different ballpark in terms of frequency and severity. So yeah, Go lacks typesafe generics, but I'll make that tradeoff all day every day in exchange for the maintainability, performance, tooling, distribution, etc improvements that Go offers today. No contest.
Which is positive mind you, but they would be better served by adopting PyPy.
Code reviewers glazing over copy-pasted boilerplate blocks can more easily lose track of the whole, and miss an error which is obvious when the whole is expressed in 10 lines.
There is some optimal range of expressive density for comfortable use by humans. APL or K likely above that level, and Go feels below it, not as low as COBOL, but still.
I have this feeling that you're probably not all that aware of 95% of what their code actually does, and thus probably not in a position to make judgements as to whether their code base is truly bloated relative to what it does.
@dataclasses.dataclass # or @attr.s
class MyStruct:
a: List[str]
b: int|float
t: MyStruct|Tuple[str,str]
But MyStruct will be an actual object that can be manipulated as an object. And if you want to accept any object that fits that interface, instead of just instances of MyStruct, class MyStructTmpl(Protocol):
a: List[str]
b: int|float
Then def f(thing: MyStructTmpl|Tuple[str,str]) -> bool:
return True
f(MyStruct(a=['a'], b=2))
would typecheck.In JS having the typed-dict type makes sense because you're often working with arbitrary objects with who knows what attributes, but in python that isn't the case. There's fairly succinct and powerful tools (now, anyway) to define record types.
I'm also not onboard the "it's a code smell, it doesn't matter" train. IMO if python adopted the typescript typing syntax we'd all be better for it.
I also forgot to mention the atrocious typing syntax for functions. Once again, typescript is a lot more succint and readable.
I more or less agree with this, but then again, IMO you should be isolating the less controlled code behind a controlled api. And the marginal value of converting `_ConvertQueryResultDictToQueryResult(qr: Dict[str, Any]) -> QueryResult` to something that uses a typeddict instead (which may not be possible, since that function is probably generic) is low.
> I'm also not onboard the "it's a code smell, it doesn't matter" train.
Emphatically, this isn't what I'm saying. What I will say is that ergonomics encourage certain methods of development. From experience, I'm strongly against the pattern of using a dict as a weak struct. The best comparison I can give is tuple -> namedtuple -> attrs. Namedtuple has absolutely valid uses (when you need tuple semantics, usually for backwards compatibility). But people often use it for any record type, because it's easy and familiar. Dataclasses are usually better, and I'd be happier (and the average python code would be better) if the friction to add a dataclass was lower than the friction to add a namedtuple.
Similarly, if the friction to use a dict in place of an object is much lower, people will be encouraged to use dicts in place of objects. This isn't a good thing. That doesn't mean that we absolutely shouldn't try to improve ergonomics across the board, but I'm a strong believer that the language should make doing the right thing easier than doing the wrong thing, and this is often (but not always!) the wrong thing.
I would say that dynamic metaprogramming is at odd with type checking (and optimizations, and in general understanding the behaviour of a program statically). But of course metaprogramming can be done perfectly fine in a statically typed language.
My blog posts give some more color on that, but also see:
https://discuss.ocaml.org/t/the-future-of-ppx/3766 (breakage)
https://words.steveklabnik.com/an-overview-of-macros-in-rust (breakage)
Also, OCaml has had at least 4 different metaprogramming systems -- ocamlmeta, camlp4, ppx, etc. Rust is getting its second one post-1.0 as AFAIU.
There are lots of open problems related to metaprogramming and type system design because they interact heavily.
This blog posts also hints at that: https://nim-lang.org/araq/v1.html
In contrast, in Lisp, metaprogramming is "just programming".
Also compare with C++ metaprogramming (whose syntax is certainly awful, although it has been continually improving), but works perfectly fine with its type system (in fact most metaprogramming in C++ is done via the type system).
On the other hand CPython also exposes the runtime internals to plugins and de facto that prevents the language from evolving and alternative implementations to gain a foothold, so the issue of exposing implementation details preventing language evolution is not restricted to static languages.
edit: the hard part of typing and metaprogramming is making sure your metaprograms are well typed, i.e. the generated program is guaranteed to typecheck. This is great, but it is not a strict requirement, if you are happy with syntax macros a la rust or unconstrained templates a la C++, there is no particular issue. Your genrated program will be still be typechecked at compile time, which is still better than having a runtime error because of a bad metaprogram.
What bizarro bubble do you live in?
Yes, and each new version actually adresses that, one step at a time. (see https://bugs.python.org/issue35813 for example).
Not to mention the language landscape is almost static at the top. Nobody's gonna came and take Python, Java, C, C++, JS, out in the next 10-15 years...
Only a huge self-blunder, like the Perl 5 -> 6 transition, and only at much more volatile time (when paradigms change, e.g. when web dev changed from CGI, Perl 5 had already lost the web framework scene to PHP, Rails, Django and the like even before losing its main niche back then - admin work) can do any serious damage to a top language...
Anyway, let's check in 5 years...
I agree with this!
> I don't think there is any sort of long-term future in anything "Python".
I disagree with this :)
I think Python has efficient IO concurrency built-in already with async, and I feel it is likely that it finds a way to work out CPU bound concurrency long-term, as projects like sub-shells with channel communication demonstrate.
Both seem rather hard to implement.
I predict that for CPU-intensive tasks, you'll keep using extensions in native code (like numpy or pytorch), or keep passing serialized objects through queues in multiprocessing setups.
On what concerns the JVM, Kotlin hype will be over in a couple of years, and it will get as much use as Scala, Clojure, Beanshell, Groovy enjoy nowadays.
Guest languages never get to own a platform, and with time all platform languages end up getting enough features that the large majority of developers never bother with extra tooling, debugging layer and idiomatic wrapper libraries of the guest languages.
And we know that because?
>Guest languages never get to own a platform
That depends on the platform, who is running it, and how. You couldn't have a worse steering than Oracle.
And most "guest languages" are smaller affairs, they don't have companies the size of Google chosing them for Android app development (a huge niche in itself). Or have first class support from the most popular IDE of the host environment.
Plus, anything is anecdotal, as we have so few cases of major parent/host language rivalries, and even fewer cases with similar dynamics, that there's no real prediction.
Scala was too complex for most Jave-ers, too slow to compile, didn't have a Google pushing it to its platform devs but an insignificant company, etc. Clojure was a Lisp (= doomed), Beanshell and Groovy where from small, insignificant origins, and not pushed by anyone really mainstream the size of Google/FB/etc.
Kotlin doesn't have any of those issues.
Heck, even Elixir does quite well I hear.
UNIX and C, Web and JavaScript, Windows and .NET/C++, macOS and Objective-C/Swift, Android and J̶a̶v̶a̶/Kotlin/C++....
Google only cares to push Kotlin on Android, and it only matters because Google visibly doesn't want to move Java beyond the Java 8 subset that Android currently supports, so the choice is between an handicapped Java support or Kotlin.
Until there is a JVM written in Kotlin, and Kotlin gets first class support in all Java IDEs instead being a tool to sell InteliJ licenses, it is just yet another language that happens to target the JVM.
This ignoring that Kotlin already has a couple of impedance mismatches with the JVM, sequences vs streams, lambdas vs SAM, co-routines vs fibers, inline classes vs data classes.
Elixir is doing well because many developers seem wary to learn Prolog/Erlang syntax.
They do up to the point where they differ philosophically. Java is never going to turn in to a Clojure, nor is it going to adopt the type of dynamic scripting features Groovy offers.
Kotlin and Scala are more at risk in that way.
b) a way to see how other users interact with their posts, allowing comments, seeing views and other analytics, monetizing etc etc
c) Provides advertisers with the ability to place stories (stories are a stream of short-lived (24 hours) video/audio posts that users see) or posts (that can be static/video/audio), with links to external sites, purchasing direct links ("Shop now"/"Buy this") etc
Instagram is much more than a stream of user images.
That doesn't include all the "back office" stuff like spam/reporting/censorship/language translation etc etc.
It's exactly analogous to types being integrated in OCaml, Rust, etc. whereas in Python, JS, Ruby, and PHP static typing it's an "add-on".
Scala also has a newer system https://scalameta.org/
C++ has new dramatic new proposals 20+ years after templates, addressing really basic use cases:
https://herbsutter.com/2017/07/26/metaclasses-thoughts-on-ge...
The point is that no language has gotten it right so they keep introducing new systems and breaking old ones.
C++ templates also have the mistake where type checking is done after template expansion. That's why you get terrible error messages.
If you're not convinced, that's OK, but try watching the talk by Yaron Minsky here:
Maybe both should be in the global namespace...
All my projects now use poetry for the full build tooling and I love it. No setup.py needed just include any settings in the standard pyproject.toml file example: https://github.com/timothycrosley/portray/blob/master/pyproj..., which can be generated with poetry's help using poetry init.
> text-editor support
I feel like Python with type hints (for all their current flaws) does give you this exactly.
> dependency management
Again I think poetry solves the problems here very nicely
> documentation generator
Personally, I like portray better than anything in the Golang world for this https://timothycrosley.github.io/portray/ I may be biased since I wrote it.
I write a lot of Python tools so I'm genuinely curious because if there were unfilled needs I would want to address them as one of my 52 projects: https://timothycrosley.com/
We have yet to try poetry in our org. I'm hesitant to stray off the well-trodden path, but it might be worth a shot. Any idea about installing packages with system dependencies? Packages like `pygraphviz` (which depends on the `graphviz` or `graphviz-devel` system library) has always given us a lot of trouble, for example.
> I feel like Python with type hints (for all their current flaws) does give you this exactly.
I've noticed that some editors try to use these hints, but they seem to have a hard time in many cases loading the modules. It's possible that the editor extensions (e.g., VS Code) are just buggy, but it's still a problem. Further, they require that all of your dependencies have annotations or type stubs.
> Personally, I like portray better than anything in the Golang world for this https://timothycrosley.github.io/portray/ I may be biased since I wrote it.
I haven't tried portray.
The killer thing about Go's documentation generation is that it uses type annotations and exposes them in the generated documentation. This is critical because 95% of the reason I'm looking at documentation (especially in Python) is because I need to know the type signatures (and often Python docs omit types, or the types are wrong or vague--e.g., "the type is 'binary'" with no indication if that means a bytestring or a BytesIO or what). This is tablestakes for documentation systems in statically typed languages, but I have yet to find a Python tool that does this well. Further, `godoc.org` also generates links to types including across packages--this is _not_ tablestakes for statically typed languages--so you just have to click the type name and it will take you to the docs for other packages. Further, there is no CI needed to build/publish your documentation; `godoc.org` just needs access to your repo on github or elsewhere (you can run your own godoc.org inside your corporate firewall). Another nice-to-have feature is that documentation is just comments; there's no formal/obscure syntax a la sphinx.
> I write a lot of Python tools so I'm genuinely curious because if there were unfilled needs I would want to address them as one of my 52 projects: https://timothycrosley.com/
Cool. I'll take a look!
Do you believe that to be incorrect? I think you're probably underestimating how widely used those languages are in massive, "boring" companies around the world. Rust may be the new cool kid, and may even be the future, but the number of companies around the world that have adopted Rust for anything significant today is minuscule.
It has Groovy above Ruby.
Again, Java is everywhere, and many Java shops have added Groovy to their workflow where it makes sense. Ruby is barely used outside a small of number of tech companies.
Not in momentum, but there are tons of installations, and billions of lines of code in COBOL ever churning. If LOC was the main criterium (and not just a factor), and if COBOL projects were hosted in GitHub, most languages with be dwarfed by it.
And Groovy is semi-popular in the Java world, which is huge itself.
But as I said, TIOBE is very good in the top-10 languages, and for spotting new major contenders (by how they jump up spots).
It's not great for relative ranking of the longer tail of languages above the top-10 / top-20...
Boilerplate distracts from what is actually going on. I can generally identify code smells from the shape of python code (like, blur all the text so I can't read the words, and the shape of the blocks tells me everything I need), I can't do the same in go, because there's so much more indentation and visual stuff happening, and most of it (boilerplate error handling) isn't interesting.
Have a look at Haskell which goes to great lengths to eliminate boilerplate and I think you'll experience the opposite--Haskell becomes very difficult to read precisely because the code is so dense. Similarly, take the indentation, newlines, etc out of a JavaScript file or JSON blob (minify it, more or less) and see if it's more or less readable as a result. I think you'll find that visual structure is actually important.
> I think you'll find that visual structure is actually important.
I didn't say otherwise. What I did say is that go adds visual noise that isn't present in python. (and it is noise: the proposal to add try! shows that the error handling style is noisy. It can be basically entirely removed by an automated transformation). Actual pattern matching like rust has, or even what Google C++ has with StatusOr and [1] our nonsense RETURN_IF_ERROR macros are better than what go does, and just as explicit (actually often moreso, since its more difficult to forget an error condition)
[1]: https://github.com/protocolbuffers/protobuf/blob/d0f91c863ae...
Naturally if the community finally gathers around PyPy, that might change.
In all 10 of the places it was adopted?
Compared to Python with the thousands of deployments "at banks and life science research labs" it's a no show....
Yeah, preference distributions are hard to assess. Either of us could be wrong.
> I didn't say otherwise. What I did say is that go adds visual noise that isn't present in python. (and it is noise: the proposal to add try! shows that the error handling style is noisy. It can be basically entirely removed by an automated transformation).
I’m glad we agree that terseness is not readability and visual structure is valuable. How do we meaningfully debate whether some boilerplate is noise or useful visual structure? Why is Python’s implicit propagation of errors elegant and beautiful visual structure while Go’s explicit error handling is ugly noise? Specifically how do we know that you aren’t prejudiced by your disproportionate experience with Python (even assuming my disproportionate experience with Python and preference for Go is an outlier)? What are the criteria?
(though, "competes" assumes Julia is getting anywhere close to Python's adoption in data science, but still)...