Django: Reformatted code with Black(github.com) |
Django: Reformatted code with Black(github.com) |
Some people make the case that it's easier to write single quotes (well, depending on the keyboard format anyway). For keyboards in the US standard you have to hold the Shift key to write a double quote. But the good thing about Black is that you can still write your code using single quote and when you run the command line utility it will fix/normalize the code to use double quotes.
Nowadays I got so used to it that I even write my Python code using double quotes. And looking at Python code using single quotes looks weird/unpleasant for me.
syslog('debug',"Just opened %s for output",filename)
While there's no semantic difference between single and double quote, in my code base, there is. And if black becomes very popular, why even support single quotes anymore?Technically single or double quotes have the exact same meaning in Python. What makes people use single quotes is probably other languages like PHP, Perl and Bash.
I know I've made it a habit to default to single quotes unless I know I need double quotes. So that might be where the habit comes from in the Django project. But it's not actually necessary in python so might as well use the most commonly used type of quote.
This was finally merged to the main branch today [2].
I suspect there are lots of other both open source and private projects that are also making the change now. This is a show of confidence in Black as the standard code formatter for Python.
0: https://github.com/django/deps/blob/main/accepted/0008-black...
SSort is currently used for several hundred kilobytes of python so I'm wary, but if I'm going to make a breaking change before 1.0 then I think this is likely to be it.
I tried it on one of my Django `admin.py` files and it created NameErrors.
class TestAdmin(admin.ModelAdmin):
list_filter = ("foo_method",)
def foo_method(self, obj):
return "something"
foo_method.short_description = "Foo method"
# It turned it into this:
class TestAdmin(admin.ModelAdmin):
list_filter = ("foo_method",)
# NameError
foo_method.short_description = "Foo method"
def foo_method(self, obj):
return "something"Which is preferable, and why?
def myfunc(): global globalvar str(globalvar)
globalvar='abc'
myfunc()
will be transfered to
globalvar='abc'
def myfunc(): global globalvar str(globalvar)
myfunc()
I understand why is it done but i dont want to have function definition block filled with this declaration of variables (which i do later) since it has no impact to my code and it makes is just a bit "cleaner". Dont tell me to not use global variables :D
What I don't much care for is reorder-python-imports, which I think is related to black (but don't quote me). For the sake of reducing merge conflicts it turns the innocuous
from typing import overload, List, Dict, Tuple, Option, Any
into
from typing import overload
from typing import List
from typing import Tuple
from typing import Option
from typing import Any
Ugh. Gross. Maybe I'm just lucky but I've never had a merge conflict due to an import line so the cure seems worse than the disease.
Edit: Just to be 100% clear: this is python-reorder-imports, not black. I thought they were related projects, though maybe I'm wrong. Regardless, black on its own won't reorder imports.
As a couple of examples, PHP has had a unified formatting standard since 2013 and Elixir has a formatter built into the language. Both languages need the formatter to be enabled by your IDE/CI and that's also the case for Black.
Python was always meant to look concise / beautiful... (MyPy has also made this trickier too)
- doesn't respect vertical space - sure, making the code fit on screen might be valuable (though the default width should be at least 120 characters, I mean we're in 2022 after all), but Black does it by blowing up the vertical space used by the code
- spurious changes in commits - if you happen to indent a block, Black will cause lines to break
- Black fails at its most basic premise - "avoiding manual code formatting" - because a trailing comma causes a list/function call to be split over lines regardless of width
The built in "facebook style" formating felt by far the most natural to me with the out of the box settings and no extra config.
Black gives generally nicer output, and also more predictable output because its folding algorithm is simpler. YAPF uses a global optimisation which makes it make very strange decisions sometimes. Black does too, but much less often.
There are also non-style problems with YAPF. It occasionally fails to produce stable output, i.e. yapf(yapf(x)) != yapf(x). In some cases it never stabilises - flip flopping between alternatives forever!
Finally it seems to have very bad worst case performance. On some long files it takes so long that we have to exclude them from formatting. Black has no issue.
In conclusion, don't use YAPF! Black is better in almost every way!
I figured yapf was not "new" which is why black won.
Starting about 5-6 years ago there was a push in the Python community to replace solved problems with new ones in what appears to me as chasing the JavaScript community.
Instead of consolidating on existing tools that worked well but had some rough edges to smooth out, numerous projects came about to reinvent the wheel.
...which is why I wish Black allowed more configuration. A team can often agree on a set of styles. Every team on the Python planet agreeing... now that's much harder
print(repr('some string'))
to print "some string"
instead of 'some string'
then that would remove the only hangup about Black that I have.For clarity, I'm hoping to open us discussion about how we're dealing with massive changesets like this that are difficult to review due chiefly to the breadth of it.
Update: Found it:
> How stable is Black’s style?
> Starting in 2022, the formatting output will be stable for the releases made in the same year
- The changeset is authored by a trusted committer but the committer's tools have been locally compromised.
- The public tool itself (e.g. black) has been compromised to automatically create vulnerabilities in difficult-to-review bits of code (a Ken Thompson hack).
My personal experience is that 1) in many cases you do benefit from taking a moment, going through your code and thinking about presentation. And 2) I find it not at all difficult to settle. A change either doesn't matter, then you just don't discuss it at all, or it is important, then you quickly agree on the best solution. (In the worst case, "best" means what the project lead finds prettier.) If you don't have a social mechanism to agree on something as basic as coding style, then your team probably has bigger problems.
I actually find robo-formated code annoying to read: Go code from a bloody beginner who doesn't know what they are doing looks exactly like carefully tended for, highly thought-out code. And in autoformatted Python, you for example cannot make formulas clearer by removing spaces around operators with higher precidence. Parentheses placement is dicated by how long words are and not by what logically belongs together, etc..
Simply give up your mind and you too can be free.
[1] https://github.com/django/django/pull/15387#issuecomment-103...
https://github.com/ipython/ipython/pull/12091/files
And here’s a GitLab issue requesting support for blame-ignore:
https://gitlab.com/gitlab-org/gitlab/-/issues/31423
I don’t think there’s a corresponding GitHub request, but maybe if GitLab adds this feature GitHub will have some incentive to follow suit.
A year later and it seems to be the default on all projects I'm working on and I'm loving it.
It's this one
> Or is the point to autoformat
This one is done with pre-commit (which should probably be named pre-push?) hooks
> and autocommit the formatted code?
I don't think this one is done, and I think it's undesirable
It's this. Ensures that anything merged to master keeps the formatting conventions established in the project.
It is way better to deal with ugly formatting as long as it is consistent than with discussions where to put a closing brace/bracket/paren.
https://pycqa.github.io/isort/docs/configuration/black_compa...
$ cat foo.py
from x import a, b, d, e
from x import c as C
$ isort foo.py
Fixing /tmp/foo.py
$ cat foo.py
from x import a, b
from x import c as C
from x import d, e from typing import (
overload,
List,
etc...,
)
would seem more sensible to me. I know you can make isort do that, I guess maybe not black. from typing import (
overload,
)
is silly, but I don't want: -from typing import overload
+from typing import (
+ overload,
+ List,
+)
when all I actually did (semantically) was: + List,In any case, you can wrap those in parentheses, in which case black will just enforce its usual tuple formatting: single line if it fits; one line per item if not, with a trailing comma.
edit: I tried it on a long line with a backslash break, and black wrapped the imports in parentheses like I suggested above. I wonder what causes the behaviour you see on your end.
Alt-enter over the would-be imported term to add an import, ctrl-alt-O to autoformat / autoremove (aka optimise) redundant ones.
You can then turn on folding to not see those imports much via Prefs -> Editor -> General -> Code Folding
from typing import *
because I get annoyed at having to change my imports each time I need a new type in my type annotations.
But even now I have to adapt on screen shares when my coworkers are using dark themes before sunset. Can't imagine what that would be like with different code formatting. So the next next step is to display their desktop with my theme.
IOW collaboration tools still have a long way to go.
Ostensibly you could craft a future where what is on disk is not what the user is actually editing - a-la the virtual DOM. And on read/save the developer's preference is used to transform the syntax into their ideal shape. This is trivial in a Lisp, but not so easy in other languages.
If this system results in syntactically identical code [1] it should not matter if it’s displaying for you differently [2] if it means you can read or write around or in it more comfortably it’s just a hairstyle.
I was asked to familiarize myself with Replit the other day and it seemed the editor defaulted to two spaces for Python. Two spaces?! I changed it to four.
A friend joined my session and began to code with me, their editor was in the default two space indentation. It was madness.
[1] This seems like is a decent sized presumption across many languages and versions.
[2] This seems like an interesting AI problem, showing code structures you’ve never used in your style you’ve never defined.
I think that making editors do this is within the realms of feasibility. Most support auto-formatting to your preferred style so it doesn't feel like a leap for it to format to your preferred style but keep the file on disk the project owner's preferred style. I haven't looked extensively to see if this already exists though but we chatted about this at work as I was advocating for use of prettier on a front-end project!
Example here: https://bignerdranch.com/blog/git-smudge-and-clean-filters-m...
IMO the next next step is, as others have discussed on HN, getting your version control to store and abstract syntax tree. tree-sitter could make this easier nowadays, but I think it'd need more invasive changes in Git than just using the filters.
See this HN thread https://news.ycombinator.com/item?id=28670372
An autoformatter removes 99% effort from formatting code, and that includes code actively being worked on. Autoformatters are incredibly useful.
A standardized format removes effort spent learning to read a new format. That's an hour per format at most.
I don't see any good reasons for an autoformatter to enforce a standard. A standard would work just as well if defined as a specific configuration.
Blacks value isn't autoformatter, it's preemptive discussion ender.
One hour my ass.
Are the mentioned issues resolved by now? E.g. the quadratic algorithm?
foo = (
spark
.read
.parquet(...)
.filter(...)
.withColumn(...)
)and turns it into
foo = spark.read.parquet(
...
).filter( ...
).withColumn( ...
)which feels harder to parse for me. I also never quite got on board with the trailing commas.
assert outputs.get("foo.bar.baz", "default") == pytest.approx(
time_recorder.time_taken, abs=0.0001
)
I get why it's done that, but I just don't think it helps humans read.
Part of the twisted beauty of PEP-008's narrow lines is that you're forced to extract (named) variables, or avoid overly indented code by extracting methods or applying higher level abstractions.In the last few years I find devs are happier to format and push to "sort that problem out", leaving the readability benefit of that thought process lost.
TL;DR writing readable code isn't just about getting the spaces and brackets right...
https://github.com/ipython/ipython/pull/12091/files
Unfortunately there isn’t support for it in GitHub or GitLab yet, but there’s at least a GitLab issue here requesting it:
To do this just highlight the block, right click, and choose Git > Show History for Selection.
The only real downside is you nuke your issue tracker at the same time.
> It would be nice if there was a kind of diffing algorithm that can diff code units syntactically across history.
There have been quite a few attempts at that though I've only seen them applied to resolving merge conflicts. It would be interesting to try them for blame too.
Probably best to just make a one time git user to do it.
This is fine with me--I think it makes sense to optimize for readability, and I can read a long vertical list of arguments a lot more readily than a long comma-delineated list.
> spurious changes in commits - if you happen to indent a block, Black will cause lines to break
Is this a generic argument against wrapping lines, or am I misunderstanding something?
> Black fails at its most basic premise - "avoiding manual code formatting" - because a trailing comma causes a list/function call to be split over lines regardless of width
I'm not following this either. If black automatically reformats your code over multiple lines, that doesn't suggest manual formatting. Maybe you're arguing that all code which produces a given AST should be formatted in the same way--this would be cool and I would agree, but black gets us 95% of the way there so to argue that it "fails" is to imply that "0%" and "<100%" are equivalent.
Even in 2022, some people don't have wide external monitors, sometimes like to view two files (or a diff) side-by-side, or need to use GitHub/BitBucket/etc. code viewer pages. Also, it's still difficult for humans to read long lines.
You cannot read things you can't see. If half a function is scrolled off the bottom of the screen because every function arg is on its own line .... its pretty annoying.
IMO, < 80 is ideal where possible with an absolute maximum of 99. I think Black's choice of 88 (plus maybe a little more in special cases) is quite good.
Yeah, this one drives me nuts too.
But I also despise long lines with a passion, I hate having to go to the right, and would much much rather scroll up and down with a consistent width, so that I can put multiple views next to each other.
[0] https://black.readthedocs.io/en/stable/the_black_code_style/...
Ah that's why `manage.py shell` now split json pasted on several lines, very annoying
Why did your team not implement a style guide? Not following style is not working as a team and this needs to be addressed.
The advantage of a tool like Black is that it avoids that constant bikeshedding and the fact that it actually does the work for you puts the conversation in a different light because the option which is the least work is just letting Black format the code. Whatever you pick for style, you really want automatic formatting to avoid it seeming like a chore.
But, it's a "rip the plaster off" kinda thing, because it should ensure a lot less churn, inconsistent code style, or arguments and reviews about formatting after this is merged. It frees up a lot of headspace and distractions in code reviews. I don't know about you, but when I did code reviews I'd always end up zooming in on code style issues - ' vs ", things on newlines or no, JS objects with stringed keys, etc.
Beyond `.git-blame-ignore-revs` (which is neat and TIL), in GitHub's web viewer, if you find the line you're interested in and see that the most recent PR is a reformat, you click the "view blame prior to this change" button. I think most blame viewers do (or at least should) have a feature like this.
I also had to turn off Black's quote normalisation otherwise it is really obvious which is which. Quote normalisation is another point in Black's favour.
I could put the survey up somewhere if anyone is interested.
The trailing comma thing seems inconsistent with handing over the formatting to black. Now I’m in charge of deciding if a list should be cleaned up into one line or not.
My point was that the user doesn't matter (vs. anything else about the commit) to me in any context that I see it.
And then I mentioned without reiterating the advice about hiding the commit from blame just as you did.
In any context where I see "OJFord committed 'Autoformat with black'" for this, it's not 'OJFord' that's the problem IMO.
Exactly. I have learned that having it formatted just like I want it is FAR FAR FAR less important than having the entire ecosystem share a single format.
[
Item1,
Item2
]
Is combined to one line, while [
Item1,
Item2,
]
Stays as multi line. Now I am once again in charge of formatting my code, by virtue of the comma. Does this stay multi line or is it short enough enough to combine? That should be for black to decide not me! $ usort diff foo.py
--- a/foo.py
+++ b/foo.py
@@ -1,2 +1 @@
-from x import a, b, d, e
-from x import c as C
+from x import a, b, c as C, d, e
https://usort.readthedocs.ioMaybe if I were allowed new syntax:
from typing:
import overload
import ListIt feels like we're trying to justify the continued employment of uncooperative, contrarian egoists. Pick a style and use it or they can go find another job to waste time debating nonsense.
I've had that happen a few times and the number of people who will volunteer opinions has consistently been at least an order of magnitude larger than those who are willing to contribute something like linter or formatter configuration.
assert outputs.get("foo.bar.baz", "default") == pytest.approx(
time_recorder.time_taken, abs=0.0001)
always feels a bit off and "unbalanced" to me. The opening paren doesn't have anything immediately following it, so it feels 'symmetric' that the closing paren shouldn't have anything preceding it.And also it feels like the open and closing parens should be on lines that start at the same indentation level. Honestly this I think does aid in readability a bit.
> Part of the twisted beauty of PEP-008's narrow lines is that you're forced to extract (named) variables, or avoid overly indented code by extracting methods or applying higher level abstractions.
This feels orthogonal? The line is wrapping either way, which might sufficiently annoy someone to extract things out a bit more. But IMO it feels like a bit of an anti-pattern to create abstractions on the basis of syntax as opposed to the structure of the program.
> writing readable code isn't just about getting the spaces and brackets right...
? I mean of course not, but that's what we're talking about in the context of formatters right? I think the real, major way auto-formatters help with readability is by getting people to stop wasting mental cycles on things like spaces and brackets so that they can focus on more important code organization concerns.
int foo() {
return 1; }
(This example actually breaks up vertically in my mind. As if it's just the number 1 being bracketed)Maybe it could be broken up differently though to avoid the lone paren.
assert (outputs.get("foo.bar.baz", "default") ==
pytest.approx(time_recorder.time_taken, abs=0.0001))Using the same three lines, you could instead assign each result to a temporary var:
x = outputs.get("foo.bar.baz", "default")
y = pytest.approx(time_recorder.time_taken, abs=0.0001)
assert x == y
If using an assert method, I think this looks okay too (although still a bit noisy): self.assertEqual(
outputs.get("foo.bar.baz", "default"),
pytest.approx(time_recorder.time_taken, abs=0.0001),
)
I find that if black produces ugly output, it's usually because of something that I could improve, and I appreciate the hint.if foo: if bar: do something else: do something else
how would you autoindent that?
Using an automatic formatter also can help reduce diff noise, which is something I've noticed on more active projects. Using Black drives close to zero the amount of time I spend confirming that someone didn't actually change functionality along with formatting. There are other ways to get this, of course, but it's so easy just to enable Black and never spend your time on it again.
> Parentheses placement is dicated by how long words are and not by what logically belongs together, etc..
This is actually a bit more subtle: Black will remove parentheses when they don't have any effect (e.g. `(32)` will become `3 2` because it covers the entire expression) but if you use them for only a subset of the expression they'll be preserved (e.g. `(1(32))` becomes `1 * (3 * 2)`.
They are also huge timesinks, people spend less time sometimes deciding which framework or language or cloud provider to use and which country to register their company in, than spending hours and hours on how some SICK ANIMAL forgot a trailing comma somewhere, or bikeshedding how many spaces to indent with, and who uses an ultrawide monitor and wants wide columns (but then again someone turns theirs into portrait mode, and wants narrow columns), etc. Gobs and gobs of time, totally disproportionate to the issue at hand!
Yes, black sometimes produces fugly code, but at least the bikeshedding can fucking stop. There are bugs to fix, for chrissake. Yes, some people are not thoughtful enough to format their code like a piece of poetry, so what? Should everyone take Typography 101?
There is also a huge cohort of programmers who, when doing reviews and finding nothing to nitpick, resort to criticizing, in great detail, the code formatting of pull requests. They fell they MUST leave some critique or else their review is incomplete and their peepee is small. I find it hugely enjoyable that a tool like black can deprive them of the joy of belittling someone else for microscopically minor things and concentrate on, say, the actual merit of a change.
I do quite often artisanally craft formatting of specific code sections to highlight the intent of the algorithm or design. I assume black would merrily destroy all my hard work.
I hope not!
> Sorting module- and from-imports separately also makes it much easier for many people to visually scan a block of imports.
For me it's the opposite. My brain skips over the irrelevant parts (i.e. import/from). Not moving the import also produces better git diffs
foo = (
spark.read.parquet(...)
.filter(...)
.withColumn(...)
)instead
of
being
split
up
into
too
many
lines.
The whole point of an opinionated formatter is to have opinions about these sorts of things.
> Second, where do you break a line if it's too long?
It depends on the context. Yeah, writing the algorithm to make these decisions is a little complex, but it's also well-understood.
> doesn't get in your way more than necessary
What is "necessary"? It seems like you're trying to say "it makes decisions on the things I think it should make decisions on" which is fine, but it's not like choosing between `struct {` and `struct{` is objectively more critical than line wrapping.
Keeping lines reasonably short is nice but doesn't need to be done strictly. It can wait until someone edits the code.
80
> where do you break a line if it's too long
Wherever a keyword or name ends, but does not exceed 80.
Gofmt has made opinionated decisions about everything, why stop at line breaks?
Black is opinionated, so it skips the debate entirely. We just use Black, not black with 120 characters lines, just Black.
To each their own I guess, but to me it just seems like pandora's box. Once you show that you are open to changes in the linting configuration, it makes the rules mutable and pretty much guarantees that at some point someone will say "how about we just change this one parameter in the linter", which will probably be agreed by the rest of the team, not because they actually agree but because they don't want to argue.
gofmt is also opiniated. Someone somewhere picked the defaults and maybe I disagree with those defaults, but in the grand scheme of things, all go code is more approachable because it all looks the same. That to me, is worth a lot more than having my favorite format be the one on top.
EDIT: If you want to discuss the actual merits of Black, you can read their style page that actually explains every choice they made: https://black.readthedocs.io/en/stable/the_black_code_style/...
What for? Just to be clear even Django itself isn't obviously using a configuration, what makes your team so special they need this?
With backwards sorting you know that, unless there is a cycle, you can always scroll up from a call site to find the definition or down from a definition to see where it is used. With forwards sorting you can scroll down to find a definition, unless the function was imported, or used as a decorator somewhere, or called by something that was used as a decorator, or used in some other way that I haven't thought of.
My personal experience is that this predictability is hugely useful. It almost entirely obviates the need for jump-to-definition within a module, and gives modules a very obvious shape and structure.
I just wish sometimes they were more configurable. For example, the Elixir formatter is quite opinionated on things but is generally not configurable.
For example, if I use the pipe example on Elixir's landing page and add comments to it (obviously a contrived example):
"Elixir" # string to get frequencies for
|> String.graphemes() # Get all graphemes (i.e., character units)
|> Enum.frequencies() # Get the number of occurrences of each grapheme
This gets formatted by the Elixir formatter to: # string to get frequencies for
"Elixir"
# Get all graphemes (i.e., character units)
|> String.graphemes()
# Get the number of occurrences of each grapheme
|> Enum.frequencies()
This is a rather strongly opinionated format stance, and as far as I know, it's not configurable.Taste is something we cannot objectively agree, and in fact, people will end up arguing even about this very statement, offering what they think is an objective measure.
Bikesheding, yes. Hours lost in meeting, chat debates, documentation to write, linting configuration. To be redone for each project, team, etc. Worse even in FOSS where everybody will come in a ticket and complain. And after while, you do it again, because the debate is never settle even in the same team or project.
There is not way to take a team of 3 people, choose a style, and make it so that they are all 100% happy with it.
Black took the road of gofmt: you can't chose. And it won because of that: it saved people time and energy.
People realize the cost was not worth the satisfaction, which you are unlikely to get anyway. Let's just move on to what matters, it's good enough. Pareto.
Sounds like a team problem, I've been on plenty of teams that use clang-format for c/c++ and there have never been any issues like this. Team players know that (almost all) arguing over formatting is not a good use of time. (edit: in case not clear, clang-format is extremely configurable. Set a default config and live with it forever, that's how those teams work.)
> Black took the road of gofmt: you can't chose. And it won because of that: it saved people time and energy.
I don't see how this follows, if a team was dysfunctional enough to be wasting hours and hours of time before, I can't imagine why that wouldn't continue. It just shifts from "let's change this flag in yapf" to "let's switch to yapf because black looks ugly and gives us no options".
You had a good team. I had a great mum. Some people have great doctors.
> It just shifts from "let's change this flag in yapf" to "let's switch to yapf because black looks ugly and gives us no options".
If the practice doesn't match the theory, I'd rather trust the practice.
Yes, and also too hard to set up. It's extremely dumb, but I'm much more likely to use something I can't configure, because if I can configure it, I'm going to want to, and it'll take forever to make all those choices.
"yapf -i --style=pep8" works great.
I wish people just used more locals though. I don't see what the problem is and it makes Sentry errors easier to debug.