Picturing Git: Conceptions and Misconceptions(biteinteractive.com) |
Picturing Git: Conceptions and Misconceptions(biteinteractive.com) |
It's unsurprising that people's mental model of git is incorrect. Git is not something people study at a conceptual level, it's something they learn recipes for in order to work on some project. Recipes like "how do I save all this work I just did" and "oh shit, everything is hosed, please give me a magic spell I can paste into my terminal to fix it".
I don't really blame people, since git itself does nothing to teach you how it works. Git it is the definition of something you have to deal with in order to do something more important to you. Some people want to dig deep and understand how the system works: it's nice to sit near that person and ask them for help sometimes.
Saying "you should really understand more about git" is like saying "you should really study the tax code, it's important and it affects you whether you like it or not." True, but deeply irrelevant!
This is not essential complexity, it's just bad design that stuck.
Take a look at https://gitless.com/
If you just look at a summary of the commands, you will have an accurate mental model of what's going on:
gl init - create an empty repo or create one from an existing remote repo
gl status - show status of the repo
gl track - start tracking changes to files
gl untrack - stop tracking changes to files
gl diff - show changes to files
gl commit - record changes in the local repo
gl checkout - checkout committed versions of files
gl history - show commit history
gl branch - list, create, edit or delete branches
gl switch - switch branches
gl tag - list, create, or delete tags
gl merge - merge the divergent changes of one branch onto another
gl fuse - fuse the divergent changes of one branch onto another
gl resolve - mark files with conflicts as resolved
gl publish - publish commits upstream
gl remote - list, create, edit or delete remotes
To me this clearly demonstrates that the problem isn't that people aren't learning git, it's that git is bad to learn. Stash + Index + Working Tree isn't the right abstraction to present to people. Just say there is a working tree, and tracked and untracked files and snapshots. Done. Branches aren't particular commits but particular working trees on top of particular commits.Working on a feature and want to look at the main branch, but not ready to commit the changes yet? Well just switch to the main branch, then switch back and pick up where you started. No need to know about an additional data structure called the stash.
Unfortunately this did not pick up enough steam. And because a lot of tools expose concepts from gits broken interface you have to learn the git interface anyway...
gl merge - merge the divergent changes of one branch onto another
gl fuse - fuse the divergent changes of one branch onto another
Good while it lasted thoughThe list you provide sounded great until it came to gl switch. Why is there one specific operation for a branch that is NOT done via gl branch?
I don't understand what fuse is supposed to do from this at all. No idea whatsoever. Merge I get and anyone who has worked with any other versioning tool does conceptually.
Rebase most people seem to have a problem with but the abstract concept really isn't that hard. Just like cherry pick isn't really hard but somehow people have trouble with it. Though conceptually it really isn't hard either.
What really helped me the most with git was the realization that it's just a tree of commits with a bunch of labels. Labels have different types so to speak, like branch or tag, remote branches being special in a way etc. And obviously various commands can interact with these labels. Like a fetch updates the remote labels and moves them around on my local copy.
This is HN criticism #94238 on the terrible git CLI.
Okay, sure.
Would you kindly post your superior git CLI? Or at least the outline of it?
---
Snark aside, Git's popularity is not an accident. Bitbucket supported Mercurial too.
The official git handbook, freely available on the official git-scm site is not terribly long, and explains the internals on a conceptual level quite well.
I think the problem is most people learning git land on some wordpress site of someone trying to flog a condensed and uninsightful shortcut to getting started with git for ad clicks, which only involves a series of commands without explaining the effects of those commands - This, combined with peoples expectation that an SCM should take no thought whatsoever causes most people that use git on a day to day basis to not really understand it at all.
Git needs to be introduced as powerful data structure, kind of like how SQL is not a DB, imagine someone explaining SQL without ever refering to the DB tables, rows and fields... only talking about git commits is like only talking about the result of a single query. You must understand the data structure to easily use the interface, otherwise the interface will be very confusing or you will be limited to "recipes"... after that you are just learning new variations on how to manipulate and navigate that structure (yes the graph), and from this perspective peoples complaints about the historical inconsistencies we have to put up with in git porcelain are moot.
So, I started reading through <https://git-scm.com/book/en/v2/Git-Internals-Git-Objects> again to make sure I didn't have anything wrong.
But now there's no point in writing a blog post. Maybe I'll write one that just links to <https://git-scm.com/book/en/v2/Git-Internals-Git-Objects>.
It even has nice diagrams, which I think are essential for this kind of thing.
I used to despise git because it was so hard to learn. Then as an exercise I started writing my own code to read and write its underlying files and it finally dawned on me how simple the whole thing was.
Git's a very unusual piece of software; it's mind-bogglingly useful, the basic data structures and algorithms are perfectly matched to its job, and it has a UI that's a train wreck.
it's like coming into a forum for accountants where people bitch about having to learn tax code. please...
“All operations on a repository involve adding commits and/or manipulating the name resolution table.”
It may be simplified, but that statement alone, taken in context, is worth its weight in gold.
I'd say that's definitely the case but also a problem.
Sophisticated users mixed with people who just want to do a few simple things is a bad combination. I seem to remember that ClearCase had the same issues.
If you would like to know more about how to manipulate the git graph, take this excellent (and free) training:
https://learngitbranching.js.org/
To slowly level up, you can watch video demonstrations from Dan's git school. Dan provides 48, 30 minute training videos:
https://www.youtube.com/watch?v=OZEGnam2M9s&list=PLu-nSsOS6F...
Able to commit locally, examine changes work with them and then push is a something you might not need or require if you think about version system like SVN.
But if you have learned Git or Mercurial or some other distributed system you would never go back to svn.
Once an SVN user discovers the magic of a staging area, stashes, or "git add -p" I don't know how they could claim SVN does anything better. All I remember from those days was how slow everything in SVN was. It felt like every command was backed by some horrible O(n^2) operation or really slow network connection.
git isn't hard. FFS, we shouldn't keep seeing these posts hitting HN every week. iptables? That's tough. DNS? No thanks. Managing package.json and keeping an app up-to-date? Git is nothing in comparison to the real challenges I face everyday.
If SVN is wonderful for you: Great! But that's not really relevant to the issue of using git effectively.
I feel like this is mostly accurate, to my knowledge, but reading this:
> I do not claim that this way of looking at Git represents absolute “facts” in any hard and fast or literal sense. But I contend that if you conceive of Git in the way that I’m going to suggest, if you substitute these conceptions of Git for any misconceptions you might have now, you’ll be a much happier and more fluid Git user.
…vexes me.
“Think of git like bowl of peanuts and marshmallows” and other pointless, wrong, metaphors about how git works are a dime a dozen.
Yet, here is someone who is clearly quite familiar with git, and they go to pains to point out they are simplifying and may not be correct in their explanations.
Its good to be humble, but ffs, git is too frigging complicated if the best you can get is a “probably wrong simplified mental model of how it works so you can be a bit more productive with it”.
I dont care;
- a simple meaningless metaphor that lets you be more productive? OK.
- a accurate description of how things actually work? OK.
…but pick one.
What I do not want is a possibly wrong complicated explanation of how git maybe works.
Based on the title, I was expecting a more in-depth study of user misconceptions about git, similar to the famous CogSci paper "Two Theories of Home Heat Control." Except with like, diagrams.
And now I want someone to make that happen.
1. Commits are immutable blobs that have one or more parents. Graphs, not trees. Anyone who uses trees for git commits misses the whole point and makes their (and their collaborators) lives complicated.
2. Tags are (mostly, best practice) immutable pointers to commits. Tag are "this is this thing FOREVER*."
3. Branches are named, mutable (by design) pointers to commits. Branches are "this is this thing FOR NOW. Later it'll be something else."
4. HEAD is special "branch" that moves around automatically.
5. Origin is the local snapshot of the remote. Origin is "what did it look like when I last looked."
6. (fundamental but not critical) Remote is the current remote state (queried by RPC).
7. Index (aka stage) is where you put changes you want to make into commits. (this is somewhat simplified). Index is "My current and immediate plan. Scrub as needed."
That's (mostly, for non advanced use cases) it. Everything else are commands to query or manipulate the various state. Every action (until it becomes instinctual knowledge) should follow the same recipe: 1. Figure out the current state (current commit graph, relevant branches). 2. Figure out the target state (desired commit graph, new branches positions). 3. Mutate using ANY command you want.
I think that's the issue really. Inexperienced dev / people who don't understand git look at commands as "this is how to do a thing". No. In Git there isn't "how to do the thing". It's exactly like writing code - so many ways to achieve the goal, just choose your own. It might be efficient and elegant, or bumbling and ugly, but it'll get there.
No, the problem is not with "how people use Git". The problem is with git. We've known for years how to make clear, concise interfaces that help people understand what's going to happen. Git does not have a clear, concise interface. That is its biggest problem and will continue to be until it is changed to have a clear, concise interface.
I would argue most things in technology are complex, and mental models are intentional ways to take something complex and turn it into something more simple. This article does not create meaningless metaphors.
Personally speaking, I find knowing and distinguishing among the 4 indexes to be essential to understanding git. Not including and really exploring that detail gives people an incorrect mental model of what's happening.
Marvelous, if the metaphors of the article helped you, but I empathize with the upstream poster's frustration. I believe that the content of the article is not medicine for the malaise it describes.
The command line interface to git is insanely complicated, confusing, and unnecessarily difficult to use, but this isn't a result of the git data model. It's definitely possible, to give a complete and accurate description of the data model, even using examples from `git cat-file` to walk through the commit history by hand.
I've also got a simple demo that generates a complete repo with a commit. You can manipulate the resulting repo from git. There are 65 non-comment lines of code.
Here it is: https://orib.dev/ugit.py
Heck, "a monoid in the category of endofunctors" is simpler.
[1] From the top of my head: The working tree, the index, the stash, the repo ADG, the local remote repo ADG, the remote repo ADG. Of course the branch labels are further state, and working with the commits directly is discouraged. Oh and files can be either tracked or not, and they can either be ignored or no. And one isn't a subset of the other. And that also interacts with the various state transitions.
I can't (yet) reason about monoids easily. But I can reason about Git, even if I can't figure out the single command to change the state the way I want it and have to resort to multiple commands. I guess it's easier for me to think in graphs.
I could never understand what kind of twilight zone stashes go into or remember which stash is which when I had too many of them. So I never use stashes any more, I just make a branch instead.
I largely use git add -A, so I can pretend that the index does not exist.
3. Branches are named, mutable pointers to commits, that you can "ride". While you "ride" a branch it keeps moving to always point to your latest commit.
4. HEAD is an implicit branch that you "ride" at all times.
re "ride" - that's exactly what I'm trying to avoid. It's an additional concept that isn't needed to understand Git. You need to understand the model. The "ride" is an emergent property of the model and commands that you eventually understand, but not a core part.
Bad programmers worry about the code. Good programmers worry about data structures and their relationships.
-Linus Torvalds
Anyway, it's not for everyone to get to understand git this way, I guess. Some people will just react "just tell me how to do X in git!"Seriously, I too find the basic concepts of git quite simple. But whenever I want to do anything slightly out of the ordinary, I find myself wasting a lot of time searching the docs. In fact, I find the naming of commands and their options almost the opposite of intuitive, given my understanding of the basic model.
Then I read "git inside out" [1] (not to be confused by "git from the bottom up" which I think is not as good), had a "aha!" moment, my view changed and everything became clear and easy. Transformation from graph to graph is something I do every day, so why not in Git?
[1] https://www.slideshare.net/MichaelNadel/git-inside-out-57904...
That would be more applicable here. But I couldn't immediately find it, so I pasted this one instead, which is somewhat close but not perfectly related to the OP.
I think understanding what .git dir contains and represents is as important as understanding the tool.
It's not like you need to understand how the tool works internally, or how it's built.
Analogy would be that you want to understand how to use a hammer, sure, but also the characteristics of material you manipulate with it. You don't need to understand how the hammer is built.
Then there are those whom I'm sure should have the necessary experience (because they are C programmers, for example), but still don't seem to get it. These people I think just don't care. They don't care about version control and therefore it's irrelevant what git is trying to represent. They just want to get their code merged.
There are some funny neologisms like "check it out on the git" since they don't know what git actually is Vs how to use it but still.
It's simplified, but really, not that much.
Once open source projects started moving to it from older platforms (Sourceforge, Trac, etc), developers who might not have cared so much about VCS flavours followed.
Centralized version control systems like Subversion are oriented around "committer permissions". Giving someone commit access incurs friction (waiting for a human to respond) and risk (potentially unwanted changes).
None of the old platforms like SourceForge implemented the concept of cloning a whole repository. Even if you made your own clone (manually or otherwise), what good would that do? Your copy would diverge from the upstream and can never be automatically reconciled. You would be forever doomed to sending patches to upstream, and calculating diffs from upstream to apply to your repo. Or just replacing your clone with the upstream's history after your patch gets accepted.
Git natively supports multiple asynchronous repos representing the same "project" because its history is designed around a directed acyclic graph (DAG) structure. You don't need the upstream repo's permission to make your own clone and commit some changes. After you diverge from the upstream, if you solicit them with your branch and they accept, then both of you can converge once again - whereas this is impossible in a linear history model like SVN.
If a tag has any attempt at immutability at the data structure level, I know nothing of it.
Once you've pushed a tag, no other clients will be willing to update their definition of that tag unless the users on those other devices force the issue.
So operationally, "tags are immutable once pushed" is a pretty reasonable way to look at things.
Remotely pushed branches of course also won't allow you to do anything but append without forcing on remote clients, so mutable and immutable isn't quite right, here.
So I guess I agree with your original contention, branches are mutable-and-you-can-ride where ride means "the remote client's porcelain will be happy with append mutations".
Sometimes it's reasonable to consider a tag immutable, though you should always checksum if you do.
gl commit a.foo b.bar
committing all but some files is done with gl commit -e a.foo b.bar
gl commit -p allows you to interactively commit parts of files.gl doesn't take any abilities away (it's just git under the hood after all), it just exposes the abilities in sane ways.
If you actually look at the homepage of gitless you will also immediately see what fuse does:
I believe that by reading that one, not very long page, most people (including non-programmers) can use gl correctly most of the time. This is not the case for git.
BTW, gl branch is for creating/deleting branches, gl switch is for switching your working tree from one branch to another. These are very different things, why should they be under the same command?
For git, the last paragraph is a necessary but in no way sufficient step towards using it proficiently. Gitless is actually much closer to realizing that vision.
Seriously, people need to go back and teach beginners git to realize how bad it is. We have internalized so much of the bad design decisions in git that we don't notice them anymore.
I understand what gl branch "is supposed to do" but I don't see why gl switch is its own command given the other reasoning presented for why gl "is better".
I would say it is different. Probably very workable. Completely intuitive and the only reasonable way to do version control? Definitely not.
To me it's very very natural that git checkout will check out any commit I give it. How I specify that commit is up to me. It could be the commit hash. It could be a text label. That text label might on a logical level be a branch. Or a tag. Why do I need to switch branches with a special switch command when checkout handles this perfectly well?
And no, my post did not say that you will understand the details of every command in the list from just looking at the list. It only said that the list demonstrates that a much more coherent, less stateful, simply better UI is possible. That you can not fully explain the difference of fuse and merge in one sentence summaries is not a counterexample to that.
I said you will have an accurate mental model of what's going on. From the summaries you can tell all state you interact with: Working Tree, Commits, Track/Untrack status. That's it. That mental model is perfectly sufficient to accurately predict what most anything will do. And crucially the explanations and mechanisms to achieve all the workflows people asked for in this comment thread can be achieved with these ingredients just fine.
It's what the "Git is easy! You just need to understand that it's a DAG!" crowd pretends git already is.
working directory - this is the project directory in the OS file structure
index - a.k.a the staging area
repository - in the .git directory
stash - a kind of scratch pad or clipboard for the developer
Understanding these different areas and how and why to move data into each is essential to understanding git
To be exactly clear, there are actually 4 staging areas within what you referenced as the index. This is indeed a detail most people do not worry about.
As for the first point, fine grained control for what goes into a commit, that's definitely a power user feature, but an important one of course. Again there are ways to achieve this without introducing new state (the index), for example by allowing to amend the last commit.
I wouldn't claim that gitless is a 100% complete git replacement for expert users. It just shows that git has way too much state exposed to users, and has confusing commands to make that state interact. Obviously we all learned git and use it successfully, so it's obviously not broken or anything, it's just worse than it could be (and the constant chorus of "it's so simple, just a DAG!" is a bit grating if you have to teach beginners regularly).
The gitless authors did do some research with users that backs up the claim that this is conceptually easier to use:
It's not a power-user feature, and it shouldn't be considered one. It should be taught as a standard part of any workflow: before committing, look at the changes you're about to add, and use hunk-staging features (e.g. trivial using Magit) to stage and commit unrelated changes separately.
For example, did you clean up some comments and docstrings while you were adding a new feature? Commit those improvements separately, so that if you need to revert the feature commit later, the improvements won't also be reverted. It also makes reviewing much easier, as each commit or patch, having its own purpose, can easily be reviewed separately, and attention can be focused on parts that need changing.
> Again there are ways to achieve this without introducing new state (the index), for example by allowing to amend the last commit.
Amending a commit does not serve the same purpose as staging files and hunks separately into the index.
It's my impression that few git users understand the value of the index, because few of them use porcelains that expose its power in simple ways. If I had only "git add -p" to use, I might not, either. But Magit is, well, like its name implies, like magic.
And your workflow of gradually building up an index of (parts of) files can be achieved by partial/amendable commits. You simply iteratively/interactively add files and partial files to your latest commit until you're done. Instead of building up the index and then committing it, you just build up the commit directly.
This also means you can interact with the "in progress commit" in the same way as with all other commits.
There is no need for having an index to realize what you want.
Another minor point: Your workflow _is_ a power user workflow in my world. Out of twenty people that have reason to use git, one has use for this workflow.
It seems we roughly agree that there is a lot of scope for improving git though. I looked at magit and it looks nice. It exposes all the moving parts in a user interface. I would prefer to just have fewer moving parts, but if they are there it's sensible to make them obvious (and it puts to rest the idea that all you need to understand is that the git data structure is a DAG...)
Other tools have none of this overly stateful bullshit. When I want a file to be included in the next commit, I don't want a silent, implicit copy to be whisked away into some interal storage the moment I flag it. There is NO reason why merges need to destroy the contents of that same storage area. The totally confusing semantics of the reset command with the three nonsensically named modes soft, mixed and hard are also created solely by this particular misfeature of a magical hidden storage area.
There is also no reason to even support destructive history editing. Immutable history is the correct choice. The mercurial evolve extension, for example, supports rebasing without destroying history.
Also, the combination of not being able to close branches instead of deleting them and not storing branch names in commits makes git history completely undecipherable when you have to go back more than a few merges. You might just as well throw it into the garbage bin.
Just take a look at competing tools (free and commercial) to see what's being innovated in this space and how this widespread obsession with git is in fact preventing much needed progress.
rofl. git is perfectly usable. proven by the hundreds of thousands of people using it daily. most of your 'complaints' here are not actually how git works in reality, and are just user error due to not bothering to learn the tool.
1. git never destroys your work inside a repository unless you told it to.
2. git doesn't 'whisk' things away implicitly. you committed them, that's pretty fucking explicit.
3. ummm every version control is stateful. wtf do you think any given version is? its fucking state. SVN has internal state, git, hg, fossil....
4. the commands, sure naming is fucking HARD we know this. we also know git gained different features over time to accommodate different workflows. perfect? absolutely not but a rose by any other name would smell as sweet.
1. a) git's garbage collector runs without asking first. b) git's UI is so bad that users regularly end up in a state where they effectively asked git to destroy data without realizing it. It's often too implicit.
2. Ever modified a file between git add and git commit? Did these extra changes get committed? Some graphical clients try to hide aspects like this.
3. Unlike SVN, hg etc. git has superfluous extra state with complicated rules. The fact that the cache/index/staging area is inconsistently named three different things just illustrates how byzantine these rules are. And they can be done away with completely.
4. A rose by another name wouldn't smell as sweet - our senses of taste and smell are easily influenced by our other observations about an object. Human perception is weird. In the same vein, bad naming of features invites more usage errors.
Do you use GitHub, GitLab, Bitbucket, gitea, ...? There's your central repo. If nothing else, it represents the “backup” facet of using an SCM.
You can already do that with TortoiseSVN or just through the cline
>stashes
SVN uses shelves
>git isn't hard
This feels like the "lisp is magic" argument that no one can seem to prove, despite how universal its proponents claim the law to be.
You are literally replying to a comment that describes a possible better CLI for git...
I maintain git stacks up well against other similarly mature/complex software (Nginx, AWS, Java), but it's a wonderful read nonetheless.
(And holy hell what a hard thing to search for...can't find the link.)
The trick is to remember it's called "git koans"