What is in that .git directory?

What is in that .git directory?(blog.meain.io)

269 points by Ivoah 2 years ago | 41 comments

charles_f 2 years ago |

By random chance I ended up in the git internals doc^1 today, also lovely refered to as plumbing and porcelain. It's a fantastic read, very well explained. I wish all doc was written with such explicit care to be understood. It reads like a good friend is trying to explain you something.

What got me into that was a 51Gb ".pack" file that I wanted to understand. If you wonder about that, they're pack files, and what that "delta compression" message when you commit is about^2. The 51Gb file though I don't have an explanation for as of yet, I'm guessing something terrible happened before I joined, and people didn't find the courage to forego the history just yet. But at least I got an entertaining read out of it.

^1: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Po...

^2: https://git-scm.com/book/en/v2/Git-Internals-Packfiles

js2 2 years ago | |

Unpack the files (git-unpack). Maybe it was one large file that someone added, then deleted in a later commit. You'd have to rewrite history to get rid of it entirely. Alternately it might be a bunch of medium sized files that were added and removed. It may take a little while to track down, but I'd start by unpacking.

This stack-overflow looks like it contains a reasonable description about how to rewrite history to remove objects:

https://stackoverflow.com/questions/11050265/remove-large-pa...

It might be easier to declare repo bankruptcy. Seed a new repo from the existing repo's source files. Have the commit message point to the old repo. Stop using the old repo. Yes, you lose history and folks trying to perform repo archeology will have to jump to the old repo.

But rewriting history to remove large files can be equally as awful since references to git commit IDs tend to end up in places you don't expect and when you rewrite history, you change the commit IDs.

Good luck.

charles_f 2 years ago | | |

Thanks! Yeah I plan to get to the bottom of it. I will probably propose to just keep a branch with full history somewhere (we need to keep history for auditability) and reset the main branch from a recent state.

MarkSweep 2 years ago | |

RE large pack files: you can remove unused objects with these commands:

git repack -AFd

git prune --expire now

Also related, the initial git clone from a TFS server (as of 2015) can include every object ever pushed to the server, even if it is on no current branch. So the above commands might save significant space locally. I’m not if newer versions of TFS and DevOps improved this behavior.

schacon 2 years ago | |

> It reads like a good friend is trying to explain you something.

As the author of this document, I wanted to let you know that it made me happy to read this. Thank you for the kind words. :)

kaycebasques 2 years ago | |

> I wish all doc was written with such explicit care to be understood. It reads like a good friend is trying to explain you something.

Thanks for this insight. As a technical writer this is a helpful phrase for providing guidelines on how to write docs.

extraduder_ire 2 years ago | |

> I wish all doc was written with such explicit care to be understood. It reads like a good friend is trying to explain you something.

I think this is due to git's early history and the reputation it had for being incomprehensible and difficult to use. Lots and lots of work has been done by many people to make it more developer/user friendly. It really helped that its feature-set made all of this work appealing. e.g. learning git-blame and git-bisect made me want to use git for all of my projects, even if it takes time to explain how to use it.

glandium 2 years ago | |

Check the git verify-pack subcommand, particularly the -s and -v flags.

Forge36 2 years ago | |

I'm doing analysis with git-filter-repo --analyze

I've found 1gb files in our repository (thankfully a work in progress so we're able to remove it before it goes to main).

It lists everything by size.

conceptme 2 years ago | |

You could try bfg https://rtyley.github.io/bfg-repo-cleaner/

beezlewax 2 years ago | |

Do you have large image files, videos or other file formats that aren't plain text only that might cause git to store weird diffs/duplicates when you change them?

js2 2 years ago |

If you'd like a more in-depth treatment of the topic, let me suggest chapter 10 of the git book:

https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Po...

> But what gets sent to the other git repo? It is everything that is in objects and under refs.

Not everything under refs. Just the refs that you push. What gets pushed depends on how you configure git, what arguments you provide to `git push` and how the refspecs are configured for the remote under `.git/config`:

https://git-scm.com/book/en/v2/Git-Internals-The-Refspec

e.g., I regularly use `git push origin +HEAD:develop` to force push the checked out branch to a destination branch named `develop`.

A couple additional points not mentioned:

There are also tag objects. You create these with `git tag -a`. These are also called annotated tags. They carry their own message and point to a commit. Without `-a` you create a so-called lightweight tag which is just an entry under `refs/tags` pointing directly to a commit (as opposed to pointing to a tag object).

https://git-scm.com/docs/git-tag

All those loose objects get packed up into pack files periodically to save space and improve git's speed. You can manually run `git gc` but git will do so for you automatically every so many commits. You'll find the pack files under `.git/objects/pack`:

https://git-scm.com/book/en/v2/Git-Internals-Packfiles

meain 2 years ago | |

> Not everything under refs. Just the refs that you push.

Ahh, thanks. I overlooked that detail. I've fixed it now. :D

p4bl0 2 years ago |

Nice post, thanks for sharing! I found that another way to learn about Git internals is following a very step by step re-implementation of Git. It really was a very cool and efficient way for me to understand what's in the .git repository.

See for example the ugit [1] "build Git from scratch in Python" series for that.

[1] https://www.leshenko.net/p/ugit/

rollcat 2 years ago |

It's fairly easy to grab info from .git for your own purposes. For example, the program that generates my PS1 peeks there (without wasting precious cycles on shelling out to the git command) to find the current branch we're on:

https://github.com/rollcat/etc/blob/b2fd739/cmd/prompter/mai...

mike_hock 2 years ago |

What's with the random bit flips in pieces that look like they would have been copied from the shell (i.e. likely not typos)?

objects/4c -> objects/5c

2023-07-02 -> 2024-07-02

meain 2 years ago | |

That is a typo, lemme go fix that :D

gv83 2 years ago |

just a random comment:

the

    .git/info/exclude

file acts as a personal, private .gitignore you don't have to commit

CableNinja 2 years ago | |

Nice! I didnt know about this. Feel like its in an odd place. This is the first ive ever heard of this, so must be that not many use it (or admit to using it)

avgcorrection 2 years ago | | |

A more suggestive name like “private-ignore” would help.

wonderfuly 2 years ago |

The way I learn git internals through experimenting is, executing a git command, watch the file changes happen in .git directory, it's pretty fun. I actually wrote a simple cli util to watch the changes: https://github.com/wong2/meowatch

rossant 2 years ago |

Great post. Git becomes much less mysterious once knowing how it works internally.

olddustytrail 2 years ago |

I really don't like the "f" in that font. Very jarring.

crazygringo 2 years ago | |

Yup. As far as modern letterform conventions go, it's just plain wrong.

It's incredibly distracting and I can't imagine why anyone would ever choose to use it for code.

If you're doing some kind of cool alternative graphic design poster, then by all means go nuts! That's precisely where it's fun to play with different forms and be as "wrong" as you want.

But for something like code where legibility is the primary concern, it's a very unfortunate choice.

Our brain recognizes words not just by individual letters but by the shape of the entire word, and inserting a descender where we're not accustomed to one, breaks our word-level recognition. It's not a neutral, aesthetic choice -- it literally makes it objectively harder to read, in a modern context.

meain 2 years ago | |

Haha, I have heard that from a lot of people. I actually really like that `f` for some reason.

ulrischa 2 years ago |

There is a trend to all these hidden dot folders and files from apps. VS code is another example. Personally I do not like this. Couldn't there be another way for this config files?

dreamcompiler 2 years ago | |

If you don't like .git directories you can create a bare repo. That puts all of git's internal stuff at top level and makes it visible. But then you have to set up your working directory somewhere else.

https://stackoverflow.com/questions/7632454/how-do-you-use-g...

nnnnnande 2 years ago |

On the same topic, I usually refer back to this fantastic talk on how to add and commit a file without using git add or git commit: https://www.youtube.com/watch?v=mdvlu_R8EWE