What is in that .git directory?(blog.meain.io) |
What is in that .git directory?(blog.meain.io) |
What got me into that was a 51Gb ".pack" file that I wanted to understand. If you wonder about that, they're pack files, and what that "delta compression" message when you commit is about^2. The 51Gb file though I don't have an explanation for as of yet, I'm guessing something terrible happened before I joined, and people didn't find the courage to forego the history just yet. But at least I got an entertaining read out of it.
^1: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Po...
This stack-overflow looks like it contains a reasonable description about how to rewrite history to remove objects:
https://stackoverflow.com/questions/11050265/remove-large-pa...
It might be easier to declare repo bankruptcy. Seed a new repo from the existing repo's source files. Have the commit message point to the old repo. Stop using the old repo. Yes, you lose history and folks trying to perform repo archeology will have to jump to the old repo.
But rewriting history to remove large files can be equally as awful since references to git commit IDs tend to end up in places you don't expect and when you rewrite history, you change the commit IDs.
Good luck.
git repack -AFd
git prune --expire now
Also related, the initial git clone from a TFS server (as of 2015) can include every object ever pushed to the server, even if it is on no current branch. So the above commands might save significant space locally. I’m not if newer versions of TFS and DevOps improved this behavior.
As the author of this document, I wanted to let you know that it made me happy to read this. Thank you for the kind words. :)
Thanks for this insight. As a technical writer this is a helpful phrase for providing guidelines on how to write docs.
I think this is due to git's early history and the reputation it had for being incomprehensible and difficult to use. Lots and lots of work has been done by many people to make it more developer/user friendly. It really helped that its feature-set made all of this work appealing. e.g. learning git-blame and git-bisect made me want to use git for all of my projects, even if it takes time to explain how to use it.
I've found 1gb files in our repository (thankfully a work in progress so we're able to remove it before it goes to main).
It lists everything by size.
https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Po...
> But what gets sent to the other git repo? It is everything that is in objects and under refs.
Not everything under refs. Just the refs that you push. What gets pushed depends on how you configure git, what arguments you provide to `git push` and how the refspecs are configured for the remote under `.git/config`:
https://git-scm.com/book/en/v2/Git-Internals-The-Refspec
e.g., I regularly use `git push origin +HEAD:develop` to force push the checked out branch to a destination branch named `develop`.
A couple additional points not mentioned:
There are also tag objects. You create these with `git tag -a`. These are also called annotated tags. They carry their own message and point to a commit. Without `-a` you create a so-called lightweight tag which is just an entry under `refs/tags` pointing directly to a commit (as opposed to pointing to a tag object).
https://git-scm.com/docs/git-tag
All those loose objects get packed up into pack files periodically to save space and improve git's speed. You can manually run `git gc` but git will do so for you automatically every so many commits. You'll find the pack files under `.git/objects/pack`:
Ahh, thanks. I overlooked that detail. I've fixed it now. :D
See for example the ugit [1] "build Git from scratch in Python" series for that.
https://github.com/rollcat/etc/blob/b2fd739/cmd/prompter/mai...
objects/4c -> objects/5c
2023-07-02 -> 2024-07-02
the
.git/info/exclude
file acts as a personal, private .gitignore you don't have to commitIt's incredibly distracting and I can't imagine why anyone would ever choose to use it for code.
If you're doing some kind of cool alternative graphic design poster, then by all means go nuts! That's precisely where it's fun to play with different forms and be as "wrong" as you want.
But for something like code where legibility is the primary concern, it's a very unfortunate choice.
Our brain recognizes words not just by individual letters but by the shape of the entire word, and inserting a descender where we're not accustomed to one, breaks our word-level recognition. It's not a neutral, aesthetic choice -- it literally makes it objectively harder to read, in a modern context.
https://stackoverflow.com/questions/7632454/how-do-you-use-g...
An other useful high-level option is git-sizer (https://github.com/github/git-sizer) which tries to expose a few useful trouble spots, there's not much that can be done if the repository is just big (long history of wide working copies with lots of changes), but sometimes it's just that there are a bunch of large binary assets.
This may be more likely if the repository was converted from a centralised VCS where storing large assets or files is less of an issue, likewise the bad compression. Though obviously removing such large assets from the core repository still requires rewriting the entire thing.
https://github.blog/2020-12-21-get-up-to-speed-with-partial-...
To be clear, I was not suggesting deleting the old repo. Keep it for historical purposes, whether you rewrite or start fresh.
https://blog.isquaredsoftware.com/2018/11/git-js-history-rew...
I specifically was looking for techniques that would let me quickly iterate over ~15000 commits.
granted, the repo size I was working with was only a few GB, but hopefully there's some pieces there you can find useful.
In my experience you'll have references to the commits in a repo from outside of the repo: links from Slack, Jira, other repos, etc to specific commit IDs. When you rewrite history, all of the commit IDs change. That's why I recommend archiving the original repo so as not to break any such references. Create the new repo, either rewritten or seeded from the old, in a new location.
It would be neat if git supported a "rewite map" to allow it to redirect from one revision to another, sort of like how `git blame` can be configured to ignore revisions.