On undoing, fixing, or removing commits in git

On undoing, fixing, or removing commits in git(sethrobertson.github.io)

105 points by DanielShir 12 years ago | 73 comments

sisk 12 years ago |

Regarding losing data: it's as simple as diving into the reflog. In order to remove something from your history, you must do so very explicitly by walking your commit history, editing each one. There is an automated workflow to accomplish that (`filter-branch`) but it's definitely not a command anyone I know has committed to memory.

Accidental mutations can be undone either by `--abort`ing (if the command supports it) or by checking out an earlier revision from the reflog.

The GC in git is pretty conservative and, while it can be triggered manually, still makes you jump through some hoops to actually get rid of something. Steve Klabnik wrote about it[1] a little while back.

In certain cases, you don't have access to the reflog because a change wasn't made locally. Perhaps someone screwed up a remote you pull from and it destroyed your history. You can, even still, find, view, and re-associate orphaned objects. Yeah, it's not terribly intuitive and, again, not a workflow anyone has probably committed to memory, but the fact that you can recover from a disaster of that magnitude is pretty amazing.

git provides we developers with a set of tools—powerful tools—and that comes with a level of responsibility. I'd rather have the ability to responsibly clean my history than the alternative.

[1] - http://words.steveklabnik.com/git-history-modification-and-l...

mikeash 12 years ago |

"Strongly consider taking a backup of your current working directory and .git to avoid any possibility of losing data as a result of the use or misuse of these instructions."

WTF?

What is the point of a version control system if you have to take backups of it to avoid losing data when performing certain operations?

I use git, I like git, but certain aspects of it are fundamentally broken.

gemma 12 years ago | |

No, that advice from the article is fundamentally broken. Outside of the garbage collection system (which runs by default after what, 30 days? 90?), Git doesn't delete committed content. Any commit you "lose" through rebasing, amending, resetting, etc. can always be recovered. It's a little more complicated than renaming a directory, sure, but it's important, and it's not something a Git tutorial should ignore.

Git IS safe, and ANYTHING involving changes to history can be undone without resorting to backups. Data loss can occur when you're mucking about with uncommitted changes, but that's a risk in most other version control systems as well.

prezjordan 12 years ago | | |

Surprised to see no one in the comments has mentioned the reflog [0]. It's really very easy.

[0]: http://jscal.es/2013/08/05/seriously-the-reflog-isnt-that-sc...

crystaln 12 years ago | | |

I'm not 100% sure this is true, however it is also a fundamental flaw of git. There should be a way to remove commits permanently in order to remove mistakenly checked in large files or private content.

It's also definitely not true with uncommitted changes, including gitignored files.

crazygringo 12 years ago | |

I understand your puzzlement, I found this confusing too at first. But then I realized it makes sense -- one of git's strengths is that you can rewrite the history. The "point" of a version control system, at least with git, is not backup which retains all history, but rather versioning which retains the history you want to retain.

Obviously, if you choose not to edit the history, then you never need to back up in this sense, and you're free to do that. But then you can't ever go back and change things (like remove accidentally committed passwords, etc.)

But if you choose to rewrite the history, and mess up, then you'll be glad you had a backup. And (in response to other comments), even if there are ways of still retrieving/fixing data, it's often easier to just restore from your backup, especially when you're trying out git commands for the first time, and you're not entirely sure if they'll work exactly how you expect. None of us are git experts from the beginning, and I've resorted to git backups numerous times when trying out a command for the first time, and then discovering it wasn't the right way.

simcop2387 12 years ago | | |

An easy way to do that, is the way I tend to do it; Create a new branch based off the one you're rewriting history in, and that will actually keep all of that for you even after you rewrite it all. Makes it really easy to restore later with git reset if you need it.

zimbatm 12 years ago | | |

Actually `git reflog` contains the HEAD history. Even after a rebase it's possible to checkout to an old commit (unless git has garbage-collected).

_ikke_ 12 years ago | |

Git is quite safe, and most operations that involve doing things to history can be undone. Unsafe operations happen when the working tree and uncomitted changes are involved.

Also, sometimes it's easier for a user to roll back to an older back up than to untangle the mess they have created.

Third, git itself is not a backup. When your repository gets corrupted, you're out-of-luck when you don't have backups for those files. So it's still good to take backups of your repositories.

mikeash 12 years ago | | |

First, your use of the word "most" is inherently incompatible with the phrase "quite safe".

Second, why would a version control system make it so difficult to roll back to an old version that it's easier to restore from backup? This is insane.

Third, I'm well aware of this, and of course you should be making backups of your git repositories (and everything else). But those backups should be there to protect against hardware failure and other external data-loss events, not protect against git itself.

nknighthb 12 years ago | |

1) Why are you taking a be-careful-don't-blame-me passage from a random article written by some guy as gospel?

2) All version control systems are vulnerable to data loss if you mess around with them in unusual ways. Would you say svn was fundamentally broken if somebody told you to take a backup before you screwed around with the repo?

mikeash 12 years ago | | |

1) It's the sort of thing I've heard many times from many people over the years.

2) The difference is that svn does not build this functionality into the main command line tool, and there is no culture of doing terrible things with svnadmin to edit svn repositories the way there is of doing terrible things with git to rewrite git history.

perlgeek 12 years ago | |

On piece of data that is easy to lose, with any version control system I've worked with so far, is uncommitted data. And that's also the only data I've lost with git so far, after using it for several years. (And yes, it was my own stupidity, saying 'git checkout .' and only noticing later that there was something I wanted to keep).

The advise to take a backup doesn't hurt, and might be helpful if restoring the original state is more effort than doing it with git operations.

oneandoneis2 12 years ago | |

you don't need to make backups - git won't lose your data. I lost all hope that the article might be worth reading at that line.

mateuszf 12 years ago | | |

Yep, that's true. Restoring data is as simple as checking latest changes using "git reflog"

mtdewcmu 12 years ago | |

I agree that git is both a great advance and seems fundamentally broken at the same time. One of git's advances is that it treats commits as snapshots of the entire tree rather than diffs[1]. A snapshot might as well be a tarball of the whole directory, except that git uses references to previous snapshots to store it efficiently. So in this aspect, git is like a backup tool plus compression. It's not quite a useful tool just for making compressed backups of source code, though, because data is buried in opaque internal files in the .git directory and can't be untangled from the commit history. You can't get at your data without going through git's tools, which means you might need to make your own backups in case git goes insane, and you can't use the backup functionality without creating indelible history.

I'm thinking that the repository could be moved out of the working directory and placed in its own file that's not invisible. If the repo was reified into a visible file, then repos would be portable and you could ftp them. The backup functionality could be separated from the history-tracking functionality, so you could make backups freely without adding noise to the commit history. A backup would basically be a tarball that you could append to a repo file, taking advantage of previous entries for compression. Commits, however they were implemented, could reference snapshots, but they needn't be 1:1.

[1] http://git-scm.com/book/ch1-3.html

eru 12 years ago | | |

> I'm thinking that the repository could be moved out of the working directory and placed in its own file that's not invisible.

Symlinks are your friends.

> then repos would be portable and you could ftp them.

tar might come in handy.

> The backup functionality could be separated from the history-tracking functionality, so you could make backups freely without adding noise to the commit history. A backup would basically be a tarball that you could append to a repo file, taking advantage of previous entries for compression.

You can already do this. You can have commits without ancestors or descendants in your repository, and they will still benefit from delta compression.

mcv 12 years ago | |

Backups are of course always a good idea, but you don't need them specifically to work with git. Git is its own backup system. If you think you might do something potentially harmful, do it in a new branch. If something goes wrong, you can always throw it away.

If something has already gone wrong, and you didn't do it in a separate branch, you can still go back to a previous situation.

Rewriting history in any serious sense (beyond a local reset or rebase for stuff that hasn't been pushed to anyone else yet) is always a bad idea. History is history for a good reason.

Of course any existing commit can always be reverted; that's not rewriting history. A revert is simply a new commit.

Estragon 12 years ago | |

It makes it fast and easy to back out if you screw anything up. Even if the data is still there, it can be complex to pull it back out and configure it the way it was when you started (as the commands in this tutorial demonstrate.) So a fast, easy snapshot before executing complex commands is a smart move.

zimbatm 12 years ago | |

git is safe but you have to know all the fancy commands like `git reflog`. I remember being puzzled by a merge conflict when I started learning git. I didn't know what it was and `git reset` or `git revert` weren't doing what I expected. All I wanted was to go back to the previous state. In the end it was easier to clone the repo and start over again.

mtdewcmu 12 years ago | | |

You probably wanted `git merge --abort`. It's not very clear what the various states are that git can be in. There seems to be a 'fixing merge conflict' state, and it's hard to find documentation that warns you about this state and what your options are once you're in it.

rebelidealist 12 years ago |

sigh it seems to me that Git is unnecessarily complicated. Wonder what if "github" started with HG.

rspeer 12 years ago |

Thanks, this is a useful reference.

I am sad about some of these other comments, which I might paraphrase as "This doesn't help me, and it might help people who are less skilled than me who don't deserve to be helped, therefore it's worthless". It's apparently a common sentiment on this site, but it shouldn't be.

caipre 12 years ago |

Usability note: after a few clicks through this (so my path had a few entries) I instinctively clicked up a few levels in the path expecting to be taken to that point. Instead, that entry was appended as another child.

crystaln 12 years ago |

The inability to, in any remotely easy way, remove mistakenly checked in large files and private data has always seemed like a major flaw with git.

pyre 12 years ago | |

Well, the solution to other systems seems to be "it's checked in, therefore it can never be un-checked-in, so deal with it!" (or at least this is the attitude of some vocal proponents of them).

ams6110 12 years ago | |

The flaw is in having this "private" data in a public repo to begin with. If your data are private, don't put your project on github.

crystaln 12 years ago | | |

While I'm certain you and your organization have a perfect record of never checking inappropriate things into your git repository, mine does not. Even if all the employees at your company were perfect, there is still a chance of inappropriate information getting into the repository.

mcv 12 years ago |

Rule number one: if you're not sure what you're doing, do it in a new branch. If things go wrong, you can always delete that branch.

And you can always make a branch out of a previous situation. Gitk/gitx make this particularly easy.

elwell 12 years ago |

Sentence 2 has typo "or" -> "of"