More on Version Control

102 points by velmu 99 days ago | 41 comments

tcoff91 98 days ago |

So many tools are tightly coupled to git or just assume that everybody uses git. I think that it's hard for any new VCS to gain traction without good git compatibility.

For instance Pijul might very well be a lot better than git / jj. I wouldn't know, I haven't bothered trying it because all the projects I need to work in use git. But since jj has great git compatibility, I actually have been able to adopt it because of its git backend.

A new VCS that doesn't have git compatibility at its core is going to have a really hard time overcoming network effects.

collabs 98 days ago | |

You're talking about a new VCS but I don't know how even git with sha-256 will gain any traction because I don't see how you can support both...

chriswarbo 98 days ago | | |

Tooling can support both (e.g. don't assume all hashes have the length of a SHA1, etc.); but they can't be used together in one repo.

esafak 98 days ago | |

The great thing about jj is that it is backend agnostic; it is the best gateway to newer version control systems.

e40 98 days ago | | |

Does jj write to the .git directory or use git’s libraries (I assume they exist)?

kelnos 98 days ago |

> The unsafe versions of these things literally throw out history and replace it with a fiction that whoever did the final operation wrote everything, or that the original author wrote something possibly very divergent from what they actually wrote.

When I'm rebasing my own work and editing history, that's exactly what I'm looking to accomplish, though.

rmunn 98 days ago | |

The team I work for tends to use GitHub's "Squash and merge" button a lot. I find it to be the best of both worlds: the `develop` branch gets a single commit per PR, with a summary of what was done (I always edit the summary commit message and reduce it from a copy of 20 commit messages down to just the most important 4-5, deleting the entirety of messages like "fixup" or "address review comments"). But the PR on GitHub also preserves the history of the commits, so anyone who wants to look at the messy commit history can follow the PR link and see the actual commits.

I'm sure there are other Git forges that would support a similar workflow, with a "Squash and Merge" button or equivalent, but my team hasn't felt any need to migrate away from GitHub so I've never yet investigated that in detail.

Only downside I've found to this workflow is that it would make it harder to migrate to a different Git forge in the future: unless you're very careful with the migration, the PR numbers are likely to be different (perhaps resetting at 1, even) and the other forge won't end up with the commits that are on GitHub's copy of the repo but no longer on any active branch (we also use the "auto-delete branches when you hit the merge button" option). But it would still be possible for a migration tool to handle this correctly: look at all PRs on GitHub, grab the commits from them, and migrate them to Merge Requests on the new forge.

anonymars 98 days ago | | |

It boggles my mind that instead of this being a UI projection, git instead ingrains a process where developers habitually destroy their history (and bisection options, and merge conflict resolution), therein loading an additional footgun that goes off every now and again when it turns out a now-squashed branch was the basis of (or merged into) some other branch

sfink 98 days ago |

I've read much of the HN discussion on the previous post, a skimmed the rest, but I didn't see a couple of things addressed:

First, how could you make this deal with copies and renames? It seems to me like the pure version of this would require a weave of your whole repository.

Second, how different is this from something like jujutsu? As in, of course it's different, your primary data structure is a weave. But jj keeps all of the old commits around for any change (and prevents git from garbage collecting them by maintaining refs from the op log). So in theory, you could replay the entire history of a file at a particular commit by tracing back through the evolog. That, plus the exact diff algorithm, seems like enough to recreate the weave at any point in time (well, at any commit), and so you could think of this whole thing as a caching layer on top of what jj already provides. I'm not saying you would want to implement it like that, but conceptually I don't see the difference and so there might be useful "in between" implementations to consider.

In fact, you could even specify different diff algorithms for different commits if you really wanted to. Which would be a bit of a mess, because you'd have to store that and a weave would be a function of those diff algorithms and when they were used, but it would at least be possible. (Cohen's system could do this too, at the cost of tracking lots of stuff that currently it doesn't need or want to track.) I'm skeptical that this would be useful except in a very limited sense (eg you could switch diff algorithms and have all new commits use the new one, without needing to rebuild your entire repository). It breaks distributed scenarios -- everyone has to agree on which diff to use for each commit. It's just something that falls out of having the complete history available.

I'm cheating with jj a bit here, since normally you won't be pushing the evolog to remotes so in practice you probably don't have the complete history. In practice, when pushing to a remote you might want to materialize a weave or a weave-like "compiled history" and push that too/instead, just like in Cohen's model, if you really wanted to do this. And that would come with limitations on the diff used for history-less usage, since the weave has to assume a specific deterministic diff.

toomim 98 days ago | |

You can generically represent copies and renamed (aka "moves") in a CRDT, OT or CTM using a Portal: https://braid.org/meeting-62/portals

rs545837 98 days ago |

usually the whole discussion has been around line-level vs commit-level history, but there's a layer nobody's talking about, and I have been exploring it here these days with https://github.com/Ataraxy-Labs/sem. It gives you entity-level version control. It parses your code into functions, classes, methods using tree-sitter (12 languages so far), computes a structural hash for each entity, and builds a cross-file dependency graph. So sem diff HEAD~1 doesn't give you "+3 -2 in tax.py", it gives you "calculate_tax signature changed, 47 dependents, 3 callers will break". The key insight is distinguishing signature changes from body changes.

kleiba 98 days ago |

> Git is very simple, reliable, and versatile, but it isn’t very functional.

Funny, I would probably swap the first and the last adjective in that sentence.

AndrewDucker 98 days ago | |

Git, the internals, are simple, but not very functional. Git, the porcelain, isn't simple, is quite functional.

toomim 98 days ago | | |

The git porcelain is functional, like a toilet.

josephg 99 days ago |

> Oddly they don’t seem to have figured out the generation counting trick, which is something I did come up with over twenty years ago. Combining the two ideas is what allows for there to be no reference to commit ids in the history and have the entire algorithm be structural.

Can you say more about this? What exactly is this trick you’re talking about? What are the benefits?

ajb 98 days ago | |

Not the OP, but probably this: https://tonyg.github.io/revctrl.org/GenerationCounting.html

(That seems to be an archive of the old revctrl.org pages from a while back; most likely Bram Cohen has a blog somewhere explaining it in his own words - probably about 2003, at a guess)

sfink 98 days ago | |

https://github.com/bramcohen/manyana?tab=readme-ov-file#why-...

But someone may need to explain it to me.

SAI_Peregrinus 99 days ago |

Discussion on the previous post in this series: https://news.ycombinator.com/item?id=47478401

mrtesthah 99 days ago |

Someone make a TLA+ model for this bad boy

Surac 98 days ago |

I for my part have not migrated to GIT cause i do not need the extra hoops like staging area or syncing whole repos over network. I have always a server at my side and can work with checkout/checkin. This implies a hard requirement at interface definitions, cause people can't just alter them. Seeing people struggle with all the problems introduced with the git way of work I feel there is still a big market for not git. People are introduced to git and stop asking if there isn't something more good to there workfow. Excuse my english please, im a non native speaker

ramon156 98 days ago | |

At least jj tries to fix this partly, and I feel like its good enough for what I do. Most of my projects nowadays are solo, so I can keep it simple.

nolist_policy 98 days ago | |

Git is a decentralized version control system. Creating a new project is just a `git init` away.

Of course centralized VCS are less popular. You need to setup a server first then wrangle with the server every time you create a new project -> fewer projects -> fewer users.