Full checkouts of large data repositories are problematic. In the video I present a workflow that does not require full checkouts of the datasets and still allows to commit diff-based changes in Git. This naturally applies to new data, and can be applied to edits or deletions provided the old data is known - this is to ensure the creation of bidirectional diffs that enable navigating Git history both forward and backward, useful when caching snapshots. Feedback is most welcome!
On open-source: should this toolset be open sourced? Which license to choose?