BorgBackup 2.0 supports Rclone – over 70 cloud providers in addition to SSH(borgbackup.readthedocs.io) |
BorgBackup 2.0 supports Rclone – over 70 cloud providers in addition to SSH(borgbackup.readthedocs.io) |
To use modern block-based backup programs for large databases and VM images (similar situation), you must use a very small block size for dedup to work well. For VM images, that's 4K. For databases, it's the page size, which is 4K for SQlite and 16K for InnoDB by default.
With very small block sizes, most block-based backup programs kind of fall over, and start downloading lots of data on each backup for the block index, use a lot of RAM for the index, or both. So it's important that you test programs with small block sizes if you expect high dedup across backups. Some backup program allow you to set the block size on a per-file basis (HashBackup does), while others set it at the backup repo level.
To backup a database, there are generally a couple of options:
1. Create an SQL text dump of the database and back that up. For this to dedup well, variable-sized blocks must be used, and the smaller the block size, the higher the dedup ratio.
2. Backup the database while running with a fixed block size equal to the db page size. You could lock the database and do the backup, but it's better to do two backup runs, the first with no locking, and the second with a read lock. The first backup cannot be restored because it would be inconsistent if any changes occur to the database during the backup. But it does not lock out any database users during the backup. The second backup will be much faster because the bulk of the database blocks have already been saved and only the changed blocks have to be re-saved. Since the second backup occurs with a read lock held, the second backup will be a consistent snapshot of the database.
3. The third way is to get the database write logs involved, which is more complex.
It is available as 2.0.0beta11, but not suitable for production yet.
"Beta" also means that there won't be repository migration code from beta N to N+1.
I've been using Duplicacy for a long while, and I've been pretty happy with it. But I'd love to switch to a full open-source solution (Duplicacy is proprietary with sources publically available).
The main problem is that there currently aren't active windows developers within the borgbackup project who continuously test and improve the windows specific code parts.
Since recently, we at least have some working CI on windows again, at least that was fixed.
I would love it if there were some kind of "Linux Desktop co-op", with a couple of staff. Users pay membership dues, vote on apps/features, and some devs get paid to develop it, in addition to "resume fame" that can translate over to a higher paying gig. But something tells me the Linux Desktop is so small and nerd-focused that we'd just end up funding more RSS readers and chat clients.
rclone is mostly the work of one guy. You can donate to him if you'd like. Making a GUI for a complex, rapidly evolving CLI is not an easy thing to do. There's probably a hundred different attempts to make a good interface for ffmpeg, but you can't please everyone.
By contrast, Borg, Restic, Kopia (anything else?) use object storage, aka binary blobs, like S3 or R2 or One Drive. They store both entire copies and small diffs on top of them, much like video codecs, or like git. You can look at the filesystem you've backed up as it was in a particular moment, and you may have a history of many such moments, say, daily snapshots for a month, stored economically, not as 30 full copies. And it all is encrypted on top. If your source FS supports snapshots (ZFS, XFS on LVM, BTRFS), your backups can be entirely consistent views of your filesystem, of its relevant subtrees.
Prometheus alerts check that latest backup is at most two hours old and that the filesystem is not reporting errors. This setup running for more than a year now and gives great peace of mind.
Mine is simpler: Syncthing with staggered versioning for important data, periodic Restic backups of the home directory (excluding caches), keeping several recent backups and a couple of older backups.
I've restored from these backups 4 times, both due to crashes and when moving to a new machine, without any adventures in the process.
It's not very helpful is you e.g. have a 1 GB file that gets appended 100 kB every day; Syncthing would store a new full-size copy in each version (immediately usable), while Borg / Restic / Kopia would only store the deltas (and would require slow mounting to access a particular version).
Different tools for different jobs.
Then everything gets backed up to my local server which then syncs out to remote storage. It's great.
Can't wait for Borg 2 to hit stable. The transfer command solves so many problems
Not considering all the aspects of a BDR process is what leads to this problem. Not the tool.
Personally I automate restore testing with cron. I have a script that picks two random files from the filesystem: an old one (which should be in long term storage) and a new one (should be in the most recent backup run, more or less), and tries restoring them both and comparing md5sums to the live file. I like this for two reasons: 1. it's easy to alert when a cronjob fails, and 2. I always have a handy working snippet for restoring from backups when I inevitably forget how to use the tooling.
IMO alerting is the trickiest part of the whole setup. I've never really gotten that down on my own.
Please tell me you verify your backups now and then?
[1]: https://borgbackup.readthedocs.io/en/stable/usage/check.html
Restoring one file from the backup, works but what if something else is corrupted?
Restoring the system from the image, works but what if some directory is not in the backup and you don't see that while testing?
Then one can't call it "set and forget", right?
Maybe try SeedVault?
Does anybody have a recommendation?
I briefly looked at restic and duplicati, but surprisingly none are as simple to use as I'd expect a dedicated backup-tool to be (I don't need, and kindda don't want GUI, I'd like all configuration to be stored in a single config-file I can just back-up to a different location like everything else, and re-create on any new machine). More than that, I've read some scary stories about these tools fucking up their indexes so that data turns out to be non-restorable, which sounds insane, since this is something you must be absolutely sure your backup-tool would never do no matter what, because what's even the point of making backups then.
However I found that the backends weren't well abstracted enough in v1 to make that easy.
However for v2 Thomas Waldmann has made a nice abstracted interface and the rclone code ended up being being only <300 lines of Python which only took an afternoon or two to make.
https://github.com/borgbackup/borgstore/blob/master/src/borg...
Borg working with object storage was not supported though some people did use it that way. From my understanding, most would duplicate a repo and upload instead of borg directly writing/manipulating it. This could problematic if the original repo was corrupt as now the corruption would be duplicated. So this will make things much easier and allow for a more streamlined workflow. Having the tool support rclone instead of specific services seems like a wise and more future-proof choice to me.
Borg 2.0 beta (deduplicating backup program with compression and encryption) - https://news.ycombinator.com/item?id=40990425 - July 2024 (1 comment)
Borgctl – borgbackup without bash scripts - https://news.ycombinator.com/item?id=39289656 - Feb 2024 (1 comment)
BorgBackup: Deduplicating archiver with compression and encryption - https://news.ycombinator.com/item?id=34152369 - Dec 2022 (177 comments)
Emborg – Front-End to Borg Backup - https://news.ycombinator.com/item?id=30035308 - Jan 2022 (2 comments)
Deduplicating Archiver with Compression and Encryption - https://news.ycombinator.com/item?id=27939412 - July 2021 (71 comments)
BorgBackup: Deduplicating Archiver - https://news.ycombinator.com/item?id=21642364 - Nov 2019 (103 comments)
Borg – Deduplicated backup with compression and authenticated encryption - https://news.ycombinator.com/item?id=13149759 - Dec 2016 (1 comment)
BorgBackup (short: Borg) is a deduplicating backup program - https://news.ycombinator.com/item?id=11192209 - Feb 2016 (1 comment)
I've previously used Borg, but the inability to use anything other than local files or ssh as a backend became a problem for me. I switched to Restic around the time it gained compression support. So for my use-case of backing up various servers to an S3-compatible storage provider, Restic and Borg now seem to be equivalent.
Obviously I don't want to fix what isn't broken, but I'd also like to know what I'm missing out on by using Restic instead of Borg.
- unreleased code that is still in heavy development (borg2, especially the new repository code inside borg2).
- released code (restic) that has practically proven "cloud support" since quite a while.
borg2 is using rclone for the cloud backend, so that part is at least quite proven, but the layers above that in borg2 are all quite fresh and not much optimized / debugged yet.
I've been using it with restic + rclone successfully for years. It's not very fast, but works.
> It is possible to get interactive SSH access, but this access is limited. It is not possible to have interactive access via port 22, but it is possible via port 23. There is no full shell. For example, it is not possible to use pipes or redirects. It is also not possible to execute uploaded scripts.
https://docs.hetzner.com/storage/storage-box/access/access-s...
https://docs.hetzner.com/storage/storage-box/access/access-s...
I currently use rsync to backup up a set of directories on a drive to another drive and a remote service (rsync.net). It's been working great, but I'm not sure if my use-case is just simple enough where this is a good solution, or if I'm missing a big benefit of Borg. I do envy Borg's encryption, but the complexity of a new tool tied with the paranoia of me maybe screwing up all my data has had me on edge a bit to make the leap. I don't have a ton of data to backup, say about 5TB at the moment.
You could probably use rsync's hard linking to save space on the mail backup but I'm not sure you'd get it as small without faffing about.
Rsync is also very slow with lots of files, and doesn't deal with renamed files (will transfer again).
Concretely, if you inadvertently delete a file and this get rsynced, you cannot use the backup to restore that file. With borg you can.
1. https://borgbackup.readthedocs.io/en/2.0.0b11/quickstart.htm... 2. https://rclone.org/crypt/
Borg is a different tool, for backup. It deduplicates, encrypts, snapshots, checksums, compresses, … source directories into a single repository. It doesn’t work with files, rather blocks of data. It includes commands for repository management, like searching data, pruning or merging snapshots, etc. You will then transfer or sync the repository to wherever you want, with a tool such as rsync/SSH or rclone. Rclone is now natively supported, so that you don’t need to store the repository locally and on remote, rather back up directly to remote.
Curious what you think is not right with their methods.
Should the storage provider provide support for encryption on their end? Would you not want to store the keys locally?
Overall it's a robust solution that isn't too painful to setup.
With rclone support built-in, the setup would be much easier.
https://borgbackup.readthedocs.io/en/stable/faq.html#can-i-c...
It is not "heavily" discouraged. But you have to pay extra attention to perform a clean copy of the borg repo's files, and ideally check both instances regularly for integrity. I would assume it's easy to forget validating the "cold storage" copy of your borg repo in practice.
That and being able to have multiple machines writing to a shared repository at the same time is handy. I have the kids' Windows computers both backing up to the same repo to save a bit of storage. (Now if only Kopia supported VSS on Windows without mucking around with dubious scripts.)
What I really like about Rustic is that it understands .gitignore natively so you can backup your entire workspace without dragging a lot of dependencies, compiled binaries, and other unnecessary data with you into your backups.
Note that if you use, say, the 2.11 version, you cannot upgrade to 2.12, you cannot go back to 1.X either. People like me were stuck, it turned out you have to discard the repo. Sometime later they better clarified this point:
>> Borg2 is currently in beta testing and might get major and/or breaking changes between beta releases (and there is no beta to next-beta upgrade code, so you will have to delete and re-create repos).
I have a 2.X repo. It’s working fine and backs up. I have a lot of snapshots in that repo. If someone knows how to transfer them to a 2.X version once it’s out of beta, let me know.
> Beta releases are only for testing on NEW repos - do not use for production.
rclone crypt also does encryption.
so far I think rclone has it all for me.
Think of grinding data in a big machine, and removing blocks that are redundant. You may have every file to be a single copy, and get significant space reduction.
rclone on it own is a syncing solution not backup.
I was using it for years on an external drive, but then I got a NAS, and did not want to fuss with community packages to get Borg working.
Kopia works fine, aside from the confusing GUI setup process, but it seems to be the least popular up and coming option.
Now it seems that this can directly target SFTP? I wonder what that means for the future of Kopia.
My reason to go with kopia was that previously you were not able to backup multiple hosts into the same repo without great inefficiencies. I'm not sure if they still have resolved that. Another was its native S3 support which I use with ceph.
A perhaps more superficial personal reason is that at least Go is a statically typed language, even if its type system isn't that great..
This is the summary of the major changes (more details are in the change log):
Does borg have the ability to split chunks over multiple repository of varying sizes? For example, I might have just 15GB Google Drive Storage, whereas on others I might have 100GB available.
Having both together makes easier to get this kind of use case.
Pcloud lifetime + Restic, all in one repo, to benefit from dedup.
You might want to look into kopia. It accomplishes the same task as restic, but handles configs in a way you might find more appealing. Further reading: https://news.ycombinator.com/item?id=34154052
Don't even bother with duplicati. I've tried to make it work so many times, but it's just a buggy mess that always fails. It's a shame too, because I really like the interface.
Useful backup tool comparison: https://github.com/deajan/backup-bench
I've been using it for quite a while now both for my personal projects and paid work and have had a good experience with it.
Borg 2 is still beta and Kopia is also there. But it's newer so I am testing it on another redundant backup on the same machine. I have space so why not?
Every once in a while I run integrity check (with data) so I can trust that metadata and data are fine.
I have a "config file", which is really just a shell script to set up the environment (repository location, etc), run the desired backups, and execute the appropriate prune command to implement my desired retention schedule.
I've been using this setup for years with great success. I've never had to do a full restore, but my experience restoring individual files and directories has been fine.
Do you have any links related to the index corruption issue? I've never encountered it, but obviously a sample size of one isn't very useful.
This is cool. It sounds like I can set up restic to copy my backups to multiple S3 buckets, or even to an S3 bucket at the same time as a local drive using a union (https://rclone.org/union/) remote
I'm sure there's more thorough ways to do this kind of testing, but whatever level of confirmation you need automating it should be viable and then you only have to pay attention if/when something breaks.
On another VM, I used postfix to email logs after cronjob (failed or passed), which also works great.
Not an endorsement, just a happy user.
Note they're intended for backup use and therefore don't have guaranteed uptime or throughput.
It's annoying because if you have TBs of stuff that blows. I'm just curious what systems exist for incremental, encrypted backups that don't require full uploading new snapshots.
See here in the NOTE section. Re-reading this, it might a limitation of Duplicity. https://www.rsync.net/resources/howto/duplicity.html
Duplicity is very old backup software that uses the "full + incremental" strategy on a file-by-file basis, like tape backup systems. The full backup must be restored first and then all of the incrementals. This becomes impractical over time, so as with tapes, you must periodically repeat the full backup so the incremental chains do not become too long.
Modern backup programs split files into blocks and keep track of data at the block level. You still do an initial full backup followed by incrementals, but block tracking allows you to restore any version of any file without restoring the full first and all following incrementals. The trade-off is in complexity: tracking blocks is more complex than tracking files.
It has nothing to do with encryption.
Seems to be good to have another tool that you either manually or automatically can setup to run regularly that tries to locate random files from your existing file system in the backups? Something like that.. though that other tool might be broken as well of course... :/
Reling on software with good defaults that a lot of people use is probably a relatively safe bet combined with a second or third backup system (Personally I use Backblaze and Time Machine).