Perkeep: personal storage system for life(perkeep.org) |
Perkeep: personal storage system for life(perkeep.org) |
From the home page (rather than the linked overview):
> Perkeep (née Camlistore) is a set of open source formats, protocols, and software for modeling, storing, searching, sharing and synchronizing data in the post-PC era. Data may be files or objects, tweets or 5TB videos, and you can access it via a phone, browser or FUSE filesystem.
Things Perkeep believes:
+ Your data is entirely under your control
+ Open Source
+ Paranoid about privacy, everything private by default
+ No SPOF: don't rely on any single party (including yourself)
+ Your data should be alive in 80 years, especially if you are
How do they deal with obsolescence?
Software that used to exist 50 years ago doesn't run today, and most of those formats (if they aren't text formats) are either obsolete or completely unsupported. Emulators exist, but nobody actually uses it. Part of this is because software becomes obsolete over time, and part of that is because hardware becomes obsolete.
How are they going to make software today that will run on new computers in 80 years, or how will they make software and data formats backwards compatible for 80 years?
Sure, "nobody" (i.e. a negligible number of people) is running emulations of consumer software, esp. non-networked consumer software.
Networked backend server software, on the other hand, is run under emulation in production all the time. It's roughly 80% of the point of IBM's z/OS product line: to continue providing backward-compatibility with their mainframes all the way back through the early 70s, by shipping hardware that runs a hypervisor that can continue running those old workloads under (accelerated!) emulation, without changes. Anyone business running "a mainframe" these days isn't running on the original hardware (which has long since broken down without component replacement availability), but rather running modern hardware that's emulating their original mainframe.
I suspect that any p2p data-storage network that achieves importance and has data an archivist would care about living on it, would be given the same treatment (if people don't just consistently write new clients for it on new platforms.)
In 1968 nobody had personal computers, they were not a thing. ASCII is still really new, "files" aren't really a thing yet, the Multics system is under development and nobody has yet made the pun "Unix" let alone named an operating system.
What formats are you thinking of that weren't text formats but are now "obsolete or completely unsupported" ? The Joint Technical Committee (home of JPEG, MPEG, and so on) isn't even an _idea_ yet, many of the people who'll form this committee are undergraduates or still in school. Machines aren't storing pictures, they're barely storing meaningful text, it's mostly numbers, big calculations.
If we ask about 40 years ago instead, things are hugely different. By this point Unix exists, ADVENT exists, ASCII has "won". There is no Internet, no X Window System yet, and there still isn't a Joint Technical Committee but already the documents, software and systems are familiar because we're still using them. At home there is Pong, and in pinball arcades the new Space Invaders, both are nicely emulated today.
Perkeep's format is basically chunks of files ("blobs") named after their sha256 hash which can be reindexed as needed. So while the files stored may require software which could be gone the files and etc. will be there in the worst case that the project disappears.
Disagree; I think the strongest example is DOSBox, through which DOS programs, of all things, are actually one of the least common denominators across an astonishingly wide set of platforms. Honestly, if I had to pick a format to use today that needed to hit as many platforms as possible and last as long as possible, I'd probably pick DOSBox, which is portable to Android, GNU/Linux, Darwin, NT, gaming consoles (at least Wii and Nintendo DS).... oh, Wikipedia actually has a better list: https://en.wikipedia.org/wiki/DOSBox#Ports
Anyways, I'll concede that emulators aren't as popular as native apps on most platforms, but they certainly hold their own, especially for archival purposes.
They're very popular for games, and can be used to get old files out of many popular computers of the 80s.
It presents a pretty compelling argument for why technology has a tendency to level out. I for one am confident that x86-64 binaries will still be running 50 years from now out of sheer inertia and the lack of any real practical jump in technology. (Computers of today are the 747s of 1969: good enough for almost everyone).
[0] http://idlewords.com/talks/web_design_first_100_years.htm
I tried Camlistore a bit a couple years ago and it was neat, but still pretty early. And then it looked like no development was happening on it. I would have helped, but I believe it's in go which I am not experienced with. Does this name change come with a new release? Is Perkeep 0.1 markedly different from the previously available public release of Camlistore?
My own (slightly out of date) comparison list: https://github.com/pjc50/pjc50.github.io/blob/master/secure-...
Maybe more relevant to private data, the builtin wiki makes a good personal knowledge database.
The next version of fossil will have a forum (seen already at https://fossil-scm.org/forum/forum ). With the time sorting for threads, that might be good for temporal data that you wouldn't want to put in a wiki.
For my purposes, I don't see any advantage of Perkeep over Fossil. I know when I use Fossil that I can trust my system and that I will always have control of my data, and that reduces my stress levels. I have enough things to worry about without worrying about my data disappearing.
I don't use the feature very often, but Fossil supports unversioned files, which allows me to delete a 500 MB file from the repo if I no longer need it.
An archive where you drag and drop your files that can upload everything to a s3 storage (no not amazon s3) and tag metdata to it would be a dream. Right now there is no good solution for this and in the beginning I took a deep look at camlistore and hoped for a solution in it. (I looked at upspin, ipfs and other solutions as well). If someone as a solution for this or if perkeep could be expaned (or has the option somehow hidden somewhere) I would be very happy if somebody could point me in the right direction.
Yeah that's a show stopper. There's just way too many scenarios where's you need to delete something.
1. child porn
2. steganography
if I were the dev I'd add a 'list all' just to avoid anyone thinking the above were a good plan.
https://github.com/perkeep/perkeep/issues/792 https://github.com/perkeep/perkeep/issues/1076
Is it still the case that you can't delete anything? Although rarely needed, that seems like a showstopper these days. Irreversible actions are bad UI.
(1) Make sure that a chunk is unlinked from everywhere. (2) Overwrite it with random data, or plainly delete from store.
I suppose there's a way to relatively painlessly find out which chunks contain a particular item, and target them.
I'm having trouble finding one. The "Getting Started" page just says "run the daemon" and not much more. There are pages on how to set the many configuration options.
What if I just want to use Perkeep, or find out what the experience of using it is like? Is there a friendly walkthrough or tutorial? Or an introduction to the concepts one needs to understand as a user, not as a developer?
https://github.com/perkeep/perkeep/graphs/code-frequency
Anyone have a testimonial from the perspective of a user or hacker on it?
So it's basically a correct idea, but I want to know what is needed to make it work.
I remember the Palm Pilot tried to do this by pretending not to have files, and having "databases" instead. The result was that the palm-pilot database just became an obscure, inconvenient file format.
On the other hand, modern big giant internet storage service do a pretty good job of "freeing" you from filenames, letting you get photos, docs stuff.
On the other, other, hand, there might be something about the personal aspect of perkeep that makes it more like the palm-pilot.
[1] https://www.nayuki.io/page/designing-better-file-organizatio... [2] https://news.ycombinator.com/item?id=16763235
Almost any database (including a filesystem) has a primary key, which can be thought of as a file-name. Filesystems are unusual in that ordinary users sometimes want to explicitly deal with the records (files) and their keys (names).
Note that Perkeep provides a FUSE interface, i.e. you can use files.
Being slightly less facetious, it depends on the filesystem. Files can easily disappear if, say, a disk crashes or there's a network outage.
Those problems can be avoided if we make backups and distribute copies across several disks and machines, but that gives us a synchronisation problem:
- If something gets renamed during an outage, how do we know that it was a rename rather than a brand new file?
- If we find that two nodes have different content in files with the same name/path, which one is "correct"?
- If we don't have much local storage (say, a netbook or a 'phone or a raspberrypi), how can we take part in the storage?
- How can we cache things to avoid remotely accessing the same data over and over?
- How can we keep data self-contained, i.e. without needing external metadata/keys/parity info/etc.?
These are hard problems, and Perkeep is a very promising solution to some of them.
Can the perkeep server be an SSH/SFTP login ? Or is there a server side component that would need to be running ?
I've thought in the past about the intersection between (camlistore) and rsync.net but it's not obvious what that looks like ...
Furthermore, dropbox uses folder structures, and can only sync folder-by-folder, and to have one folder synced requires EVERYTHING in that folder being synced.
There are many other differences that are listed on the article.
It's sort of like automobiles in 1968 advertising how they are made with care and detail so they'll last, and made to be easy to work on so you can expect them to actually have people (or yourself) that know how to fix them decades later. People could easily come out and say most of what made a car in 1918 was very different to then, all the way down to the tires themselves. Industries that have had multiple decades of general use mature quite a bit, and people don't like to throw away stuff that works (or that they're fond of). We'll still have computers capable of running a von neumann architecture in 50 years, whether through hardware or software, and that's assuming we can't just port/compile to newer systems if they aren't as extreme of departures.
I still occasionally play computer games written in the 1980's, generally through dosbox or something similar. I think the most likely reason we have to lose access to running this software is if we lose access to running all software, in which case nobody will really care (not that I think that's remotely likely, just that it's the most likely scenario where that holds).
If you're not publishing anything, reverting the last change is easy and a rebase isn't that hard.