You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe that any successful long-term project needs some ground rules. These help make better decisions and find better solutions to challenges down the line.
Some of the ground rules that this project might have are:
1. Everything is a file
I have been playing with the idea of importing structured data into an S3-compatible block storage, but for some reason that adds un-needed complexity, since most of the data stored in the vault will most like not be structured (such as media: videos, photos, music).
Second, looking at any big brand personal export (such as Google Takeout), all the personal data, even the structured one, is in files (at least JSON files for Google and Facebook).
Therefore, it only makes sense that the S3-idea is abandoned, and a new idea (rule-to-be) comes: Everything is a file. Your Strava runs, your Trello boards, everything has to be an object.
This is especially useful, since in 20 years S3 might not be a very common technology. But filesystems have been and will (most likely) be.
2. Must be able to run on commodity hardware
Everyone has a computer or a laptop or some components lying around somewhere (or might have friends that have). I have been researching a lot on HP MicroServers, refurbished computers, NAS devices and more.
But most of these solutions involve over $1000 investment (and probably constant maintenance). And it's not commodity hardware, they're energy efficient solutions that aren't available to everyone.
The final tip of the iceberg was when @aosan sent me an article about "digital preservation" by Vinton Cerf (father of the Internet), where he was mentioning old hardware, old software.
It only makes sense to re-use and re-purpose old hardware that can fulfill the needs for a personal storage.
As a side note, I'm looking into using ZFS as the storage file-system, since it can easily span across multiple devices and drives and add a bit of redundancy to the data.
3. Must have a database
A database will be needed to create indexes, embeddings, find and sort data using other means.
For this PostgreSQL is the number-one runner up as it (1) has SQL, (2) handles vector data, so it's suitable for embeddings and (3) supports full-text search
4. Must have an interface
There should be an interface with a search box (at the top?) and a list/gallery of files & folders under it. It should be light and fast.
The interface should be able to render any file according to its type. If it's a picture, show a thumbnail, if it's a location, show it on a map, if it's a Strava bike trail, map it, if it's a text file, show an excerpt, if it's a random JSON, show some structure. The point is to get a quick grasp of the files listed based on their type.
5. Must be composable
Everyone is different, and everyone has different needs and tools.
Therefore, for this project to be successful, it must provide flexibility to what tools, flows, importers and other processes it uses.
As I value convention over configuration, it should support overriding all the settings in order to fulfill the needs of the many.
6. Must be open-source and only use open-source libraries & tools
As some tools that people use will be proprietary or closed-source, the core system must be open-source and community driven. I see no other way to have it exist beyond a few years.
Very interested to hear your thoughts/comments/questions/ideas about these, and maybe suggest other ground rules that are missing.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I believe that any successful long-term project needs some ground rules. These help make better decisions and find better solutions to challenges down the line.
Some of the ground rules that this project might have are:
1. Everything is a file
I have been playing with the idea of importing structured data into an S3-compatible block storage, but for some reason that adds un-needed complexity, since most of the data stored in the vault will most like not be structured (such as media: videos, photos, music).
Second, looking at any big brand personal export (such as Google Takeout), all the personal data, even the structured one, is in files (at least JSON files for Google and Facebook).
Therefore, it only makes sense that the S3-idea is abandoned, and a new idea (rule-to-be) comes: Everything is a file. Your Strava runs, your Trello boards, everything has to be an object.
This is especially useful, since in 20 years S3 might not be a very common technology. But filesystems have been and will (most likely) be.
2. Must be able to run on commodity hardware
Everyone has a computer or a laptop or some components lying around somewhere (or might have friends that have). I have been researching a lot on HP MicroServers, refurbished computers, NAS devices and more.
But most of these solutions involve over $1000 investment (and probably constant maintenance). And it's not commodity hardware, they're energy efficient solutions that aren't available to everyone.
The final tip of the iceberg was when @aosan sent me an article about "digital preservation" by Vinton Cerf (father of the Internet), where he was mentioning old hardware, old software.
It only makes sense to re-use and re-purpose old hardware that can fulfill the needs for a personal storage.
As a side note, I'm looking into using ZFS as the storage file-system, since it can easily span across multiple devices and drives and add a bit of redundancy to the data.
3. Must have a database
A database will be needed to create indexes, embeddings, find and sort data using other means.
For this PostgreSQL is the number-one runner up as it (1) has SQL, (2) handles vector data, so it's suitable for embeddings and (3) supports full-text search
4. Must have an interface
There should be an interface with a search box (at the top?) and a list/gallery of files & folders under it. It should be light and fast.
The interface should be able to render any file according to its type. If it's a picture, show a thumbnail, if it's a location, show it on a map, if it's a Strava bike trail, map it, if it's a text file, show an excerpt, if it's a random JSON, show some structure. The point is to get a quick grasp of the files listed based on their type.
5. Must be composable
Everyone is different, and everyone has different needs and tools.
Therefore, for this project to be successful, it must provide flexibility to what tools, flows, importers and other processes it uses.
As I value convention over configuration, it should support overriding all the settings in order to fulfill the needs of the many.
6. Must be open-source and only use open-source libraries & tools
As some tools that people use will be proprietary or closed-source, the core system must be open-source and community driven. I see no other way to have it exist beyond a few years.
Very interested to hear your thoughts/comments/questions/ideas about these, and maybe suggest other ground rules that are missing.
Beta Was this translation helpful? Give feedback.
All reactions