-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data loss, all links are gone #1292
Comments
I'll second this report. I use the API to add links rather often, and my first link is now April 11, but I've been running Shaarli for a long time and had hundreds of links. |
Did the data loss occurred after an update?
Every time that the datastore is altered, its whole content is written in Can you both look for any error in your HTTP server logs? |
I did not update recently. I'm on 0.10.0 and have been since late summer of last year. It looks an error while handling the file. My error log from the morning things got wiped reads:
|
I've not updated since I installed Shaarli which was sometimes around August. Only Shaarli related error I could find from web servers logs was this, but this is old one:
But really, Shaarli does not take copy of the original datastore before overwriting it? Quite a trust put on that write always succeedes (which it clearly does not). |
Hey,
The first section of the Upgrade and migration documentation page is called Backup your data for a reason! Making regular backups of data that is important to you is a good practice, regardless of the software being used. Also regardless of its test coverage, data backend and codebase maturity. As a rule of thumb, power outages, disk failures and network issues have a fondness to happen at the worst of time.
I understand that losing data is frustrating. Were I cynical, I'd argue that this is quite a trust put on a piece of software you got for free and that is maintained by a handful of people, on their free time and good will. We've done our best to improve Shaarli's features, make it more stable, provide easy ways to install it through documentation and Docker packaging... Development has been slow-paced for a while, maybe it's time for new people to step in, if you feel it's a worthy tool to self-host and use on a regular basis? |
Well, I'm not interested in critiquing the project's features/architecture or its maintainers. I'd just like to provide enough information to help with a fix down the road. Is there any other diagnostic information I could provide that would be helpful? |
It may not be very helpful now, but things like this is why my backup policy retains daily, weekly and monthly backups for a while (realizing data was lost after all backups are rotated feels bad). I also have a few manual HTML exports lying around. About providing more info: Can the date of the In short, can you track in your backup history at which point the datastore got corrupted/emptied? If you can do that, there may be a way to correlate the incident with something your webserver/shaarli logs (web interface login/interaction? API interaction? server/php errors? power loss? drive full/failure? other "usual" messages that could give some context?) - provided you still have the logs from that time. |
As far as I know, data loss could result from: ShaarliThe datastore format is itself quite simple (deplite a tad convoluted): links are stored in a PHP array, that is serialized, gzipped and written to a file. Existing links are mainly altered when upgrading Shaarli, which runs an Updater to migrate configuration and bookmark data to new(er) formats. Note that changes to the bookmark format have been quite scarce over time. Updater operations are logged under Another possibility would be a bug in the REST API. We haven't had much feedback regarding its usage, so anything that could help highlight and reproduce a bug would be helpful :) Shaarli + content formatThere are occasional issues raised following a failure to bookmark a given link or import an existing collection (e.g. from browser-saved bookmarks). This is usually due to the bookmark's description containing special formatting (Markdown, HTML entities). This could result in:
UserThere are a lot of possibilities here.. file removal, erroneous permissions, forgetting to persist Docker data in a dedicated volume... Best guess is the shell history for your user account. I'd recommend keeping a large amount of entries (1000 - 10000) and logging timestamps alongside commands so it's easy to find which commands were run during a given time range. Erroneous web server permissionsThe datastore could be overwritten or blanked in case the user and group used by the web server (Apache HTTPD, Nginx + PHP-FPM) do not match those set on the filesystem. This is quite unlikely, as the web server would most probably crash or return errors because it is unable to read and/or write data from/to the storage medium. Storage failureNothing we can really do here :( Recovery utilities might help though if you have root/superuser access to the server. I highly recommend TestDisk to recover deleted files and corrupted data. |
And isn't that exactly what I did? Unfortunately for me 30 days was not enought and it was my fault to to not make for example monthly rotated backups. Data was not important enought for me to care that much.
In fact I trust more open software than closed as with these we can usually discuss about problems in code and improve them. In this case it surprises me that this seems to be known and critical problem but none action have been taken to fix it. Yeah yeah, this is open software and I could have read the source and not to decide use the software at first place, but who really has time?
What I can provide is experience over working with systems that have high data integrity requirements. Losing all data in one error is very bad. The way data is currently handled in Shaarli is very dangerous. I understand the philosophy behind file only system (without need of external services) and I like it but it needs to be properly implemented to be safe enough. I think Shaarli could benefit from something like SQLite where inserting and deleting data is transactional and in case of error data loss is not huge and usually only affects to data currently handled. Not my intention to be hateful and I'm sorry if you felt that way. |
#953 (use a RDBMS) requires major changes. I see your point about the way Shaarli writes the datastore being "unsafe" (no integrity verification after write, no rollback mechanism).
It was unheard of until this bug report. We are still unsure about what caused the problem in the first place. Personally I never encountered data loss with Shaarli in years. Yes hardware/OS-level problems can trash your data. Backups. @uuksu do you use the API? Were there any other changes at the moment the problem started? Any info that would help pinpoint the original cause would be welcome. |
Mechanism of how Shaarli handles the data have been always known, right? This has been critical always when handling has been implemented this way but maybe my and @averymd bug reports have been first real symptoms of that design. It has been disaster waiting for to happen.
I had intentions to use it for my own purposes and I was in progress of implementing .NET client against the API. Due to time constraits I only implemented the link get side of the API to my client. I think the Android client uses the API so that I used to share links from my phone to computer. This is something I used pretty frequently. |
My backups aren't granular enough to be able to pinpoint the moment data was lost, but the next link I added was at 20:47:17 the same day (also using the REST API), and it was the first link remaining after the loss. I did load the UI of Shaarli at 20:18:27 (as restored background tabs in Chrome, I think, given the searches in the URLs), but did not edit any configuration between the 07:07:41 and 20:47:17 API posts. Also no attempts to import bookmarks or use any new third-party tools to create bookmarks. These days, probably 98% of my links come through the REST API. I built a little Express service last fall so that I can use IFTTT to store links from a variety of sources.
These are on shared hosting, so I don't have access to any other logs, and the ones I have only go back to 4/11, alas. With so many things looking fine across my setup, I'm leaning towards a momentary fluke/random disk write error like y'all mentioned, to be honest. It happens occasionally on this hosting provider if processes are being terminated by their proc monitor. Shaarli has never before gotten the monitor's Eye, but if they were clobbering some deadlocked stuff, mine might have gotten swept up even if it wasn't the cause of any problems. But I think I just have way too much smooth-sailing usage for this to be some uncaught bug in Shaarli. |
So maybe it could be related to #1132.
However, if you're using Shaarlier app, it does not use the API. The REST API didn't exist when the app was written. As @virtualtam mentioned, there are quite a few possible sources of error. We can't pinpoint which one caused the errors here, and even if you we could it probably won't be easily fixable with the current datastore system. What we can do, while working on a more reliable system (see @nodiscc comment regarding the usage of a RDBMS), is adding more safeguards:
|
That's distinctly possible--IFTTT will occasionally end up with several items to post at a time, since it's not really listening/acting in realtime. When I went to restore lost items in a simple script, I needed to put several seconds between POSTs to avoid the datastore file being broken partway through the set. |
What we know so far:
What can be done (?):
I guess the missing issues can be opened. |
One thing to note is that the I have not tried to load a faulty datastore, but I don't think there is a check to see if the data is valid and it would probably fail silently and result in an empty LinkDB. |
Hi, I experienced this same issue. I think this is a CRITICAL flaw with how data integrity is maintained. I was editing a note, and when I applied the changes I got a "Nothing found" message returned. I am using Shaarli 0.11.1 and my disk ran out of space. I guess this is what caused it to write a 0 byte datastore.php ? |
Out of disk space will cause some un-intuitive bugs like the impossibility to login (returns |
I experienced the same issue: adding a new link through Firefox extension has wiped my datastore. After that my file datasore.php have a 0 bytes size. Interesting enough the file history.php also have a 0 bytes size. After some test using a 4Mo RAMdisk to simulate full disk, I have successfuly corrupted my datastore. Below an proposition to use a temporary file to garantee to always have a valid datasore. public static function writeFlatDB($file, $content)
{
if (is_file($file) && !is_writeable($file)) {
// The datastore exists but is not writeable
throw new IOException($file);
}
if (!is_writeable(dirname($file))) {
// The datastore does not exist and its parent directory is not writeable
throw new IOException(dirname($file));
}
// generate temporary filename by adding time stamp with microseconds precision
$mtime = microtime();
$timestamp = substr($mtime,11) . substr($mtime,2,9);
$tempFile = $file . "." . $timestamp;
// Calculate payload
$payload = self::$phpPrefix . base64_encode(gzdeflate(serialize($content))) . self::$phpSuffix;
$payloadLength = strlen($payload);
// Write payload to temporary file and save how many bytes was written
$bytesWritten = file_put_contents($tempFile, $payload);
// Write datastore to temp file, copy to original file, then delete temp file
// Stop at first error,we should have a valid datastore at any point
if ($payloadLength !== $bytesWritten ){
// we do not write the same number of bytes
if (is_file($tempFile)) {
// delete tempFile if exists
unlink($tempFile);
}
die(t("Incomplet write to temporary file $tempFile"));
} elseif (!copy($tempFile,$file)) {
// Copy have raise an error
die(t("Error when copying $tempFile to $file"));
} else {
// if everyting is OK we can delete tempfile
unlink($tempFile);
}
return $bytesWritten;
} |
Adding security label because it affects data integrity |
Thanks!!! This works perfectly and is much better than nothing. If only I had known before. |
#1570 (adding a mutex on the datastore file) should fix this issue. I'm sorry for everyone who have lost data. |
So, I have my own instance of Shaarli running on my server. Current version is v0.10.2. I use Shaarli pretty rarely and mainly use it to share links between my phone and computer.
It's been while since I last logged in to Shaarli but today I did and for my surprise all links are gone. Datastore.php is empty file right now. I have backups from last 30 days and in all of these datastore is empty so it must have happened earlier and I have not just noticed. All in all, I lost all of my data.
What could have caused this? Why would all links get wiped by themselves?
The text was updated successfully, but these errors were encountered: