Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loss, all links are gone #1292

Closed
uuksu opened this issue Apr 14, 2019 · 23 comments
Closed

Data loss, all links are gone #1292

uuksu opened this issue Apr 14, 2019 · 23 comments
Labels
bug it's broken! security
Milestone

Comments

@uuksu
Copy link

uuksu commented Apr 14, 2019

So, I have my own instance of Shaarli running on my server. Current version is v0.10.2. I use Shaarli pretty rarely and mainly use it to share links between my phone and computer.

It's been while since I last logged in to Shaarli but today I did and for my surprise all links are gone. Datastore.php is empty file right now. I have backups from last 30 days and in all of these datastore is empty so it must have happened earlier and I have not just noticed. All in all, I lost all of my data.

What could have caused this? Why would all links get wiped by themselves?

@averymd
Copy link

averymd commented Apr 15, 2019

I'll second this report. I use the API to add links rather often, and my first link is now April 11, but I've been running Shaarli for a long time and had hundreds of links.

@ArthurHoaro
Copy link
Member

Did the data loss occurred after an update?

What could have caused this? Why would all links get wiped by themselves?

Every time that the datastore is altered, its whole content is written in datastore.php. So either a bug made it write an empty content, or an error occurred while writing the file (a full filesystem for example).

Can you both look for any error in your HTTP server logs?

@ArthurHoaro ArthurHoaro added the support installation and configuration issues label Apr 15, 2019
@ArthurHoaro ArthurHoaro added this to the 0.11.0 milestone Apr 15, 2019
@averymd
Copy link

averymd commented Apr 15, 2019

I did not update recently. I'm on 0.10.0 and have been since late summer of last year.

It looks an error while handling the file. My error log from the morning things got wiped reads:

[Thu Apr 11 07:07:41 2019] [warn] [client 173.236.193.82] mod_fcgid: stderr: PHP Notice: unserialize(): Error at offset 577712 of 649411 bytes in /home/username/domain/application/FileUtils.php on line 77

@uuksu
Copy link
Author

uuksu commented Apr 15, 2019

I've not updated since I installed Shaarli which was sometimes around August.

Only Shaarli related error I could find from web servers logs was this, but this is old one:

[TueNov 13 16:01:38.441127 2018] [fcgid:warn] stderr: PHP Notice: Undefined offset: 1 in
/www/shaarli/application/Updater.php on line 423

But really, Shaarli does not take copy of the original datastore before overwriting it? Quite a trust put on that write always succeedes (which it clearly does not).

@virtualtam
Copy link
Member

Hey,

But really, Shaarli does not take copy of the original datastore before overwriting it?

The first section of the Upgrade and migration documentation page is called Backup your data for a reason!

Making regular backups of data that is important to you is a good practice, regardless of the software being used. Also regardless of its test coverage, data backend and codebase maturity. As a rule of thumb, power outages, disk failures and network issues have a fondness to happen at the worst of time.

Quite a trust put on that write always succeedes (which it clearly does not).

I understand that losing data is frustrating. Were I cynical, I'd argue that this is quite a trust put on a piece of software you got for free and that is maintained by a handful of people, on their free time and good will.

We've done our best to improve Shaarli's features, make it more stable, provide easy ways to install it through documentation and Docker packaging... Development has been slow-paced for a while, maybe it's time for new people to step in, if you feel it's a worthy tool to self-host and use on a regular basis?

@averymd
Copy link

averymd commented Apr 15, 2019

Well, I'm not interested in critiquing the project's features/architecture or its maintainers. I'd just like to provide enough information to help with a fix down the road. Is there any other diagnostic information I could provide that would be helpful?

@nodiscc
Copy link
Member

nodiscc commented Apr 15, 2019

It may not be very helpful now, but things like this is why my backup policy retains daily, weekly and monthly backups for a while (realizing data was lost after all backups are rotated feels bad). I also have a few manual HTML exports lying around.

About providing more info:

Can the date of the [Thu Apr 11 07:07:41 2019] ... unserialize(): Error at offset message be linked to the moment data was lost? For example is there a wild difference (filesize? contents?) between datastore.php after this date and the previous backup?

In short, can you track in your backup history at which point the datastore got corrupted/emptied?

If you can do that, there may be a way to correlate the incident with something your webserver/shaarli logs (web interface login/interaction? API interaction? server/php errors? power loss? drive full/failure? other "usual" messages that could give some context?) - provided you still have the logs from that time.

@virtualtam
Copy link
Member

As far as I know, data loss could result from:

Shaarli

The datastore format is itself quite simple (deplite a tad convoluted): links are stored in a PHP array, that is serialized, gzipped and written to a file.

Existing links are mainly altered when upgrading Shaarli, which runs an Updater to migrate configuration and bookmark data to new(er) formats. Note that changes to the bookmark format have been quite scarce over time.

Updater operations are logged under data/updates.txt, additional information might be found under data/log.txt.

Another possibility would be a bug in the REST API. We haven't had much feedback regarding its usage, so anything that could help highlight and reproduce a bug would be helpful :)

Shaarli + content format

There are occasional issues raised following a failure to bookmark a given link or import an existing collection (e.g. from browser-saved bookmarks). This is usually due to the bookmark's description containing special formatting (Markdown, HTML entities).

This could result in:

  • the bookmark not being imported
  • the bookmark being imported, and Shaarli failing to write the resulting datastore (resulting in data loss?)
  • the bookmark being imported, Shaarli managing to write the datastore but failing to read it

User

There are a lot of possibilities here.. file removal, erroneous permissions, forgetting to persist Docker data in a dedicated volume...

Best guess is the shell history for your user account. I'd recommend keeping a large amount of entries (1000 - 10000) and logging timestamps alongside commands so it's easy to find which commands were run during a given time range.

Erroneous web server permissions

The datastore could be overwritten or blanked in case the user and group used by the web server (Apache HTTPD, Nginx + PHP-FPM) do not match those set on the filesystem. This is quite unlikely, as the web server would most probably crash or return errors because it is unable to read and/or write data from/to the storage medium.

Storage failure

Nothing we can really do here :( Recovery utilities might help though if you have root/superuser access to the server. I highly recommend TestDisk to recover deleted files and corrupted data.

@uuksu
Copy link
Author

uuksu commented Apr 16, 2019

The first section of the Upgrade and migration documentation page is called Backup your data for a reason!

And isn't that exactly what I did? Unfortunately for me 30 days was not enought and it was my fault to to not make for example monthly rotated backups. Data was not important enought for me to care that much.

I understand that losing data is frustrating. Were I cynical, I'd argue that this is quite a trust put on a piece of software you got for free and that is maintained by a handful of people, on their free time and good will.

In fact I trust more open software than closed as with these we can usually discuss about problems in code and improve them. In this case it surprises me that this seems to be known and critical problem but none action have been taken to fix it. Yeah yeah, this is open software and I could have read the source and not to decide use the software at first place, but who really has time?

We've done our best to improve Shaarli's features, make it more stable, provide easy ways to install it through documentation and Docker packaging... Development has been slow-paced for a while, maybe it's time for new people to step in, if you feel it's a worthy tool to self-host and use on a regular basis?

What I can provide is experience over working with systems that have high data integrity requirements. Losing all data in one error is very bad. The way data is currently handled in Shaarli is very dangerous. I understand the philosophy behind file only system (without need of external services) and I like it but it needs to be properly implemented to be safe enough. I think Shaarli could benefit from something like SQLite where inserting and deleting data is transactional and in case of error data loss is not huge and usually only affects to data currently handled.

Not my intention to be hateful and I'm sorry if you felt that way.

@nodiscc
Copy link
Member

nodiscc commented Apr 16, 2019

Shaarli could benefit from something like SQLite

#953 (use a RDBMS) requires major changes.
See also https://gitter.im/shaarli/Shaarli?at=5c72cac0ab952d308583f5b7

I see your point about the way Shaarli writes the datastore being "unsafe" (no integrity verification after write, no rollback mechanism).

seems to be known and critical problem

It was unheard of until this bug report.

We are still unsure about what caused the problem in the first place. Personally I never encountered data loss with Shaarli in years. Yes hardware/OS-level problems can trash your data. Backups.

@uuksu do you use the API? Were there any other changes at the moment the problem started? Any info that would help pinpoint the original cause would be welcome.

@uuksu
Copy link
Author

uuksu commented Apr 16, 2019

It was unheard of until this bug report.

Mechanism of how Shaarli handles the data have been always known, right? This has been critical always when handling has been implemented this way but maybe my and @averymd bug reports have been first real symptoms of that design. It has been disaster waiting for to happen.

@uuksu do you use the API? Were there any other changes at the moment the problem started? Any info that would help pinpoint the original cause would be welcome.

I had intentions to use it for my own purposes and I was in progress of implementing .NET client against the API. Due to time constraits I only implemented the link get side of the API to my client.

I think the Android client uses the API so that I used to share links from my phone to computer. This is something I used pretty frequently.

@averymd
Copy link

averymd commented Apr 16, 2019

Can the date of the [Thu Apr 11 07:07:41 2019] ... unserialize(): Error at offset message be linked to the moment data was lost? For example is there a wild difference (filesize? contents?) between datastore.php after this date and the previous backup?

My backups aren't granular enough to be able to pinpoint the moment data was lost, but the next link I added was at 20:47:17 the same day (also using the REST API), and it was the first link remaining after the loss. I did load the UI of Shaarli at 20:18:27 (as restored background tabs in Chrome, I think, given the searches in the URLs), but did not edit any configuration between the 07:07:41 and 20:47:17 API posts. Also no attempts to import bookmarks or use any new third-party tools to create bookmarks.

These days, probably 98% of my links come through the REST API. I built a little Express service last fall so that I can use IFTTT to store links from a variety of sources.

  1. That service doesn't show any errors in its Apache logs. It received a call at 07:07:40 that successfully created the link in Shaarli, then another at 20:47:17 that also worked.
  2. Sentry is attached to that service, and doesn't show any errors. I also log successes there: it shows 215 successful POST events in the last 90 days. (wow, I save a lot of links!)

If you can do that, there may be a way to correlate the incident with something your webserver/shaarli logs (web interface login/interaction? API interaction? server/php errors? power loss? drive full/failure? other "usual" messages that could give some context?) - provided you still have the logs from that time.

These are on shared hosting, so I don't have access to any other logs, and the ones I have only go back to 4/11, alas.

With so many things looking fine across my setup, I'm leaning towards a momentary fluke/random disk write error like y'all mentioned, to be honest. It happens occasionally on this hosting provider if processes are being terminated by their proc monitor. Shaarli has never before gotten the monitor's Eye, but if they were clobbering some deadlocked stuff, mine might have gotten swept up even if it wasn't the cause of any problems.

But I think I just have way too much smooth-sailing usage for this to be some uncaught bug in Shaarli.

@ArthurHoaro
Copy link
Member

These days, probably 98% of my links come through the REST API.

So maybe it could be related to #1132.

I think the Android client uses the API so that I used to share links from my phone to computer. This is something I used pretty frequently.

However, if you're using Shaarlier app, it does not use the API. The REST API didn't exist when the app was written.

As @virtualtam mentioned, there are quite a few possible sources of error. We can't pinpoint which one caused the errors here, and even if you we could it probably won't be easily fixable with the current datastore system.

What we can do, while working on a more reliable system (see @nodiscc comment regarding the usage of a RDBMS), is adding more safeguards:

  1. Make sure that the serialization, gzip compression, etc go well before writing the datastore.
  2. Instead of overwriting the datastore, write it to a new file, and delete the old one if the size is consistent. This would allow Shaarli to be in kind of a fail safe mode if 2 datastore files are present. The downside of this solution is that it adds IO operations, which can also be sources of error.

@averymd
Copy link

averymd commented Apr 22, 2019

These days, probably 98% of my links come through the REST API.

So maybe it could be related to #1132.

That's distinctly possible--IFTTT will occasionally end up with several items to post at a time, since it's not really listening/acting in realtime. When I went to restore lost items in a simple script, I needed to put several seconds between POSTs to avoid the datastore file being broken partway through the set.

@nodiscc
Copy link
Member

nodiscc commented May 17, 2019

What we know so far:

  • Corruption occured (probably) between 07:07:41 and 20:47:17 on Thu Apr 11 for @averymd
  • Shaarli and client tools (API based for the most part) do not report any activity like upgrades, new posts, or a simple visit during this interval.
  • Whether the bug is due to the modification on 20:47:17 or an unrelated/underlying cause prior to that is unknown. Experience seems to show this is not reproducible/extremely rare but the impact is high (need to restore backups, need to have a frequent/long retention backup policy).
  • Datastore write operations are unsafe in case something goes wrong during linkdb generation or writing to disk.

What can be done (?):

  • Document a reliable backup procedure
  • Make sure that the serialization, gzip compression, etc go well before writing the datastore. (how?)
  • Instead of overwriting the datastore, write it to a new file, and delete the old one if the size is consistent. (need to define what size difference is acceptable - size of LinkDB should normally not change from more than 1 except for batch delete operations as far as I know.)
  • Add a write lock to ApiMiddleware (possibly related) API bug when posting two links in parallel  #1132
  • Require a RDBMS (file-based like sqlite or client-server-based like mysql/mariadb/postgres...) Use SQLite as storage for links, settings and plugin data #953

I guess the missing issues can be opened.

@nodiscc
Copy link
Member

nodiscc commented May 17, 2019

One thing to note is that the datastore.php file was probably existing, else a new one would have been created with the default Welcome to Shaarli: shaare.

I have not tried to load a faulty datastore, but I don't think there is a check to see if the data is valid and it would probably fail silently and result in an empty LinkDB.

@ArthurHoaro
Copy link
Member

Require a RDBMS (file-based like sqlite or client-server-based like mysql/mariadb/postgres...) #953

I'm really close to submitting a PR for #445 (it's working, I'm writing unit tests), which will introduce a proper service layer, allowing us to work towards using RDBMS.

@ArthurHoaro ArthurHoaro modified the milestones: 0.11.0, 0.11.1 Jul 27, 2019
@ArthurHoaro ArthurHoaro modified the milestones: 0.11.1, 0.11.2 Aug 7, 2019
@daprofessa19
Copy link

Hi,

I experienced this same issue. I think this is a CRITICAL flaw with how data integrity is maintained.

I was editing a note, and when I applied the changes I got a "Nothing found" message returned.

I am using Shaarli 0.11.1 and my disk ran out of space. I guess this is what caused it to write a 0 byte datastore.php ?

@nodiscc
Copy link
Member

nodiscc commented Apr 14, 2020

my disk ran out of space. I guess this is what caused it to write a 0 byte datastore.php

Out of disk space will cause some un-intuitive bugs like the impossibility to login (returns wrong username/password), or in your case, probably writing a 0b datastore, yes. Possible solutions to fix data loss risks are listed above

@DaarkMoon
Copy link

I experienced the same issue: adding a new link through Firefox extension has wiped my datastore. After that my file datasore.php have a 0 bytes size. Interesting enough the file history.php also have a 0 bytes size.

After some test using a 4Mo RAMdisk to simulate full disk, I have successfuly corrupted my datastore.
When looking at code I see that excepted 2 cases there is no checks of the results when calling function file_put_contents. For most of the case there is no impact (test code) or small impact (configuration loses) but in some case like in function writeFlatDB this lead to critical data loses (all link can be lost).

Below an proposition to use a temporary file to garantee to always have a valid datasore.

    public static function writeFlatDB($file, $content)
    {
        if (is_file($file) && !is_writeable($file)) {
            // The datastore exists but is not writeable
            throw new IOException($file);
        }

        if (!is_writeable(dirname($file))) {
            // The datastore does not exist and its parent directory is not writeable
            throw new IOException(dirname($file));
        }

        // generate temporary filename by adding time stamp with microseconds precision
        $mtime = microtime();
        $timestamp = substr($mtime,11) . substr($mtime,2,9);
        $tempFile = $file . "." . $timestamp;

        // Calculate payload
        $payload = self::$phpPrefix . base64_encode(gzdeflate(serialize($content))) . self::$phpSuffix;
        $payloadLength = strlen($payload);

        // Write payload to temporary file and save how many bytes was written
        $bytesWritten = file_put_contents($tempFile, $payload);

        // Write datastore to temp file, copy to original file, then delete temp file
        // Stop at first error,we should have a valid datastore at any point
        if ($payloadLength !== $bytesWritten ){
            // we do not write the same number of bytes
            if (is_file($tempFile)) {
               // delete tempFile if exists
               unlink($tempFile);
            }
            die(t("Incomplet write to temporary file $tempFile"));
        } elseif (!copy($tempFile,$file)) {
            // Copy have raise an error
            die(t("Error when copying $tempFile to $file"));
        } else {
             // if everyting is OK we can delete tempfile
             unlink($tempFile);
        }

        return $bytesWritten;
    }

@nodiscc nodiscc added bug it's broken! security and removed support installation and configuration issues labels May 1, 2020
@nodiscc
Copy link
Member

nodiscc commented May 1, 2020

Adding security label because it affects data integrity

@andreworg
Copy link

andreworg commented Aug 21, 2020

Below an proposition to use a temporary file to garantee to always have a valid datasore.

Thanks!!! This works perfectly and is much better than nothing. If only I had known before.
I think datasore was a typo, but it fits my mood now after having lost more than a year worth of bookmarks.

@ArthurHoaro ArthurHoaro modified the milestones: 0.12.0, 0.12.1 Sep 3, 2020
@ArthurHoaro
Copy link
Member

#1570 (adding a mutex on the datastore file) should fix this issue. I'm sorry for everyone who have lost data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug it's broken! security
Projects
None yet
Development

No branches or pull requests

8 participants