Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Lock Support #186

Open
kiler129 opened this issue Apr 11, 2019 · 4 comments
Open

Add Lock Support #186

kiler129 opened this issue Apr 11, 2019 · 4 comments

Comments

@kiler129
Copy link
Contributor

Currently nothing prevents usercrontab from running two copies of presta:sitemaps:dump twice. This will cause race condition for index modification.

I think support for blocking FS-based lock type can be easily added using Symfony Lock, so that still you can generate multiple roots at once but the index operations are locked to only one instance at any given time.

WDYT?

@yann-eugone
Copy link
Member

Hello, Thank you for reporting this issue.

I usually rely on a system based lock which I write in my crontabs, something like :

0 * * * * flock -xn var/crontab/sitemap.lock bin/console presta:sitemaps:dump &>> var/crontab/sitemap.log

Adding a lock support could be possible, but I dislike adding optional dependencies : it's always painful to maintain.

Do you think that flock is doing the job for you ?

@kiler129
Copy link
Contributor Author

@yann-eugone Hi! Thank you for the response!

The flock is how we handle this currently, however we would like to speed up the generation process (because with millions of links it takes "a while") by splitting sitemap to multiple logical sections which can be generated independently.

The whole process of generation in this bundle, as far as I can see, is atomic and indeed multiple sections can be generated at the same time... almost :)
The only missing part is the index update - there's a fair chance of race condition while multiple sections wants to modify the file. That's why I suggested section-level lock where lock blocks until previous section finishes its index write (which should be under a second, so not noticeable). That's why file-based lock in cron doesn't really help in such case.

I agree on maintenance of the additional dependency. If so I was thinking about adding symfony/lock to suggests in composer.json and performing feature-detection like e.g. Symfony Framework Bundle does in bunch of places to adapt. WDYT?

@yann-eugone
Copy link
Member

You want to use the a section based lock to protect the sitemapindex, because you need to speed up your sitemap generation : dumping multiple sections in parallel.

Improving performance is always a good thing.
I'm just afraid of all possible errors this may introduce.
If you had to write such feature, what could be the start point (in the code I mean) ?

About the "optional dependency" thing.
We will have no choice to add a suggest in the composer.json, but the problem is always checking versions, because suggestions does not handle version requirements.
Also, the Lock component is pretty new (since 3.4). So if we go to this feature, we will need our own interface, with a default Null implementation + a symfony/lock adapter for it.

@kiler129
Copy link
Contributor Author

I poked around and you're right about the suggest problem, but I think the solution is simple.

The default dumper does a parallel-sensitive operations within these lines:

if (null !== $section) {
// Load current SitemapIndex file and add all sitemaps except those,
// matching section currently being regenerated to root
$index = $this->loadCurrentSitemapIndex($targetDir . '/' . $this->sitemapFilePrefix . '.xml');
foreach ($index as $key => $urlset) {
// cut possible _X, to compare base section name
$baseKey = preg_replace('/(.*?)(_\d+)?/', '\1', $key);
if ($baseKey !== $section) {
// we add them to root only, if we add them to $this->urlset
// deleteExistingSitemaps() will delete matching files, which we don't want
$this->getRoot()->addSitemap($urlset);
}
}
}
file_put_contents($this->tmpFolder . '/' . $this->sitemapFilePrefix . '.xml', $this->getRoot()->toXml());
$filenames[] = $this->sitemapFilePrefix . '.xml';
// if we came to this point - we can activate new files
// if we fail on exception eariler - old files will stay making Google happy
$this->activate($targetDir);

The lock should be acquired before, since collection & saving of XMLs can happen in the same time. The only sensitive operation is actually updating the index.
What I would do, to not provide the null-implementation of locking is simply make the dumper more modular, meaning separate the parallel-sensitive part to a protected method. From there it's a simple operation to provide two dumpers: lock-aware and lock-unaware (as it is now). The implementation can be either configured manually when people need it (since you need to configure Symfony Lock, you need to have it etc) or detected during container compilation.

Doing such detection sort of off-line during build allows for a great DX - you may get a warning during container compilation (from container extension) saying "hey, if you install Lock we will give you parallel compatibility, otherwise don't run two generations at the same time". As an improvement (which I think is out of scope for this issue) we may use a simple flock to at least detect (and prevent) running two instances at the same time if symfony/lock is not present/configured.

Of course the solution is not idiot-proof: you can still run the same section twice, or run a section + all sections at the same time and break the sitemap potentially.
All in all: I think it's an easy feature to implement, just shift some code in Dumper and provide an LockingDumper which simply wraps the index update function in-between lock calls.

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants