-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increasing archive throughput #901
Comments
It is a thing that has been discussed but I can't say I've got any schedule for implementing it. |
Would it be possible to run two instances of plotman archive each with its own config.yaml, and a slightly different target definition? |
Yes, sorry I didn't think more to mention that. It detects based on the site root so if you just mount to two different directories that would cut it. I don't think we actually have a configuration path override though... Definitely a missing feature. I should also have mentioned https://github.com/rjsears/chia_plot_manager. The author uses plotman to plot and their own tooling to do "higher end" archiving (plus whatever other features it has). |
Thanks for the info. I just need to double the plotman archive throughput at this point so will take a look at the chia plot manager later. I am able to run two instances of plotman archive now each working off a different tmp dir (dst drive), and to a different site root. |
Yes it'll be more convenient to allow config file override. |
Actually I think there is some issue there. When 1 instance is running rsync, the other instance doesn't start rsync. But perhaps since I changed the archive sleep time to 10s, each instance from time to time starts rsync around the same minute. I recall the archive.py code is actually checking for the transfer script name and argument list. Any suggestions? |
plotman does check for the To be clear, yes, we are talking about an annoying hacky way to get to what you want (sort of). I'm not suggesting this is a good way for plotman to work. |
That's exactly how I set this up. Two config files with different site_roots (and different buffer drive paths, and different log directories). The inconvenience is I had to copy the the desired config file to the only location plotman is looking before starting that plotman instance, but the real problem is, it seems, plotman still detects the rsync process run by the other instance, most of the time. This test seems to only fail when the other rsync is started within the same minute. Therefore, I either only see 1 rsync running (most of the time), or see 2 rsync processes that are started at the same hour and minute. Is the code checking for the command_name (rsync in both instances) AND site_root (different by 1 character in both instances)? So with this setup and the problem, I have increased my archive throughput but by nowhere close to doubling it. The buffer drive Use% is still growing, although at a much slower rate than when only 1 instance was running. |
plotman/src/plotman/archive.py Lines 191 to 195 in 77f85e3
What are the actual site roots? Is one just the other one plus a character? Perhaps just share both complete config config files. |
Yes. The second site_root is first $site_root}1. So it's getting a partial match by using startwith I suppose. |
Is there a function that does exact match? Or I can change site_root to something like appending 2 to it I suppose. |
Yeah, for now, making it so that neither starts with the other seems best. I'm sure the code could change as well. |
Ok. So after naming 2 site_roots neither of which starting with the other, both plotman archive instances are firing up 1 rsync regardless of the other instance. To summarize I guess a few things are good to have:
FYI I'm using plotman archive only and it's been working well. |
Discussed in #900
Originally posted by jayhohoho2019 August 8, 2021
Hello,
Which parameter if any is for archive polling period? 1 minute believe it or not is getting too long now for me. Thanks.
In addition to making the archive mode polling period configurable, could we allow running multiple rsync in parallel? I'm referring to the local_rsync mode only at this point.
My issue is that, with a fast plotter in use, it now takes longer to rsync a plot from my dst drive (NVME SSD) to the final archive HDD than to create a plot and save it to the dst drive, so the dst drive fills up after a while. Running 2 rsync to 2 archive HDDs would solve this.
The text was updated successfully, but these errors were encountered: