-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flag for automatic Blocksize in commandline #171
Comments
What is the "best" though? Generally the ideal size/count depends on your preference for performance, error resilience and size of recovery (if a percentage wasn't specified).
A better explanation in the help may be useful. For a general overview, PAR2 is block based, in the sense that the data is broken up into blocks. A single error renders the entire block it's within, corrupt, meaning that larger blocks are less resilient against random errors.
From memory, there's no analysis, it just picks 2000 as the count. Personally I think it should be removed, as there's no reasoning behind the '2000' figure, and it can make people think that it's a good value to use (when there's really no reason to believe so). |
@animetosho Yeah the absolute best scenario is for the par2cmdline to select the appropriate block count / size based on the file structure and size But yes the enhanced documentation on the man pages with a formula to come up with this value for block count pr size would definitely help. Or even a flag for Medium , Low or High values Since i am not mentioning any switches (s or b) my block count would be 2000, is there any formula that you use to come up with a value for block count / size I also would like to know if 2000 was chosen as a best figure and if there is a specific reason that we are not seeing @BlackIkeEagle @mdnahas |
Firstly, I don't know what you mean by "file structure", but I think you're assuming that an "ideal" setting exists, when there's no such thing.
Like above, you'd have to define what these actually mean.
Amount of recovery data = recovery block count * block size You need to supply two of the values above to work out the third.
It's not. |
@animetosho Many thanks for the insights and guidance Amount of recovery data = recovery block count * block size That means for a 10GB file , if i am setting a recovery record percentage of 10%
|
Yes, that's correct.
Each input block cannot contain more than one file, which means that you'll at least need a block for each file.
I can't really say with the given details. They have a lot of files, and if their files vary greatly in size, you'll get a lot of inefficiency due to files needing to be padded to block lengths. For example, if you have two files: 1GB and a 1B file. The minimum number of input blocks is 2, but lets say you choose to have 3 blocks instead. The 1GB file will be broken up into two blocks, each 500MB, whilst the 1B file will consume the third block. Since the block size chosen here is 500MB, the third file will consume a whole 500MB block, despite only being 1B in size.
Assuming you specify a percentage, it doesn't matter, because the one you don't specify (count or size) will be calculated from the other.
As previously mentioned, it depends on your preference for performance and error resilience.
I recall the default in current versions is actually based on the amount of memory available on your system (which can be problematic if you have too much RAM available). What's ideal depends on the RAM you have available and how much you're willing to dedicate to par2cmdline. I probably wouldn't go past several GB though, as the benefit you get from reduced I/O quickly diminishes. |
@animetosho what i specifying is the -r switch with value 10 and i am not specifying either s (size) or block count(b) Then it would default to 2000 block count and the size gets derived from 10% of (r) and 2000 of (default b) Is there anyway for me to come up with the block count or size. |
I really don't understand what you're asking there. |
@animetosho Many thanks i have learned quite a lot from this thread with your engagement. By any chance through the commandline or by other means we will be able to analyze the already created parity files to know if what the block count and size it is created with ? |
I don't recall par2cmdline offering any feature to show info of a PAR2. |
If I may interject from a 'help desk' perspective: @the123blackjack may be coming from the position of the archetypal (intelligent and domain-inexperienced) new user. From that perspective, par2cmdline is somewhat difficult to interact with. One cannot reliably do From that perspective:
Can be answered by "whatever is a (non-optimized) value that successfully creates the parity set without using an unexpected amount of disk space". Users in this category are not usually looking for the optimal number. They are looking for the "right now" number. In my experience this is normally because they are in step 2 of the journey that goes "I hopefully know enough to find a tool that does what I want" > running the command the first time and seeing what happens > checking whether it did what is wanted > starting a (probably endless) loop of { reading documentation > tweaking the options > refining the result > asking questions }. They know that there are an almost infinite number of tools that just will not accomplish what they need, and so those first couple of steps are to find out if a specific tool is worth looking in to more. This type of user can actually be a big benefit because they can call attention to the challenges that potential users face before silently giving up (opening up an opportunity for the project maintainers to address them), and as they get more experience they can 'grow in to' users who add significant value to the project itself. |
Not trying to be mean, but that's a lot of talk without providing any workable solution. I wouldn't describe 'best' as a "right now" number, but even if I use the latter, I have no clue how to approach that. Don't get me wrong - I'm all for making things friendlier to users and seriously value it myself. And yeah, I get that understanding these parameters does require some effort - something I'd wish wasn't necessary. |
I wasn’t trying to shout at all, but instead to first build a bridge for understanding. Until they reply I’m not sure whether @the123blackjack actually agrees with the perspective I’m sharing, but in my experience these kinds of discussions do help reach the solution. I’m all for helping provide a workable solution, so let me add this suggestion for a non-optimized auto mode: number of blocks = number of files The “minimum block size” would be at or above the threshold where processing times get significantly higher. I have no expertise here, but I suspect that there’s a reasonable number that’s already known. Notes:
|
Well firstly, appreciate the suggestion.
This almost certainly won't work because the settings likely conflict with each other. Note that 'number of blocks' and 'block size' are related, so you generally only specify one of those values and the other is derived.
Are you expecting the user to supply a percentage? Your earlier example excluded this. |
Aha. My understanding needs some refinement. I was going in the wrong direction based on the fact that each file needs it’s own block. Instead: block size as I mentioned, and then number of blocks is derived.
You’re right. Apologies for polluting the conversation. I’m not expecting a user to specify a percentage at first, though it is something that I expect they will tweak fairly soon in the learning process. I don’t know what par2cmdline currently does to determine the amount of parity data it adds, but whatever that is is fine. Mainly I brought that point in to highlight the likely user expectation about disk space used. The example above:
I think automatic values should check to see if a “reasonable” amount of disk space will be used and adjust according, thus avoiding a “wtf?” user reaction. |
Assuming you mean "block size = size of the smallest file", and ignoring the minimum value part for now.
It requires the user to specify the amount.
I do think some warning could be added if efficiency falls below a certain threshold, although explaining it concisely may be a challenge. |
I'm comparing the experience of A first time user can quite easily and simply call In the example of As mentioned above: the 2000 block default is arbitrary, but I suspect that there is a formula that can be found that will make a good enough estimate for 80% of use-cases. In the example of running create on one large file with a bunch of smaller ones: the user can run in to a block size that causes large PAR2 files with minimal protection. I'm not sure if the answer to this is simple (I suspect less simple than the 2000 check), but it could even be a warning of "This will result in 29GB of parity files and 1% protection. Proceed?". |
How so?
As mentioned earlier, you're going to have to be specific if you want to get anywhere. Unfortunately, you can't code based on wishful thinking alone. |
It already works. In my using it to learn more as a part of this thread the combined sizes of the PAR2 files have been <10% of the file they were created from while still providing meaningful protection (I did not track the percentage of protection for each, but minor (intentional) errors were corrected without errors)
While some might see this response as frustrated, I appreciate that you’re willing to take the hard stances to avoid wasting time. To be clear: I don’t yet have the facts that someone would need to make a ruleset. I have been asking direct questions and proposing ideas as a part of uncovering those facts. It also does not seem like there is such a thing as a “complete set of logic, with exact thresholds” because of how much variability there is for new users who wish to create parity files from their files. In this thread I do see progress towards finding a set of logic and formula for 80% of use-cases. At that point it could be proposed or submitted as a PR. We have already identified some edge cases, identified the necessary variables, and identified at least two rules that would be part of the final calculation. |
Hi @BlackIkeEagle @mdnahas
I mainly use par2 for creating parity for family photos and videos
This is an enhancement request for par2cmdline to choose the best possible block size and count based on the analysis of file structure and sizes as i am / users may be unsure of what should be set
Command i currently use: (not mentioning -s or -b switches)
par2 create -r10 "parchivename" "FilesToCreateParity*.extention"
What i hope for is
Size is it based on analysis of file size or a default value.
Hoping for your valuable feedback and thoughts...
The text was updated successfully, but these errors were encountered: