Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more specific return code / rc / error codes of the borg process #6756

Closed
ThomasWaldmann opened this issue Jun 6, 2022 · 37 comments
Closed
Assignees
Milestone

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Jun 6, 2022

Currently, borg has only a few, rather generic return codes: 0 ok, 1 warning, 2 error, >=128 for termination due to signals.

This could be improved (preferably with a breaking release) by moving some stuff to more specific codes. Everything that has no specific code assigned stays at the current code.

First task is to go through the issue tracker and link to all issues talking about (more specific) error codes from here.

Then it needs some planning: what makes sense, what does not, ...

When planning is finished and there is some consensus, implementation can follow.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Jun 6, 2022

I'll update this list here with whatever we come up with:

# generic return codes used by borg
0 OK
1 WARNING
2 ERROR
# 3 .. 63 reserved for more specific errors
3 repository is locked
4 connection error  # e.g. ssh returned 255 (other return codes == ?)
5 quota limit reached / no space left in file system
...
# 64 .. 127 reserved for more specific warnings
...
128+N killed by signal N (N == 0..64 for posix signals / realtime signals?)
# 200 .. 255 reserved for informative RCs (like "0 OK", but more specifically telling something)

If a wrapper is unaware of a specific return code, it must treat it like 2 ERROR and notify the user.

An updated wrapper at least needs to implement a range check for the new error / warning / info ranges.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Jun 6, 2022

I'll update this list with other related issues we find:

@fantasya-pbem
Copy link
Contributor

I'd like to suggest an enhancement of error codes.

We should at least reserve or define a broader range of codes that could be implemented in future versions, for example:

1-63: warning (1 meaning "general warning" as it is now)
64-127: error (64 meaning "general error")

(128+N stays unchanged)

If we don't want to implement more detailed return values yet, at least we would change rc 2 to 64.
Users would have to change their wrappers from rc 2 to rc >= 64 to recognize errors and could keep the logic for warning.

If we want, we could already define some sub-ranges, e.g. "network errors", "disk related", "quota related" and the like.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Jun 6, 2022

No, rc 0 1 2 need to stay as is. There are enough free codes without changing these.

@fantasya-pbem I've updated #6756 (comment) .

@ThomasWaldmann
Copy link
Member Author

hmm, if we only reserve space for warnings/errors, we only have 0 for success / information. not sure if that is a problem / limitation in practice.

@fantasya-pbem
Copy link
Contributor

We could reserve the range from 128+64 onwards, maybe 200+ for possible success codes, if we ever have the need for this. Posix signals / real-time signals have max code 32 / 64.

@fantasya-pbem
Copy link
Contributor

fantasya-pbem commented Jun 6, 2022

This proposal collects possible error/warning cases that can occur. Some may not be detectable or make no sense with the current implementation.

Specific errors

3 unknown cli option
4 pattern syntax error
5 repository not found
6 archive not found
7 locked repo (taken from #5158)
10 borg init unsupported encryption mode or key algorithm
11+ more init or repo related
15 borg check issues could not be fixed
16+ more check related
20 borg create failed, checkpoint archive left
21+ more create related
30 repository quota reached
31 backup disk full
32 decryption failure
33 decompression failure
34 checksum failure (if applicable)
35 missing chunk
36 segment not found
37 hash not found in index
38 unsupported key
39+ more disk or data structure related
50 remote error due to network error like "broken pipe" (taken from #1424)
51+ more specific SSH errors, needs investigation (taken from #3939)

Specific warnings

64 cannot combine cli options
65 lock removed
66+ more lock related
70 borg check --max-duration stopped
71 borg check --repair fixed issues (taken from #3446)
72 repository check found issues (only useful if repo check issue prevents archive check)
73 archive check found issues (only useful if archive check issue prevents verify)
74 verify check found issues
75+ more check related
80 cache rebuilt occurred / cache out of sync
81+ more cache related
85 borg create could not find root path (inspired by #6708)
86 borg create could not read files
87 file changed during borg create (inspired by #6657)
88+ more create related

Comments welcome!

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Jun 6, 2022

For errors (fatal stuff that leads to immediate termination) we can deal rather easily with them: raise an Exception, let the exit handler deal with the exception.

For warnings (stuff borg notices while processing, but does not immediately terminate), we could encounter multiple different warnings. How to deal with that? Collect them in a list? If len(warnings) > 1, just give rc 1, otherwise the more specific code? Same for informations.

@m3nu
Copy link
Contributor

m3nu commented Jun 6, 2022

No, rc 0 1 2 need to stay as is.

Thanks for the heads-up! We mostly use 0, 1 and 2 at the moment, but the new ones are for sure interesting to provide better feedback in the UI. Tracking adjustments here: borgbase/vorta#1349

@sophie-h
Copy link
Contributor

sophie-h commented Jun 8, 2022

At first glance, this seems redundant to message ids. Pika is already making use of them. Therefore I don't see the use case yet.

(I would like to add msg ids to a bunch of existing warnings though 😆)

@fantasya-pbem
Copy link
Contributor

fantasya-pbem commented Jun 9, 2022

Use case is for the shell people that don't want to parse JSON. Thank you for the message ids link, I wasn't aware of that yet. We could associate some of those IDs to return codes.

@fantasya-pbem
Copy link
Contributor

fantasya-pbem commented Jun 14, 2022

New errors and warnings tables, return codes and message IDs associated (best guess).

Code Message ID Error
3+ CLI/patterns
Unknown CLI option
Pattern syntax error
8+ Repository
Repository.AlreadyExists Repository {} already exists.
Repository.CheckNeeded Inconsistency detected. Please run “borg check {}”.
Repository.DoesNotExist Repository {} does not exist.
Repository.InsufficientFreeSpaceError Insufficient free space to complete transaction (required: {}, available: {}).
Repository.InvalidRepository {} is not a valid repository. Check repo config.
Repository.AtticRepository Attic repository detected. Please run “borg upgrade {}”.
Repository.ObjectNotFound Object with key {} not found in repository {}.
Repository quota reached
Backup disk full
Missing chunk
Segment not found
Hash not found in index
20+ Archive
Archive.AlreadyExists Archive {} already exists
Archive.DoesNotExist Archive {} does not exist
Archive.IncompatibleFilesystemEncodingError Failed to encode filename “{}” into file system encoding “{}”. Consider configuring the LANG environment variable.
borg create failed, checkpoint archive left
25+ Key/Security
KeyfileInvalidError Invalid key file for repository {} found in {}.
KeyfileMismatchError Mismatch between repository {} and key file {}.
KeyfileNotFoundError No key file for repository {} found in {}.
PassphraseWrong passphrase supplied in BORG_PASSPHRASE is incorrect
PasswordRetriesExceeded exceeded the maximum password retries
RepoKeyNotFoundError No key entry found in the config of repository {}.
UnsupportedManifestError Unsupported manifest envelope. A newer version is required to access this repository.
UnsupportedPayloadError Unsupported payload type {}. A newer version is required to access this repository.
NotABorgKeyFile This file is not a borg key backup, aborting.
RepoIdMismatch This key backup seems to be for a different backup repository, aborting.
UnencryptedRepo Keymanagement not available for unencrypted repositories.
UnknownKeyType Keytype {0} is unknown.
40+ Locking
LockError Failed to acquire the lock {}. (solves #5158)
LockErrorT Failed to acquire the lock {}. (solves #5158)
45+ Caching
Cache.CacheInitAbortedError Cache initialization aborted
Cache.EncryptionMethodMismatch Repository encryption method changed since last access, refusing to continue
Cache.RepositoryAccessAborted Repository access aborted
Cache.RepositoryIDNotUnique Cache is newer than repository - do you have multiple, independently updated repos with same ID?
Cache.RepositoryReplay Cache is newer than repository - this is either an attack or unsafe (multiple repos with same ID)
50+ Check
borg check issue could not be fixed
55+ Misc
Buffer.MemoryLimitExceeded Requested buffer size {} is above the limit of {}.
ExtensionModuleError The Borg binary extension modules do not seem to be properly installed
IntegrityError Data integrity error: {}
NoManifestError Repository has no manifest.
PlaceholderError Formatting Error: “{}”.format({}): {}({})
Decryption failure
Decompression failure
64+ Network/Remote
ConnectionClosed Connection closed by remote host (solves #1424)
InvalidRPCMethod RPC method {} is not valid
PathNotAllowed Repository path not allowed
RemoteRepository.RPCServerOutdated Borg server is too old for {}. Required version {}
UnexpectedRPCDataFormatFromClient Borg {}: Got unexpected RPC data format from client.
UnexpectedRPCDataFormatFromServer Got unexpected RPC data format from server: {}
80+ More specific SSH errors, needs investigation (taken from #3939)
Code Message ID Warning
100+ CLI/patterns
Cannot combine cli options
103+ Repository
Repository.ObjectNotFound Object with key {} not found in repository {}.
106+ Locking
Lock removed
110+ Caching
Cache rebuilt occurred / cache out of sync
115+ Check
borg check --max-duration stopped
borg check --repair fixed issues (taken from #3446)
Repository check found issues (only useful if repo check issue prevents archive check)
Archive check found issues (only useful if archive check issue prevents verify)
Verify check found issues
borg create could not find root path (inspired by #6708)
borg create could not read files
File changed during borg create (inspired by #6657)

@ThomasWaldmann
Copy link
Member Author

Is there some systematic in the Code (guess it means the return code, rc) column (except that >= 64 is a warning)?

Where do the numbers seen there come from?

@fantasya-pbem
Copy link
Contributor

Not yet. Now, after having integrated the message IDs, the codes have to be distributed more evenly. But first I want to check these IDs - are they all really errors that abort Borg or are some of them progress messages? Second, after we have found that the suggested groups make sense and we have defined all possible (and recognizable) error / warning conditions, we can distribute the return codes appropriately.

@fantasya-pbem
Copy link
Contributor

I found that most message IDs are just raised errors, but at least Repository.ObjectNotFound is used in a warning context, too (like in archive.delete() where this error is catched and just ignored). After reordering the groups there seem to exist much more errors than warnings, so I suggest to have the warnings starting with return code 100 (or at least 80) instead of 64.

@sophie-h
Copy link
Contributor

It's not unusual to have different warnings like "File changed during borg create" or "File not found" during one create. I'm not sure how return codes would handle that.

Overall, I'm a bit scared of the shell script that will handle all of these errors. Since the number of exit codes is quite limited I don't really see that this concept scales. I would suggest writing down the use case more specifically to see how this could actually work.

@ThomasWaldmann ThomasWaldmann added this to the 2.0.0b1 milestone Jul 16, 2022
@ThomasWaldmann ThomasWaldmann modified the milestones: 2.0.0b1, 2.0.0b2 Jul 27, 2022
@ThomasWaldmann
Copy link
Member Author

@fantasya-pbem are you still working on this?

@fantasya-pbem
Copy link
Contributor

No, and nobody besides me seems to have interest, too. My "work" has been collecting possible error codes,the real work had to be done by developers of Borg. Feel free to close this issue.

@ThomasWaldmann
Copy link
Member Author

@fantasya-pbem Well, my impression was that there definitely is some interest (see also misc. other related tickets) and I was primarily waiting for you to finish your work, see #6756 (comment) .

I guess I or someone else could then implement this, but having a good plan first is essential. In case we get some funds on the bountysource org account, there could be even one or two bounties.

Personally I will be rather busy with #6987 for a while, so help here is appreciated!

@fantasya-pbem
Copy link
Contributor

fantasya-pbem commented Aug 23, 2022

See #6756 (comment). Also: It seems that warning return codes are less useful as there is likely more than one condition to be represented. Error codes make more sense here I guess, maybe we could concentrate on errors first. Then, the next task would be to check the code where these errors can occur: Is there an exception thrown / is Borg aborted? If so, the respective error code could be returned directly.

@ThomasWaldmann
Copy link
Member Author

ERRORs are rather simple, WARNINGs more difficult / unclear yet how to best deal with.

@real-yfprojects
Copy link

ERRORs are rather simple

I think you have to walk through the relevant parts of the code base and specify the return code for a the different problems resulting in an error. Or is a their a way to easily map existing exceptions and message ids to return codes in one place?

@ThomasWaldmann
Copy link
Member Author

@real-yfprojects yes. it's not necessarily an exception, can be also just an if+ return.

@mai-schlau
Copy link

Are there plans to have a own RC for the borg create message "file changed while we backed it up", which comes now up with RC1?

@ThomasWaldmann
Copy link
Member Author

@mai-schlau that is a warning (not: error) message and considering there might be multiple and different warnings (plus potentially even some real error), this is a more difficult case than the "real error" return codes and IIRC we did not discuss yet how to handle that.

@mai-schlau
Copy link

As we have multiple systems, which produce this "file changed while we backed it up" and which still have to be backuped it would be very helpful if these messages would produce a different RC than 1.
So it would be easier and more handy to distinguish this occurrence from others like 'no such path' which in our environment requests action.

But maybe, and this is only a simple suggestion, something like RC 1-1 for the "changed while" and a simple 1 for the rest would be an approach. Further RC1s like 'no such path' could also come up with RC 1-2 and maybe the list could be filled till 1-9.

Especially the RC1-1 for the "file changed while we backed it up", would make the messages and our monitoring a lot clearer ...

@borgbackup borgbackup deleted a comment from real-yfprojects Nov 9, 2023
@borgbackup borgbackup deleted a comment from real-yfprojects Nov 9, 2023
@borgbackup borgbackup deleted a comment from i1sm3ky Nov 9, 2023
@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Nov 9, 2023

idea about how to deal with warnings

As there can be multiple warnings in a borg run, it is difficult to map all this information to a few bits in a 8bit return code.

But what we could do is:

  • if there is an error, set error to its return code (> 0).
  • collect all warnings in a list.
  • to start simple, the elements would be numerical return codes.
  • later, if needed, the elements could also be (rc, warning message) tuples.
  • compute borg process rc like this:
def compute_rc(error, warnings):
    if error > 0:
        # there was a fatal error, return its error code
        return error
    if not warnings:
        # great: no error and also no warnings
        return 0
    rcs = set(warnings)
    if len(rcs) == 1:
        # easy: there was only one kind of warning, so we can be specific
        return rcs[0]
    # there were different kinds of warnings
    return 1  # generic warning rc, user has to look into the logs

We could think about whether it makes sense to have the same rc for different kinds of similar warnings, so that the specific warning rc case in that computation is used more frequently than the fallback to rc 1 ("find it out yourself!")

@ThomasWaldmann ThomasWaldmann self-assigned this Nov 9, 2023
@borgbackup borgbackup deleted a comment from real-yfprojects Nov 9, 2023
@borgbackup borgbackup deleted a comment from real-yfprojects Nov 9, 2023
@borgbackup borgbackup deleted a comment from i1sm3ky Nov 9, 2023
@sophie-h
Copy link
Contributor

sophie-h commented Nov 9, 2023

Couldn't see this being mentioned before: If any exit codes are added, I would strongly advice to not use exit codes that might appear independent of borg. Especially 128+n for signals like SEGFAULT.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Nov 9, 2023

@sophie-h sure, only 3..127 will be used (and 0, 1, 2 when nothing more specific is known).

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Nov 15, 2023

PR #7928 is not quite finished, but guess some early review / feedback would be helpful!

I did a PR against 1.2-maint, but guess that needs some good review and testing before we can decide whether it can go into 1.2.x releases - the change got a bit bigger than I thought...

Update:

I suspect that this has a slight potential to either break existing scripts or have some new bugs in the way errors and warnings are dealt with internally in borg - so maybe this is not suitable for a 1.2.x patch release which is expected to be rather stable and boring.

1.3.0* has some tags in the git repo, but that was only an early attempt of what later became borg 2.0.0*, so guess we better ignore/skip that version number to not confuse users or developers.

Thus, borg 1.4.0 could be the next near-future and rather stable release (borg2 will still take quite a while) and we could have some alpha / beta / rc phases first before releasing that. The increment in the minor number might also motivate more people to look into the changelog and maybe adapt their scripts for the new return codes. We could even use the new return codes by default, not sure.

What do you think about that?

@ThomasWaldmann
Copy link
Member Author

Related: #7080 (msgid for file changed warning)

@ThomasWaldmann
Copy link
Member Author

Guess everybody was too busy to comment, so I just went ahead and put this into first borg 1.4.0 milestone.

@ThomasWaldmann
Copy link
Member Author

borg 1.4 fixed by PR #7976 - this definitely will need practical testing!

@ThomasWaldmann ThomasWaldmann modified the milestones: 1.4.0a1, 2.0.0rc1 Jan 1, 2024
@ThomasWaldmann
Copy link
Member Author

borg2 / master branch now got all the changes forward-ported, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants