Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObjectService.Search server response is always 0 OK even on partial success #2721

Open
cthulhu-rider opened this issue Jan 17, 2024 · 15 comments
Labels
bug Something isn't working I3 Minimal impact S2 Regular significance U2 Seriously planned
Milestone

Comments

@cthulhu-rider
Copy link
Contributor

being executed over multi-node container, object SEARCH op can have partial success by design. Current server behavior could be buggy

Steps to reproduce

having NeoFS with at least 2 online storage nodes (N1 and N2), i've done following steps:

  • created container C with REP 1 CBF 2 storage policy requiring to select any 2 storage nodes and store 1 replica of any object on them
  • created and saved 2 objects O1->N1, O2->N2
  • searched for the C objects
  • shutdown N2
  • searched again

Expected Behavior

$ neofs-cli -c neofs_cli.yaml object search --cid C --timeout 1m
Enter password > 
Found 2 objects.
O1
O2
$ echo $?
0

# node stops

$ neofs-cli -c neofs_cli.yaml object search --cid C --timeout 1m
Enter password > 
Found 1 objects.
O1
INCOMPLETE: number of unavailable container nodes = 1
$ echo $?
ERR_CODE

Current Behavior

$ neofs-cli -c neofs_cli.yaml object search --cid C --timeout 1m
Enter password > 
Found 2 objects.
O1
O2
$ echo $?
0

# node stops

$ neofs-cli -c neofs_cli.yaml object search --cid C --timeout 1m
Enter password > 
Found 1 objects.
O1
$ echo $?
0

Possible Solution

respond with specific status non-zero code and number of unavailable container nodes. Respond with 0 OK only when all container nodes responded

Context

i dive into SEARCH server within #2692, but dont think this really matters

Regression

No

@cthulhu-rider cthulhu-rider added the bug Something isn't working label Jan 17, 2024
@roman-khimov roman-khimov added U4 Nothing urgent S2 Regular significance I3 Minimal impact labels Jan 17, 2024
@roman-khimov
Copy link
Member

We can say that the current SEARCH is best-effort. Even if you get this additional data, there is not a lot you can do. Especially if you're to consider the case of N2 leaving the network map after some period of time.

@cthulhu-rider
Copy link
Contributor Author

Even if you get this additional data, there is not a lot you can do

Best-effort - yes it is, but client needs to understand how it ended. Being a client, i wanna distinguish complete results from possibly incomplete by any reason. That's what i expect from the responsible system. What to do with this - personal matter of each client

if u mean dont respond how much nodes are unavailable - yeah it's more like sugar for now. Being in a free text message, no application will desire to process its content. At the same time, it can be very useful in test/debug setups. Depends

N2 leaving the network map after some period of time

this is another plane of the system. As the network map changes, the execution of the storage policy changes. This topic discusses the state transfer at the op exec time

@carpawell
Copy link
Member

i wanna distinguish complete results from possibly incomplete by any reason

As a node, how do you distinguish disk/shard failure? Temporary unstable state caused by the object migrations after another node goes off/online? To me, any GET/PUT operation should fixed then too (and I do not agree with it).

@cthulhu-rider
Copy link
Contributor Author

any GET/PUT operation should fixed then too

@carpawell pls clarify what fix r u talking about? GET/PUT cannot fin with partial success

lets focus on proposed expected behavior of SEARCH first, then we'll develop ideas further. Search is currently the only best-effort op in the protocol

@carpawell
Copy link
Member

@carpawell pls clarify what fix r u talking about? GET/PUT cannot fin with partial success

If you receive ObjectNotFound from GET request, do you think that an object is completely missing? Or do you expect that some nodes may be down and an object may be available later? No info about the provided case in the current API. PUT's IncompleteObjectPut is also undetailed, REP 0 is also an incomplete object put.

Search is currently the only best-effort op in the protocol

Basically, I mean that our protocol is best-effort widewise, I cannot agree with you.

@cthulhu-rider
Copy link
Contributor Author

@carpawell ur talking about clatification of failure responses while this issue is about partially succeeded ones. Neither PUT nor GET can end with partial success now. Different topics to me

but since you asked

If you receive ObjectNotFound from GET request, do you think that an object is completely missing? Or do you expect that some nodes may be down and an object may be available later?

PUT and GET responsiveness could also be improved

Basically, I mean that our protocol is best-effort widewise

i wouldn't generalize like that. At the moment, only SEARCH is such

@carpawell
Copy link
Member

carpawell commented Jan 25, 2024

while this issue is about partially succeeded ones

GET and PUT can also be partially successful (every node is ok about ACL rules, all they tried to fetch an object but 404, does it mean the object does not exist? can you ensure this?)

At the moment, only SEARCH is such

So that is the point where we disagree. Why do you think it is the only one? SEARCH is an extremely generalized GET to me. If a node is up it answers you, if a node is down, it wont. That is why we have REPs and that is why we are decentralized. User can minimize data loss with replicas. How status can help you?

@cthulhu-rider
Copy link
Contributor Author

GET and PUT can also be partially successful

gotcha, i should've been clarify that under "partial success" i mean details of the scenario described in this issue. Lets stick to "best-effort" term. I agree that PUT and GET can reach partial success (2/3 replicas saved or header+50% payload got), but they never finish with successful status like described SEARCH does. So, at the end of the day, only SEARCH is a best-effort op to me

How status can help you?

status is a key point of this issue. I outlined this in expected and current behavior sections. PUT and GET dont (and never will) behave like this. So we need to resolve SEARCH 1st, then we can think about other ops responsiveness

@carpawell
Copy link
Member

carpawell commented Jan 25, 2024

status is a key point of this issue. I outlined this in expected and current behavior sections.

I understood.

So we need to resolve SEARCH 1st

Need for what?

@cthulhu-rider
Copy link
Contributor Author

Need for what?

for this issue to be resolved

@carpawell
Copy link
Member

carpawell commented Jan 26, 2024

for this issue to be resolved

And that is what i suggest to discuss. Is it an issue? And how a user can use the suggested new info?

@AnnaShaleva AnnaShaleva added U2 Seriously planned and removed U4 Nothing urgent labels Oct 24, 2024
@AnnaShaleva
Copy link
Member

status is a key point of this issue. I outlined this in expected and current behavior sections. PUT and GET dont (and never will) behave like this. So we need to resolve SEARCH 1st, then we can think about other ops responsiveness

Vote up at least for including additional information into SEARCH result about if it's complete or not. NeoGo, as a user of SEARCH, needs to distinguish outcomes of object search described in this issue: either result include all matching objects or some objects may be missing. It's required by NeoGo BlockFetcher/BlockUploader services, ref. nspcc-dev/neo-go#3645, and it's critical for correct application functioning.

Regarding the format of this feature: it may be a special response status-code or just some special message, it's actually not important for NeoGo. We're fine with any source of knowledge that request was partially-successful or completely-successful.

GET and PUT can also be partially successful

Here are some notes from the real user (NeoGo) experience regarding GET: our usage pattern is to SEARCH for the set of objects matching some criteria, then GET every object. If GET returns an error, then we just try several times more because it's clear that this object really exists. So GET is less critical wrt this issue. But of course, it would be beneficial for the client to know that GET failed not because of the fact that the object does not exist.

@roman-khimov
Copy link
Member

"Incomplete success" can be somewhat helpful in case of temporary problems (like in #2979). The original description here is more about permanent ones which just can't be solved (no number of retries would help). And notice that if some node is to go offline you'd get a "complete success" from the other existing one.

@AnnaShaleva
Copy link
Member

AnnaShaleva commented Oct 24, 2024

The original description here is more about permanent ones which just can't be solved (no number of retries would help)

Even with that, BlockUploader may distinguish this result and stop its operation if such problem is located.

And notice that if some node is to go offline you'd get a "complete success" from the other existing one.

But this one is more critical. Do I understand correctly that according to NeoFS design there's completely no way to learn that some more matching object exist if the node that stores them is offline?

@roman-khimov
Copy link
Member

But this one is more critical. Do I understand correctly that according to NeoFS design there's completely no way to learn that some more matching object exist if the node that stores them is offline?

If all nodes storing some object are offline, it's gone. And it's hard to blame NeoFS for this. Got 1 replica and node is down? Sorry, nothing can help. Got three replicas and three nodes down? Well... Synchronized meta solves a part of the problem (you'd know that the object does in fact exist), but it can't resurrect a lost object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working I3 Minimal impact S2 Regular significance U2 Seriously planned
Projects
None yet
Development

No branches or pull requests

4 participants