`ObjectService.Search` server response is always `0 OK` even on partial success #2721

cthulhu-rider · 2024-01-17T11:34:37Z

being executed over multi-node container, object SEARCH op can have partial success by design. Current server behavior could be buggy

Steps to reproduce

having NeoFS with at least 2 online storage nodes (N1 and N2), i've done following steps:

created container C with REP 1 CBF 2 storage policy requiring to select any 2 storage nodes and store 1 replica of any object on them
created and saved 2 objects O1->N1, O2->N2
searched for the C objects
shutdown N2
searched again

Expected Behavior

$ neofs-cli -c neofs_cli.yaml object search --cid C --timeout 1m
Enter password > 
Found 2 objects.
O1
O2
$ echo $?
0

# node stops

$ neofs-cli -c neofs_cli.yaml object search --cid C --timeout 1m
Enter password > 
Found 1 objects.
O1
INCOMPLETE: number of unavailable container nodes = 1
$ echo $?
ERR_CODE

Current Behavior

$ neofs-cli -c neofs_cli.yaml object search --cid C --timeout 1m
Enter password > 
Found 2 objects.
O1
O2
$ echo $?
0

# node stops

$ neofs-cli -c neofs_cli.yaml object search --cid C --timeout 1m
Enter password > 
Found 1 objects.
O1
$ echo $?
0

Possible Solution

respond with specific status non-zero code and number of unavailable container nodes. Respond with 0 OK only when all container nodes responded

Context

i dive into SEARCH server within #2692, but dont think this really matters

Regression

No

The text was updated successfully, but these errors were encountered:

roman-khimov · 2024-01-17T13:02:36Z

We can say that the current SEARCH is best-effort. Even if you get this additional data, there is not a lot you can do. Especially if you're to consider the case of N2 leaving the network map after some period of time.

cthulhu-rider · 2024-01-17T13:27:11Z

Even if you get this additional data, there is not a lot you can do

Best-effort - yes it is, but client needs to understand how it ended. Being a client, i wanna distinguish complete results from possibly incomplete by any reason. That's what i expect from the responsible system. What to do with this - personal matter of each client

if u mean dont respond how much nodes are unavailable - yeah it's more like sugar for now. Being in a free text message, no application will desire to process its content. At the same time, it can be very useful in test/debug setups. Depends

N2 leaving the network map after some period of time

this is another plane of the system. As the network map changes, the execution of the storage policy changes. This topic discusses the state transfer at the op exec time

carpawell · 2024-01-17T19:38:11Z

i wanna distinguish complete results from possibly incomplete by any reason

As a node, how do you distinguish disk/shard failure? Temporary unstable state caused by the object migrations after another node goes off/online? To me, any GET/PUT operation should fixed then too (and I do not agree with it).

cthulhu-rider · 2024-01-18T06:09:31Z

any GET/PUT operation should fixed then too

@carpawell pls clarify what fix r u talking about? GET/PUT cannot fin with partial success

lets focus on proposed expected behavior of SEARCH first, then we'll develop ideas further. Search is currently the only best-effort op in the protocol

carpawell · 2024-01-22T08:23:05Z

@carpawell pls clarify what fix r u talking about? GET/PUT cannot fin with partial success

If you receive ObjectNotFound from GET request, do you think that an object is completely missing? Or do you expect that some nodes may be down and an object may be available later? No info about the provided case in the current API. PUT's IncompleteObjectPut is also undetailed, REP 0 is also an incomplete object put.

Search is currently the only best-effort op in the protocol

Basically, I mean that our protocol is best-effort widewise, I cannot agree with you.

cthulhu-rider · 2024-01-23T16:35:28Z

@carpawell ur talking about clatification of failure responses while this issue is about partially succeeded ones. Neither PUT nor GET can end with partial success now. Different topics to me

but since you asked

If you receive ObjectNotFound from GET request, do you think that an object is completely missing? Or do you expect that some nodes may be down and an object may be available later?

PUT and GET responsiveness could also be improved

Basically, I mean that our protocol is best-effort widewise

i wouldn't generalize like that. At the moment, only SEARCH is such

carpawell · 2024-01-25T13:54:48Z

while this issue is about partially succeeded ones

GET and PUT can also be partially successful (every node is ok about ACL rules, all they tried to fetch an object but 404, does it mean the object does not exist? can you ensure this?)

At the moment, only SEARCH is such

So that is the point where we disagree. Why do you think it is the only one? SEARCH is an extremely generalized GET to me. If a node is up it answers you, if a node is down, it wont. That is why we have REPs and that is why we are decentralized. User can minimize data loss with replicas. How status can help you?

cthulhu-rider · 2024-01-25T14:33:38Z

GET and PUT can also be partially successful

gotcha, i should've been clarify that under "partial success" i mean details of the scenario described in this issue. Lets stick to "best-effort" term. I agree that PUT and GET can reach partial success (2/3 replicas saved or header+50% payload got), but they never finish with successful status like described SEARCH does. So, at the end of the day, only SEARCH is a best-effort op to me

How status can help you?

status is a key point of this issue. I outlined this in expected and current behavior sections. PUT and GET dont (and never will) behave like this. So we need to resolve SEARCH 1st, then we can think about other ops responsiveness

carpawell · 2024-01-25T15:37:53Z

status is a key point of this issue. I outlined this in expected and current behavior sections.

I understood.

So we need to resolve SEARCH 1st

Need for what?

cthulhu-rider · 2024-01-25T16:42:49Z

Need for what?

for this issue to be resolved

carpawell · 2024-01-26T18:26:09Z

for this issue to be resolved

And that is what i suggest to discuss. Is it an issue? And how a user can use the suggested new info?

AnnaShaleva · 2024-10-24T10:08:55Z

status is a key point of this issue. I outlined this in expected and current behavior sections. PUT and GET dont (and never will) behave like this. So we need to resolve SEARCH 1st, then we can think about other ops responsiveness

Vote up at least for including additional information into SEARCH result about if it's complete or not. NeoGo, as a user of SEARCH, needs to distinguish outcomes of object search described in this issue: either result include all matching objects or some objects may be missing. It's required by NeoGo BlockFetcher/BlockUploader services, ref. nspcc-dev/neo-go#3645, and it's critical for correct application functioning.

Regarding the format of this feature: it may be a special response status-code or just some special message, it's actually not important for NeoGo. We're fine with any source of knowledge that request was partially-successful or completely-successful.

GET and PUT can also be partially successful

Here are some notes from the real user (NeoGo) experience regarding GET: our usage pattern is to SEARCH for the set of objects matching some criteria, then GET every object. If GET returns an error, then we just try several times more because it's clear that this object really exists. So GET is less critical wrt this issue. But of course, it would be beneficial for the client to know that GET failed not because of the fact that the object does not exist.

roman-khimov · 2024-10-24T11:01:01Z

"Incomplete success" can be somewhat helpful in case of temporary problems (like in #2979). The original description here is more about permanent ones which just can't be solved (no number of retries would help). And notice that if some node is to go offline you'd get a "complete success" from the other existing one.

AnnaShaleva · 2024-10-24T11:12:29Z

The original description here is more about permanent ones which just can't be solved (no number of retries would help)

Even with that, BlockUploader may distinguish this result and stop its operation if such problem is located.

And notice that if some node is to go offline you'd get a "complete success" from the other existing one.

But this one is more critical. Do I understand correctly that according to NeoFS design there's completely no way to learn that some more matching object exist if the node that stores them is offline?

roman-khimov · 2024-10-24T12:13:37Z

But this one is more critical. Do I understand correctly that according to NeoFS design there's completely no way to learn that some more matching object exist if the node that stores them is offline?

If all nodes storing some object are offline, it's gone. And it's hard to blame NeoFS for this. Got 1 replica and node is down? Sorry, nothing can help. Got three replicas and three nodes down? Well... Synchronized meta solves a part of the problem (you'd know that the object does in fact exist), but it can't resurrect a lost object.

cthulhu-rider added the bug Something isn't working label Jan 17, 2024

roman-khimov added U4 Nothing urgent S2 Regular significance I3 Minimal impact labels Jan 17, 2024

AnnaShaleva added U2 Seriously planned and removed U4 Nothing urgent labels Oct 24, 2024

AnnaShaleva mentioned this issue Oct 24, 2024

Adopt possibly incomplete NeoFS SEARCH results in NeoFSBlockFetcher and upload-bin CLI command nspcc-dev/neo-go#3645

Open

roman-khimov mentioned this issue Oct 24, 2024

object search returns fewer objects than exist #2979

Closed

roman-khimov modified the milestones: v0.44.0, v0.45.0 Oct 29, 2024

carpawell mentioned this issue Oct 30, 2024

SEARCH improvements #2990

Open

AliceInHunterland mentioned this issue Nov 5, 2024

upload-bin fails to upload index files nspcc-dev/neo-go#3658

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ObjectService.Search` server response is always `0 OK` even on partial success #2721

`ObjectService.Search` server response is always `0 OK` even on partial success #2721

cthulhu-rider commented Jan 17, 2024

roman-khimov commented Jan 17, 2024

cthulhu-rider commented Jan 17, 2024

carpawell commented Jan 17, 2024

cthulhu-rider commented Jan 18, 2024

carpawell commented Jan 22, 2024

cthulhu-rider commented Jan 23, 2024

carpawell commented Jan 25, 2024 •

edited

Loading

cthulhu-rider commented Jan 25, 2024

carpawell commented Jan 25, 2024 •

edited

Loading

cthulhu-rider commented Jan 25, 2024

carpawell commented Jan 26, 2024 •

edited

Loading

AnnaShaleva commented Oct 24, 2024

roman-khimov commented Oct 24, 2024

AnnaShaleva commented Oct 24, 2024 •

edited

Loading

roman-khimov commented Oct 24, 2024

ObjectService.Search server response is always 0 OK even on partial success #2721

ObjectService.Search server response is always 0 OK even on partial success #2721

Comments

cthulhu-rider commented Jan 17, 2024

Steps to reproduce

Expected Behavior

Current Behavior

Possible Solution

Context

Regression

roman-khimov commented Jan 17, 2024

cthulhu-rider commented Jan 17, 2024

carpawell commented Jan 17, 2024

cthulhu-rider commented Jan 18, 2024

carpawell commented Jan 22, 2024

cthulhu-rider commented Jan 23, 2024

carpawell commented Jan 25, 2024 • edited Loading

cthulhu-rider commented Jan 25, 2024

carpawell commented Jan 25, 2024 • edited Loading

cthulhu-rider commented Jan 25, 2024

carpawell commented Jan 26, 2024 • edited Loading

AnnaShaleva commented Oct 24, 2024

roman-khimov commented Oct 24, 2024

AnnaShaleva commented Oct 24, 2024 • edited Loading

roman-khimov commented Oct 24, 2024

`ObjectService.Search` server response is always `0 OK` even on partial success #2721

`ObjectService.Search` server response is always `0 OK` even on partial success #2721

carpawell commented Jan 25, 2024 •

edited

Loading

carpawell commented Jan 25, 2024 •

edited

Loading

carpawell commented Jan 26, 2024 •

edited

Loading

AnnaShaleva commented Oct 24, 2024 •

edited

Loading