Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design & Implement Contract Deletion (OP_CONTRACT_DELETE?) #1927

Open
taoeffect opened this issue Apr 15, 2024 · 20 comments · May be fixed by #2495
Open

Design & Implement Contract Deletion (OP_CONTRACT_DELETE?) #1927

taoeffect opened this issue Apr 15, 2024 · 20 comments · May be fixed by #2495
Assignees
Labels
App:Backend App:Frontend Kind:Core Anything that changes or affects the fundamental core data structures & design of the application. Kind:Enhancement Improvements, new features, performance upgrades, etc. Level:Advanced Priority:High

Comments

@taoeffect
Copy link
Member

taoeffect commented Apr 15, 2024

Problem

Currently we have no way to delete a contract once it's been created.

This means unnecessary space will be used on the server.

Solution

Implement OP_CONTRACT_DELETE and use it when, for example, deleting chatrooms.

This opcode will go through and delete every message in a contract and all state (except attachments) associated with it.

Clients trying to sync the contract again should get 410 Gone errors and assume the contract has been deleted. Clients syncing contracts that never existed should return 404 not found.

Note: in the case of deleted chatrooms, the attachments in that chatroom must be handled separately. Deleting a contract doesn't delete the attachments inside of it, so those must be deleted separately.

EDIT: note this comment that mentions that attachments could be deleted as well.

@taoeffect taoeffect added Kind:Enhancement Improvements, new features, performance upgrades, etc. App:Frontend App:Backend Level:Advanced Kind:Core Anything that changes or affects the fundamental core data structures & design of the application. labels Apr 15, 2024
@taoeffect taoeffect changed the title Design & Implement OP_CONTRACT_DELETE Design & Implement Contract Deletion (OP_CONTRACT_DELETE?) Dec 11, 2024
@corrideat
Copy link
Member

Looking through this, I think that we could do this very similarly to how we do it with files. I also don't think that we should use OP_CONTRACT_DELETE because deletion is, IMHO, primarily a server operation, and, because of the nature of Chelonia, an action would likely be better suited for marking a contract as 'unusable' than an OPCODE would be. (Another issue with OP_CONTRACT_DELETE is that the SAK (which is for server auth) wouldn't come much into play.)

To illustrate this, imagine a contract (say, a group) with 3 members, Alice, Bob and Carol. Alice and Bob are online, and Carol is offline. Now, Alice issues OP_CONTRACT_DELETE. Bob receives this over the web socket and proceeds to delete the contract. Carol, when coming online, will never see OP_CONTRACT_DELETE (but might see 404 or 410 instead, suggesting that OP_CONTRACT_DELETE) was issued. The result is that, for neither Bob nor Carol did OP_CONTRACT_DELETE provide any 'advantage' over simply receiving a notification that the contract had been deleted.

From the server's perspective, OP_CONTRACT_DELETE would similarly require processing an opcode, adding it to the DB and broadcasting it, only for the contract to be immediately deleted.

So, I think we could just use the router with a shelter auth header to delete a contract and not bother with a 'virtual' opcode like OP_CONTRACT_DELETE.

Clients trying to sync the contract again should get 410 Gone errors and assume the contract has been deleted. Clients syncing contracts that never existed should return 404 not found.

This is possible, but I'm not sure that we can guarantee this behaviour. For many reasons could a deleted contract result in 404 (loss of state, new server, federation, legal reasons, moral reasons, etc).

I think that we challenging part of this task is deciding the correct client behaviour on notification of deletion / 404 / 410. While the contract is probably no longer usable, users might want to keep the data. Consider here too a malicious (or not) server that returns 410 for contracts that shouldn't be deleted, or contracts that are 'deleted' for non-payment.

@taoeffect
Copy link
Member Author

taoeffect commented Dec 12, 2024

To illustrate this, imagine a contract (say, a group) with 3 members, Alice, Bob and Carol. Alice and Bob are online, and Carol is offline. Now, Alice issues OP_CONTRACT_DELETE. Bob receives this over the web socket and proceeds to delete the contract. Carol, when coming online, will never see OP_CONTRACT_DELETE (but might see 404 or 410 instead, suggesting that OP_CONTRACT_DELETE) was issued. The result is that, for neither Bob nor Carol did OP_CONTRACT_DELETE provide any 'advantage' over simply receiving a notification that the contract had been deleted.

I do see this theory, however, it's only true if we decide to implement it that way.

The thing is, for some apps, it might very well be useful to know whether or not a contract was deleted.

After all, there's a difference between asking, "Hey, send contract XYZ", and the server saying, "That contract doesn't exist — AND NEVER EXISTED", and "that contract doesn't exist — ANYMORE (but it did at one point)".

Some apps might need to be able to distinguish between these two cases for whatever reason.

If we implement it as a separate API, then we won't be able to distinguish between those two cases, and we will artificially limit the types of apps that can be created with Chelonia because of it.

What we could do instead is use OP_CONTRACT_DELETE, and upon receiving it the server would do two things:

  1. It would delete all messages before it (and possibly any related attachments)
  2. It would keep that last message

Then whenever someone goes to sync that contract, the server will either send back that last message, or it will detect that the last message was OP_CONTRACT_DELETE and send back a 410. This will be a signal to clients to either:

  1. Delete their local copy of the contract (and perform any related cleanup, like deleting attachments, something that I think only some specific users are able to do and therefore actually do need this distinction between "404" and "410").
  2. Or if they didn't have a copy of that contract to begin with, perform any related UI actions to inform the user that the requested item no longer exists.

@corrideat
Copy link
Member

corrideat commented Dec 12, 2024

I get the distinction between 404 and 410 and its usefulness (even though I made another point about how this information might get lost), but I think maybe something wasn't clear.

I didn't say that the alternative to OP_CONTRACT_DELETE is delete all data and pretend the contract never existed. We could very well store in the server that contract ID xyz used to exist and no longer does.

What I did say is that I don't see, neither from the perspective of clients, nor from the perspective of the server, a situation where OP_CONTRACT_DELETE would be advantageous over just issuing an HTTP request that means 'delete this contract', similarly to what we do with files.

  1. It would delete all messages before it (and possibly any related attachments)
  2. It would keep that last message

I considered this, and it'd mostly be a waste of storage over simply storing a boolean flag. This is because that last message contains no useful information without the entire chain. It can be entirely fabricated, so it's unverifiable, and it's much more verbose than isGone: true.

So, to clarify my points were:

  1. I don't think we should create a new opcode for this
  2. The 404 / 410 distinction is useful (and we should support it), but may not always work reliably.
  3. Deleting data on clients can be lead to issues. This isn't a protocol issue though.

@taoeffect
Copy link
Member Author

What I did say is that I don't see, neither from the perspective of clients, nor from the perspective of the server, a situation where OP_CONTRACT_DELETE would be advantageous over just issuing an HTTP request that means 'delete this contract', similarly to what we do with files.

It's advantageous in that it's elegant. I just like how it looks. When writing out the documentation and description of the VM, we have something to pair OP_CONTRACT with. No additional API needs to be invented. When the differences are otherwise negligible, simplicity and beauty are significant considerations.

I considered this, and it'd mostly be a waste of storage over simply storing a boolean flag. This is because that last message contains no useful information without the entire chain. It can be entirely fabricated, so it's unverifiable, and it's much more verbose than isGone: true.

  • Where would the isGone: true be stored?
  • How much extra code branches would we have to introduce to handle this special data type?

Using OP_CONTRACT_DELETE:

  • Prevents creating a new API (simplicity / beauty)
  • Prevents introducing new code paths for handling a special non-SPMessage datatype
  • Adds an utterly negligible amount of overhead

@corrideat
Copy link
Member

corrideat commented Dec 12, 2024

  1. Where would the isGone: true be stored?

In chelonia/db. For example, we could replace the OP_CONTRACT with null. Another way is to delete all data and store a flag in the (server-side) contract state.

Note that with OP_CONTRACT_DELETE you still need to augment server side records to be able to return 410, at least if you actually delete data.

  1. How much extra code branches would we have to introduce to handle this special data type?

The exact number of extra branches as it'd take to implement 410 in any way, and possibly fewer.

  1. Using OP_CONTRACT_DELETE:
  1. Prevents creating a new API (simplicity / beauty)

Not sure about this. I'd say opcodes are APIs too, and also require branches to handle.

  1. Prevents introducing new code paths for handling a special non-SPMessage datatype

Sure, if you're talking about when it's received. However, see my response to the previous point. If you were referring to the isGone: true example, i.e., post facto, then, no, we still need the extra branch.

Even if we did store the OP_CONTRACT_DELETE (which we shouldn't), we need an extra branch to check for this message and return 410, or return this message. In particular, note that ${contractID}, corresponding to OP_CONTRACT will need to be updated to say that it's been deleted, and possibly point to the OP_CONTRACT_DELETE CID.

  1. Adds an utterly negligible amount of overhead

For processing, it's not too different from other approaches, true. For storage, it's utterly useless to store OP_CONTRACT_DELETE because it can't be verified.

@taoeffect
Copy link
Member Author

taoeffect commented Dec 12, 2024

For processing, it's not too different from other approaches, true. For storage, it's utterly useless to store OP_CONTRACT_DELETE because it can't be verified.

That's not true though. It can be verified by both the client and the server upon receiving it. It is signed using whatever are the latest keys, which both the server and the client have, so they can verify it if they've sync'd the contract before.

That is unlike simply storing { isGone: true }, which is only verifiable by the server, not the other clients.

So, if it's not a big deal to you, can we go with OP_CONTRACT_DELETE? It has the advantage of being verifiable client-side, and avoids the need to create a new RESTful API.

@corrideat
Copy link
Member

It can be verified by both the client and the server upon receiving it.

But it can't be verified by the client unless the client is online.

@corrideat
Copy link
Member

isn't that better than everyone being unable to detect the forgery?

I don't think so, because I don't think that the server saying the contract is deleted and being able to somehow verify this is particularly useful information, even if it can be verified.

Also, the server could unilaterally delete contracts for a variety of reasons and clients can do nothing about it, whether it can be verified or not.

I think that we're conflating two distinct but somehow related issues.

  1. It can be useful for clients to know that a contract is 'deleted' / 'disabled' and discard data or simply know that it shouldn't be updated.
  2. What we're talking about, which is exclusively a server / storage concern.

I'm not opposed to the idea of creating an opcode for 1, but by necessity it can only be verified if the contract chain itself isn't deleted. I'm somewhat opposed to the idea of using an opcode for 2, because 'verifying' at this point isn't very useful (if at all possible) since the data are gone, and it's only done because of reasons that concern the server (or the use thereof), not the clients.

If the server didn't exist at all, we'd probably have an opcode for 1 but not for 2, and maybe after some time clients would prune the entries from their cache, or they might not.

Also, the server could unilaterally delete contracts for a variety of reasons and clients can do nothing about it, whether it can be verified or not.

Are you seriously trying to argue that being able to detect whether or not the server is doing malicious things is not "particularly useful information"?

I wouldn't necessarily call deleting data malicious (depends on the circumstances (*)), and in any case it can't be reliably checked if it was 'malicious' or not (see, offline clients and the server claiming there was a key rotation; btw, the server can also make clients be 'offline'). Clients can verify already that a contract is gone, by getting a 404 or 410.

(*) For example, if the server charges for storage and storage isn't paid, I wouldn't call deleting data after a while malicious. If the server room is accidentally flooded and contracts are lost, it wouldn't be malicious. The larger point is 'what can clients do with this information' and I don't think they can do very much.

@taoeffect
Copy link
Member Author

(*) For example, if the server charges for storage and storage isn't paid, I wouldn't call deleting data after a while malicious. If the server room is accidentally flooded and contracts are lost, it wouldn't be malicious. The larger point is 'what can clients do with this information' and I don't think they can do very much.

I think this is the strongest argument against needing OP_CONTRACT_DELETE - the server deleting the contracts due to lack of payment. Since the server isn't able to issue OP_CONTRACT_DELETE, it needs a way to delete the contract without it, and also to inform clients (via 410 errors) that the contract was deleted.

@corrideat
Copy link
Member

Ok, so what about we do this:

  1. Add an opcode called OP_CONTRACT_FREEZE or OP_CONTRACT_DEACTIVATE that permanently 'locks' a contract, preventing any further updates.
  2. For deleting contracts, we use the regular shelter authorization header as we do for files. We could also make it so that a single request with both OP_CONTRACT_FREEZE and the shelter authorization header can be sent, and we disallow deleting non-frozen contracts programmatically.
  3. When a contract is deleted from the server, we keep a list of deleted contracts and return 410.

@taoeffect
Copy link
Member Author

@corrideat That sounds reasonable (with a vote in favor OP_CONTRACT_DISABLE) except I have a concern about that last part:

  1. When a contract is deleted from the server, we keep a list of deleted contracts and return 410.

Ideally we wouldn't have to maintain any separate list.

When a contract is deleted, the info should be stored using the existing "list", or rather KV mapping, of either the contractID => latestHash mapping, or the latestHash => latestEvent mapping, with the latter being basically what I wrote earlier about keeping the last event.

I feel like we have enough lists to manage and don't need to overcomplicate the server with any more, especially as this can be done without a new list.

@corrideat
Copy link
Member

Ideally we wouldn't have to maintain any separate list.

Ideally, I agree with you, but I don't think we can do this here, regardless of implementation. Let me explain why.

Currently, we store GIMessages indexed by CID, and we have a few other pieces of associated data, such as head=${contractID}, _private_hidx=${contractID}#${height} and _private_cheloniaState_${contractID} (plus a few other states for the contract state and the accounting information).

Now, let's say that a contract is deleted (at height d) and someone tries to sync it.

If syncing using the /file API, events from [0, d) will be missing, so trying to fetch the contract ID (first message) will result in 404. If syncing using the streaming API, we use chelonia/db/latestHEADinfo (head=${contractID} key, which potentially we could keep) and _private_hidx=${contractID}#${height} (which would be deleted).

Either way, implementing 410 requires at least a branch in each API endpoint that should return 410 to communicate that the contract has been deleted and that there's no data to fetch. Even if we kept the 'last' message that represents OP_CONTRACT_DISABLE (which as we established, is not guaranteed that it will exist), we do still need an extra check. In particular, we need, in very generic terms:

if (isContractDeleted(contractID)) {
  return 410
} else {
  // Existing logic
}

@corrideat
Copy link
Member

corrideat commented Dec 14, 2024

Incidentally, this raises another question that we haven't discussed --- what deletion means.

Say that I have a contract with messages with IDs A (the contract ID, OP_CONTRACT), B and C. Then the contract is deleted.

So far, it's been strongly implied that fetching A should return 410. But, what about B and C? Should fetching those messages also return 410? And if the answer is yes, what about DB keys like _private_hidx=${contractID}#${height}? Should we keep those too?

@taoeffect
Copy link
Member Author

So far, it's been strongly implied that fetching A should return 410. But, what about B and C? Should fetching those messages also return 410? And if the answer is yes, what about DB keys like _private_hidx=${contractID}#${height}? Should we keep those too?

I think the answer is probably "no" to both questions. Requesting B and C directly should return 404, and _private_hidx=${contractID}#${height} should be deleted.

@corrideat
Copy link
Member

Thanks for all of the feedback, @taoeffect!

I think I've now got enough information to start working on this, but I'll make a brief proposal first just in case there's something to adjust.

As we've discussed on Slack, we can postpone OP_CONTRACT_DISABLE/FREEZE/DELETE until a later point.

/deleteContract/{hash} endpoint

This endpoint would use the chel-shelter authentication strategy and will behave similarly to how file deletion works, and will also require that the ultimate owner is the one performing the request (i.e., the #sak of the ultimate owner must be used).

When a contract is deleted, we can store something under the head=${contractID} to indicate that a contract's been deleted. Based on our current code, I believe that storing a falsy value, such as null or false under that key would require the fewest changes in the server code to be able to return 410. All other resources related to that contract, including events after OP_CONTRACT, the server-side Chelonia state related to the contract, other contracts, files and internal indexing keys will be removed, as though the contract had never existed. The only thing that will remain will be what we store under head=${contractID} to represent a deleted contract.

In fact, because of all of the things that may need to be deleted, perhaps it's appropriate to return 202 Accepted for these requests and schedule deletion to be done later.

I'm assuming that deleted contracts can be reuploaded to the server, so what we store under head=${contractID} shouldn't prevent this from happening. I believe that, if we store a falsy value, we don't need to modify anything to make this work.

@taoeffect
Copy link
Member Author

@corrideat that sounds reasonable 👍

Some things to consider as you're implementing this:

  • How to handle file attachments? (Remove attachments when user removes chatroom #1913)
  • Making sure that if in the future we decide to go with your earlier comment, that the internal code can still be called in a DRY way to do the deletion via a different means
  • If you're going to schedule a "deletion job", consider implementing a generic jobs system and a way for clients to query the status of the job (i.e. in reference to your 202 Accepted comment)
    • Whether those jobs are stored in-memory or persisted. Please consider using/repurposing @snowteamer's chelonia.persistentActions for this

@corrideat
Copy link
Member

How to handle file attachments? (#1913)

File attachments are linked to chatrooms via the billableContractID parameter. It's therefore possible to delete files without many changes and have that work retroactively.

Making sure that if in the future we decide to go with your earlier comment, that the internal code can still be called in a DRY way to do the deletion via a different means

Using a separate task would help with this, but if you read the comment, the new opcode was not meant as a replacement for the endpoint for file deletion.

If you're going to schedule a "deletion job", consider implementing a generic jobs system and a way for clients to query the status of the job (i.e. in reference to your 202 Accepted comment)

Sure, I can look into that.

  • Whether those jobs are stored in-memory or persisted. Please consider using/repurposing @snowteamer's chelonia.persistentActions for this

Yeah, I think it could be done that way. We'd want the delete action to be persisted across server restarts.

@corrideat
Copy link
Member

corrideat commented Jan 2, 2025

I think that this could be a suitable algorithm to implement for deleting contracts:

  1. Read resource ID, recursively doing a deep-first search on _private_resources_{cid}.
  2. If contract:
    1. Read head={cid}
    2. Go through each _private_hidx={cid}#{number}
    3. Delete the content
    4. Delete _private_hidx={cid}#{number}
    5. (Delete cid)
    6. Delete _private_cheloniaState_{cid}
    7. Delete cid from _private_cheloniaState_index
    8. If contract has a namespace allocation, remove _private_rid_{name} and
      name={name}
    9. If cid in _private_billable_entities, delete cid from there
    10. If the contract has KV keys, delete them and the KV index.
  3. If file:
    1. Read manifest
    2. Delete each chunk
    3. Delete _private_deletionToken_{cid} and _private_size_{cid}
  4. Delete _private_owner_{cid}
  5. Delete resource from _private_resources_{parentCid}

Now, there are a few things that are non-optimal, which we don't need to handle right now, but that we should address at some point:

  1. We don't have a global 'resource type' identifier. This means that for deciding whether something is a file, a contract, or something else, we need to read the cid.
  2. Looking at Disable deleting account #2478, we'd ideally want account deletion to have some additional safeguards, ideally something that requires the contract password.

@corrideat
Copy link
Member

Regarding the "We don't have a global 'resource type' identifier." issue, there are two avenues for addressing this (which would also address #2115, or bring us close to that)

  1. For each resource, store a separate key including metadata such as resource type. For example, the key could be _private_rt_{cid} or _private_meta_{cid}. The advantage of this is that it wouldn't require recreating groups or contracts (but would potentially require a database migration). The disadvantage is that we'd need an extra lookup every time we need to access this information.
  2. Alternatively, the content type can be encoded in the CID itself. The advantage of this approach is that keys are self-describing and that no extra lookups are needed. The disadvantage is that potentially we'd need to recreate groups (technically, we don't, but it's easier this way than to keep backwards compatibility) and that the changes are more involved because we need to change the APIs we use to generate CIDs, and this change would affect both Group Income (mostly the Chelonia part of it) and the chel command.

@taoeffect
Copy link
Member Author

Yeah I think now's a good time to do this (2)

@taoeffect taoeffect added this to the Keys management milestone Jan 5, 2025
@corrideat corrideat linked a pull request Jan 6, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
App:Backend App:Frontend Kind:Core Anything that changes or affects the fundamental core data structures & design of the application. Kind:Enhancement Improvements, new features, performance upgrades, etc. Level:Advanced Priority:High
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants