Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to delete spent UTXOs from database #725

Conversation

vladimirfomene
Copy link
Contributor

@vladimirfomene vladimirfomene commented Aug 17, 2022

Description

We currently store spent UTXOs in the database with an is_spent field and we don't provide users with a way to delete these UTXOs. The issue here is that these could easily bloat the database or memory. This PR provides methods that can allow users to delete spent UTXOs from the database following certain criteria like size of spent utxos and number of confirmations.
This PR fixes issue #573

Notes to the reviewers

I haven't added this change to the change log and written test for it. I'm thinking we can use this PR as a starting point to discuss how best we can solve this issue. I think having an API that users can call manually is a great start towards solving this issue but it might also be necessary to implement an automatic version of this.

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature
  • I've updated CHANGELOG.md

Copy link
Contributor

@rajarshimaitra rajarshimaitra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK on we should have a utxo deleting API..

Below are few comments over the API and some internal codes..

src/database/keyvalue.rs Outdated Show resolved Hide resolved
src/database/memory.rs Outdated Show resolved Hide resolved
src/database/memory.rs Outdated Show resolved Hide resolved
src/database/mod.rs Outdated Show resolved Hide resolved
src/database/mod.rs Outdated Show resolved Hide resolved
@vladimirfomene vladimirfomene force-pushed the manually-deleting-spent-utxos branch 3 times, most recently from 19fa758 to 0fd7110 Compare August 24, 2022 12:12
Copy link
Contributor

@rajarshimaitra rajarshimaitra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went over the code, and here are some more thoughts and comments.. Sorry for being too verbose on approach.. But I think we can do some cool stuffs here..

  • It seems to me the the basic operations of getting and deleting an utxos are already part of the Database trait.. So heres a wild thought.. Why not have a default deleting utxo implementation in Database trait itself??

I am not sure if its possible, but have you tried that route?? Could be really interesting and users won't have to define it for their custom type anymore if it can be default implemented.. Will also help us concentrate the deletion logic in just one place in the whole codebase, and not have codes for each impls..

It will work iff, all the basic method calls you need to make for the selection and deletion are part of the Database trait..

I also feel the delete_spent_utxo_by_criteria() should use the delete_spent_utxo() call internally and use that function to delete selected utxo upon given criteria.. Should not need its own separate code to make delete calls.. Only needs to pass on a correct vec of utxos to delete..

The two criteria condition confirmation and threshold_size should be asserted together.. For example
if confirmation = 5, threshold_size = 10, after deleting the database should have <10 utxos, all with confirmation<5.. I am not very if thats what happening currently..

I feel this will separate the logic a lot, and reuse it where possible.. And code will be much shorter in that way.. And if we can actually do the default way, that would be the best thing to do in terms of maintainability..

  • To verify that our DelCriteria is working as expected we should also include a test for it too..

Comment on lines 424 to 425
//delete all spent utxos
return self.del_spent_utxos(None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about deleting only the excess utxos sorted by confirmations??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will deleting just the excess not mean they will have to do frequent cleanups?

src/database/memory.rs Outdated Show resolved Hide resolved
@vladimirfomene
Copy link
Contributor Author

vladimirfomene commented Sep 1, 2022

Thanks @rajarshimaitra for the great feedback! I have pushed some changes with the following:

  • Made the code more ergonomic.
  • Implemented the functionality as defaults methods on the Database trait.
  • Use del_spent_utxos in delete_spent_utxos_by_criteria.

What is left:
I have not added test because I believe a lot will still change on the current implementation. I have also not amended the code so that the two criteria can work in conjunction.

Copy link
Contributor

@rajarshimaitra rajarshimaitra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review ACK c1cae75

Some nits and comments..

/// UTXOs in the database.
fn del_spent_utxos(
&mut self,
spent_outpoints: Option<Vec<OutPoint>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better to name it something like to_delete? All utxos in this context are "spent".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajarshimaitra don't you think it is more descriptive to keep del_spent_utxos?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he means using to_delete instead of spent_outpoints

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what I meant..

Comment on lines 172 to 173
/// Delete UTXOs provided as params, if `spent_utxos` is empty it deletes all spent
/// UTXOs in the database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this can sound clearer?

Suggested change
/// Delete UTXOs provided as params, if `spent_utxos` is empty it deletes all spent
/// UTXOs in the database.
/// Delete a list of spent utxos from database. Delete all spent utxos if list is `None`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will make the change

Comment on lines 206 to 272
if let Some(threshold_size) = criteria.threshold_size {
// if the threshold size is smaller than current size, delete everything
if threshold_size < spent_utxos.len() as u64 {
//delete all spent utxos
return self.del_spent_utxos(None);
}
}

let mut filtered_outpoints: Vec<OutPoint> = vec![];

// get utxos that matches criteria and delete them
match self.get_sync_time()? {
Some(sync_time) => {
for spent_utxo in spent_utxos {
let tx_details = self.get_tx(&spent_utxo.outpoint.txid, false)?;
let height = tx_details
.and_then(|details| details.confirmation_time)
.map(|conf| conf.height);
let confirmations = criteria.confirmations;
if height.is_some()
&& confirmations.is_some()
&& ((sync_time.block_time.height - height.unwrap())
> confirmations.unwrap())
{
filtered_outpoints.push(spent_utxo.outpoint);
}
}
Ok(self.del_spent_utxos(Some(filtered_outpoints))?)
}
None => Ok(vec![]),
}
}
Copy link
Contributor

@rajarshimaitra rajarshimaitra Sep 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not much sure on the criteria part.. If the wallet dev wants to delete utxos upon some criteria they should be able to do that on their own in their code. And such criteria in practice can be very diverse. It doesn't help for us to anticipate them and provide defaults. More options at such low level database API is more confusing..

But if it has to stay, I would expect a behavior from the structure something like below..

Both the criteria are defined as options. So they are both free to be set independent of each other. So when both is set I would expect an and of them, and when one is None it should not have any effect.

So there code here should do something like this, instead of deleting per each criteria.

  • Choose all utxos to delete by confirmation criteria.
  • Apply threshold criteria on remaining utxos and get how many more to delete, if any. If some, select them by decreasing confirmation time from utxo list.
  • Compile both of the two list together.
  • Delete those utxos..

Copy link
Member

@danielabrozzoni danielabrozzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, but this definitely needs some tests, showing how to delete by confirmations and/or by threshold size, and checking that the deletions are correct. This is also useful for reviewers to check if the code is correct and/or if the API itself is easy to understand

I have not added test because I believe a lot will still change on the current implementation.

You're probably right, on the other hand we don't want to merge something broken, and tests are the easiest way to find out if it's all ok

// if the threshold size is smaller than current size, delete everything
if threshold_size < spent_utxos.len() as u64 {
//delete all spent utxos
return self.del_spent_utxos(None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with what Raj is saying below: you can't really delete all the utxos if there are more than threshold_size, you probably just want to delete the oldest ones (=first spent ones)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will go with Raj's suggestion

src/database/mod.rs Outdated Show resolved Hide resolved
@vladimirfomene vladimirfomene force-pushed the manually-deleting-spent-utxos branch 3 times, most recently from 952fc77 to c957755 Compare September 24, 2022 10:23
@vladimirfomene
Copy link
Contributor Author

@rajarshimaitra and @danielabrozzoni , I have amended the PR based on your recommendations. I have also added tests for the methods.

@danielabrozzoni
Copy link
Member

@vladimirfomene did you also fix #725 (comment) ? Looking at the code, it seems to me it hasn't been

@vladimirfomene
Copy link
Contributor Author

Yes I did

// call del_spent_utxos with None
let mut res = db.del_spent_utxos(None).unwrap();
res.sort_by(|a, b| a.txout.value.cmp(&b.txout.value));
// verify that the one spent utxo has been deleted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "verify that the spent ones has been deleted" is slightly easier to understand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.del_spent_utxos(Some(vec![
first_utxo.outpoint,
third_utxo.outpoint,
fourth_utxo.outpoint,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here you should call del_spent_utxos with just 2 of the 3 utxos that have been spent, so that we can check that del_spent_utxos(Some(v)) deletes all the utxos in v and not all the spent ones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if let Some(to_delete) = to_delete {
let deleted_utxos = to_delete
.iter()
.filter_map(|out| self.del_utxo(out).transpose())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also check that the to_delete txouts are effectively spent, otherwise we might have users accidentally deleting unspent txouts and then having problems with sync/balance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 215 to 216
let utxos_heights: Vec<(&LocalUtxo, &u32)> =
spent_utxos.iter().zip(txs_conf_heights.iter()).collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you really zip here? I think there are cases where txs_conf_heights is shorter than spent_utxos:

  • if self.get_tx(&utxo.outpoint.txid, false).transpose() is None
  • if details.confirmation_time.as_ref().map(|conf| conf.height) is None

let mut failed_confirmation_criteria: Vec<(&LocalUtxo, &u32)> = vec![];

// Choose all utxos to delete by confirmation criteria.
if criteria.confirmations.is_some() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be an if let

}

// only select on threshold if there are spent utxos to be deleted
if spent_utxos.len() - to_delete.len() > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use spent_utxos.len() > to_delete.len() instead: if spent_utxos.len() - to_delete.len() is less than zero, I think Rust would panic (subtraction underflow)

}

// apply threshold criteria on spent utxos
if criteria.threshold_size.is_some() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same: I would use if let, so you don't have to unwrap below

Comment on lines 254 to 256
let qty_to_delete = spent_utxos.len() as u64
- criteria.threshold_size.unwrap()
- to_delete.len() as u64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could underflow as well... you might want to use https://doc.rust-lang.org/std/primitive.u32.html#method.saturating_sub to avoid problems

let qty_to_delete = spent_utxos.len() as u64
                               - to_delete.len() as u64
                               .saturanting_sub(criteria.threshold_size.unwrap());

(I didn't compile this code, it probably needs some more parenthesis to work)

// split utxos_heights pair based on confirmation criteria
let (pass, fail): (Vec<_>, Vec<_>) =
utxos_heights.iter().partition(|(_utxo, height)| {
(current_block_height.unwrap() - **height) >= criteria.confirmations.unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use saturating_sub for current_block_height.unwrap() - **height, to avoid panics

We currently store spent UTXOs in the database
with an `is_spent` field and we don't provide
users with a way to delete these UTXOs. The
issue here is that these could easily bloat the
database or memory. This PR provides methods
that can allow users to delete spent UTXOs
from the database. This PR fixes issue bitcoindevkit#573

add verification
@vladimirfomene
Copy link
Contributor Author

I have removed the del_spent_utxos_by_criteria method and its test. We decided to remove the del_spent_utxos_by_criteria because we don't currently store data that tells us how many blocks ago a utxo was spent. We only store data that tells us how many blocks ago, a utxo was received. This makes it hard to calculate the number of blocks passed since a utxo was spent and therefore difficult to use that as a criteria for deletion.

This commit adds test for `del_spent_utxos`.
@notmandatory
Copy link
Member

notmandatory commented Sep 29, 2022

Hey sorry but I need to take this out of the 0.23 release, and there's a chance that with the up-coming bdk_core work we may not need it. But this PR is a still a good bit of work so let's leave it open until we have more info.

@notmandatory notmandatory removed this from the Release 0.23.0 Feature Freeze milestone Sep 29, 2022
@rajarshimaitra
Copy link
Contributor

rajarshimaitra commented Oct 15, 2022

My 2 sats would be to get it into bdk anyway even if we might not need it after bdk_core integration.. A full featured bdk_core integration would take a bit of time, and this can be useful feature to add for wallet devs anyway who might wanna stay with the older versions of bdk..

@danielabrozzoni
Copy link
Member

In my opinion getting this in might cause more disrupt than help:

  • It takes resources away from bdk-core (I can't work on bdk_core and review on bdk at the same time :))
  • We don't have proof that people need this right now
    I agree that it might be useful to get this one in for the ones that want to stay on bdk 0.something instead of switching to 1.0, but IMHO we should wait for the bdk_core integration to be almost ready before deciding this

@danielabrozzoni
Copy link
Member

Hey, we are in the process of releasing BDK 1.0, which will under the hood work quite differently from the current BDK. For this reason, I'm closing all the PRs that don't really apply anymore. If you think this is a mistake, feel free to rebase on master and re-open!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants