Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate and write stash including soft blocked versions #22826

Merged
merged 2 commits into from
Nov 13, 2024

Conversation

KevinMind
Copy link
Contributor

@KevinMind KevinMind commented Nov 5, 2024

Fixes: mozilla/addons#15050

Description

This PR updates the generate_and _write_stash method on MLBF class to consider the range of use cases covered in the linked jira issue document.

We have to add support for specifying a block_type when requesting a change count. This will be useful for creating type level filters as we need to know for each type if we have any changes and enough changes for a new filter.

We have to modify which guids are included in the now 3 different properties on the stash. This has to cover the set of use cases where blocks are added as hard or soft as well as moving from soft to hard and vice versa.

Context

When updating our bloom filter we are now going to create a unified stash which will allow us to enable support for adding/removing items from the soon to be soft block filter, but, to do so in a way that is backwards compatible with legacy fx clients that have no concept of soft blocking. This raises some edge cases that increased the complexity of this patch. Read the doc for more info

Testing

Look at the automated tests but essentially we should consider the following scenarios and you can independently verify them if you don't trust the tests.

Here is a detailed break down of the scenarios

Checklist

  • Add #ISSUENUM at the top of your PR to an existing open issue in the mozilla/addons repository.
  • Successfully verified the change locally.
  • The change is covered by automated tests, or otherwise indicated why doing so is unnecessary/impossible.
  • Add before and after screenshots (Only for changes that impact the UI).
  • Add or update relevant docs reflecting the changes made.

@KevinMind KevinMind force-pushed the soft-block-bloom-filter branch from 5d67b09 to dc97c8a Compare November 5, 2024 14:39
@KevinMind KevinMind changed the base branch from soft-block-bloom-filter to master November 5, 2024 14:41
@KevinMind KevinMind force-pushed the soft-block-stashes branch 2 times, most recently from d5652d0 to 3d8e95b Compare November 5, 2024 15:50
@KevinMind KevinMind marked this pull request as ready for review November 5, 2024 16:01
@KevinMind KevinMind requested review from a team and eviljeff and removed request for a team November 5, 2024 16:01
@KevinMind
Copy link
Contributor Author

@willdurand and @diox 2 remaining questions from my side.

  1. what logic should we use to determine the number of "changed" guids when including the differing handling of hard <-> soft blocks in legacy versus new clients. Some items are included in multiple stashes so should they equally double increase the counts? Seems intuitive enough but not sure if we care about that.
  2. It is important that the time used for the stash is earlier than the time used for the filter in the event that we create both. Currently we use a single generation time for both. Do we want to always make the filter generation time +1 milisecond of the stash? or vice versa or how exactly should we handle this?

@KevinMind KevinMind changed the title Soft-block-stashes Generate and write stash including soft blocked versions Nov 5, 2024
@KevinMind KevinMind marked this pull request as draft November 5, 2024 18:03
@KevinMind KevinMind force-pushed the soft-block-stashes branch 4 times, most recently from 7d932aa to f73aa2b Compare November 6, 2024 15:15
@KevinMind KevinMind requested review from willdurand and removed request for eviljeff November 6, 2024 15:16
@KevinMind KevinMind marked this pull request as ready for review November 6, 2024 15:17
@KevinMind KevinMind mentioned this pull request Nov 7, 2024
5 tasks
@KevinMind KevinMind force-pushed the soft-block-stashes branch 2 times, most recently from ff970f1 to d515b9d Compare November 7, 2024 17:10
@KevinMind KevinMind force-pushed the soft-block-stashes branch 2 times, most recently from 59de2e8 to 5f52ca5 Compare November 8, 2024 12:10
@KevinMind KevinMind requested a review from willdurand November 8, 2024 12:11
@KevinMind
Copy link
Contributor Author

@willdurand rebased

Copy link
Member

@willdurand willdurand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have questions.

src/olympia/blocklist/mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_mlbf.py Show resolved Hide resolved
Copy link
Member

@willdurand willdurand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall but given the number of suggestions, I'm going to request changes to get a chance to re-read the entire diff once updated. Given that most of the tests are in the single test_diff_all_possible_changes test method, I made numerous suggestions to get it better documented for our future selves (not that it wasn't well documented initially).

Another thing that I am wondering is that we do a lot of comparisons between a MLBF (so a base bloom filter) and changes. Do we have test coverage to generate a stash after a previous stash was created?

src/olympia/blocklist/tests/test_mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/mlbf.py Show resolved Hide resolved
src/olympia/blocklist/mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_mlbf.py Show resolved Hide resolved
src/olympia/blocklist/tests/test_mlbf.py Outdated Show resolved Hide resolved
@KevinMind
Copy link
Contributor Author

@willdurand I'm not sure I get it but

Another thing that I am wondering is that we do a lot of comparisons between a MLBF (so a base bloom filter) and changes. Do we have test coverage to generate a stash after a previous stash was created?

Each time we run the cron job we check the current state of the database from the previous cache.json and if there are >0 < threshold number of changes we create a stash.

It's actually totally irrelevant if we created a stash the last time or not. It only matters what was the state of the DB during the last run. In the case where we don't have any filter to compare to, we cannot create a stash because we need to create a filter.

@willdurand
Copy link
Member

@willdurand I'm not sure I get it but

Another thing that I am wondering is that we do a lot of comparisons between a MLBF (so a base bloom filter) and changes. Do we have test coverage to generate a stash after a previous stash was created?

Each time we run the cron job we check the current state of the database from the previous cache.json and if there are >0 < threshold number of changes we create a stash.

It's actually totally irrelevant if we created a stash the last time or not. It only matters what was the state of the DB during the last run. In the case where we don't have any filter to compare to, we cannot create a stash because we need to create a filter.

You got it right, I think. Do we have test coverage that reads a previous cache.json file? Something a bit more end-to-end than calling individual methods.

Copy link
Member

@willdurand willdurand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+wc, thank you!

src/olympia/blocklist/mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_mlbf.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_mlbf.py Outdated Show resolved Hide resolved
),
}

assert second_mlbf.generate_and_write_stash(previous_mlbf=first_mlbf) == {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever change the default of the switch, this would fail. Maybe we should put this assertion in a with override_switch('mlbf-soft-blocks-enabled', active=False): block? Or we ensure the switch is disabled in a setUp method in this test file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will remove the switch before flipping the default, and even if we did, I'd just update the tests then.

@KevinMind KevinMind force-pushed the soft-block-stashes branch 2 times, most recently from 65031a1 to 70fc8a3 Compare November 13, 2024 14:41
Update src/olympia/blocklist/mlbf.py

Co-authored-by: William Durand <will+git@drnd.me>

Simplify the test comment

Add waffle switch controlling stash

link explicitly to soft blocking

Better comment

Update src/olympia/blocklist/tests/test_mlbf.py

Co-authored-by: William Durand <will+git@drnd.me>

Update src/olympia/blocklist/tests/test_mlbf.py

Co-authored-by: William Durand <will+git@drnd.me>

Update src/olympia/blocklist/tests/test_mlbf.py

Co-authored-by: William Durand <will+git@drnd.me>

Update src/olympia/blocklist/tests/test_mlbf.py

Co-authored-by: William Durand <will+git@drnd.me>

Update src/olympia/blocklist/tests/test_mlbf.py

Co-authored-by: William Durand <will+git@drnd.me>

Update src/olympia/blocklist/tests/test_mlbf.py

Co-authored-by: William Durand <will+git@drnd.me>

Update src/olympia/blocklist/tests/test_mlbf.py

Co-authored-by: William Durand <will+git@drnd.me>

Update src/olympia/blocklist/tests/test_mlbf.py

Co-authored-by: William Durand <will+git@drnd.me>

Apply suggestions from code review

Co-authored-by: William Durand <will+git@drnd.me>

Update src/olympia/blocklist/mlbf.py

Co-authored-by: William Durand <will+git@drnd.me>

Apply suggestions from code review

Co-authored-by: William Durand <will+git@drnd.me>
@KevinMind
Copy link
Contributor Author

@willdurand added the integration test for invalid cache, we already had for previously existing cache.

@KevinMind KevinMind merged commit ce66a9f into master Nov 13, 2024
31 checks passed
@KevinMind KevinMind deleted the soft-block-stashes branch November 13, 2024 17:13
}

if waffle.switch_is_active('mlbf-soft-blocks-enabled'):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheiße forgot to update the switch name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Implement updated stashing strategy to stash soft as well as hard blocks
2 participants