Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a pubsub database implementation #4265

Merged
merged 3 commits into from
Feb 18, 2025
Merged

Conversation

tybug
Copy link
Member

@tybug tybug commented Feb 9, 2025

TODO: write real docs for this? and figure out whether to put watchdog in test.in or tools.in?

@tybug tybug force-pushed the db-pubsub branch 3 times, most recently from 21cda3f to 473a7cb Compare February 9, 2025 20:18
Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level feedback: I'm really excited to have a high-performing option for this!

Small design change to support that: we should notify only on save events (inc. move), and pass the listener the changed k:v pair(s), rather than just a "something changed" notification. That way listeners never need to rescan the database!

@tybug
Copy link
Member Author

tybug commented Feb 10, 2025

Not on deletion also? We should definitely pass the k:v pairs (I realized this last night), but if we want consumers to be able to losslessly reconstruct the db state without rescanning then we also need to fire deletion events right?

move is bit awkward with both a deletion and save event, but consumers (hypofuzz) are going to be debouncing anyway, so I don't think it will impact much.

@Zac-HD
Copy link
Member

Zac-HD commented Feb 10, 2025

Yeah, good point. Let's make it a stream of ("save"/"delete", key, value).

Then our watcher can subscribe, scan once, and replay updates onto a cheap-to-query InMemoryExampleDatabase; and maybe even keep track of the keys for a lazily-updated derived view of deserialized contents.

@tybug
Copy link
Member Author

tybug commented Feb 10, 2025

hm, problem: DirectoryBasedExampleDatabase stores key paths lossily with a hash, with no way to go from hash -> key unless we track the reverse mapping in memory (which isn't complete in this case, because we might want to emit e.g. a save event for a file saved by another database accessing the same directory, where we never see the original key).

We could add a new .key_mapping json file in the directory for this? Backwards compatibility is fine, we just don't emit events we don't have a mapping for.

e: probably better to do one .key file per directory to avoid write races or rereading a large file

@Zac-HD
Copy link
Member

Zac-HD commented Feb 10, 2025

idea: just have a b"keys" key, which saves the value of each known key (similar to HypoFuzz, but more general). Then we can scan that at start-listening time, track the hash->key mapping, and we'll be notified about new keys when they're written!

@tybug
Copy link
Member Author

tybug commented Feb 11, 2025

that works wonderfully!

One other snag: DirectoryBasedExampleDatabase cannot propagate the value which was deleted, because the file is already gone by the time we get the event. I'm currently typing the event as type: Literal["delete"], key: bytes, value: Optional[bytes] where value=None indicates we don't know what the deleted value is, with the idea being hypofuzz would rescan just that key on deletions.

@tybug
Copy link
Member Author

tybug commented Feb 11, 2025

This was an enormous pain to test, but on the bright side I am confident in the implementation now, and have picked up on a few minor implementation savings (move with src == dest in redis) along the way. I half expect these tests to be flaky though...

@tybug tybug force-pushed the db-pubsub branch 8 times, most recently from 9e50226 to 2d451c7 Compare February 13, 2025 00:03
@tybug tybug force-pushed the db-pubsub branch 3 times, most recently from 4cbdcd1 to c6c2837 Compare February 16, 2025 06:17
("delete", (b"k2", b"v3")),
("save", (b"k1", b"v3")),
]
elif sys.platform.startswith("win"):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the linux details for move here are weird, but not really harmful, just redundant/inefficient. I suspect this is at the os/watchdog level but don't have a linux machine to test further. We use os.renames for db.move with a fallback to save/delete, which may explain some of the divergent behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github "codespaces" might be helpful to debug?

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, comments below are non-blocking 🙂

("delete", (b"k2", b"v3")),
("save", (b"k1", b"v3")),
]
elif sys.platform.startswith("win"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github "codespaces" might be helpful to debug?

@tybug tybug enabled auto-merge February 18, 2025 07:38
@tybug tybug merged commit 88cf256 into HypothesisWorks:master Feb 18, 2025
50 checks passed
@tybug tybug deleted the db-pubsub branch February 18, 2025 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants