Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced the cursor to be stateful #1091

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

crodas
Copy link

@crodas crodas commented Apr 18, 2024

Description

Fixes #147

To avoid duplicated results push some state towards the cursor.

The cursor uses a Bloom Filter[1], a probabilistic data structure. In this data structure, False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed. The more items added, the larger the probability of false positives.

[1] https://en.wikipedia.org/wiki/Bloom_filter

crodas and others added 2 commits April 18, 2024 13:34
Fixes stackernews#147

To avoid duplicated results push some state towards the cursor.

The cursor uses a Bloom Filter[1], a probabilistic data structure. In this data
structure, False positive matches are possible, but false negatives are not –
in other words, a query returns either "possibly in set" or "definitely not in
set". Elements can be added to the set, but not removed. The more items added,
the larger the probability of false positives.

[1] https://en.wikipedia.org/wiki/Bloom_filter
Copy link

New and removed dependencies detected. Learn more about Socket for GitHub ↗︎

Package New capabilities Transitives Size Publisher
npm/bloom-filters@3.0.1 None +2 608 kB callidon
npm/jest@29.7.0 Transitive: environment, eval, filesystem, network, shell, unsafe +159 20.1 MB simenb

🚮 Removed packages: npm/puppeteer@20.8.2

View full report↗︎

@huumn
Copy link
Member

huumn commented Apr 18, 2024

This is going to require using a bloomfilter in postgres queries. It doesn't look like you've gotten to that part in the PR yet so I wanted to be sure to flag that.

For a nice to have thing, my "heads up" advice is that it's going to be easy to overengineer a solution here. The requirement this is going to need to meet is "is it worth a probabilistic data structure, another package and new code to solve this minor inconvenience" and "is this the simplest way to solve this problem or merely the best solution that can be tacked on without understanding the rest of the code"?

@crodas
Copy link
Author

crodas commented Apr 18, 2024

This is going to require using a bloomfilter in postgres queries. It doesn't look like you've gotten to that part in the PR yet so I wanted to be sure to flag that.

That bit is still to be implemented, I guess my original idea was to filter things out, but still alter as little as possible from the existing logic. AFAIK, there is no bloom filter built-in for Postgres. My end goal was to avoid sending duplicate items, but still read them from the database.

My same approach can be done in the front end, but just refusing to render duplicated items. Maybe, it could even update the existing item in place, instead of appending it. I'll give it a thought

For a nice to have thing, my "heads up" advice is that it's going to be easy to overengineer a solution here.

I agree. There is very little benefit but it was a fun approach nonetheless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eliminate duplication that can happen after clicking More when ranking changes over time
2 participants