Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hma] Add optional distance parameter in match API #1573

Open
Dcallies opened this issue Mar 27, 2024 · 0 comments
Open

[hma] Add optional distance parameter in match API #1573

Dcallies opened this issue Mar 27, 2024 · 0 comments
Assignees
Labels
hma Items related to the hasher-matcher-actioner system

Comments

@Dcallies
Copy link
Contributor

Dcallies commented Mar 27, 2024

Currently the only match API is based on banks:

hash/content -> [banks]

Inside that function, it's actually first getting the content ids that it's matching. To help us prepare for a future implementation that allows returning more information, it might be useful to provide a test/debug API that returns the content ids.

hash/content -> [content_ids[id, ?distance]]

This could help inform a future implementation that uses the ids.

On Distance

SignalType has a concept of distance, which currently lives here.

However, not all indices are capable of returning distance, though it seems that PDQ does, unlike what I had said in an earlier discussion: https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/signal_type/pdq/pdq_index.py#L49-L61

So now let's bubble up the distance to the REST API.

An index might returning you a distance object which is stringable, or an "empty distance" which strings to " - ".

Dealers choice about whether the API should either:

  1. Always return distance if it has it
  2. Optionally brute force the distance from the id based on a param (include_distance=True)

Followups

It's unclear what the "default" output of matching should be. It seems like most potential users are interested in the metadata of the match (not just the banks), and so returning the bank content id seems like it might be needed in every case.

@Dcallies Dcallies changed the title Write issue for Distance on the match API [hma] Add optional distance parameter in match API Mar 27, 2024
@Dcallies Dcallies assigned Dcallies and juanmrad and unassigned juanmrad Mar 27, 2024
@Dcallies Dcallies added the hma Items related to the hasher-matcher-actioner system label Mar 27, 2024
@Dcallies Dcallies assigned juanmrad and unassigned Dcallies Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hma Items related to the hasher-matcher-actioner system
Projects
None yet
Development

No branches or pull requests

2 participants