Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make contribution ID consistent from one whatsapp chat import to another #129

Open
tcouch opened this issue Nov 8, 2024 · 2 comments
Open
Labels

Comments

@tcouch
Copy link
Collaborator

tcouch commented Nov 8, 2024

Currently contributionid is a uuid generated at random for each observation each time a whatsapp chat is exported into Kapta Mobile. In order to deal with the problem of duplicates from repeated uploads however, it would be better to generate an id from attributes which are consistent from one export to the next. That's because the uploadAPI uses contributionid as the filename in the S3 bucket so we'd be overwriting/updating observations that have already been uploaded as opposed to creating new files.

@tcouch tcouch added the Priority label Nov 8, 2024
@tcouch
Copy link
Collaborator Author

tcouch commented Nov 8, 2024

To make it unique, it'd have to be a combination of message.sender and message.datetime. I did consider including groupName, but that's too easy to change. I'm not sure what happens if a message sender changes their username if that updates old messages in the chat, in which case even message.sender isn't reliable, but there isn't really an alternative.

There doesn't appear to be a built in way to generate hashes, but this code snippet from blogpost looks like one way to do it:

export const sha256 = async (text: string): Promise<string> => {
  const encoder = new TextEncoder();
  const data = encoder.encode(text);
  const hash = await crypto.subtle.digest('SHA-256', data);

  return Array.from(new Uint8Array(hash))
    .map((byte) => byte.toString(16).padStart(2, '0'))
    .join('');
};

@acholyn
Copy link
Collaborator

acholyn commented Dec 13, 2024

think we can mark this as closed now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants