[ENH] Upsert and a synthetic data set. #3253

rescrv · 2024-12-05T22:17:56Z

This introduces the upsert call and a synthetic data set capable of upsert.

github-actions · 2024-12-05T22:18:06Z

This lays out the scaffolding for upsert.

codetheweb · 2024-12-09T18:16:16Z

rust/load/src/bit_difference.rs

+            .iter()
+            .enumerate()
+            .filter_map(|(idx, word)| {
+                if embedding[idx >> 3] & (1 << (idx & 0x7)) != 0 {


some context on what this does would be helpful

codetheweb · 2024-12-09T18:17:45Z

rust/load/src/bit_difference.rs

+    pub fn embedding(&self) -> Vec<f32> {
+        let mut result = vec![];
+        let words = self.content.split_whitespace().collect::<Vec<_>>();
+        for word in WORDS.iter() {


what happens if the wordlist changes? or will created collections never be re-used across chroma load invocations?

The words are set in the binary. They will "never" change.

right, they won't change for the same binary
I was asking if there would be a problem if we decided to change the wordlist in a new version or made it longer to simulate a larger embedding space

I'd make that a new data set. Any limitations there?

no, that makes sense 👍

rescrv force-pushed the rescrv/upsert branch from 81caf05 to 6ac4b7b Compare December 6, 2024 21:21

rescrv requested a review from codetheweb December 6, 2024 21:21

[ENH] Upsert and a synthetic data set.

8b4037f

This lays out the scaffolding for upsert.

rescrv force-pushed the rescrv/upsert branch from 6ac4b7b to 8b4037f Compare December 6, 2024 22:51

codetheweb approved these changes Dec 9, 2024

View reviewed changes

incorporate reviewer feedback

6c66c4a

rescrv merged commit 0b267cf into main Dec 9, 2024
71 checks passed

rescrv deleted the rescrv/upsert branch December 9, 2024 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Upsert and a synthetic data set. #3253

[ENH] Upsert and a synthetic data set. #3253

rescrv commented Dec 5, 2024 •

edited

Loading

github-actions bot commented Dec 5, 2024

codetheweb Dec 9, 2024

codetheweb Dec 9, 2024

rescrv Dec 9, 2024

codetheweb Dec 9, 2024

rescrv Dec 10, 2024

codetheweb Dec 10, 2024

[ENH] Upsert and a synthetic data set. #3253

[ENH] Upsert and a synthetic data set. #3253

Conversation

rescrv commented Dec 5, 2024 • edited Loading

github-actions bot commented Dec 5, 2024

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

codetheweb Dec 9, 2024

Choose a reason for hiding this comment

codetheweb Dec 9, 2024

Choose a reason for hiding this comment

rescrv Dec 9, 2024

Choose a reason for hiding this comment

codetheweb Dec 9, 2024

Choose a reason for hiding this comment

rescrv Dec 10, 2024

Choose a reason for hiding this comment

codetheweb Dec 10, 2024

Choose a reason for hiding this comment

rescrv commented Dec 5, 2024 •

edited

Loading