Skip to content

Conversation

@ammar-elsabe
Copy link
Contributor

4 days ago huggingface released tokenizers@v0.21.2, which changed the get_normalizers method to be an impl AsRef<[NormalizerWrapper]> for Sequence. Due to the version requirement in toktrie_hf_tokenizers (version = ">=0.20.0, <1.0.0"), in the absence of a lockfile cargo fetches v0.21.2 which will not compile

This was caught in EricLBuehler/mistral.rs#1523

`tokenizers::normalizers::Sequence::get_normalizers` has been changed to
`AsRef<[NormalizerWrapper]>`
@ammar-elsabe
Copy link
Contributor Author

I didn't include this in the PR but if huggingface is going to introduce api breaking changes with patches maybe the version constraint should be =0.21.2?

@mmoskal
Copy link
Member

mmoskal commented Jun 29, 2025

The main reason I had this very flexible dep is so that the tokenizer lib can be easily updated in mistral.rs (I think that's the only pure rust user of llg). But I think you're right - if they break it like this, we should probably pin it...

@ammar-elsabe
Copy link
Contributor Author

Maybe make it "0.21.2" and hope they follow semver guidelines from now on?

@dirvine
Copy link

dirvine commented Jul 3, 2025

This would be really helpful if we could progress it.

@hudson-ai
Copy link
Contributor

Maybe make it "0.21.2" and hope they follow semver guidelines from now on?

@ammar-elsabe did you mean to set it to version = ">=0.21.2, <1.0.0"? While my opinion on this isn't incredibly strong, I do lean a bit towards us trying to remain flexible by bumping the lower-end of the range and optimistically hoping they get a bit better at semver... If this bites us again, I'll be more on the "pin it" team.

@mmoskal
Copy link
Member

mmoskal commented Jul 3, 2025

I'm with @hudson-ai on this one - please set 0.21.2 as minimum and let's see!

@ammar-elsabe
Copy link
Contributor Author

@hudson-ai I actually set it to "0.21.2" on purpose, technically speaking SemVer considers any version change before 1.0.0 to be incompatible

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

cargo is a little more flexible, considering minor version bumps as incompatible pre-1.0.0 but patches are allowed

Versions are considered compatible if their left-most non-zero major/minor/patch component is the same. This is different from SemVer which considers all pre-1.0.0 packages to be incompatible.

I thought 0.21.2 := >=0.21.2, <0.22.0 would be a decent middleground that follows cargo's guidelines, regardless its your repository so up to you 🤷 , ill update it with whatever you confirm

Copy link
Contributor

@hudson-ai hudson-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wasn't aware of cargo's flexibility on patch versions for pre-1.0.0. This means 0.21.2 := >=0.21.2, <0.22.0? I would agree that this is a reasonable middle-ground. @mmoskal what do you think?

@mmoskal mmoskal merged commit 9ec2355 into guidance-ai:main Jul 3, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants