You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
while having a look at the code for generation with the Florence 2 model, I've noticed something weird. The original code for inference uses the _encode_image method for creating image features. However, looking at the encode_image used in transformers.js, I've noticed the postprocessing after the model forward pass is missing. Here's a minimal reproducible example:
This might be due to image reading differences in JavaScript vs. Python. Could you try passing the exact same data (e.g., all-zero tensor) to see if the difference is there too? Also, remember to load the full-precision model in Transformers.js, as this could be another source for differences.
To clear any misunderstanding, the model I used is converted in full precision. Unfortunately, using the model in transformers.js is not an option for me as my use case requires python.
Question
Hi,
while having a look at the code for generation with the Florence 2 model, I've noticed something weird. The original code for inference uses the _encode_image method for creating image features. However, looking at the encode_image used in
transformers.js
, I've noticed the postprocessing after the model forward pass is missing. Here's a minimal reproducible example:The feature differences are pretty big:
Am I missing something here or is this a potential bug?
The text was updated successfully, but these errors were encountered: