wrongly annotated configuration when saving a model that has a custom pipeline #28907

not-lain · 2024-02-07T13:48:31Z

System Info

transformers version: 4.35.2
Platform: Linux-6.1.58+-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.20.3
Safetensors version: 0.4.2
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.1.0+cu121 (False)
Tensorflow version (GPU?): 2.15.0 (False)
Flax version (CPU?/GPU?/TPU?): 0.8.0 (cpu)
Jax version: 0.4.23
JaxLib version: 0.4.23
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@Narsil

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

# repo with custom pipeline
from transformers import AutoModelForImageClassification
model = AutoModelForImageClassification.from_pretrained("not-lain/MyRepo", trust_remote_code=True)

model.push_to_hub("testpushfrommodel")

.
└── testpushfrommodel
    ├── config.json
    └── model.safetensors

{
  "_name_or_path": "not-lain/MyRepo",
  "architectures": [
    "MnistModel"
  ],
  "auto_map": {
    "AutoConfig": "not-lain/MyRepo--MyConfig.MnistConfig",
    "AutoModelForImageClassification": "not-lain/MyRepo--MyModel.MnistModel"
  },
  "conv1": 10,
  "conv2": 20,
  "custom_pipelines": {
    "image-classification": {
      "impl": "MyPipe.MnistPipe",
      "pt": [
        "AutoModelForImageClassification"
      ],
      "tf": [],
      "type": "image"
    }
  },
  "model_type": "MobileNetV1",
  "torch_dtype": "float32",
  "transformers_version": "4.35.2"
}

impl is wrongly annotated

Expected behavior

output configuration is

{
  "_name_or_path": "not-lain/MyRepo",
  "architectures": [
    "MnistModel"
  ],
  "auto_map": {
    "AutoConfig": "not-lain/MyRepo--MyConfig.MnistConfig",
    "AutoModelForImageClassification": "not-lain/MyRepo--MyModel.MnistModel"
  },
  "conv1": 10,
  "conv2": 20,
  "custom_pipelines": {
    "image-classification": {
      "impl": "not-lain/MyRepo--MyPipe.MnistPipe",
      "pt": [
        "AutoModelForImageClassification"
      ],
      "tf": [],
      "type": "image"
    }
  },
  "model_type": "MobileNetV1",
  "torch_dtype": "float32",
  "transformers_version": "4.35.2"
}

The text was updated successfully, but these errors were encountered:

not-lain · 2024-02-07T13:52:17Z

to understand more about this try checking these 2 checkpoints

# checkpoint that doesn't have a custom pipeline
model_without_custom_pipeline = AutoModelForImageClassification.from_pretrained("not-lain/MyRepo", trust_remote_code=True, revision="dba8d15072d743b6cb4a707246f801699897fb72")
model_without_custom_pipeline.push_to_hub("model_without_custom_pipeline_repo")

# checkpoint that has a custom pipeline
model_with_custom_pipeline = AutoModelForImageClassification.from_pretrained("not-lain/MyRepo", trust_remote_code=True, revision="4b57ca965d5070af9975666e2a0f584241991597")
model_with_custom_pipeline.push_to_hub("model_with_custom_pipeline_repo")

and try checking the difference in configuration between them

ArthurZucker · 2024-02-08T02:18:46Z

cc @Rocketknight1 if you can have a look!

not-lain · 2024-02-08T03:00:05Z

@Rocketknight1 I'm going to give you a helping hand. you can trace custom_object_save function and check in which files it got called.
I'll try to help out tomorrow, but thought I post this in public to let you know where you can start

Rocketknight1 · 2024-02-22T15:12:39Z

Hey! I've been reviewing this issue and the associated fix PR and the Pipeline.push_to_hub() PR. I wasn't actually that familiar with our custom pipeline code before now, but here's my summary of what's going on and what this fixes:

Current situation:

Right now, we have a guide on custom pipelines here. This was written by @Narsil during a big pipeline refactor 3 years ago.
The guide assumes that custom pipelines are only uploaded with custom models.
The uploaded config.json for custom pipelines does not give full paths to model repos, only a relative path inside the current repo
Because we designed the custom pipeline docs to be tightly connected with a custom model, there's no way to upload a custom pipeline independently (except to save files and call a bunch of Github/huggingface_hub commands manually)

Proposed fixes:

This PR: Fix the config.json uploading so it can correctly point to model repos, instead of assuming that the model always lives in the pipeline repo

#29172: Add a push_to_hub method to Pipeline by inhering from PushToHubMixin, so that pipelines can be directly pushed to the hub rather than as part of a custom model

Stuff I don't understand yet:

Right now, we don't really have a way to initialize a custom pipeline independently of a model. According to the docs, the only way to get a remote code pipeline is to initialize it with an associated model:

from transformers import pipeline

classifier = pipeline(model="{your_username}/test-dynamic-pipeline", trust_remote_code=True)

Therefore, I'm a bit confused about some of these PRs:

Is it actually possible to initialize a custom code pipeline that isn't associated with a model in the same repo?
The addition of Pipeline.push_to_hub() has the same issue - I think it makes sense in theory, but in practice custom pipelines seem to be tightly coupled to custom models, and therefore I'm not sure if it makes sense to upload custom pipelines independently!

The bit where I call for help because I'm not sure what to do:

cc @Narsil - can you clarify what the intention was with the custom pipeline code? Should it be possible for users to upload custom pipelines separately from models?

cc the core maintainers @amyeroberts and @ArthurZucker as well - even though they're not that big, I think these PRs touch very fundamental parts of the conceptual model in transformers. I think it makes a lot of sense to allow users to write custom pipelines that can point to arbitrary models, even in other repos. However, that's (AFAIK) not what the custom pipeline code does right now, and I think a lot of the code and docs haven't been touched for a long time! Therefore, I think we might need a more fundamental change than just tweaking config.json and adding push_to_hub() to custom pipelines if we want to make this work properly. However, I might be wrong - there are still some parts of our library here that I don't fully understand

not-lain · 2024-02-22T16:28:49Z

as for this issue I propose a fix which can be found in #29004
TLDR;
this issue only need to fix the impl field if it exists when transfering from repo A to B (example finetuning an AI model or other stuff ) , meaning

the config should look like this

{
  (...),
  "auto_map": {
    "AutoConfig": "not-lain/MyRepo--MyConfig.MnistConfig",
    "AutoModelForImageClassification": "not-lain/MyRepo--MyModel.MnistModel"
  },
 (...),
  "custom_pipelines": {
    "image-classification": {
      "impl": "not-lain/MyRepo--MyPipe.MnistPipe",
      "pt": [
        "AutoModelForImageClassification"
      ],
      "tf": [],
      "type": "image"
    }
  },
(...)

instead of this

{
  (...),
  "auto_map": {
    "AutoConfig": "not-lain/MyRepo--MyConfig.MnistConfig",
    "AutoModelForImageClassification": "not-lain/MyRepo--MyModel.MnistModel"
  },
 (...),
  "custom_pipelines": {
    "image-classification": {
      "impl": "MyPipe.MnistPipe",
      "pt": [
        "AutoModelForImageClassification"
      ],
      "tf": [],
      "type": "image"
    }
  },
(...)

also this is can be considered as a seperate issue from #29172 since the latter needs 1 minor extra check to test if we are pushing the new registered pipeline to the same repo that the model came from (i know a pretty specific condition) , a ruff estimation for the latter's solution is this #29172 (comment).
Again, they are both seperate from each other since #29172 needs to fix the auto_config field and this issue is about the custom-pipelines field. hope this clears it out.

not-lain · 2024-02-22T16:54:31Z

Is it actually possible to initialize a custom code pipeline that isn't associated with a model in the same repo?

yes, you can do that this model can be a nice example of that( code is in another repo not-lain/CustomCodeForRMBG and the weights are under briaai/RMBG-1.4 ), you can use this to check it out :

from transformers import pipeline

pipe = pipeline("image-segmentation", model="briaai/RMBG-1.4",revision ="refs/pr/9",  trust_remote_code=True)

numpy_mask = pipe("img_path") # outputs numpy mask

if you mean the model is identified in repo A (normal model) and pipeline in repo B (custom pipeline), yes to that too, Sylvian has done that before in this repo https://huggingface.co/sgugger/test-dynamic-pipeline,
if you mean repo A has custom model, and repo B has custom pipeline, didn't try this yet, but i can generate a repo for you to work with if you want to.

The addition of Pipeline.push_to_hub() has the same issue - I think it makes sense in theory, but in practice custom pipelines seem to be tightly coupled to custom models, and therefore I'm not sure if it makes sense to upload custom pipelines independently!

nah, not really, the push_to_hub needs a minor enhancement with auto_config field, also not always are we dealing with custom models, refer to Sylvian's example.

the library as a whole is not consistent enough when working with custom architectures, which is why i reported these issues, maybe we can work through every issue independently, one step at a time stabilizing it.

amyeroberts · 2024-02-27T12:20:50Z

@Rocketknight1 Yes, I think this poses a more fundamental question about the behaviour we want from out pipelines.

The first question I have is where should a pipeline live if we push it to the hub? Would it be under a model space? If that's the case, can a pipeline and a model exist under the same space? If so, do we have the same mapping behaviour that we do with e.g. tokenizers and configs? If the model and pipeline are under the same space, should that pipeline be associated with that model by default? If a pipeline is uploaded by itself, I do think it should still have an associated default model, but I don't see why that couldn't be any model i.e. its default could be a remote model, or a model in the same model repo

Tbh, the current behaviour coupling a model and a pipeline together I think is sensible as although pipelines within transformers are meant to be able to use any task-compatible model as an option, I don't think we can enforce the same guarantees with custom pipelines.

not-lain · 2024-02-27T17:41:40Z

@amyeroberts can you read this one to understand more
https://colab.research.google.com/drive/16xOFUabjTZCaP1kEp44XV_dfEoan-_yZ?usp=sharing

if you still have more questions my dms are always open or you can check back with @Rocketknight1 since I have explained most of this to him in the dms

github-actions · 2024-05-13T08:05:50Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

not-lain changed the title ~~wrongly annotated configuration when saving a model that has a custom implementation~~ wrongly annotated configuration when saving a model that has a custom pipeline Feb 7, 2024

This was referenced Feb 9, 2024

Add push_to_hub( ) to pipeline #28870

Closed

Fix custom architectures #28983

Closed

fix for custom pipeline configuration #29004

Merged

Rocketknight1 mentioned this issue Feb 29, 2024

add push_to_hub to pipeline #29172

Merged

9 tasks

huggingface deleted a comment from github-actions bot Mar 23, 2024

huggingface deleted a comment from github-actions bot Apr 18, 2024

ArthurZucker closed this as completed in #29004 May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrongly annotated configuration when saving a model that has a custom pipeline #28907

wrongly annotated configuration when saving a model that has a custom pipeline #28907

not-lain commented Feb 7, 2024 •

edited

Loading

not-lain commented Feb 7, 2024 •

edited

Loading

ArthurZucker commented Feb 8, 2024

not-lain commented Feb 8, 2024

Rocketknight1 commented Feb 22, 2024 •

edited

Loading

not-lain commented Feb 22, 2024 •

edited

Loading

not-lain commented Feb 22, 2024 •

edited

Loading

amyeroberts commented Feb 27, 2024

not-lain commented Feb 27, 2024

github-actions bot commented May 13, 2024

wrongly annotated configuration when saving a model that has a custom pipeline #28907

wrongly annotated configuration when saving a model that has a custom pipeline #28907

Comments

not-lain commented Feb 7, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

not-lain commented Feb 7, 2024 • edited Loading

ArthurZucker commented Feb 8, 2024

not-lain commented Feb 8, 2024

Rocketknight1 commented Feb 22, 2024 • edited Loading

Current situation:

Proposed fixes:

Stuff I don't understand yet:

The bit where I call for help because I'm not sure what to do:

not-lain commented Feb 22, 2024 • edited Loading

the config should look like this

instead of this

not-lain commented Feb 22, 2024 • edited Loading

amyeroberts commented Feb 27, 2024

not-lain commented Feb 27, 2024

github-actions bot commented May 13, 2024

not-lain commented Feb 7, 2024 •

edited

Loading

not-lain commented Feb 7, 2024 •

edited

Loading

Rocketknight1 commented Feb 22, 2024 •

edited

Loading

not-lain commented Feb 22, 2024 •

edited

Loading

not-lain commented Feb 22, 2024 •

edited

Loading