better snippets for KerasHub models #1021

martin-gorner · 2024-11-12T04:20:32Z

Added better "use this model" snippets for KerasHub models with:

The base class to instantiate when loading the model from_preset
the generate() function if any
A link to the keras.io documentation of the model
alternative snippets for loading the components of the model (Tokenizer, etc)

The data for this comes from parsing task.json, and if not available, config.json and tokenizer.json as well as the pipeline_tag in metadata. The parsing is done in the model repo code.

Screenshot attached for Llama3

martin-gorner · 2024-11-12T04:42:40Z

sister PR:
https://github.com/huggingface-internal/moon-landing/pull/11693

packages/tasks/src/model-libraries-snippets.ts

Wauplin

Left a few comments. Hopefully the logic can be simplified if KerasHub stabilizes its structure

Wauplin · 2024-11-18T14:49:55Z

packages/tasks/src/model-data.ts

+		keras_hub_task_json?: {
+			class_name: string;
+			alt_class_names?: string[];
+		};
+		keras_hub_config_json?: {
+			class_name: string;
+		};
+		keras_hub_tokenizer_json?: {
+			class_name: string;
+		};


slight preference for a more concise

Suggested change

keras_hub_task_json?: {

class_name: string;

alt_class_names?: string[];

};

keras_hub_config_json?: {

class_name: string;

};

keras_hub_tokenizer_json?: {

class_name: string;

};

keras_hub?: {

// relevant task.json content

};

From what I understand, task.json is the future-proof way of getting this info correctly. And getting things from config.json/tokenizer.json is more of a default for previous models up to now. Is my assumption correct or not? If that's the case, then let's focus on parsing only task.json to only promote the "correct" way.

In any case (no matter if the config comes from task.json, config.json or tokenizer.json) I think that having a single field with nested values is better rather than exposing 3 different high-level fields related to keras_hub.

Taking a look at this recent keras-hub model (https://huggingface.co/evandrarf/health-care-gemma2-kagglex/tree/main), I can see that task.json, config.json, tokenizers.json and preprocessor.json are all set. And the content of task.json is strictly a superset of the other 3. Do you know if other files are kept for backward compatbility?

I agree with you that standardizing on a single config file would be the best. Let me ask the keras team?

Matt on the Keras team responded. task.json is not always present and that is by design and not a legacy thing. I recommend we deploy the currently implemented logic while we continue the discussion with Matt and possibly simplify.

I'd rather wait for a simplification before merging except if it's time-sensitive

yes and imo we can influence the standardization by supporting the simpler / single-version version that Wauplin mentions

Wauplin · 2024-11-18T14:51:26Z

packages/tasks/src/model-libraries-snippets.ts

+	let class_name =
+		// If the model has a task.json config, then the base Task class is known
+		model.config?.keras_hub_task_json?.class_name ??
+		// If only a config.json is present, the base class will be a "backbone"
+		model.config?.keras_hub_config_json?.class_name;


Related to my comment above, if we can get rid of some logic by parsing only task.json, that would be for the best.

better snippets for KerasHub models

0020154

martin-gorner requested review from SBrandeis, gary149, Wauplin, julien-c, pcuenca and ngxson as code owners November 12, 2024 04:20

martin-gorner requested a review from enzostvs November 12, 2024 04:48

julien-c reviewed Nov 15, 2024

View reviewed changes

packages/tasks/src/model-libraries-snippets.ts Outdated Show resolved Hide resolved

julien-c reviewed Nov 15, 2024

View reviewed changes

packages/tasks/src/model-libraries-snippets.ts Outdated Show resolved Hide resolved

martin-gorner and others added 2 commits November 18, 2024 10:36

Merge branch 'main' into main

02ea2db

refactor to address stylistic comments

312c58b

Wauplin reviewed Nov 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better snippets for KerasHub models #1021

better snippets for KerasHub models #1021

martin-gorner commented Nov 12, 2024

martin-gorner commented Nov 12, 2024

Wauplin left a comment

Wauplin Nov 18, 2024

Wauplin Nov 18, 2024

martin-gorner Nov 19, 2024

martin-gorner Nov 20, 2024

Wauplin Nov 20, 2024

julien-c Nov 20, 2024

Wauplin Nov 18, 2024

better snippets for KerasHub models #1021

Are you sure you want to change the base?

better snippets for KerasHub models #1021

Conversation

martin-gorner commented Nov 12, 2024

martin-gorner commented Nov 12, 2024

Wauplin left a comment

Choose a reason for hiding this comment

Wauplin Nov 18, 2024

Choose a reason for hiding this comment

Wauplin Nov 18, 2024

Choose a reason for hiding this comment

martin-gorner Nov 19, 2024

Choose a reason for hiding this comment

martin-gorner Nov 20, 2024

Choose a reason for hiding this comment

Wauplin Nov 20, 2024

Choose a reason for hiding this comment

julien-c Nov 20, 2024

Choose a reason for hiding this comment

Wauplin Nov 18, 2024

Choose a reason for hiding this comment