Refactored ModelConfig object into a Marshmallow schema #2906

tgaddair · 2023-01-09T23:59:25Z

No description provided.

for more information, see https://pre-commit.ci

github-actions · 2023-01-10T01:43:18Z

Unit Test Results

        6 files ±0         6 suites ±0 4h 10m 11s ⏱️ - 56m 55s
  3 890 tests ±0   3 853 ✔️ ±0   37 💤 ±0 0 ❌ ±0
11 667 runs ±0 11 556 ✔️ ±0 111 💤 ±0 0 ❌ ±0

Results for commit ea0e305. ± Comparison against base commit 9d58f0f.

♻️ This comment has been updated with latest results.

for more information, see https://pre-commit.ci

…into refactor-schema

ludwig/utils/config_utils.py

connor-mccorm

This is great, definitely a lot of terrific changes in here. Thanks for putting this together Travis. I just had a few nits I noticed while going through this but at a high level the logic all makes sense and doesn't have any major errors. The one thing I'll callout is that we may need to double check that there are no changes needed on the Predibase side to reconcile these changes. It looks like the api has been preserved so I don't expect anything here, but I do remember running into issues last time making these changes so just wanted to mention it. Overall, LGTM!

connor-mccorm · 2023-02-03T00:09:31Z

ludwig/schema/features/utils.py

-                    "description": "Type of the input feature",
-                },
-                "column": {"type": "string", "title": "column", "description": "Name of the column."},
+        "type": "object",


Since we're no longer treating this as an array with "minItems": 1 will this still validate that the user provided at least one feature?

Good question, it should get it from the FeaturesTypeSelection which has min_length=1 set by defaults. But let me check.

Yes, looks like it works. Added tests to verify.

Awesome that it works, also verified it locally. I'm a little confused about how this works though... the custom schema here is no longer even an array so how does it end up being one in the final assembly (let alone how does the allOf end up in the right place)?

I think it's marshmallow-jsonschema magic.

@tgaddair I'm not so sure. Whatever is set in _json_type_mapping should override any kind of automatic logic so at minimum while I might expect it to perhaps fix the type to be an array I would not expect it to insert items or know to cherry-pick allOf to one level higher than it is in the final schema. But I actually wanted to talk about this in our next sync

I just figured this out. Despite the fact that json_type_mapping is supposed to be the ultimate override method, marshmallow_jsonschema will change the type of the field if the class in question derives a native marshmallow type. Since FeaturesTypeSelection ultimately derives marshmallow.fields.List, the package will force its JSON schema into proper list/array format - in particular by taking the output of json_type_mapping and shoving it inside of a standard list's JSON schema (that is where the surrounding type: array : { items: { ... comes from).

We are lucky that all of the feature's schema attributes and conditional logic indeed are supposed to be nested inside of items, as otherwise this would have caused a major error!

Very interesting! Nice discovery.

ludwig/schema/features/preprocessing/image.py

ludwig/schema/model_config.py

connor-mccorm · 2023-02-03T01:33:06Z

ludwig/schema/combiners/base.py

 from ludwig.api_annotations import DeveloperAPI
 from ludwig.schema import utils as schema_utils


 @DeveloperAPI
+@dataclass(repr=False, order=True)


Can we import and use ludwig_dataclass here instead?

from ludwig.schema.utils import ludwig_dataclass @ludwig_dataclass

Done. What about BaseFeatureConfig and the other classes in that file? Does it matter that order=True isn't specified on those?

Ya we should use @ludwig_dataclass on those as well. Currently the it's not a huge deal since a lot of the parameters on those classes are filtered out or not shown for one reason or another. However, we will need it eventually and having order=True doesn't hurt so I think adding it in is a great idea.

ksbrar

Super, super clean. a few nits... and questions 🤔

ksbrar · 2023-02-03T06:02:16Z

ludwig/config_validation/validation.py

+from ludwig.constants import MODEL_ECD, MODEL_TYPE, PREPROCESSING, SPLIT
+from ludwig.schema import utils as schema_utils
+
+# TODO(travis): figure out why we need these imports to avoid circular import error


Is this actually the case even with validation moved to a separate module?

Yeah, unfortunately so. I think the right fix here is to remove the circular import structure so schema has no deps on the encoders, etc. But this is a larger refctor.

ksbrar · 2023-02-03T06:03:51Z

ludwig/config_validation/validation.py

+            error = e
+
+    if error is not None:
+        raise ValidationError(f"Failed to validate JSON schema for config. Error: {error.message}")


ksbrar · 2023-02-03T06:23:02Z

ludwig/schema/features/binary_feature.py

+class BinaryInputFeatureConfig(BaseInputFeatureConfig, BinaryInputFeatureConfigMixin):
+    """BinaryInputFeatureConfig is a dataclass that configures the parameters used for a binary input feature."""
+
+    encoder: BaseEncoderConfig = None


I'm assuming that without setting = None here you get a bunch of dataclass parameter order initialization errors?

Right, I think it has to do with the "can't have a default param before a require param" stuff.

ludwig/schema/features/utils.py

ksbrar · 2023-02-03T06:38:20Z

ludwig/schema/hyperopt/scheduler.py

@@ -109,12 +105,14 @@ class CommonSchedulerOptions:
        ),
    )

+    max_t: int = max_t_alias()


max_t is not on every scheduler, e.g: https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#median-stopping-rule-tune-schedulers-medianstoppingrule

Reason why I had to move it here is because we run some checks against it for hyperopt defaults, but I can rework.

ludwig/schema/utils.py

ksbrar · 2023-02-03T07:06:49Z

ludwig/schema/features/utils.py

-                    "description": "Type of the input feature",
-                },
-                "column": {"type": "string", "title": "column", "description": "Name of the column."},
+        "type": "object",


Awesome that it works, also verified it locally. I'm a little confused about how this works though... the custom schema here is no longer even an array so how does it end up being one in the final assembly (let alone how does the allOf end up in the right place)?

arnavgarg1 · 2023-02-06T18:03:56Z

ludwig/schema/optimizers.py

-                    f"Invalid params for optimizer: {value}, expect dict with at least a valid `type` attribute."
+                    f"Invalid optimizer type: '{opt_type}', expected one of: {list(optimizer_registry.keys())}."


Much more useful error -> resolution messages, thanks for adding these in

ludwig/schema/optimizers.py

ludwig/schema/utils.py

arnavgarg1 · 2023-02-06T18:07:41Z

ludwig/utils/config_utils.py

-    else:
-        return get_default_decoder_type(feature[TYPE])
-
-
 def has_trainable_encoder(config: ModelConfig) -> bool:


nit: would it make sense to also move has_trainable_encoder and has_pretrained_encoder into ludwig/schema/model_types/utils.py?

Yes, we can do it in a follow-up.

arnavgarg1 · 2023-02-06T18:20:34Z

tests/ludwig/utils/test_config_utils.py

-from typing import Any, Dict, Optional
-
-import pytest
-
-from ludwig.utils.config_utils import merge_fixed_preprocessing_params
-
-


Should we just delete this test module? I'm assuming you've commented out these tests because these functions are only used within the ModelConfig object, and they seem to work?

I want to revisit in a follow-up to make it work with the new structure.

arnavgarg1

This cleans up all the schema implementations we've had by a lot, thanks for putting this together 👏

Only minor nits, but looks good otherwise!

tgaddair and others added 22 commits January 7, 2023 16:25

Improved error message

2947c2a

Added model type schemas

9a17830

Generalized TypeSelection

c3e00f6

Use default_factory

be29ccf

Features

06cd691

Fixed from_dict

7c3768a

Multiple input registries

fef3d74

Subselect features

e81e5fb

Custom encoders

73fb8de

Handle defaults

5ebe2c3

Added FeatureCollection

8f08ebe

Swap ModelConfig

4e8246c

Fixed to_dict

37b024c

Fixed preprocessing

456a24d

Cleanup

6a714ec

Fixed update_config_with_metadata

0d2b83f

Fixed validation metric

58a3e68

Fixed defaults

8f0f43a

[pre-commit.ci] auto fixes from pre-commit.com hooks

bde39a1

for more information, see https://pre-commit.ci

Added requires_equal_dimensions

2cb5920

Commented test

1aeb52d

Fixed proc_column

68780dd

tgaddair and others added 7 commits January 10, 2023 10:19

Fixed loss and decoders

bd18d9d

[pre-commit.ci] auto fixes from pre-commit.com hooks

265cd80

for more information, see https://pre-commit.ci

Refactored encoder schema

085c347

Refactored encoder schema

6296750

Fixed merge_with_defaults

676ef2d

Merge branch 'refactor-schema' of https://github.com/ludwig-ai/ludwig …

2c694b9

…into refactor-schema

Refactor build_outputs

fd940a8

tgaddair added 2 commits February 2, 2023 12:52

Fixed hyperopt defaults

5045560

Merge branch 'refactor-schema' of https://github.com/ludwig-ai/ludwig …

748cd4b

…into refactor-schema

tgaddair marked this pull request as ready for review February 2, 2023 22:35

tgaddair requested review from connor-mccorm, justinxzhao, arnavgarg1 and ksbrar February 2, 2023 22:35

arnavgarg1 reviewed Feb 3, 2023

View reviewed changes

ludwig/utils/config_utils.py Outdated Show resolved Hide resolved

connor-mccorm approved these changes Feb 3, 2023

View reviewed changes

tgaddair added 3 commits February 2, 2023 21:22

Added tests

8e3d0e5

Addressed comments

9a5c073

Added backend and ludwig_version

e079f3a

ksbrar approved these changes Feb 3, 2023

View reviewed changes

Fixed backend conversion

c256d65

arnavgarg1 reviewed Feb 6, 2023

View reviewed changes

ludwig/schema/optimizers.py Outdated Show resolved Hide resolved

arnavgarg1 reviewed Feb 6, 2023

View reviewed changes

ludwig/schema/utils.py Outdated Show resolved Hide resolved

arnavgarg1 reviewed Feb 6, 2023

View reviewed changes

arnavgarg1 approved these changes Feb 6, 2023

View reviewed changes

tgaddair added 5 commits February 7, 2023 12:00

Merge branch 'master' into refactor-schema

d75d032

Fixed ludwig_dataclass

5bb8567

Fixed hyperband config

ee5ecb9

Typos

745a8db

Fixed test

ea0e305

tgaddair merged commit ff3bb05 into master Feb 7, 2023

tgaddair deleted the refactor-schema branch February 7, 2023 22:51

ksbrar mentioned this pull request Feb 8, 2023

Feature: Data Augmentation for Image Input Features #2925

Merged

This was referenced Feb 28, 2023

fix: Hoist uniqueItemProperties to top of feature JSON schema #3159

Closed

fix: [REBASE] Hoist uniqueItemProperties to top of feature JSON schema #3183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored ModelConfig object into a Marshmallow schema #2906

Refactored ModelConfig object into a Marshmallow schema #2906

tgaddair commented Jan 9, 2023

github-actions bot commented Jan 10, 2023 •

edited

Loading

connor-mccorm left a comment

connor-mccorm Feb 3, 2023

tgaddair Feb 3, 2023

tgaddair Feb 3, 2023

ksbrar Feb 3, 2023

tgaddair Feb 7, 2023

ksbrar Feb 7, 2023

ksbrar Feb 28, 2023 •

edited

Loading

tgaddair Feb 28, 2023

connor-mccorm Feb 3, 2023

tgaddair Feb 3, 2023

connor-mccorm Feb 3, 2023

ksbrar left a comment

ksbrar Feb 3, 2023

tgaddair Feb 3, 2023

ksbrar Feb 3, 2023

ksbrar Feb 3, 2023

tgaddair Feb 3, 2023

ksbrar Feb 3, 2023

arnavgarg1 Feb 6, 2023

tgaddair Feb 7, 2023

tgaddair Feb 7, 2023

ksbrar Feb 3, 2023

arnavgarg1 Feb 6, 2023

arnavgarg1 Feb 6, 2023

tgaddair Feb 7, 2023

arnavgarg1 Feb 6, 2023

tgaddair Feb 7, 2023

arnavgarg1 left a comment

		f"Invalid params for optimizer: {value}, expect dict with at least a valid `type` attribute."
		f"Invalid optimizer type: '{opt_type}', expected one of: {list(optimizer_registry.keys())}."

Refactored ModelConfig object into a Marshmallow schema #2906

Refactored ModelConfig object into a Marshmallow schema #2906

Conversation

tgaddair commented Jan 9, 2023

github-actions bot commented Jan 10, 2023 • edited Loading

Unit Test Results

connor-mccorm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ksbrar Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ksbrar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnavgarg1 left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 10, 2023 •

edited

Loading

ksbrar Feb 28, 2023 •

edited

Loading