Project import generated by Copybara. (#107)

GitOrigin-RevId: a23c1817783b50e3eb626411cb222d74c60c578d Co-authored-by: Snowflake Authors <noreply@snowflake.com>
snowflakedb · Jul 11, 2024 · 3cbf8f1 · 3cbf8f1
1 parent f0ff796
commit 3cbf8f1
Show file tree

Hide file tree

Showing 144 changed files with 5,190 additions and 793 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,36 @@
 # Release History
 
-## 1.5.3
+## 1.5.4
+
+### Bug Fixes
+
+- Model Registry (PrPr): Fix 401 Unauthorized issue when deploying model to SPCS.
+- Feature Store: Downgrades exceptions to warnings for few property setters in feature view. Now you can set
+  desc, refresh_freq and warehouse for draft feature views.
+- Modeling: Fix an issue with calling `OrdinalEncoder` with `categories` as a dictionary and a pandas DataFrame
+- Modeling: Fix an issue with calling `OneHotEncoder` with `categories` as a dictionary and a pandas DataFrame
+
+### New Features
+
+- Registry: Allow overriding `device_map` and `device` when loading huggingface pipeline models.
+- Registry: Add `set_alias` method to `ModelVersion` instance to set an alias to model version.
+- Registry: Add `unset_alias` method to `ModelVersion` instance to unset an alias to model version.
+- Registry: Add `partitioned_inference_api` allowing users to create partitioned inference functions in registered
+  models. Enable model inference methods with table functions with vectorized process methods in registered models.
+- Feature Store: add 3 more columns: refresh_freq, refresh_mode and scheduling_state to the result of
+  `list_feature_views()`.
+- Feature Store: `update_feature_view()` supports updating description.
+- Feature Store: add new API `refresh_feature_view()`.
+- Feature Store: add new API `get_refresh_history()`.
+- Feature Store: Add `generate_training_set()` API for generating table-backed feature snapshots.
+- Feature Store: Add `DeprecationWarning` for `generate_dataset(..., output_type="table")`.
+- Feature Store: `update_feature_view()` supports updating description.
+- Feature Store: add new API `refresh_feature_view()`.
+- Feature Store: add new API `get_refresh_history()`.
+- Model Development: OrdinalEncoder supports a list of array-likes for `categories` argument.
+- Model Development: OneHotEncoder supports a list of array-likes for `categories` argument.
+
+## 1.5.3 (06-17-2024)
 
 ### Bug Fixes
 
@@ -9,8 +39,6 @@
 - Registry: Fix an issue that leads to incorrect result when using pandas Dataframe with over 100, 000 rows as the input
   of `ModelVersion.run` method in Stored Procedure.
 
-### Behavior Changes
-
 ### New Features
 
 - Registry: Add support for TIMESTAMP_NTZ model signature data type, allowing timestamp input and output.

diff --git a/bazel/environments/conda-env-snowflake.yml b/bazel/environments/conda-env-snowflake.yml
@@ -51,7 +51,7 @@ dependencies:
   - sentencepiece==0.1.99
   - shap==0.42.1
   - snowflake-connector-python==3.10.0
-  - snowflake-snowpark-python==1.15.0
+  - snowflake-snowpark-python==1.17.0
   - sphinx==5.0.2
   - sqlparse==0.4.4
   - tensorflow==2.12.0
@@ -63,4 +63,5 @@ dependencies:
   - types-requests==2.30.0.0
   - types-toml==0.10.8.6
   - typing-extensions==4.5.0
+  - werkzeug==2.2.2
   - xgboost==1.7.3
diff --git a/bazel/environments/conda-env.yml b/bazel/environments/conda-env.yml
@@ -56,7 +56,7 @@ dependencies:
   - sentencepiece==0.1.99
   - shap==0.42.1
   - snowflake-connector-python==3.10.0
-  - snowflake-snowpark-python==1.15.0
+  - snowflake-snowpark-python==1.17.0
   - sphinx==5.0.2
   - sqlparse==0.4.4
   - tensorflow==2.12.0
@@ -68,6 +68,7 @@ dependencies:
   - types-requests==2.30.0.0
   - types-toml==0.10.8.6
   - typing-extensions==4.5.0
+  - werkzeug==2.2.2
   - xgboost==1.7.3
   - pip
   - pip:

diff --git a/bazel/environments/conda-gpu-env.yml b/bazel/environments/conda-gpu-env.yml
@@ -58,7 +58,7 @@ dependencies:
   - sentencepiece==0.1.99
   - shap==0.42.1
   - snowflake-connector-python==3.10.0
-  - snowflake-snowpark-python==1.15.0
+  - snowflake-snowpark-python==1.17.0
   - sphinx==5.0.2
   - sqlparse==0.4.4
   - tensorflow==2.12.0
@@ -70,6 +70,7 @@ dependencies:
   - types-requests==2.30.0.0
   - types-toml==0.10.8.6
   - typing-extensions==4.5.0
+  - werkzeug==2.2.2
   - xgboost==1.7.3
   - pip
   - pip:

diff --git a/bazel/requirements/templates/meta.tpl.yaml b/bazel/requirements/templates/meta.tpl.yaml
@@ -14,6 +14,8 @@ requirements:
     - bazel >=6.0.0
   run:
     - python>=3.8,<3.12
+  run_constrained:
+    - openjpeg !=2.4.0=*_1 # [win]
 
 about:
   home: https://github.com/snowflakedb/snowflake-ml-python

diff --git a/ci/conda_recipe/meta.yaml b/ci/conda_recipe/meta.yaml
@@ -17,7 +17,7 @@ build:
   noarch: python
 package:
   name: snowflake-ml-python
-  version: 1.5.3
+  version: 1.5.4
 requirements:
   build:
     - python
@@ -42,7 +42,7 @@ requirements:
     - scikit-learn>=1.2.1,<1.4
     - scipy>=1.9,<2
     - snowflake-connector-python>=3.5.0,<4
-    - snowflake-snowpark-python>=1.15.0,<2
+    - snowflake-snowpark-python>=1.17.0,<2
     - sqlparse>=0.4,<1
     - typing-extensions>=4.1.0,<5
     - xgboost>=1.7.3,<2
@@ -51,13 +51,14 @@ requirements:
     - catboost>=1.2.0, <2
     - lightgbm>=3.3.5,<5
     - mlflow>=2.1.0,<2.4
-    - pytorch>=2.0.1,<3
+    - pytorch>=2.0.1,<2.3.0
     - sentence-transformers>=2.2.2,<3
     - sentencepiece>=0.1.95,<1
     - shap==0.42.1
     - tensorflow>=2.10,<3
     - tokenizers>=0.10,<1
     - torchdata>=0.4,<1
     - transformers>=4.32.1,<5
+    - openjpeg !=2.4.0=*_1 # [win]
 source:
   path: ../../
diff --git a/ci/targets/quarantine/prod3.txt b/ci/targets/quarantine/prod3.txt
@@ -1,3 +1,5 @@
 //tests/integ/snowflake/ml/model:deployment_to_snowservice_integ_test
 //tests/integ/snowflake/ml/registry:model_registry_snowservice_integ_test
 //tests/integ/snowflake/ml/model:spcs_llm_model_integ_test
+//tests/integ/snowflake/ml/extra_tests:xgboost_external_memory_training_test
+//tests/integ/snowflake/ml/lineage:lineage_integ_test
diff --git a/codegen/BUILD.bazel b/codegen/BUILD.bazel
@@ -7,6 +7,7 @@ filegroup(
     srcs = [
         "init_template.py_template",
         "sklearn_wrapper_template.py_template",
+        "snowpark_pandas_autogen_test_template.py_template",
         "transformer_autogen_test_template.py_template",
     ],
 )

diff --git a/codegen/build_file_autogen.py b/codegen/build_file_autogen.py
@@ -5,6 +5,7 @@
 
 python3 snowflake/ml/experimental/amauser/transformer/build_file_autogen.py
 """
+
 import os
 from dataclasses import dataclass, field
 from typing import List
@@ -13,6 +14,7 @@
 from absl import app
 
 from codegen import sklearn_wrapper_autogen as swa
+from snowflake.ml.snowpark_pandas import imports
 
 
 @dataclass(frozen=True)
@@ -23,7 +25,10 @@ class ModuleInfo:
 
 
 MODULES = [
-    ModuleInfo("sklearn.linear_model", ["OrthogonalMatchingPursuitCV", "QuantileRegressor"]),
+    ModuleInfo(
+        "sklearn.linear_model",
+        ["OrthogonalMatchingPursuitCV", "QuantileRegressor"],
+    ),
     ModuleInfo(
         "sklearn.ensemble",
         [
@@ -170,6 +175,27 @@ def get_test_build_file_content(module: ModuleInfo, module_root_dir: str) -> str
     )
 
 
+def get_snowpark_pandas_test_build_file_content(module: imports.ModuleInfo, module_root_dir: str) -> str:
+    """Generates the content of BUILD.bazel file for snowpark_pandas test directory of the given module.
+
+    Args:
+        module: Module information.
+        module_root_dir: Relative directory path of the module source code.
+
+    Returns:
+        Returns content of the BUILD.bazel file for module test directory.
+    """
+    return (
+        'load("//codegen:codegen_rules.bzl", "autogen_snowpark_pandas_tests")\n'
+        f'load("//{module_root_dir}:estimators_info.bzl", "snowpark_pandas_estimator_info_list")\n'
+        'package(default_visibility = ["//snowflake/ml/snowpark_pandas"])\n'
+        "\nautogen_snowpark_pandas_tests(\n"
+        f'    module = "{module.module_name}",\n'
+        f'    module_root_dir = "{module_root_dir}",\n'
+        "    snowpark_pandas_estimator_info_list=snowpark_pandas_estimator_info_list\n)"
+    )
+
+
 def main(argv: List[str]) -> None:
     del argv  # Unused.
 
@@ -200,6 +226,19 @@ def main(argv: List[str]) -> None:
         os.makedirs("/".join(test_build_file_path.split("/")[:-1]), exist_ok=True)
         open(test_build_file_path, "w").write(test_build_file_content)
 
+    for module in imports.MODULES:
+        if len(module.exclude_list) > 0 and len(module.include_list) > 0:
+            raise ValueError(f"Both include_list and exclude_list can't be specified for module {module.module_name}!")
+
+        module_root_dir = swa.AutogenTool.module_root_dir(module.module_name)
+        test_build_file_path = os.path.join(TEST_OUTPUT_PATH, module_root_dir, "BUILD.bazel")
+
+        # Snowpandas test build file:
+        # Contains genrules and py_test rules for all the snowpark_pandas estimators.
+        test_build_file_content = get_snowpark_pandas_test_build_file_content(module, module_root_dir)
+        os.makedirs("/".join(test_build_file_path.split("/")[:-1]), exist_ok=True)
+        open(test_build_file_path, "w").write(test_build_file_content)
+
 
 def get_estimators_info_file_content(module: ModuleInfo) -> str:
     """Returns information of all the transformer and estimator classes in the given module.

diff --git a/codegen/codegen_rules.bzl b/codegen/codegen_rules.bzl
@@ -13,6 +13,9 @@ ESTIMATOR_TEMPLATE_BAZEL_PATH = "//codegen:sklearn_wrapper_template.py_template"
 ESTIMATOR_TEST_TEMPLATE_BAZEL_PATH = (
     "//codegen:transformer_autogen_test_template.py_template"
 )
+SNOWPARK_PANDAS_TEST_TEMPLATE_BAZEL_PATH = (
+    "//codegen:snowpark_pandas_autogen_test_template.py_template"
+)
 INIT_TEMPLATE_BAZEL_PATH = "//codegen:init_template.py_template"
 SRC_OUTPUT_PATH = ""
 TEST_OUTPUT_PATH = "tests/integ"
@@ -113,7 +116,7 @@ def autogen_tests_for_estimators(module, module_root_dir, estimator_info_list):
      List of generated build rules for every class in the estimator_info_list
         1. `genrule` with label `generate_test_<estimator-class-name-snakecase>` to auto-generate
             integration test for the estimator's wrapper class.
-        2. `py_test` rule with label `test_<estimator-class-name-snakecase>` to build the auto-generated
+        2. `py_test` rule with label `<estimator-class-name-snakecase>_test` to build the auto-generated
             test files from the  `generate_test_<estimator-class-name-snakecase>` rule.
     """
     cmd = get_genrule_cmd(
@@ -145,3 +148,42 @@ def autogen_tests_for_estimators(module, module_root_dir, estimator_info_list):
             shard_count = 5,
             tags = ["autogen"],
         )
+
+def autogen_snowpark_pandas_tests(module, module_root_dir, snowpark_pandas_estimator_info_list):
+    """Generates `genrules` and `py_test` rules for every snowpark pandas estimator
+     List of generated build rules for every class in the snowpark_pandas_estimator_info_list
+        1. `genrule` with label `generate_test_snowpark_pandas_<estimator-class-name-snakecase>` to auto-generate
+            integration test for the estimator.
+        2. `py_test` rule with label `estimator-class-name-snakecase>_snowpark_pandas_test` to build the auto-generated
+            test files from the `generate_test_snowpark_pandas_<estimator-class-name-snakecase>` rule.
+    """
+    cmd = get_genrule_cmd(
+        gen_mode = "SNOWPARK_PANDAS_TEST",
+        template_path = SNOWPARK_PANDAS_TEST_TEMPLATE_BAZEL_PATH,
+        module = module,
+        output_path = TEST_OUTPUT_PATH,
+    )
+
+    for e in snowpark_pandas_estimator_info_list:
+        py_genrule(
+            name = "generate_test_snowpark_pandas_{}".format(e.normalized_class_name),
+            outs = ["{}_snowpark_pandas_test.py".format(e.normalized_class_name)],
+            tools = [AUTO_GEN_TOOL_BAZEL_PATH],
+            srcs = [SNOWPARK_PANDAS_TEST_TEMPLATE_BAZEL_PATH],
+            cmd = cmd.format(e.class_name),
+            tags = ["autogen_build"],
+        )
+
+        py_test(
+            name = "{}_snowpark_pandas_test".format(e.normalized_class_name),
+            srcs = [":generate_test_snowpark_pandas_{}".format(e.normalized_class_name)],
+            deps = [
+                "//snowflake/ml/snowpark_pandas:snowpark_pandas_lib",
+                "//snowflake/ml/utils:connection_params",
+            ],
+            compatible_with_snowpark = False,
+            timeout = "long",
+            legacy_create_init = 0,
+            shard_count = 5,
+            tags = ["snowpark_pandas_autogen"],
+        )
diff --git a/codegen/estimator_autogen_tool.py b/codegen/estimator_autogen_tool.py
@@ -39,9 +39,10 @@
 flags.DEFINE_string(
     "gen_mode",
     None,
-    "Options: ['SRC', 'TEST']."
+    "Options: ['SRC', 'TEST', 'SNOWPARK_PANDAS_TEST']."
     + "SRC mode generates source code for snowflake wrapper for all the estimator objects in the given modules.\n"
-    + "TEST mode generates integration tests for all the auto generated python wrappers in the given module.\n",
+    + "TEST mode generates integration tests for all the auto generated python wrappers in the given module.\n"
+    + "SNOWPARK_PANDAS_TEST mode generates snowpark pandas integration tests in the given module.\n",
 )
 flags.DEFINE_string(
     "bazel_out_dir", None, "Takes bazel out directory as input to compute relative path to bazel-bin folder"

diff --git a/codegen/sklearn_wrapper_autogen.py b/codegen/sklearn_wrapper_autogen.py
@@ -18,14 +18,16 @@
 class GenMode(Enum):
     SRC = "SRC"
     TEST = "TEST"
+    SNOWPARK_PANDAS_TEST = "SNOWPARK_PANDAS_TEST"
 
 
 class AutogenTool:
     """Tool to auto-generate estimator wrappers and integration test for estimator wrappers.
 
     Args:
-        gen_mode: Possible values {GenMode.SRC, GenMode.TEST}. Tool generates source code for estimator
-            wrappers or integration tests for generated estimator wrappers based on the selected mode.
+        gen_mode: Possible values {GenMode.SRC, GenMode.TEST, GenMode.SNOWPARK_PANDAS_TEST}. Tool generates source code
+            for estimator wrappers or integration tests for generated estimator wrappers or snowpark_pandas based on the
+            selected mode.
         template_path: Path to file containing estimator wrapper or test template code.
         output_path : Path to the root of the destination folder to write auto-generated code.
         class_list: Allow list of estimator classes. If specified, wrappers or tests will be generated for only
@@ -138,7 +140,8 @@ def _generate_src_files(
     def _generate_test_files(
         self, module_name: str, generators: Iterable[swg.WrapperGeneratorBase], skip_code_gen: bool = False
     ) -> List[str]:
-        """Autogenerate integ tests for snowflake estimator wrappers for the given SKLearn or XGBoost module.
+        """Autogenerate integ tests for snowflake estimator wrappers or snowpark_pandas for the given SKLearn or XGBoost
+        module.
 
         Args:
             module_name: Module name to process.
@@ -153,7 +156,10 @@ def _generate_test_files(
 
         generated_files_list = []
         for generator in generators:
-            test_output_file_name = os.path.join(self.output_path, generator.estimator_test_file_name)
+            if self.gen_mode == GenMode.TEST:
+                test_output_file_name = os.path.join(self.output_path, generator.estimator_test_file_name)
+            else:
+                test_output_file_name = os.path.join(self.output_path, generator.snowpark_pandas_test_file_name)
             generated_files_list.append(test_output_file_name)
             if skip_code_gen:
                 continue