Skip to content

Commit

Permalink
Project import generated by Copybara. (#107)
Browse files Browse the repository at this point in the history
GitOrigin-RevId: a23c1817783b50e3eb626411cb222d74c60c578d

Co-authored-by: Snowflake Authors <noreply@snowflake.com>
  • Loading branch information
sfc-gh-anavalos and Snowflake Authors authored Jul 11, 2024
1 parent f0ff796 commit 3cbf8f1
Show file tree
Hide file tree
Showing 144 changed files with 5,190 additions and 793 deletions.
34 changes: 31 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,36 @@
# Release History

## 1.5.3
## 1.5.4

### Bug Fixes

- Model Registry (PrPr): Fix 401 Unauthorized issue when deploying model to SPCS.
- Feature Store: Downgrades exceptions to warnings for few property setters in feature view. Now you can set
desc, refresh_freq and warehouse for draft feature views.
- Modeling: Fix an issue with calling `OrdinalEncoder` with `categories` as a dictionary and a pandas DataFrame
- Modeling: Fix an issue with calling `OneHotEncoder` with `categories` as a dictionary and a pandas DataFrame

### New Features

- Registry: Allow overriding `device_map` and `device` when loading huggingface pipeline models.
- Registry: Add `set_alias` method to `ModelVersion` instance to set an alias to model version.
- Registry: Add `unset_alias` method to `ModelVersion` instance to unset an alias to model version.
- Registry: Add `partitioned_inference_api` allowing users to create partitioned inference functions in registered
models. Enable model inference methods with table functions with vectorized process methods in registered models.
- Feature Store: add 3 more columns: refresh_freq, refresh_mode and scheduling_state to the result of
`list_feature_views()`.
- Feature Store: `update_feature_view()` supports updating description.
- Feature Store: add new API `refresh_feature_view()`.
- Feature Store: add new API `get_refresh_history()`.
- Feature Store: Add `generate_training_set()` API for generating table-backed feature snapshots.
- Feature Store: Add `DeprecationWarning` for `generate_dataset(..., output_type="table")`.
- Feature Store: `update_feature_view()` supports updating description.
- Feature Store: add new API `refresh_feature_view()`.
- Feature Store: add new API `get_refresh_history()`.
- Model Development: OrdinalEncoder supports a list of array-likes for `categories` argument.
- Model Development: OneHotEncoder supports a list of array-likes for `categories` argument.

## 1.5.3 (06-17-2024)

### Bug Fixes

Expand All @@ -9,8 +39,6 @@
- Registry: Fix an issue that leads to incorrect result when using pandas Dataframe with over 100, 000 rows as the input
of `ModelVersion.run` method in Stored Procedure.

### Behavior Changes

### New Features

- Registry: Add support for TIMESTAMP_NTZ model signature data type, allowing timestamp input and output.
Expand Down
3 changes: 2 additions & 1 deletion bazel/environments/conda-env-snowflake.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ dependencies:
- sentencepiece==0.1.99
- shap==0.42.1
- snowflake-connector-python==3.10.0
- snowflake-snowpark-python==1.15.0
- snowflake-snowpark-python==1.17.0
- sphinx==5.0.2
- sqlparse==0.4.4
- tensorflow==2.12.0
Expand All @@ -63,4 +63,5 @@ dependencies:
- types-requests==2.30.0.0
- types-toml==0.10.8.6
- typing-extensions==4.5.0
- werkzeug==2.2.2
- xgboost==1.7.3
3 changes: 2 additions & 1 deletion bazel/environments/conda-env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ dependencies:
- sentencepiece==0.1.99
- shap==0.42.1
- snowflake-connector-python==3.10.0
- snowflake-snowpark-python==1.15.0
- snowflake-snowpark-python==1.17.0
- sphinx==5.0.2
- sqlparse==0.4.4
- tensorflow==2.12.0
Expand All @@ -68,6 +68,7 @@ dependencies:
- types-requests==2.30.0.0
- types-toml==0.10.8.6
- typing-extensions==4.5.0
- werkzeug==2.2.2
- xgboost==1.7.3
- pip
- pip:
Expand Down
3 changes: 2 additions & 1 deletion bazel/environments/conda-gpu-env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ dependencies:
- sentencepiece==0.1.99
- shap==0.42.1
- snowflake-connector-python==3.10.0
- snowflake-snowpark-python==1.15.0
- snowflake-snowpark-python==1.17.0
- sphinx==5.0.2
- sqlparse==0.4.4
- tensorflow==2.12.0
Expand All @@ -70,6 +70,7 @@ dependencies:
- types-requests==2.30.0.0
- types-toml==0.10.8.6
- typing-extensions==4.5.0
- werkzeug==2.2.2
- xgboost==1.7.3
- pip
- pip:
Expand Down
2 changes: 2 additions & 0 deletions bazel/requirements/templates/meta.tpl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ requirements:
- bazel >=6.0.0
run:
- python>=3.8,<3.12
run_constrained:
- openjpeg !=2.4.0=*_1 # [win]

about:
home: https://github.com/snowflakedb/snowflake-ml-python
Expand Down
7 changes: 4 additions & 3 deletions ci/conda_recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ build:
noarch: python
package:
name: snowflake-ml-python
version: 1.5.3
version: 1.5.4
requirements:
build:
- python
Expand All @@ -42,7 +42,7 @@ requirements:
- scikit-learn>=1.2.1,<1.4
- scipy>=1.9,<2
- snowflake-connector-python>=3.5.0,<4
- snowflake-snowpark-python>=1.15.0,<2
- snowflake-snowpark-python>=1.17.0,<2
- sqlparse>=0.4,<1
- typing-extensions>=4.1.0,<5
- xgboost>=1.7.3,<2
Expand All @@ -51,13 +51,14 @@ requirements:
- catboost>=1.2.0, <2
- lightgbm>=3.3.5,<5
- mlflow>=2.1.0,<2.4
- pytorch>=2.0.1,<3
- pytorch>=2.0.1,<2.3.0
- sentence-transformers>=2.2.2,<3
- sentencepiece>=0.1.95,<1
- shap==0.42.1
- tensorflow>=2.10,<3
- tokenizers>=0.10,<1
- torchdata>=0.4,<1
- transformers>=4.32.1,<5
- openjpeg !=2.4.0=*_1 # [win]
source:
path: ../../
2 changes: 2 additions & 0 deletions ci/targets/quarantine/prod3.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
//tests/integ/snowflake/ml/model:deployment_to_snowservice_integ_test
//tests/integ/snowflake/ml/registry:model_registry_snowservice_integ_test
//tests/integ/snowflake/ml/model:spcs_llm_model_integ_test
//tests/integ/snowflake/ml/extra_tests:xgboost_external_memory_training_test
//tests/integ/snowflake/ml/lineage:lineage_integ_test
1 change: 1 addition & 0 deletions codegen/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ filegroup(
srcs = [
"init_template.py_template",
"sklearn_wrapper_template.py_template",
"snowpark_pandas_autogen_test_template.py_template",
"transformer_autogen_test_template.py_template",
],
)
Expand Down
41 changes: 40 additions & 1 deletion codegen/build_file_autogen.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
python3 snowflake/ml/experimental/amauser/transformer/build_file_autogen.py
"""

import os
from dataclasses import dataclass, field
from typing import List
Expand All @@ -13,6 +14,7 @@
from absl import app

from codegen import sklearn_wrapper_autogen as swa
from snowflake.ml.snowpark_pandas import imports


@dataclass(frozen=True)
Expand All @@ -23,7 +25,10 @@ class ModuleInfo:


MODULES = [
ModuleInfo("sklearn.linear_model", ["OrthogonalMatchingPursuitCV", "QuantileRegressor"]),
ModuleInfo(
"sklearn.linear_model",
["OrthogonalMatchingPursuitCV", "QuantileRegressor"],
),
ModuleInfo(
"sklearn.ensemble",
[
Expand Down Expand Up @@ -170,6 +175,27 @@ def get_test_build_file_content(module: ModuleInfo, module_root_dir: str) -> str
)


def get_snowpark_pandas_test_build_file_content(module: imports.ModuleInfo, module_root_dir: str) -> str:
"""Generates the content of BUILD.bazel file for snowpark_pandas test directory of the given module.
Args:
module: Module information.
module_root_dir: Relative directory path of the module source code.
Returns:
Returns content of the BUILD.bazel file for module test directory.
"""
return (
'load("//codegen:codegen_rules.bzl", "autogen_snowpark_pandas_tests")\n'
f'load("//{module_root_dir}:estimators_info.bzl", "snowpark_pandas_estimator_info_list")\n'
'package(default_visibility = ["//snowflake/ml/snowpark_pandas"])\n'
"\nautogen_snowpark_pandas_tests(\n"
f' module = "{module.module_name}",\n'
f' module_root_dir = "{module_root_dir}",\n'
" snowpark_pandas_estimator_info_list=snowpark_pandas_estimator_info_list\n)"
)


def main(argv: List[str]) -> None:
del argv # Unused.

Expand Down Expand Up @@ -200,6 +226,19 @@ def main(argv: List[str]) -> None:
os.makedirs("/".join(test_build_file_path.split("/")[:-1]), exist_ok=True)
open(test_build_file_path, "w").write(test_build_file_content)

for module in imports.MODULES:
if len(module.exclude_list) > 0 and len(module.include_list) > 0:
raise ValueError(f"Both include_list and exclude_list can't be specified for module {module.module_name}!")

module_root_dir = swa.AutogenTool.module_root_dir(module.module_name)
test_build_file_path = os.path.join(TEST_OUTPUT_PATH, module_root_dir, "BUILD.bazel")

# Snowpandas test build file:
# Contains genrules and py_test rules for all the snowpark_pandas estimators.
test_build_file_content = get_snowpark_pandas_test_build_file_content(module, module_root_dir)
os.makedirs("/".join(test_build_file_path.split("/")[:-1]), exist_ok=True)
open(test_build_file_path, "w").write(test_build_file_content)


def get_estimators_info_file_content(module: ModuleInfo) -> str:
"""Returns information of all the transformer and estimator classes in the given module.
Expand Down
44 changes: 43 additions & 1 deletion codegen/codegen_rules.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ ESTIMATOR_TEMPLATE_BAZEL_PATH = "//codegen:sklearn_wrapper_template.py_template"
ESTIMATOR_TEST_TEMPLATE_BAZEL_PATH = (
"//codegen:transformer_autogen_test_template.py_template"
)
SNOWPARK_PANDAS_TEST_TEMPLATE_BAZEL_PATH = (
"//codegen:snowpark_pandas_autogen_test_template.py_template"
)
INIT_TEMPLATE_BAZEL_PATH = "//codegen:init_template.py_template"
SRC_OUTPUT_PATH = ""
TEST_OUTPUT_PATH = "tests/integ"
Expand Down Expand Up @@ -113,7 +116,7 @@ def autogen_tests_for_estimators(module, module_root_dir, estimator_info_list):
List of generated build rules for every class in the estimator_info_list
1. `genrule` with label `generate_test_<estimator-class-name-snakecase>` to auto-generate
integration test for the estimator's wrapper class.
2. `py_test` rule with label `test_<estimator-class-name-snakecase>` to build the auto-generated
2. `py_test` rule with label `<estimator-class-name-snakecase>_test` to build the auto-generated
test files from the `generate_test_<estimator-class-name-snakecase>` rule.
"""
cmd = get_genrule_cmd(
Expand Down Expand Up @@ -145,3 +148,42 @@ def autogen_tests_for_estimators(module, module_root_dir, estimator_info_list):
shard_count = 5,
tags = ["autogen"],
)

def autogen_snowpark_pandas_tests(module, module_root_dir, snowpark_pandas_estimator_info_list):
"""Generates `genrules` and `py_test` rules for every snowpark pandas estimator
List of generated build rules for every class in the snowpark_pandas_estimator_info_list
1. `genrule` with label `generate_test_snowpark_pandas_<estimator-class-name-snakecase>` to auto-generate
integration test for the estimator.
2. `py_test` rule with label `estimator-class-name-snakecase>_snowpark_pandas_test` to build the auto-generated
test files from the `generate_test_snowpark_pandas_<estimator-class-name-snakecase>` rule.
"""
cmd = get_genrule_cmd(
gen_mode = "SNOWPARK_PANDAS_TEST",
template_path = SNOWPARK_PANDAS_TEST_TEMPLATE_BAZEL_PATH,
module = module,
output_path = TEST_OUTPUT_PATH,
)

for e in snowpark_pandas_estimator_info_list:
py_genrule(
name = "generate_test_snowpark_pandas_{}".format(e.normalized_class_name),
outs = ["{}_snowpark_pandas_test.py".format(e.normalized_class_name)],
tools = [AUTO_GEN_TOOL_BAZEL_PATH],
srcs = [SNOWPARK_PANDAS_TEST_TEMPLATE_BAZEL_PATH],
cmd = cmd.format(e.class_name),
tags = ["autogen_build"],
)

py_test(
name = "{}_snowpark_pandas_test".format(e.normalized_class_name),
srcs = [":generate_test_snowpark_pandas_{}".format(e.normalized_class_name)],
deps = [
"//snowflake/ml/snowpark_pandas:snowpark_pandas_lib",
"//snowflake/ml/utils:connection_params",
],
compatible_with_snowpark = False,
timeout = "long",
legacy_create_init = 0,
shard_count = 5,
tags = ["snowpark_pandas_autogen"],
)
5 changes: 3 additions & 2 deletions codegen/estimator_autogen_tool.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,10 @@
flags.DEFINE_string(
"gen_mode",
None,
"Options: ['SRC', 'TEST']."
"Options: ['SRC', 'TEST', 'SNOWPARK_PANDAS_TEST']."
+ "SRC mode generates source code for snowflake wrapper for all the estimator objects in the given modules.\n"
+ "TEST mode generates integration tests for all the auto generated python wrappers in the given module.\n",
+ "TEST mode generates integration tests for all the auto generated python wrappers in the given module.\n"
+ "SNOWPARK_PANDAS_TEST mode generates snowpark pandas integration tests in the given module.\n",
)
flags.DEFINE_string(
"bazel_out_dir", None, "Takes bazel out directory as input to compute relative path to bazel-bin folder"
Expand Down
14 changes: 10 additions & 4 deletions codegen/sklearn_wrapper_autogen.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,16 @@
class GenMode(Enum):
SRC = "SRC"
TEST = "TEST"
SNOWPARK_PANDAS_TEST = "SNOWPARK_PANDAS_TEST"


class AutogenTool:
"""Tool to auto-generate estimator wrappers and integration test for estimator wrappers.
Args:
gen_mode: Possible values {GenMode.SRC, GenMode.TEST}. Tool generates source code for estimator
wrappers or integration tests for generated estimator wrappers based on the selected mode.
gen_mode: Possible values {GenMode.SRC, GenMode.TEST, GenMode.SNOWPARK_PANDAS_TEST}. Tool generates source code
for estimator wrappers or integration tests for generated estimator wrappers or snowpark_pandas based on the
selected mode.
template_path: Path to file containing estimator wrapper or test template code.
output_path : Path to the root of the destination folder to write auto-generated code.
class_list: Allow list of estimator classes. If specified, wrappers or tests will be generated for only
Expand Down Expand Up @@ -138,7 +140,8 @@ def _generate_src_files(
def _generate_test_files(
self, module_name: str, generators: Iterable[swg.WrapperGeneratorBase], skip_code_gen: bool = False
) -> List[str]:
"""Autogenerate integ tests for snowflake estimator wrappers for the given SKLearn or XGBoost module.
"""Autogenerate integ tests for snowflake estimator wrappers or snowpark_pandas for the given SKLearn or XGBoost
module.
Args:
module_name: Module name to process.
Expand All @@ -153,7 +156,10 @@ def _generate_test_files(

generated_files_list = []
for generator in generators:
test_output_file_name = os.path.join(self.output_path, generator.estimator_test_file_name)
if self.gen_mode == GenMode.TEST:
test_output_file_name = os.path.join(self.output_path, generator.estimator_test_file_name)
else:
test_output_file_name = os.path.join(self.output_path, generator.snowpark_pandas_test_file_name)
generated_files_list.append(test_output_file_name)
if skip_code_gen:
continue
Expand Down
Loading

0 comments on commit 3cbf8f1

Please sign in to comment.