Skip to content

Latest commit

 

History

History
436 lines (302 loc) · 15.2 KB

CONTRIBUTING.md

File metadata and controls

436 lines (302 loc) · 15.2 KB

Contributing Guide

Build system

We are using bazel as the build system.

Installation

Bazel

Install bazel, if not already done:

# This installs bazelisk in ~/go/bin/bazelisk
go install github.com/bazelbuild/bazelisk@latest

Add shortcut in your ~/.bashrc (or equivalent):

if [ -f ~/go/bin/bazelisk ]; then
  alias bazel=~/go/bin/bazelisk
fi

Buildifier

This tool helps auto-formatting BUILD.bazel file. Installation is similar:

go install github.com/bazelbuild/buildtools/buildifier@latest

Add shortcut in your ~/.bashrc (or equivalent):

if [ -f ~/go/bin/buildifier ]; then
  alias buildifier=~/go/bin/buildifier
fi

Note: You may need to configure your editor to run this on save.

Build

To build the package, run:

> bazel build //:wheel

bazel can be run from anywhere under the monorepo and it can accept absolute path or a relative path. For example,

snowml> bazel build :wheel

You can build an entire sub-tree as:

> bazel build //snowflake/...

Notes when you add new target in a BUILD.bazel file

  • Instead of using py_binary, py_library and py_test rule from bazel, use those from bazel/py_rules.bzl. Example, instead of

    py_library(
        name="my_lib",
        srcs=["my_lib.py"],
      )

    use the following instead

    load("//bazel:py_rules.bzl", "py_library")
    
    py_library(
        name="my_lib",
        srcs=["my_lib.py"],
      )
  • When using a genrule rule whose tool is a py_binary, use py_genrule from bazel/py_rules.bzl instead. Example, instead of

    py_binary(
        name="my_tool",
        srcs=["my_tool.py"],
      )
    
    genrule(
        name="generate_something",
        cmd="$(location :my_tool)",
        tools=[":my_tool"]
    )

    use the following instead

    load("//bazel:py_rules.bzl", "py_binary", "py_genrule")
    
    py_binary(
        name="my_tool",
        srcs=["my_tool.py"],
      )
    
    py_genrule(
        name="generate_something",
        cmd="$(location :my_tool)",
        tools=[":my_tool"]
    )
  • If the visibility of the target is not //visibility:public, you need to make sure your target is visible to //bazel:snowml_public_common to make sure CI type checking work.

Type-check

mypy

We use mypy to type-check our Python source files. mypy is integrated into our bazel environment.

The version of MyPy is specified in conda-env.yml, just like other conda packages we depend on.

Invoke MyPy locally

bazel build --config=typecheck <your target>

Or you could run

./ci/type_check/type_check.sh -b <path_to_bazel>

You only need to specify -b <path_to_bazel> if your bazel is not in $PATH or is an alias.

Test

Similar to bazel build, bazel test can test any target. The target must be a test target. It will run the target and report if PASSED or FAILED. It essentially builds the target and then run it. You can also build and run separately.

TIP: If a test fails, there will be a log file, which is executable. You do not need to open via less or editor. You can directly paste the path in command line.

Integration tests are configured to run against an existing Snowflake account. To run tests locally, make sure that you have configured a SnowSQL config file in <HOME_DIR>/.snowsql/config (see Snowflake documentation for configuration options).

For example, to run all autogenerated tests locally:

# Then run all autogenerated tests
bazel test //... --test_tag_filters=autogen

Coverage

A lcov coverage report can be generated by running

bazel coverage --combined_report=lcov <target pattern>

To get a human-readable report:

lcov --list $(bazel info output_path)/_coverage/_coverage_report.dat

To get an HTML report:

genhtml --output <output_dir> "$(bazel info output_path)/_coverage/_coverage_report.dat"

Both lcov and genhtml are part of the lcov project. To install it on MacOS:

brew install lcov

The unit test coverage report is generated periodically by a GitHub workflow. You can download the report in the artifacts generated by the action runs.

Run

Another useful command is, bazel run. This builds and then run the built target directly. Useful for binaries while debugging.

Other commands

bazel is pretty powerful and has lots of other commands. Read more here.

Python dependencies

To introduce a third-party Python dependency, first check if it is available as a package in the Snowflake conda channel. Then modify requirements.yml, and run the following to re-generate all requirements files, including conda-env.yml:

bazel run --config=pre_build //bazel/requirements:sync_requirements

Then, your code can use the package as if it were "installed" in the Python environment.

Adding a new dependencies

Please provide the following fields when adding a new record:

Package Name Fields

name: The name of the package. Set this if the package is available with the same name and is required in both PyPI and conda.

name_pypi: The name of the package in PyPI. Set this only to indicate that it is available in PyPI only. You can also set this along with name_conda if the package has different names in PyPI and conda.

name_conda: The name of the package in conda. Set this only to indicate that it is available in conda only. You can also set this along with name_pypi if the package has different names in PyPI and conda.

(At least one of these three fields should be set.)

Development Version Fields

dev_version: The version of the package to be pinned in the dev environment. Set this if the package is available with the same version and is required in both PyPI and conda.

dev_version_pypi: The version from PyPI to be pinned in the dev environment. Set this only to indicate that it is available in PyPI only. You can also set this along with dev_version_conda if the package has different versions in PyPI and conda.

dev_version_conda: The version from conda to be pinned in the dev environment. Set this only to indicate that it is available in conda only. You can also set this along with dev_version_pypi if the package has different versions in PyPI and conda.

(At least one of these three fields should be set.)

require_gpu: Set this to true if the package is only a requirement for the environment with GPUs.

Snowflake Anaconda Channel

from_channel: Set this if the package is not available in the Snowflake Anaconda Channel (https://repo.anaconda.com/pkgs/snowflake).

Version Requirements Fields (for snowflake-ml-python release)

version_requirements: The version requirements specifiers when this requirement is a dependency of the snowflake-ml-python release. Set this if the package is available with the same name and required in both PyPI and conda.

version_requirements_pypi: The version requirements specifiers when this requirement is a dependency of the snowflake-ml-python release via PyPI. Set this only to indicate that it is required by the PyPI release only. You can also set this along with version_requirements_conda if the package has different versions in PyPI and conda.

version_requirements_conda: The version requirements specifiers when this requirement is a dependency of the snowflake-ml-python release via conda. Set this only to indicate that it is required by the conda release only. You can also set this along with version_requirements_pypi if the package has different versions in PyPI and conda.

(At least one of these three fields must be set to indicate that this package is a dependency of the release. If you don't want to constrain the version, set the field to an empty string.)

Extras Tags and Tags

requirements_extra_tags: Set this to indicate that the package is an extras dependency of snowflake-ml-python. This requirement will be added to all extras tags specified here, and an all extras tag will be auto-generated to include all extras requirements. All extras requirements will be labeled as run_constrained in conda's meta.yaml.

tags: Set tags to filter some of the requirements in specific cases. The current valid tags include:

  • deployment_core: Used by model deployment to indicate dependencies required to execute model deployment code on the server-side.
  • build_essential: Used to indicate the packages composing the build environment.

Example:

- name: pandas
  name_pypi: pandas-pypi-name
  dev_version: 1.2.0
  dev_version_pypi: 1.2.0-pypi
  version_requirements: ">=1.0.0"
  version_requirements_pypi: ">=1.0.0"
  from_channel: "conda-forge"
  requirements_extra_tags:
    - pandas
  tags:
    - deployment_core
    - build_essential

Unit Testing

Write Python unittest style unit tests. Pytest is allowed, but not recommended.

unittest

Use absl.testing.absltest as a drop-in replacement of unittest.

For example:

# instead of
# import unittest
from absl.testing import absltest

# instead of
# from unittest import TestCase, main
from absl.testing.absltest import TestCase, main

# Call main.
if __name__ == '__main__':
  absltest.main()

absltest provides better bazel integration which produces a more detailed XML test report. The test report is picked up by a Github workflow to provide a nice UI for test results.

pytest

Make each unit test file its own runnable py_test target and use the main() function provided by snowflake.ml.test_utils.pytest_driver.

For example:

from snowflake.ml.utils import pytest_driver

def test_case():
    assert some_feature()

if __name__ == "__main__":
    pytest_driver.main()

pytest_driver contains bazel integration that allows pytest to produce a XML test report.

Important Notes

When you add a new test file, you should always ensure the existence of a if __name__ == "__main__": block, otherwise, the test file will not be instructed by bazel. We have a test wrapper here to ensure that the test will fail if you forget that part.

Integration test

Test in Store Procedure

To test if your code is working in store procedure or not simply, you could work based on CommonTestBase in tests/integ/snowflake/ml/test_utils/common_test_base.py. An example of such test could be found in tests/integ/snowflake/ml/_internal/file_utils_integ_test.py.

To write a such test, you need to

  1. Your test cannot have a parameter called _sproc_test_mode.
  2. Let your test case inherit from common_test_base.CommonTestBase.
  3. Remove all Snowpark Session creation in your test, and use self.session to access the session if needed.
  4. If you write your own setUp and tearDown method, remember to call super().setUp() or super().tearDown().
  5. Decorate your test method with common_test_base.CommonTestBase.sproc_test(). If you want your test running in store procedure only rather than both locally and in store procedure, set local=False. If you don't want to test with caller's rights, set test_callers_rights=False. (Owner's rights store procedure is always tested)

Compatibility Test

To test if your code is compatible with previous version simply, you could work based on CommonTestBase in tests/integ/snowflake/ml/test_utils/common_test_base.py. An example of such test could be found in tests/integ/snowflake/ml/registry/model_registry_compat_test.py.

To write a such test, you need to

  1. Your test cannot have a parameter called _snowml_pkg_ver.

  2. Let your test case inherit from common_test_base.CommonTestBase.

  3. Remove all Snowpark Session creation in your test, and use self.session to access the session if needed.

  4. If you write your own setUp and tearDown method, remember to call super().setUp() or super().tearDown().

  5. Write a factory method in your test class that return a tuple of a function and its parameters as a tuple. The function will be run as a store procedure in the environment with previous version of library.

    Note: Since the function will be created as a store procedure, the first argument must be a Snowpark Session. The arguments tuple you provided via the factory method does not require to include the session object.

    Note: To avoid any objects from current environment affecting the result, instead of using cloudpickle to pickle the function, the function will be created as a Python file and registered as a store procedure. This means you cannot use any object outside of the function, and if you want to import anything, you need to import inside the function definition. So it would help if you make your prepare function as simple as possible.

  6. Decorate your test method with common_test_base.CommonTestBase.compatibility_test, providing the factory method you created in the above step, optional version range to test with, as well as additional package requirements.

pre-commit

Pull requests against the main branch are subject to pre-commit checks. Those checks enforce the code style.

You can make sure the checks can pass by installing the pre-commit hooks to your local repo (instructions). Those hooks will be invoked when you commit locally, and they fix the style violations in-place. The minimal pre-commit version required is 3.4.0.

Tip: if you want to isolate those fixes, avoid the -a the option in git commit. This way the automated changes will be unstaged changes.

Darglint

The darglint pre-commit hook lints docstrings to make sure they conform to the Google style guide for docstrings. Function docstrings must contain "Args" section with input value descriptions, "Returns" section describing output, and "Raises" section enumerating the exceptions that the function can raise. Darglint will ensure that all input args are present in the docstring and is sensitive to whitespace (e.g. args should be indented the correct number of spaces). Refer to the list of darglint error codes for guidance.

Editors

VSCode

Here are few good plugins to use:

  1. Python
  2. Pylance static checking
  3. Bazel
    • You need to configure buildifier in settings for auto-formatting BUILD.bazel files
  4. Black Python Formatter
  5. Flake8 Linter