diff --git a/.github/config.yml b/.github/config.yml deleted file mode 100644 index 14cfdf6e8ae..00000000000 --- a/.github/config.yml +++ /dev/null @@ -1,33 +0,0 @@ -# Configuration for new-issue-welcome - https://github.com/behaviorbot/new-issue-welcome - -# Comment to be posted to on first time issues -newIssueWelcomeComment: > - 👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it. - -# Configuration for new-pr-welcome - https://github.com/behaviorbot/new-pr-welcome - -# Comment to be posted to on PRs from first time contributors in your repository -newPRWelcomeComment: > - 💖 Thanks for opening your first pull request! 💖 - We use [semantic commit messages](https://www.conventionalcommits.org/en/v1.0.0-beta.2/) to streamline the release process. - Before your pull request can be merged, you should **make sure your first commit and PR title** start with a semantic prefix. - This helps us to create release messages and credit you for your hard work! - - Examples of commit messages with semantic prefixes: - - - `fix: Fix LightGBM crashes with empty partitions` - - `feat: Make HTTP on Spark back-offs configurable` - - `docs: Update Spark Serving usage` - - `build: Add codecov support` - - `perf: improve LightGBM memory usage` - - `refactor: make python code generation rely on classes` - - `style: Remove nulls from CNTKModel` - - `test: Add test coverage for CNTKModel` - - Make sure to check out the [developer guide](https://github.com/Microsoft/SynapseML/blob/master/CONTRIBUTING.md) for guidance on testing your change. - -# Configuration for first-pr-merge - https://github.com/behaviorbot/first-pr-merge - -# Comment to be posted to on pull requests merged by a first time user -firstPRMergeComment: > - Congrats on merging your first pull request, we appreciate your support! 🎉🎉🎉 diff --git a/.github/workflows/acknowledge-new-prs.yml b/.github/workflows/acknowledge-new-prs.yml index f4ecbcbe0b6..01a94daddf1 100644 --- a/.github/workflows/acknowledge-new-prs.yml +++ b/.github/workflows/acknowledge-new-prs.yml @@ -14,4 +14,21 @@ jobs: Hey @${{ github.event.pull_request.user.login }} :wave:! Thank you so much for contributing to our repository :raised_hands:. Someone from SynapseML Team will be reviewing this pull request soon. - We appreciate your patience and contributions :100:! + + We use [semantic commit messages](https://www.conventionalcommits.org/en/v1.0.0-beta.2/) to streamline the release process. + Before your pull request can be merged, you should **make sure your first commit and PR title** start with a semantic prefix. + This helps us to create release messages and credit you for your hard work! + + Examples of commit messages with semantic prefixes: + + - `fix: Fix LightGBM crashes with empty partitions` + - `feat: Make HTTP on Spark back-offs configurable` + - `docs: Update Spark Serving usage` + - `build: Add codecov support` + - `perf: improve LightGBM memory usage` + - `refactor: make python code generation rely on classes` + - `style: Remove nulls from CNTKModel` + - `test: Add test coverage for CNTKModel` + + To test your commit locally, please follow our guild on [building from source](https://microsoft.github.io/SynapseML/docs/reference/developer-readme/). + Check out the [developer guide](https://github.com/Microsoft/SynapseML/blob/master/CONTRIBUTING.md) for additional guidance on testing your change. diff --git a/.github/workflows/label-add-triage.yml b/.github/workflows/add-triage-label.yml similarity index 90% rename from .github/workflows/label-add-triage.yml rename to .github/workflows/add-triage-label.yml index 05921e6d2b1..011722d66f2 100644 --- a/.github/workflows/label-add-triage.yml +++ b/.github/workflows/add-triage-label.yml @@ -1,4 +1,4 @@ -name: Label issues +name: Add Triage Label to New Issues on: issues: types: diff --git a/.github/workflows/semantic_pr.yaml b/.github/workflows/check-semantic-prs.yaml similarity index 100% rename from .github/workflows/semantic_pr.yaml rename to .github/workflows/check-semantic-prs.yaml diff --git a/.github/workflows/ci-publish-artifacts.yml b/.github/workflows/ci-publish-artifacts.yml deleted file mode 100644 index e1b355150c2..00000000000 --- a/.github/workflows/ci-publish-artifacts.yml +++ /dev/null @@ -1,69 +0,0 @@ -name: CI/Publish/Artifacts - -on: - pull_request: - branches: [ master ] - paths-ignore: - - 'docs/*' - - CODEOWNERS - - .gitignore - - README.md - - CONTRIBUTING.md - - '.github/**' - workflow_dispatch: - -jobs: - Publish: - runs-on: ubuntu-18.04 - timeout-minutes: 0 - environment: ci - steps: - - uses: actions/checkout@master - with: - fetch-depth: 0 - - uses: Azure/login@v1 - with: - creds: ${{ secrets.AZURE_CREDENTIALS }} - - uses: Azure/get-keyvault-secrets@v1 - with: - keyvault: "mmlspark-keys" - secrets: 'storage-key,nexus-un,nexus-pw,pgp-private,pgp-public,pgp-pw' # comma separated list of secret keys that need to be fetched from the Key Vault - id: GetKeyVaultSecrets - - name: Setup Python - uses: actions/setup-python@v4.2.0 - with: - python-version: 3.8.8 - - name: Setup Miniconda - uses: conda-incubator/setup-miniconda@v2.1.1 - with: - python-version: 3.8.8 - environment-file: environment.yml - activate-environment: synapseml - - name: Publish Artifact - shell: bash -l {0} - run: | - set -e - sbt packagePython - sbt publishBlob publishDocs publishR publishPython - sbt publishSigned - sbt genBuildInfo - env: - STORAGE-KEY: ${{ steps.GetKeyVaultSecrets.outputs.storage-key }} - NEXUS-UN: ${{ steps.GetKeyVaultSecrets.outputs.nexus-un }} - NEXUS-PW: ${{ steps.GetKeyVaultSecrets.outputs.nexus-pw }} - PGP-PRIVATE: ${{ steps.GetKeyVaultSecrets.outputs.pgp-private }} - PGP-PUBLIC: ${{ steps.GetKeyVaultSecrets.outputs.pgp-public }} - PGP-PW: ${{ steps.GetKeyVaultSecrets.outputs.pgp-pw }} - - name: Publish Badges - if: success() - shell: bash -l {0} - run: | - set -e - sbt publishBadges - env: - STORAGE-KEY: ${{ steps.GetKeyVaultSecrets.outputs.storage-key }} - NEXUS-UN: ${{ steps.GetKeyVaultSecrets.outputs.nexus-un }} - NEXUS-PW: ${{ steps.GetKeyVaultSecrets.outputs.nexus-pw }} - PGP-PRIVATE: ${{ steps.GetKeyVaultSecrets.outputs.pgp-private }} - PGP-PUBLIC: ${{ steps.GetKeyVaultSecrets.outputs.pgp-public }} - PGP-PW: ${{ steps.GetKeyVaultSecrets.outputs.pgp-pw }} diff --git a/.github/workflows/ci-publish-docker.yml b/.github/workflows/ci-publish-docker.yml deleted file mode 100644 index c0d849ce8f7..00000000000 --- a/.github/workflows/ci-publish-docker.yml +++ /dev/null @@ -1,58 +0,0 @@ -name: CI/Publish/Docker - -on: - pull_request: - branches: [ master ] - paths-ignore: - - 'docs/*' - - CODEOWNERS - - .gitignore - - README.md - - CONTRIBUTING.md - - '.github/**' - workflow_dispatch: - -jobs: - PublishDocker: - runs-on: ubuntu-18.04 - timeout-minutes: 0 - steps: - - uses: actions/checkout@master - with: - fetch-depth: 0 - - uses: Azure/login@v1 - with: - creds: ${{ secrets.AZURE_CREDENTIALS }} - - name: Azure Container Registry Login - shell: bash -l {0} - run: az acr login --name mmlsparkmcr - - name: Get Docker Tag and Version - id: getDockerTagAndVersion - shell: bash -l {0} - run: | - echo '::set-output name=version::'$(sbt "core/version" | tail -2 | cut -d' ' -f2 | sed 's/\x1b\[[0-9;]*m//g' | head -n 1) - echo '::set-output name=gittag::'$(git tag -l --points-at HEAD) - - name: Demo Image Build - shell: bash -l {0} - run: docker build -f tools/docker/demo/Dockerfile --build-arg SYNAPSEML_VERSION=${{ steps.getDockerTagAndVersion.outputs.version }} -t mmlsparkmcr.azurecr.io/public/mmlspark/build-demo:${{ steps.getDockerTagAndVersion.outputs.version }} . - - name: Demo Image Push - shell: bash -l {0} - run: docker push mmlsparkmcr.azurecr.io/public/mmlspark/build-demo:${{ steps.getDockerTagAndVersion.outputs.version }} - - name: Minimal Image Build - shell: bash -l {0} - run: docker build -f tools/docker/minimal/Dockerfile --build-arg SYNAPSEML_VERSION=${{ steps.getDockerTagAndVersion.outputs.version }} -t mmlsparkmcr.azurecr.io/public/mmlspark/build-minimal:${{ steps.getDockerTagAndVersion.outputs.version }} . - - name: Minimal Image Push - shell: bash -l {0} - run: docker push mmlsparkmcr.azurecr.io/public/mmlspark/build-minimal:${{ steps.getDockerTagAndVersion.outputs.version }} - - name: Release Image Build - if: ${{ startsWith(steps.getDockerTagAndVersion.outputs.gittag, 'v') }} - shell: bash -l {0} - run: docker build -f tools/docker/demo/Dockerfile --build-arg SYNAPSEML_VERSION=${{ steps.getDockerTagAndVersion.outputs.version }} -t mmlsparkmcr.azurecr.io/public/mmlspark/release:${{ steps.getDockerTagAndVersion.outputs.version }} -t mmlsparkmcr.azurecr.io/public/mmlspark/release:latest . - - name: Release Image Push - build version - if: ${{ startsWith(steps.getDockerTagAndVersion.outputs.gittag, 'v') }} - shell: bash -l {0} - run: docker push mmlsparkmcr.azurecr.io/public/mmlspark/release:${{ steps.getDockerTagAndVersion.outputs.version }} - - name: Release Image Push - latest - if: ${{ startsWith(steps.getDockerTagAndVersion.outputs.gittag, 'v') }} - shell: bash -l {0} - run: docker push mmlsparkmcr.azurecr.io/public/mmlspark/release:latest diff --git a/.github/workflows/ci-publish-websiteautodeployment.yml b/.github/workflows/ci-publish-websiteautodeployment.yml deleted file mode 100644 index 8249839de49..00000000000 --- a/.github/workflows/ci-publish-websiteautodeployment.yml +++ /dev/null @@ -1,76 +0,0 @@ -name: CI/Publish/Website - -on: - pull_request: - branches: [ master ] - paths-ignore: - - 'docs/*' - - CODEOWNERS - - .gitignore - - README.md - - CONTRIBUTING.md - - '.github/**' - workflow_dispatch: - -jobs: - Test: - runs-on: ubuntu-18.04 - timeout-minutes: 0 - steps: - - uses: actions/checkout@master - with: - fetch-depth: 0 - - name: Azure Login - uses: Azure/login@v1 - with: - creds: ${{ secrets.AZURE_CREDENTIALS }} - - uses: Azure/get-keyvault-secrets@v1 - with: - keyvault: "mmlspark-keys" - secrets: 'gh-name,gh-email,gh-token' # comma separated list of secret keys that need to be fetched from the Key Vault - id: GetKeyVaultSecrets - - name: 'Install Node.js' - uses: actions/setup-node@v3 - with: - node-version: '16' - - name: Setup Python - uses: actions/setup-python@v4.2.0 - with: - python-version: 3.8.8 - - name: Set up JDK 11 - uses: actions/setup-java@v3 - with: - java-version: '11' - distribution: 'temurin' - - name: Setup Miniconda - uses: conda-incubator/setup-miniconda@v2.1.1 - with: - python-version: 3.8.8 - environment-file: environment.yml - activate-environment: synapseml - - name: Convert notebooks to markdowns - shell: bash -l {0} - run: sbt convertNotebooks - - name: yarn install and build - shell: bash -l {0} - run: | - set -e - yarn install - cd website - yarn - yarn build - - name: yarn deploy - if: success() - shell: bash -l {0} - run: | - set -e - git config --global user.name "${GH_NAME}" - git config --global user.email "${GH_EMAIL}" - git checkout -b main - echo "machine github.com login ${GH_NAME} password ${GH_TOKEN}" > ~/.netrc - cd website - GIT_USER="${GH_NAME}" yarn deploy - env: - GH_NAME: ${{ steps.GetKeyVaultSecrets.outputs.gh-name }} - GH_EMAIL: ${{ steps.GetKeyVaultSecrets.outputs.gh-email }} - GH_TOKEN: ${{ steps.GetKeyVaultSecrets.outputs.gh-token }} diff --git a/.github/workflows/ci-release-sonatype.yml b/.github/workflows/ci-release-sonatype.yml deleted file mode 100644 index 76c9c1e1f50..00000000000 --- a/.github/workflows/ci-release-sonatype.yml +++ /dev/null @@ -1,98 +0,0 @@ -name: CI/Release/Sonatype - -on: - pull_request: - branches: [ master ] - paths-ignore: - - 'docs/*' - - CODEOWNERS - - .gitignore - - README.md - - CONTRIBUTING.md - - '.github/**' - workflow_dispatch: - -jobs: - Release: - runs-on: ubuntu-18.04 - timeout-minutes: 0 - steps: - - uses: actions/checkout@master - with: - fetch-depth: 0 - - name: Get Git Tag - id: getGitTag - shell: bash -l {0} - run: | - echo '::set-output name=gittag::'$(git tag -l --points-at HEAD) - - name: Generage CHANGELOG - shell: bash -l {0} - if: startsWith(steps.getGitTag.outputs.gittag, 'v') - run: | - set -e - wget https://github.com/git-chglog/git-chglog/releases/download/0.8.0/git-chglog_linux_amd64 - chmod +x git-chglog_linux_amd64 - ./git-chglog_linux_amd64 -o CHANGELOG.md $TAG - - name: Create GitHub Release - if: startsWith(steps.getGitTag.outputs.gittag, 'v') - id: create_release - uses: actions/create-release@v1 - env: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - with: - tag_name: ${{ steps.getGitTag.outputs.gittag }} - body_path: CHANGELOG.md - draft: true - release_name: SynapseML ${{ steps.getGitTag.outputs.gittag }} - - name: Azure Login - if: startsWith(steps.getGitTag.outputs.gittag, 'v') - uses: Azure/login@v1 - with: - creds: ${{ secrets.AZURE_CREDENTIALS }} - - name: Get Secrets from KeyVault - if: startsWith(steps.getGitTag.outputs.gittag, 'v') - uses: Azure/get-keyvault-secrets@v1 - with: - keyvault: "mmlspark-keys" - secrets: 'storage-key,nexus-un,nexus-pw,pgp-private,pgp-public,pgp-pw,pypi-api-token' # comma separated list of secret keys that need to be fetched from the Key Vault - id: GetKeyVaultSecrets - - name: Setup Python - if: startsWith(steps.getGitTag.outputs.gittag, 'v') - uses: actions/setup-python@v4.2.0 - with: - python-version: 3.8.8 - - name: Setup Miniconda - if: startsWith(steps.getGitTag.outputs.gittag, 'v') - uses: conda-incubator/setup-miniconda@v2.1.1 - with: - python-version: 3.8.8 - environment-file: environment.yml - activate-environment: synapseml - - name: publish python package to pypi - if: startsWith(steps.getGitTag.outputs.gittag, 'v') - shell: bash -l {0} - run: | - set -e - sbt publishPypi - env: - STORAGE-KEY: ${{ steps.GetKeyVaultSecrets.outputs.storage-key }} - NEXUS-UN: ${{ steps.GetKeyVaultSecrets.outputs.nexus-un }} - NEXUS-PW: ${{ steps.GetKeyVaultSecrets.outputs.nexus-pw }} - PGP-PRIVATE: ${{ steps.GetKeyVaultSecrets.outputs.pgp-private }} - PGP-PUBLIC: ${{ steps.GetKeyVaultSecrets.outputs.pgp-public }} - PGP-PW: ${{ steps.GetKeyVaultSecrets.outputs.pgp-pw }} - PYPI-API-TOKEN: ${{ steps.GetKeyVaultSecrets.outputs.pypi-api-token }} - - name: publish jar package to maven central - if: startsWith(steps.getGitTag.outputs.gittag, 'v') - shell: bash -l {0} - run: | - set -e - sbt publishSigned - sbt sonatypeBundleRelease - env: - STORAGE-KEY: ${{ steps.GetKeyVaultSecrets.outputs.storage-key }} - NEXUS-UN: ${{ steps.GetKeyVaultSecrets.outputs.nexus-un }} - NEXUS-PW: ${{ steps.GetKeyVaultSecrets.outputs.nexus-pw }} - PGP-PRIVATE: ${{ steps.GetKeyVaultSecrets.outputs.pgp-private }} - PGP-PUBLIC: ${{ steps.GetKeyVaultSecrets.outputs.pgp-public }} - PGP-PW: ${{ steps.GetKeyVaultSecrets.outputs.pgp-pw }} diff --git a/.github/workflows/ci-scalastyle.yml b/.github/workflows/ci-scalastyle.yml deleted file mode 100644 index 16400c6495b..00000000000 --- a/.github/workflows/ci-scalastyle.yml +++ /dev/null @@ -1,29 +0,0 @@ -name: CI/Scala Style - -on: - pull_request: - branches: [ master ] - paths-ignore: - - 'docs/*' - - CODEOWNERS - - .gitignore - - README.md - - CONTRIBUTING.md - - '.github/**' - workflow_dispatch: - -jobs: - ScalaStyle: - runs-on: ubuntu-18.04 - timeout-minutes: 0 - steps: - - uses: actions/checkout@master - with: - fetch-depth: 0 - - name: Set up JDK 11 - uses: actions/setup-java@v3 - with: - java-version: '11' - distribution: 'temurin' - - name: Run scalastyle - run: sbt scalastyle test:scalastyle diff --git a/.github/workflows/ci-test-synapsee2e.yml b/.github/workflows/ci-test-synapsee2e.yml deleted file mode 100644 index a7ed194af9e..00000000000 --- a/.github/workflows/ci-test-synapsee2e.yml +++ /dev/null @@ -1,65 +0,0 @@ -name: CI/Tests/Synapse E2E - -on: - pull_request: - branches: [ master ] - paths-ignore: - - 'docs/*' - - CODEOWNERS - - .gitignore - - README.md - - CONTRIBUTING.md - - '.github/**' - workflow_dispatch: - -jobs: - SynapseE2E: - runs-on: ubuntu-18.04 - timeout-minutes: 0 - steps: - - uses: actions/checkout@master - with: - fetch-depth: 0 - - uses: Azure/login@v1 - with: - creds: ${{ secrets.AZURE_CREDENTIALS }} - - uses: Azure/get-keyvault-secrets@v1 - with: - keyvault: "mmlspark-keys" - secrets: 'storage-key,nexus-un,nexus-pw,pgp-private,pgp-public,pgp-pw' # comma separated list of secret keys that need to be fetched from the Key Vault - id: GetKeyVaultSecrets - - name: Setup Python - uses: actions/setup-python@v4.2.0 - with: - python-version: 3.8.8 - - name: Setup Miniconda - uses: conda-incubator/setup-miniconda@v2.1.1 - with: - python-version: 3.8.8 - environment-file: environment.yml - activate-environment: synapseml - - name: Publish Blob Artifacts - shell: bash -l {0} - run: | - set -e - jupyter nbconvert --to script ./notebooks/features/*/*.ipynb* - sbt packagePython - sbt publishBlob - env: - STORAGE-KEY: ${{ steps.GetKeyVaultSecrets.outputs.storage-key }} - NEXUS-UN: ${{ steps.GetKeyVaultSecrets.outputs.nexus-un }} - NEXUS-PW: ${{ steps.GetKeyVaultSecrets.outputs.nexus-pw }} - PGP-PRIVATE: ${{ steps.GetKeyVaultSecrets.outputs.pgp-private }} - PGP-PUBLIC: ${{ steps.GetKeyVaultSecrets.outputs.pgp-public }} - PGP-PW: ${{ steps.GetKeyVaultSecrets.outputs.pgp-pw }} - - name: Run E2E Tests - if: success() - shell: bash -l {0} - run: 'sbt "testOnly com.microsoft.azure.synapse.ml.nbtest.SynapseTests"' - - name: Publish Test Results - uses: EnricoMi/publish-unit-test-result-action@v2 - if: always() - with: - files: '**/test-reports/TEST-*.xml' - check_name: "SynapseE2E Test Results" - comment_title: "SynapseE2E Test Results" diff --git a/.github/workflows/ci-tests-databrickse2e.yml b/.github/workflows/ci-tests-databrickse2e.yml deleted file mode 100644 index 55c9c129921..00000000000 --- a/.github/workflows/ci-tests-databrickse2e.yml +++ /dev/null @@ -1,66 +0,0 @@ -name: CI/Tests/Databricks E2E - -on: - pull_request: - branches: [ master ] - paths-ignore: - - 'docs/*' - - CODEOWNERS - - .gitignore - - README.md - - CONTRIBUTING.md - - '.github/**' - workflow_dispatch: - -jobs: - DatabricksE2E: - runs-on: ubuntu-18.04 - timeout-minutes: 0 - steps: - - uses: actions/checkout@master - with: - fetch-depth: 0 - - name: Azure Login - uses: Azure/login@v1 - with: - creds: ${{ secrets.AZURE_CREDENTIALS }} - - name: Get Secrets from Key Vault - uses: Azure/get-keyvault-secrets@v1 - with: - keyvault: "mmlspark-keys" - secrets: 'storage-key,nexus-un,nexus-pw,pgp-private,pgp-public,pgp-pw' # comma separated list of secret keys that need to be fetched from the Key Vault - id: GetKeyVaultSecrets - - name: Setup Python - uses: actions/setup-python@v4.2.0 - with: - python-version: 3.8.8 - - name: Setup Miniconda - uses: conda-incubator/setup-miniconda@v2.1.1 - with: - python-version: 3.8.8 - environment-file: environment.yml - activate-environment: synapseml - - name: Publish Blob Artifacts - shell: bash -l {0} - run: | - set -e - sbt packagePython - sbt publishBlob - env: - STORAGE-KEY: ${{ steps.GetKeyVaultSecrets.outputs.storage-key }} - NEXUS-UN: ${{ steps.GetKeyVaultSecrets.outputs.nexus-un }} - NEXUS-PW: ${{ steps.GetKeyVaultSecrets.outputs.nexus-pw }} - PGP-PRIVATE: ${{ steps.GetKeyVaultSecrets.outputs.pgp-private }} - PGP-PUBLIC: ${{ steps.GetKeyVaultSecrets.outputs.pgp-public }} - PGP-PW: ${{ steps.GetKeyVaultSecrets.outputs.pgp-pw }} - - name: Run E2E Tests - if: success() - shell: bash -l {0} - run: 'sbt "testOnly com.microsoft.azure.synapse.ml.nbtest.DatabricksTests"' - - name: Publish Test Results - uses: EnricoMi/publish-unit-test-result-action@v2 - if: always() - with: - files: '**/test-reports/TEST-*.xml' - check_name: "DatabricksE2E Test Results" - comment_title: "DatabricksE2E Test Results" diff --git a/.github/workflows/ci-tests-python.yml b/.github/workflows/ci-tests-python.yml deleted file mode 100644 index 0d6007081f5..00000000000 --- a/.github/workflows/ci-tests-python.yml +++ /dev/null @@ -1,80 +0,0 @@ -name: CI/Tests/Python - -on: - pull_request: - branches: [ master ] - paths-ignore: - - 'docs/*' - - CODEOWNERS - - .gitignore - - README.md - - CONTRIBUTING.md - - '.github/**' - workflow_dispatch: - -jobs: - Test: - runs-on: ubuntu-18.04 - timeout-minutes: 0 - strategy: - fail-fast: false - matrix: - project: ["core", "deepLearning", "lightgbm", "opencv", "vw", "cognitive"] - steps: - - uses: actions/checkout@master - with: - fetch-depth: 0 - - name: Azure Login - uses: Azure/login@v1 - with: - creds: ${{ secrets.AZURE_CREDENTIALS }} - - name: Setup Python - uses: actions/setup-python@v4.2.0 - with: - python-version: 3.8.8 - - name: Set up JDK 11 - uses: actions/setup-java@v3 - with: - java-version: '11' - distribution: 'temurin' - - name: Setup Miniconda - uses: conda-incubator/setup-miniconda@v2.1.1 - with: - python-version: 3.8.8 - environment-file: environment.yml - activate-environment: synapseml - - name: Install Pip Package - shell: bash -l {0} - run: | - (timeout 5m sbt setup) || (echo "retrying" && timeout 5m sbt setup) || (echo "retrying" && timeout 5m sbt setup) - sbt installPipPackage - sbt publishM2 - - name: Test Python Code - shell: bash -l {0} - run: (sbt "project ${{ matrix.project }}" coverage testPython) || (sbt "project ${{ matrix.project }}" coverage testPython) || (sbt "project ${{ matrix.project }}" coverage testPython) - - name: Publish Test Results - uses: EnricoMi/publish-unit-test-result-action@v2 - if: always() - with: - files: '**/python-test-*.xml' - check_name: "${{ matrix.project }} Unit Test Results" - comment_title: "${{ matrix.project }} Unit Test Results" - - name: Generate Codecov report - if: always() - shell: bash -l {0} - run: sbt coverageReport - - name: Get Codecov Secret from Key Vault - uses: Azure/get-keyvault-secrets@v1 - with: - keyvault: "mmlspark-keys" - secrets: 'codecov-token' - id: GetKeyVaultSecrets - - name: Upload Coverage Report To Codecov.io - if: always() - shell: bash -l {0} - run: | - set -e - curl -s https://codecov.io/bash > .codecov - chmod +x .codecov - echo "Starting Codecov Upload" - ./.codecov -t ${{ steps.GetKeyVaultSecrets.outputs.codecov-token }} diff --git a/.github/workflows/ci-tests-r.yml b/.github/workflows/ci-tests-r.yml deleted file mode 100644 index 16da7b4c9a0..00000000000 --- a/.github/workflows/ci-tests-r.yml +++ /dev/null @@ -1,75 +0,0 @@ -name: CI/Tests/R - -on: - pull_request: - branches: [ master ] - paths-ignore: - - 'docs/*' - - CODEOWNERS - - .gitignore - - README.md - - CONTRIBUTING.md - - '.github/**' - workflow_dispatch: - -jobs: - Test: - runs-on: ubuntu-18.04 - timeout-minutes: 0 - steps: - - uses: actions/checkout@master - with: - fetch-depth: 0 - - name: Azure Login - uses: Azure/login@v1 - with: - creds: ${{ secrets.AZURE_CREDENTIALS }} - - name: Setup Python - uses: actions/setup-python@v4.2.0 - with: - python-version: 3.8.8 - - name: Set up JDK 11 - uses: actions/setup-java@v3 - with: - java-version: '11' - distribution: 'temurin' - - name: Setup Miniconda - uses: conda-incubator/setup-miniconda@v2.1.1 - with: - python-version: 3.8.8 - environment-file: environment.yml - activate-environment: synapseml - - name: Download Spark - shell: bash -l {0} - run: curl https://archive.apache.org/dist/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz -o spark-3.2.0-bin-hadoop3.2.tgz - - name: Test R code - shell: bash -l {0} - run: | - (timeout 5m sbt setup) || (echo "retrying" && timeout 5m sbt setup) || (echo "retrying" && timeout 5m sbt setup) - (sbt coverage testR) || (echo "retrying" && sbt coverage testR) || (echo "retrying" && sbt coverage testR) - - name: Publish Test Results - uses: EnricoMi/publish-unit-test-result-action@v2 - if: always() - with: - files: '**/r-test-*.xml' - check_name: "R Test Results" - comment_title: "R Test Results" - - name: Generate Codecov report - if: always() - shell: bash -l {0} - run: sbt coverageReport - - name: Get Codecov Secret from Key Vault - uses: Azure/get-keyvault-secrets@v1 - with: - keyvault: "mmlspark-keys" - secrets: 'codecov-token' - id: GetKeyVaultSecrets - - name: Upload Coverage Report To Codecov.io - if: always() - shell: bash -l {0} - run: | - set -e - curl -s https://codecov.io/bash > .codecov - chmod +x .codecov - echo "Starting Codecov Upload" - ./.codecov -t ${{ steps.GetKeyVaultSecrets.outputs.codecov-token }} diff --git a/.github/workflows/ci-tests-unit.yml b/.github/workflows/ci-tests-unit.yml deleted file mode 100644 index c79cae1df8f..00000000000 --- a/.github/workflows/ci-tests-unit.yml +++ /dev/null @@ -1,115 +0,0 @@ -name: CI/Tests/Unit - -on: - pull_request: - branches: [ master ] - paths-ignore: - - 'docs/*' - - CODEOWNERS - - .gitignore - - README.md - - CONTRIBUTING.md - - '.github/**' - workflow_dispatch: - -jobs: - Test: - runs-on: ubuntu-18.04 - timeout-minutes: 0 - strategy: - fail-fast: false - matrix: - package: - - name: "automl" - - name: "cntk" - - name: "geospatial" - - name: "onnx" - - name: "cognitive.split1" - flaky: "true" - - name: "cognitive.split2" - ffmpeg: "true" - flaky: "true" - - name: "cognitive.split3" - ffmpeg: "true" - flaky: "true" - - name: "cognitive.split4" - flaky: "true" - - name: "core" - - name: "downloader" - - name: "explainers.split1" - - name: "explainers.split2" - - name: "explainers.split3" - - name: "exploratory" - - name: "featurize" - - name: "image" - - name: "io.split1" - flaky: "true" - - name: "io.split2" - flaky: "true" - - name: "isolationforest" - - name: "flaky" - flaky: "true" - - name: "lightgbm.split1" - flaky: "true" - - name: "lightgbm.split2" - flaky: "true" - - name: "lime" - - name: "opencv" - - name: "recommendation" - - name: "stages" - - name: "nn" - - name: "train" - - name: "vw" - steps: - - uses: actions/checkout@master - with: - fetch-depth: 0 - - name: Azure Login - uses: Azure/login@v1 - with: - creds: ${{ secrets.AZURE_CREDENTIALS }} - - name: Setup repo - shell: bash -l {0} - run: | - (timeout 30s pip install requests) || (echo "retrying" && timeout 30s pip install requests) - (timeout 5m sbt setup) || (echo "retrying" && timeout 5m sbt setup) || (echo "retrying" && timeout 5m sbt setup) - - name: Unit Test - shell: bash -l {0} - run: | - (${FFMPEG:-false} && sudo add-apt-repository ppa:jonathonf/ffmpeg-4 -y && \ - sudo apt-get update && sudo apt-get install ffmpeg libgstreamer1.0-0 \ - gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly -y) - export SBT_OPTS="-Xmx2G -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=2G -Xss2M -Duser.timezone=GMT" - (timeout 30m sbt coverage "testOnly com.microsoft.azure.synapse.ml.${PACKAGE}.**") || - (${FLAKY:-false} && timeout 30m sbt coverage "testOnly com.microsoft.azure.synapse.ml.${PACKAGE}.**") || - (${FLAKY:-false} && timeout 30m sbt coverage "testOnly com.microsoft.azure.synapse.ml.${PACKAGE}.**") - env: - PACKAGE: ${{ matrix.package.name }} - FFMPEG: ${{ matrix.package.ffmpeg }} - FLAKY: ${{ matrix.package.flaky }} - - name: Publish Test Results - uses: EnricoMi/publish-unit-test-result-action@v2 - if: always() - with: - files: '**/test-reports/TEST-*.xml' - check_name: "${{ matrix.package.name }} Unit Test Results" - comment_title: "${{ matrix.package.name }} Unit Test Results" - - name: Generate Codecov report - if: always() - shell: bash -l {0} - run: sbt coverageReport - - name: Get Codecov Secret from Key Vault - uses: Azure/get-keyvault-secrets@v1 - with: - keyvault: "mmlspark-keys" - secrets: 'codecov-token' - id: GetKeyVaultSecrets - - name: Upload Coverage Report To Codecov.io - if: always() - shell: bash -l {0} - run: | - set -e - curl -s https://codecov.io/bash > .codecov - chmod +x .codecov - echo "Starting Codecov Upload" - ./.codecov -t ${{ steps.GetKeyVaultSecrets.outputs.codecov-token }} diff --git a/.github/workflows/ci-tests-websitesamples.yml b/.github/workflows/ci-tests-websitesamples.yml deleted file mode 100644 index 338aeaca2a3..00000000000 --- a/.github/workflows/ci-tests-websitesamples.yml +++ /dev/null @@ -1,74 +0,0 @@ -name: CI/Tests/Website Samples - -on: - pull_request: - branches: [ master ] - paths-ignore: - - 'docs/*' - - CODEOWNERS - - .gitignore - - README.md - - CONTRIBUTING.md - - '.github/**' - workflow_dispatch: - -jobs: - Test: - runs-on: ubuntu-18.04 - timeout-minutes: 0 - steps: - - uses: actions/checkout@master - with: - fetch-depth: 0 - - name: Azure Login - uses: Azure/login@v1 - with: - creds: ${{ secrets.AZURE_CREDENTIALS }} - - name: Setup Python - uses: actions/setup-python@v4.2.0 - with: - python-version: 3.8.8 - - name: Set up JDK 11 - uses: actions/setup-java@v3 - with: - java-version: '11' - distribution: 'temurin' - - name: Setup Miniconda - uses: conda-incubator/setup-miniconda@v2.1.1 - with: - python-version: 3.8.8 - environment-file: environment.yml - activate-environment: synapseml - - name: Test Website Samples - shell: bash -l {0} - run: | - sbt packagePython - sbt publishBlob - (timeout 5m sbt setup) || (echo "retrying" && timeout 5m sbt setup) || (echo "retrying" && timeout 5m sbt setup) - (sbt coverage testWebsiteDocs) - - name: Publish Test Results - uses: EnricoMi/publish-unit-test-result-action@v2 - if: always() - with: - files: '**/website-test-result.xml' - check_name: "Website Samples Test Results" - comment_title: "Website Samples Test Results" - - name: Generate Codecov report - if: always() - shell: bash -l {0} - run: sbt coverageReport - - name: Get Codecov Secret from Key Vault - uses: Azure/get-keyvault-secrets@v1 - with: - keyvault: "mmlspark-keys" - secrets: 'codecov-token' - id: GetKeyVaultSecrets - - name: Upload Coverage Report To Codecov.io - if: always() - shell: bash -l {0} - run: | - set -e - curl -s https://codecov.io/bash > .codecov - chmod +x .codecov - echo "Starting Codecov Upload" - ./.codecov -t ${{ steps.GetKeyVaultSecrets.outputs.codecov-token }} diff --git a/.github/workflows/remove-awaiting-response-label.yml b/.github/workflows/remove-awaiting-response-label.yml new file mode 100644 index 00000000000..548eb054771 --- /dev/null +++ b/.github/workflows/remove-awaiting-response-label.yml @@ -0,0 +1,12 @@ +name: Remove Awaiting Response Label + +on: [issue_comment] + +jobs: + remove_label: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + - uses: actions-ecosystem/action-remove-labels@v1 + with: + labels: "awaiting response" \ No newline at end of file diff --git a/notebooks/community/aisamples/AIsample - Time Series Forecasting.ipynb b/notebooks/community/aisamples/AIsample - Time Series Forecasting.ipynb index 4b8efa25241..cd3c5d3bd6f 100644 --- a/notebooks/community/aisamples/AIsample - Time Series Forecasting.ipynb +++ b/notebooks/community/aisamples/AIsample - Time Series Forecasting.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Training and Evaluating Time Series Forcasting Model" + "# Training and Evaluating Time Series Forecasting Model" ] }, { @@ -13,16 +13,16 @@ "source": [ "## Introduction \n", "\n", - "In this notebook, we will develop a program to forcast time series data which has seasonal cycles. We will use [NYC Property Sales data](https://www1.nyc.gov/site/finance/about/open-portal.page) range from 2003 to 2015 published by NYC Department of Finance on the [NYC Open Data Portal](https://opendata.cityofnewyork.us/). \n", + "In this notebook, we will develop a program to forecast time series data which has seasonal cycles. We will use [NYC Property Sales data](https://www1.nyc.gov/site/finance/about/open-portal.page) range from 2003 to 2015 published by NYC Department of Finance on the [NYC Open Data Portal](https://opendata.cityofnewyork.us/). \n", "\n", - "The dataset is a record of every building sold in New York City property market during 13-year period. Please refer to [Glossary of Terms for Property Sales Files](https://www1.nyc.gov/assets/finance/downloads/pdf/07pdf/glossary_rsf071607.pdf) for definition of columns in the spreadsheet. The dataset looks like below:\n", + "The dataset is a record of every building sold in the New York City property market during a 13-year period. Please refer to [Glossary of Terms for Property Sales Files](https://www1.nyc.gov/assets/finance/downloads/pdf/07pdf/glossary_rsf071607.pdf) for definition of columns in the spreadsheet. The dataset looks like below: \n", "\n", "|borouge|neighborhood|building_class_category|tax_class|block|lot|eastment|building_class_at_present|address|apartment_number|zip_code|residential_units|commercial_units|total_units|land_square_feet|gross_square_feet|year_built|tax_class_at_time_of_sale|building_class_at_time_of_sale|sale_price|sale_date|\n", "|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|\n", "|Manhattan|ALPHABET CITY|07 RENTALS - WALKUP APARTMENTS|0.0|384.0|17.0||C4|225 EAST 2ND STREET||10009.0|10.0|0.0|10.0|2145.0|6670.0|1900.0|2.0|C4|275000.0|2007-06-19|\n", "|Manhattan|ALPHABET CITY|07 RENTALS - WALKUP APARTMENTS|2.0|405.0|12.0||C7|508 EAST 12TH STREET||10009.0|28.0|2.0|30.0|3872.0|15428.0|1930.0|2.0|C7|7794005.0|2007-05-21|\n", "\n", - "We will build up a model to forcast monthly volume of property trade based on history data. In order to forecast, we will use [Facebook Prophet](https://facebook.github.io/prophet/), which provides fast and automated forecast procedure and handles seasonality well." + "We will build up a model to forecast the monthly volume of property trade based on historical data. To forecast, we will use [Facebook Prophet](https://facebook.github.io/prophet/), which provides fast and automated forecast procedure and handles seasonality well. " ] }, { @@ -31,9 +31,7 @@ "source": [ "## Install Prophet\n", "\n", - "Let's first install [Facebook Prophet](https://facebook.github.io/prophet/). It uses a decomposable time series model which consist of three main components: trend, seasonality and holidays. \n", - "For the trend part, Prophet assumes piece-wise constant rate of growth with automatic chagne point selection.\n", - "For seasonality part, Prophet models weekly and yearly seasonality using Fourier Series. Since we are using monthly data, so we won't have weekly seasonality and won't considering holidays." + "Let's first install [Facebook Prophet](https://facebook.github.io/prophet/). It uses a decomposable time series model which consists of three main components: trend, seasonality, and holidays. For the trend part, Prophet assumes piece-wise constant rate of growth with automatic change point selection. For seasonality part, Prophet models weekly and yearly seasonality using Fourier Series. Since we are using monthly data, we will not have weekly seasonality and will not consider holidays." ] }, { @@ -58,7 +56,7 @@ "source": [ "### Download dataset and Upload to Lakehouse\n", "\n", - "There are 15 csv files containig property sales records from 5 boroughs in New York since 2003 to 2015. For your convenience, these files are compressed in `nyc_property_sales.tar` and is available in a public blob storage." + "There are 15 csv files holding property sales records from 5 boroughs in New York from 2003 to 2015. For your convenience, these files are compressed in `nyc_property_sales.tar` and are available in a public blob storage." ] }, { @@ -86,7 +84,7 @@ " # ask user to add a lakehouse if no default lakehouse added to the notebook.\n", " # a new notebook will not link to any lakehouse by default.\n", " raise FileNotFoundError(\n", - " \"Default lakehouse not found, please add a lakehouse for the notebook.\"\n", + " \"Default lakehouse not found, please add a lakehouse and restart the session.\"\n", " )\n", "else:\n", " # check if the needed files are already in the lakehouse, try to download and unzip if not.\n", @@ -104,7 +102,7 @@ "source": [ "### Create Dataframe from Lakehouse\n", "\n", - "The `display` function print the dataframe and automatically gives chart views." + "The `display` function prints the dataframe and automatically gives chart views." ] }, { @@ -132,13 +130,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Type Conversion and Filtering\n", - "Lets do some necessary type conversion and filtering.\n", - "- Need convert sale prices to integers.\n", - "- Need exclude irregular sales data. For example, a $0 sale indicate ownership transfer without cash consideration.\n", - "- Exclude building types other than A class.\n", + "### Type Conversion and Filtering \n", + "Let us do some necessary type conversion and filtering. \n", + "- Need to convert sale prices to integers. \n", + "- Need to exclude irregular sales data. For example, a $0 sale indicates ownership transfer without cash consideration. \n", + "- Exclude building types other than A class. \n", "\n", - "The reason to choose only market of A class building for analysis is that seasonal effect is innegligible coeffient for A class building. The model we will use outperform many others in including seasonality, which is very common needs in time series analysis." + "The reason to choose only market of A class building for analysis is that seasonal effect cannot be ignored for A class building. The model we will use outperforms many others in including seasonality, which is a quite common need in time series analysis." ] }, { @@ -220,7 +218,7 @@ "metadata": {}, "source": [ "### Visualization\n", - "Now, let's take a look at the trend of property trade trend at NYC. The yearly seasonality is quite clear on the chosen building class. The peek buying seasons are usually spring and fall." + "Now, let us look at the trend of property trade in NYC. The yearly seasonality is quite clear in the chosen building class. The peek buying seasons are usually spring and fall." ] }, { @@ -293,8 +291,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now let's fit the model. We will choose to use 'multiplicative' seasonality, it means seasonality is no longer a constant additive factor like defualt assumed by Prophet. As you can see a in previous cell, we printed out the total property sale data per month, and the vibration amplitude is not consistant. It means using simple additive seasonlity won't fit the data well.\n", - "In addition, we will use Markov Chain Monte Carlo(MCMC) that gives mean of posteriori distribution. By default, Prophet uses Stan's L-BFGS to fit the model, which find a maximum a posteriori probability(MAP) estimate.\n" + "Now let us fit the model. We will choose to use 'multiplicative' seasonality, it means seasonality is no longer a constant additive factor like default assumed by Prophet. As you can see in a previous cell, we printed out the total property sale data per month, and the vibration amplitude is not consistent. It means using simple additive seasonality will not fit the data well. In addition, we will use Markov Chain Monte Carlo (MCMC) that gives mean of posteriori distribution. By default, Prophet uses Stan's L-BFGS to fit the model, which finds maximum a posteriori probability (MAP) estimate." ] }, { @@ -316,7 +313,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's use bulit-in functions in Prophet to show the model fitting results. The black dots are data points used to train the model. The blue line is the prediction and the light blue area shows uncertainty intervals." + "Let us use built-in functions in Prophet to show the model fitting results. The black dots are data points used to train the model. The blue line is the prediction, and the light blue area shows uncertainty intervals. " ] }, { @@ -334,7 +331,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Prophet assumes piece-wise constant growth, thus you can plot the change points of the trained model" + "The Prophet assumes piece-wise constant growth, thus you can plot the change points of the trained model " ] }, { @@ -374,7 +371,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can use Prophet's built in cross validation functionality to measure the forecast error on historical data. The below parameters means starting with 11 years of training data, then making predictions every 30 days within 1 year horizon." + "We can use Prophet's built-in cross validation functionality to measure the forecast error on historical data. The below parameters mean starting with 11 years of training data, then making predictions every 30 days within 1 year horizon." ] }, {