Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: binder runtime issues #1598

Merged
merged 18 commits into from
Aug 9, 2022

Conversation

ppruthi
Copy link
Contributor

@ppruthi ppruthi commented Aug 4, 2022

Related Issues/PRs

A few notebooks have runtime failures while running on mybinder.org. This PR fixes those issues.

What changes are proposed in this pull request?

Modifying jupyter notebooks in $REPO_ROOT/notebooks/features to run on mybinder.org as well as our test infrastrucuture.

How is this patch tested?

Our CI infra + manually running notebooks on mybinder.org

  • I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

  • No. You can skip this section.
  • Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

  • No. You can skip this section.
  • Yes. Make sure you have added samples following below steps.
  1. Find the corresponding markdown file for your new feature in website/docs/documentation folder.
    Make sure you choose the correct class estimators/transformers and namespace.
  2. Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
  3. Make sure the DocTable points to correct API link.
  4. Navigate to website folder, and run yarn run start to make sure the website renders correctly.
  5. Don't forget to add <!--pytest-codeblocks:cont--> before each python code blocks to enable auto-tests for python samples.
  6. Make sure the WebsiteSamplesTests job pass in the pipeline.

AB#1914413

@github-actions
Copy link

github-actions bot commented Aug 4, 2022

Hey @ppruthi 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.
We appreciate your patience and contributions 💯!

@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 4, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ppruthi ppruthi changed the title fix binder runtime issues fix: binder runtime issues Aug 4, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 4, 2022

Codecov Report

Merging #1598 (a5a5d89) into master (c960c06) will decrease coverage by 1.38%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1598      +/-   ##
==========================================
- Coverage   83.68%   82.30%   -1.39%     
==========================================
  Files         310      292      -18     
  Lines       15823    15473     -350     
  Branches      752      752              
==========================================
- Hits        13242    12735     -507     
- Misses       2581     2738     +157     
Impacted Files Coverage Δ
...soft/azure/synapse/ml/cognitive/AudioStreams.scala 0.00% <0.00%> (-87.88%) ⬇️
...t/azure/synapse/ml/cognitive/SpeechToTextSDK.scala 18.03% <0.00%> (-72.55%) ⬇️
...crosoft/azure/synapse/ml/cognitive/SpeechAPI.scala 0.00% <0.00%> (-70.00%) ⬇️
...osoft/azure/synapse/ml/param/TypedArrayParam.scala 41.66% <0.00%> (-12.50%) ⬇️
...ft/azure/synapse/ml/core/env/StreamUtilities.scala 77.77% <0.00%> (-7.41%) ⬇️
...a/com/microsoft/azure/synapse/ml/nn/BallTree.scala 82.85% <0.00%> (-4.77%) ⬇️
...rosoft/azure/synapse/ml/param/EstimatorParam.scala 54.54% <0.00%> (-4.55%) ⬇️
...om/microsoft/azure/synapse/ml/param/MapParam.scala 72.72% <0.00%> (-4.55%) ⬇️
...rosoft/azure/synapse/ml/param/DataFrameParam.scala 60.71% <0.00%> (-3.58%) ⬇️
...oft/azure/synapse/ml/param/UntypedArrayParam.scala 59.37% <0.00%> (-3.13%) ⬇️
... and 22 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 4, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 5, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

"if platform == \"binder\":\n",
" from IPython import get_ipython\n",
"\n",
" !pip install matplotlib Pillow\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put these libraries in the binder image?

Comment on lines 93 to 94
"platform = current_platform()\n",
"if platform == \"synapse\":\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know its a PITA but can we replace all of the places in the notebooks where we have this silly project arcadia thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - that was the plan -- wanted to see if this works fine and does not regress on E2E tests

return "binder"
else:
return "unknown"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to add a findSecret API that will run through the gambit of looking in keyvaults, using databricks methods, and checking env vars before yelling and asking someone to replace that line with their key if they cant find anything

@ppruthi ppruthi requested a review from eisber as a code owner August 8, 2022 09:57
@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 8, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 8, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines 12 to 41
def CurrentPlatform():
if os.environ.get("AZURE_SERVICE", None) == "Microsoft.ProjectArcadia":
return PLATFORM_SYNAPSE
elif "dbfs" in os.listdir("/"):
return PLATFORM_DATABRICKS
elif os.environ.get("BINDER_LAUNCH_HOST", None) is not None:
return PLATFORM_BINDER
else:
return PLATFORM_UNKNOWN


def RunningOnSynapse():
if CurrentPlatform() is PLATFORM_SYNAPSE:
return True
return False


def RunningOnBinder():
if CurrentPlatform() is PLATFORM_BINDER:
return True
return False


def RunningOnDatabricks():
if CurrentPlatform() is PLATFORM_DATABRICKS:
return True
return False


def PrintKeyWarning():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use snake_case here



def PrintKeyWarning():
if not RunningOnSynapse():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, throw an error here and refactor this to try to find a secret in a few places

@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 8, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 8, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines 50 to 53
raise RuntimeError(
"#### Please add your environment/service specific key(s) before running the notebook ####"
) from None
return ""
Copy link
Collaborator

@mhamilton723 mhamilton723 Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise RuntimeError(
"#### Please add your environment/service specific key(s) before running the notebook ####"
) from None
return ""
raise RuntimeError(f"Could not find {secret_name} in keyvault or overrides. If you are running this demo and would like to manually specify your key please add the override="YOUR_KEY_HERE" to the arguments of the find_secret method")
)

return current_platform() is PLATFORM_DATABRICKS


def get_platform_specific_secret(searchKey):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_platform_specific_secret(searchKey):
def find_secret(secret_name, keyvault=SECRET_STORE, override=None):


spark = SparkSession.builder.getOrCreate()
dbutils = DBUtils(spark)
return dbutils.secrets.get(scope=SECRET_STORE, key=searchKey)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return dbutils.secrets.get(scope=SECRET_STORE, key=searchKey)
return dbutils.secrets.get(scope=keyvault, key=secret_name)

if running_on_synapse():
from notebookutils.mssparkutils.credentials import getSecret

return getSecret(SECRET_STORE, searchKey)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return getSecret(SECRET_STORE, searchKey)
return getSecret(keyvault, secret_name)

@mhamilton723 mhamilton723 force-pushed the ms/ppruthi/fix-binder-run branch from 3122307 to c67cfb0 Compare August 8, 2022 23:42
@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723 mhamilton723 force-pushed the ms/ppruthi/fix-binder-run branch from c67cfb0 to 20c3744 Compare August 8, 2022 23:46
@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 9, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 9, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 9, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 9, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 9, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ppruthi
Copy link
Contributor Author

ppruthi commented Aug 9, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723 mhamilton723 merged commit c7a61ec into microsoft:master Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants