You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
For months, we have been hitting spurious issues when uploading Databricks notebooks to a Workspace using the SDK. Occasionally, the notebooks would be empty, then a job would run and succeed, but not do any work (because the notebook was empty). We believe we have traced this to an issue in the SDK - specifically, the workspace.upload method supports a BinaryIO input, which is a streaming file-like interface. However, an IO interface in Python can only be read once - a second attempt to read from it will result in an empty string. This means that, if for any reason the API call fails, the second attempt will result in an empty notebook.
Reproduction
run this in a fresh REPL session:
import databricks.sdk
from databricks.sdk.service.workspace import Language
import io
import logging
logging.basicConfig(level=logging.DEBUG)
w = databricks.sdk.WorkspaceClient(profile='your-profile-here')
Now, turn off network access so your connection times out and has to retry
Description
For months, we have been hitting spurious issues when uploading Databricks notebooks to a Workspace using the SDK. Occasionally, the notebooks would be empty, then a job would run and succeed, but not do any work (because the notebook was empty). We believe we have traced this to an issue in the SDK - specifically, the
workspace.upload
method supports aBinaryIO
input, which is a streaming file-like interface. However, an IO interface in Python can only be read once - a second attempt to read from it will result in an empty string. This means that, if for any reason the API call fails, the second attempt will result in an empty notebook.Reproduction
run this in a fresh REPL session:
Now, turn off network access so your connection times out and has to retry
After one or two retries on the failed network connection, which look like the following, re-enable network access.
The job will now complete, but the file will be blank.
Expected behavior
The file should not be blank.
Is it a regression?
It has been broken since at least 0.13. I tested it and it fails in 0.20 and 0.30.
Other Information
Additional context
This caused major data quality issues that spanned a several-month period.
The text was updated successfully, but these errors were encountered: