Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Adding telemetry for the dataset metadata. This one is specially for … #1917

Merged

Conversation

saileshbaidya
Copy link
Contributor

…adding count of the columns.

Related Issues/PRs

Task: 2348953

Close #2348953

What changes are proposed in this pull request?

Adding the dataset schema column count to the existing logs.

Briefly describe the changes included in this Pull Request.

How is this patch tested?

I am going to run the existing tests that calls the logging apis.

  • I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

  • No. You can skip this section.
  • Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

  • No. You can skip this section.
  • Yes. Make sure you have added samples following below steps.
  1. Find the corresponding markdown file for your new feature in website/docs/documentation folder.
    Make sure you choose the correct class estimators/transformers and namespace.
  2. Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
  3. Make sure the DocTable points to correct API link.
  4. Navigate to website folder, and run yarn run start to make sure the website renders correctly.
  5. Don't forget to add <!--pytest-codeblocks:cont--> before each python code blocks to enable auto-tests for python samples.
  6. Make sure the WebsiteSamplesTests job pass in the pipeline.

@github-actions
Copy link

Hey @saileshbaidya 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

  • fix: Fix LightGBM crashes with empty partitions
  • feat: Make HTTP on Spark back-offs configurable
  • docs: Update Spark Serving usage
  • build: Add codecov support
  • perf: improve LightGBM memory usage
  • refactor: make python code generation rely on classes
  • style: Remove nulls from CNTKModel
  • test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

@saileshbaidya saileshbaidya changed the title task: Adding telemetry for the dataset metadata. This one is specially for … chore: Adding telemetry for the dataset metadata. This one is specially for … Apr 15, 2023
@saileshbaidya
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1917 in repo microsoft/SynapseML

@saileshbaidya
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@saileshbaidya
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2023

Codecov Report

Merging #1917 (94182c9) into master (0d0d10c) will increase coverage by 0.03%.
The diff coverage is 98.51%.

@@            Coverage Diff             @@
##           master    #1917      +/-   ##
==========================================
+ Coverage   86.77%   86.81%   +0.03%     
==========================================
  Files         301      301              
  Lines       15677    15783     +106     
  Branches      815      848      +33     
==========================================
+ Hits        13604    13702      +98     
- Misses       2073     2081       +8     
Impacted Files Coverage Δ
...se/ml/cognitive/translate/DocumentTranslator.scala 19.17% <0.00%> (-0.27%) ⬇️
.../azure/synapse/ml/automl/TuneHyperparameters.scala 78.31% <50.00%> (+0.26%) ⬆️
...re/synapse/ml/cognitive/CognitiveServiceBase.scala 81.62% <100.00%> (-0.10%) ⬇️
...ynapse/ml/cognitive/anomaly/AnomalyDetection.scala 81.19% <100.00%> (+0.16%) ⬆️
...gnitive/anomaly/MultivariateAnomalyDetection.scala 90.43% <100.00%> (-0.54%) ⬇️
...ynapse/ml/cognitive/form/FormOntologyLearner.scala 90.19% <100.00%> (+0.40%) ⬆️
...ure/synapse/ml/cognitive/openai/OpenAIPrompt.scala 85.48% <100.00%> (+0.23%) ⬆️
...zure/synapse/ml/cognitive/search/AzureSearch.scala 87.23% <100.00%> (+0.09%) ⬆️
.../synapse/ml/cognitive/speech/SpeechToTextSDK.scala 87.50% <100.00%> (+0.04%) ⬆️
...rosoft/azure/synapse/ml/automl/FindBestModel.scala 87.93% <100.00%> (+0.21%) ⬆️
... and 70 more

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -12,10 +12,11 @@ import scala.collection.mutable
case class SynapseMLLogInfo(uid: String,
className: String,
method: String,
buildVersion: String)
buildVersion: String,
columns: Int = -1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In scala we use Option types to denote something that can be there or not. You can add this without changing much by also adding a different "this" method so you can just pass in columns as a non-optioned int too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Agree, better to use language features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have sent out new iteration with the changes @mhamilton723.

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit on the argument type of columns, should be Option[Int] instead of Int

Comment on lines 73 to 83
def logFit[T](f: => T, columns: Int = -1): T = {
logVerb("fit", f, columns)
}

def logTrain[T](f: => T): T = {
logVerb("train", f)
def logTrain[T](f: => T, columns: Int = -1): T = {
logVerb("train", f, columns)
}

def logTransform[T](f: => T): T = {
logVerb("transform", f)
def logTransform[T](f: => T, columns: Int = -1): T = {
logVerb("transform", f, columns)
}
Copy link
Collaborator

@mhamilton723 mhamilton723 Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we still should probabbly get rid of this -1 default arg here. Please remove the default arg from this function as it should be provided in all cases right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that. Please check.

Comment on lines 49 to 55
protected def logBase(methodName: String, columns:Int = -1): Unit = {
logBase(SynapseMLLogInfo(
uid,
getClass.toString,
methodName,
BuildInfo.version,
if (columns == -1) None else Some(columns)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can just take in an Option of int too because it isnt used much

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that. Please check.

@saileshbaidya
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@saileshbaidya
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723 mhamilton723 merged commit 3c09702 into microsoft:master Apr 24, 2023
@saileshbaidya saileshbaidya deleted the saibai/TelemetryTask1893417 branch April 24, 2023 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants