-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Causal DoubleMLEstimator (#8) #1715
feat: Causal DoubleMLEstimator (#8) #1715
Conversation
Add package 'com.microsoft.azure.synapse.ml.causal' and implementation LinearDMLEstimator
Acrolinx ScorecardsA minimum Acrolinx score of 80 is required. Click the scorecard links for each article to review the Acrolinx feedback on grammar, spelling, punctuation, writing style, and terminology.
More information about Acrolinx |
Hey @dylanw-oss 👋! We use semantic commit messages to streamline the release process. Examples of commit messages with semantic prefixes:
To test your commit locally, please follow our guild on building from source. |
/azp run |
Commenter does not have sufficient privileges for PR 1715 in repo microsoft/SynapseML |
@serena-ruan, @mhamilton723, can anyone help to give me permission to run a pipeline? |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #1715 +/- ##
==========================================
- Coverage 86.51% 85.94% -0.58%
==========================================
Files 273 276 +3
Lines 14420 14571 +151
Branches 769 754 -15
==========================================
+ Hits 12476 12523 +47
- Misses 1944 2048 +104
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Hi @dylanw-oss Thanks for this PR!! Could you raise a request to join this team: https://github.com/orgs/microsoft/teams/synapseml So that @mhamilton723 could add you in, the you can run /azp run to trigger the pipeline. |
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLEstimator.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLEstimator.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLEstimator.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLEstimator.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLEstimator.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLEstimator.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLEstimator.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLEstimator.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLEstimator.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLEstimator.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLEstimator.scala
Outdated
Show resolved
Hide resolved
} | ||
|
||
@DeveloperApi | ||
override def transformSchema(schema: StructType): StructType = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This transform schema doesent look right, you sure this doesent add any info to the data dframe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LinearDMLEstimator transform does nothing by design and isn't supposed to be called by end user.
Previously, I set it throw exception, but it won't pass fuzzing testing, so I changed it to return the original dataset back, in this case I don't think we need transform schema, please correct me if I missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there actually is a way to actually use this model in a natural way and perform a regression. In particular you can think of this pipeline as estimating a prediction variable in two steps. The first is the debiasing operation where you map a dataframe to it's residuals. The second is the prediction of the target residuals.
To form the actual prediction target, you first use your baseline estimate of the target from step 1, then add your predicted residual from step 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To give a little more info here:
First use your learned residual models to map the inputs to their residuals, then use your treatment effect model to map the residuals to the treatment. Then append that treadment as the prediction column. (If im missing something here perhaps we can chat to help clarify)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@memoryz , Jason, did you sync with our data scientist and if this is feasible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's no objections, I'll set it as by design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhamilton723, I discussed the feedback in depth with our data scientist @sarahshy, and she confirmed that there is no meaningful natural transformation we can do here. We can implement a natural transformation as you suggested, but the result won't be meaningful and interpretable. Therefore, I suggest we resolve this item as "by design". I can schedule a meeting with @sarahshy if you still have concerns.
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLParams.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/ResidualTransformer.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/ResidualTransformer.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/core/contracts/Params.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/microsoft/azure/synapse/ml/core/schema/SchemaConstants.scala
Show resolved
Hide resolved
website/docs/documentation/estimators/causal/_causalInferenceDML.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely work! Really excited to see this go in, please feel free to chat if any of these comments dont make sense!
core/src/main/scala/com/microsoft/azure/synapse/ml/causal/LinearDMLEstimator.scala
Outdated
Show resolved
Hide resolved
Acrolinx ScorecardsA minimum Acrolinx score of 80 is required. Click the scorecard links for each article to review the Acrolinx feedback on grammar, spelling, punctuation, writing style, and terminology.
More information about Acrolinx |
Acrolinx ScorecardsA minimum Acrolinx score of 80 is required. Click the scorecard links for each article to review the Acrolinx feedback on grammar, spelling, punctuation, writing style, and terminology.
More information about Acrolinx |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now.
Acrolinx ScorecardsA minimum Acrolinx score of 80 is required. Click the scorecard links for each article to review the Acrolinx feedback on grammar, spelling, punctuation, writing style, and terminology.
More information about Acrolinx |
Acrolinx ScorecardsA minimum Acrolinx score of 80 is required. Click the scorecard links for each article to review the Acrolinx feedback on grammar, spelling, punctuation, writing style, and terminology.
More information about Acrolinx |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Acrolinx ScorecardsA minimum Acrolinx score of 80 is required. Click the scorecard links for each article to review the Acrolinx feedback on grammar, spelling, punctuation, writing style, and terminology.
More information about Acrolinx |
Acrolinx ScorecardsA minimum Acrolinx score of 80 is required. Click the scorecard links for each article to review the Acrolinx feedback on grammar, spelling, punctuation, writing style, and terminology.
More information about Acrolinx |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@mhamilton723 ready for merge. :) |
What changes are proposed in this pull request?
Add package 'com.microsoft.azure.synapse.ml.causal' and implementation LinearDMLEstimator
How is this patch tested?
Does this PR change any dependencies?
Does this PR add a new feature? If so, have you added samples on website?