-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/spark expectations fixes #117
Merged
asingamaneni
merged 16 commits into
Nike-Inc:main
from
sudeep7978:feature/spark_expectations_fixes
Dec 2, 2024
Merged
Feature/spark expectations fixes #117
asingamaneni
merged 16 commits into
Nike-Inc:main
from
sudeep7978:feature/spark_expectations_fixes
Dec 2, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…olumn to stats_detailed table for improved observability. Fix: Addressed duplicate check issues in row_dq for improved data quality. Feature: Added new column to stats_detailed table for enhanced observability.
…d table Improves visibility and aids in quick resolution of data quality issues.
keeping the read_me file as it is
Will update the documentation with the further release.
asingamaneni
approved these changes
Dec 2, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The fix is related to the issue in the code that lies with the duplicate check logic in row_dq, which is not functioning as intended. Despite implementing the expectations outlined in the Spark Expectations Nike Info Page, the output does not align with the anticipated results, indicating a mismatch or inconsistency in the validation process. Also we had added a new column in the stats_detailed from observability point of view and we want to get the error records column wise.
Description
We have added column name in writer.py and action.py to get the column name in the stats_detailed table and as well as in the query_dq_output table .Also we had made some changes in the writer.py so that it will rectify the duplicate check and write correct error records in the stats_detailed table
Related Issue
Link to the issue below
https://github.com/Nike-Inc/spark-expectations/issues/116
Motivation and Context
This solves the duplicate check that is uniqueness issue with the dq rules also adding a column helps to get column level error records that we can further enhance from observability point of view.
How Has This Been Tested?
This has been tested rigorously also this version of the spark expectation is currently running in various environment in Nike
PSSP pipelines integrated with observability features. like alerts and dashboards.
We have also tested it locally with the all the possible combinations of rules as well as dataset.
We tested it locally and also it successfully passed all the 400 test cases of unit testing using [make cov and make test] .This signifies this changes is not breaking any other things in the code
Screenshots (if appropriate):
Fixed screenshot result of the duplicate check that is explained in the issue 116
Screenshots for all the test cases passing [make cov]
[make test] screenshots
Types of changes
Checklist: