Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/spark expectations fixes #117

Merged

Conversation

sudeep7978
Copy link
Contributor

@sudeep7978 sudeep7978 commented Nov 18, 2024

The fix is related to the issue in the code that lies with the duplicate check logic in row_dq, which is not functioning as intended. Despite implementing the expectations outlined in the Spark Expectations Nike Info Page, the output does not align with the anticipated results, indicating a mismatch or inconsistency in the validation process. Also we had added a new column in the stats_detailed from observability point of view and we want to get the error records column wise.

Description

We have added column name in writer.py and action.py to get the column name in the stats_detailed table and as well as in the query_dq_output table .Also we had made some changes in the writer.py so that it will rectify the duplicate check and write correct error records in the stats_detailed table

Related Issue

Link to the issue below
https://github.com/Nike-Inc/spark-expectations/issues/116

Motivation and Context

This solves the duplicate check that is uniqueness issue with the dq rules also adding a column helps to get column level error records that we can further enhance from observability point of view.

How Has This Been Tested?

This has been tested rigorously also this version of the spark expectation is currently running in various environment in Nike
PSSP pipelines integrated with observability features. like alerts and dashboards.
We have also tested it locally with the all the possible combinations of rules as well as dataset.
We tested it locally and also it successfully passed all the 400 test cases of unit testing using [make cov and make test] .This signifies this changes is not breaking any other things in the code

Screenshots (if appropriate):

Fixed screenshot result of the duplicate check that is explained in the issue 116
Screenshot 2024-11-19 at 12 28 28 AM
Screenshots for all the test cases passing [make cov]
Screenshot 2024-11-19 at 12 37 58 AM
[make test] screenshots
Screenshot 2024-11-19 at 12 44 27 AM

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

…olumn to stats_detailed table for improved observability.

Fix: Addressed duplicate check issues in row_dq for improved data quality.
Feature: Added new column to stats_detailed table for enhanced observability.
…d table

Improves visibility and aids in quick resolution of data quality issues.
README.md Outdated Show resolved Hide resolved
@asingamaneni asingamaneni self-requested a review December 2, 2024 16:13
Copy link
Collaborator

@asingamaneni asingamaneni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asingamaneni asingamaneni merged commit 9500ed9 into Nike-Inc:main Dec 2, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants