-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atum redesign #28
Comments
Makes sense. |
I would also proporse to redesign some parts so that they are immutable and functional style. What do you think? |
Absolutely. |
Not sure about the last one like its described, particularly in regard to the changes above. |
Yeah, it would probably be hard to implement an event that is sent last per dataset. But an event that is sent last during the lifetime of the application could be useful. |
Fields such as Country and others should be made optional and only functional ones should be mandatory to include |
Background
Currently, Atum relies on the global state of a Spark Application. This complicates the usage of Atum for jobs that are slightly more complicated than just a pipeline of a single dataframe. If there are several dataframes and several reads/writes and not every read and write is associated with control measurements, Atum will try to process all dataframes as if all require measurements.
The current workaround for such use cases is
disableControlMeasuresTracking()
method that is invoked before writing a dataframe that does not require control measurements.Feature
df.enableControlMeasuresTracking()
instead ofspark. enableControlMeasuresTracking()
. Same for switching off control measurements.df.setAdditionalInfo(...)
).Additonal context
After the new design is confirmed this issue can be converted to epic and all subitems to tasks.
The text was updated successfully, but these errors were encountered: