-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wizard: problems and improvements for etl dashboard and step updater #2365
Comments
Hi @paarriagadap, thanks a lot for reporting these issues! The issue of removing comments is already known (listed above). I'll try to fix that soon. If I understand correctly, the issues are related to the formatting of the written files (either the dag or the snapshot metadata files), but the step updater is behaving as expected. In other words, the code generated by the tool is correct, although not great in terms of style. Is that right? Currently, the step updater either (1) writes new steps in the dag, or (2) overwrites dependencies of existing steps. Steps with "latest" version correspond to case (2) because the step already exists, and its dependencies need to be updated. So you would prefer those updated "latest" steps to be moved the bottom of the dag, as if they were new steps. Is that right? Please let me know if I misunderstood your issues. Thanks! |
Hi @pabloarosado, yes, it's mostly formatting and that is better that the latest steps go to the bottom (or it's replicated in both old and new steps). Thank you! A tiny one I found now is that I had an additional script to extract the data from PIP in the snapshot folder ( |
Thanks for the clarification, I added the suggestion about "latest" steps to the list of improvements. Regarding |
should we allocate time for this during this cycle, @pabloarosado ? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
One-liner
This tracking issue will list all problems and improvements for the ETL dashboard and StepUpdater.
Issues
steps_df
(fromStepUpdater
) contains only active steps (since they are the only needed ones). However, "direct_usages" (and probably other columns of dependencies) includes also archive steps. This means that, in the Operations list, when adding direct usages, archive steps suddenly appear (and also the dashboard raises anIndexError
when trying to access archive steps insteps_df
. Fixed by Let VersionTracker optionally ignore archive steps #2448long_term_crop_yields
to the Operations list.long_term_crop_yields
.long_term_wheat_yields
.faostat_qcl
.Loading and processing analytics makes VersionTracker and StepUpdater slower. And analytics are only needed for the ETL dashboard. A solution would be to add an optional flag argument to StepUpdater and VersionTracker so that analytics are loaded only optionally. Use it when using it for the dashboard, but not for updates.I tried doing this and the time difference was insignificant. Updating many steps (e.g. climate) is still very slow (even over 1 minute), but fetching analyltics or not doesn't make a significant difference.Improvements and ideas
The text was updated successfully, but these errors were encountered: