Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stale_nodes_statement of Neo4jStalenessRemovalTask not returning any result #567

Closed
ayush-san opened this issue Jul 20, 2020 · 9 comments
Closed
Labels
status:completed Issue is completed and on master type:bug An unexpected problem or unintended behavior

Comments

@ayush-san
Copy link
Contributor

ayush-san commented Jul 20, 2020

I am trying to use Neo4jStalenessRemovalTask to remove stale data from Neo4j. Following are the configs I am passing to the task

DEFAULT_TARGET_RELATIONS: List[str] = [
    "SCHEMA_OF",
    "CLUSTER",
    "COLUMN_OF",
    "TABLE_OF",
    "COLUMN",
    "CLUSTER_OF",
    "TABLE",
    "SCHEMA"
]

DEFAULT_TARGET_NODES: List[str] = [
    "Table",
    "Database",
    "Cluster",
    "Schema",
    "Watermark",
    "Column"
]

milliseconds_to_expire: int = 86400000 * 2
staleness_max_pct: int = 20

Expected Behavior

I have pushed data to Amundsen three days back so while removing the stale data, validation for staleness percentage should have thrown an exception but it didn't. Instead, it ran successfully without deleting anything

INFO - Deleting stale data of Watermark with batch size 100
INFO - Deleted 0 stale data of Watermark

Current Behavior

The stale data removal task is running without deleting any data. I am seeing the following logs in the airflow task

INFO - Deleting stale data of Watermark with batch size 100
INFO - Deleted 0 stale data of Watermark

So I tried running the code of _validate_relation_staleness_pct(), I can see that stale_relations_statement is not returning any result but when I tried running the same statement in neo4j browser by replacing param with the desired value it's returning the desired result.

But if I use params in neo4j browser, then there also query is not returning any result

image

Screenshots

image

image

Possible Solution

If I change the _decorate_staleness() marker query for the case where ms_to_expire is given and update params dict accordingly then stale_nodes_statement is returning the result

change in _decorate_staleness()
return statement.format(textwrap.dedent("""
n.publisher_last_updated_epoch_ms < (timestamp() - ${marker})
OR NOT EXISTS(n.publisher_last_updated_epoch_ms)""".format(marker=MARKER_VAR_NAME)))
change in init()
self.marker = self.ms_to_expire

image
image

Your Environment

  • Amundsen version used: I am running the master code of lyft/amundsen
  • amundsen-databuilder version used: 2.6.4
  • Deployment (k8s or native): docker-compose-local.yml
@feng-tao
Copy link
Member

cc @jinhyukchang

@feng-tao feng-tao added Project: Databuilder status:needs_triage For all issues that need to be processed type:bug An unexpected problem or unintended behavior labels Jul 21, 2020
@feng-tao
Copy link
Member

I will take a look as well. Internally we have been using this approach to clean stale metadata which we haven't had much issues. But take a look you example.

@ayush-san
Copy link
Contributor Author

@feng-tao Were you able to review this issue?

@feng-tao
Copy link
Member

@ayush-san sorry too busy this week.

@ayush-san
Copy link
Contributor Author

@feng-tao Should I create a PR of what I think is the fix of this problem and you can review it later or wait for you to review the issue?

@feng-tao
Copy link
Member

@ayush-san yeah, that will be good as well as I am overloaded this few weeks.

@ayush-san
Copy link
Contributor Author

@feng-tao When can we release a new version of amundsen-databuilder?

@stale
Copy link

stale bot commented May 6, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label May 6, 2021
@stale
Copy link

stale bot commented May 28, 2021

This issue has been automatically closed for inactivity. If you still wish to make these changes, please open a new pull request or reopen this one.

@stale stale bot closed this as completed May 28, 2021
@Golodhros Golodhros added status:completed Issue is completed and on master and removed status:needs_triage For all issues that need to be processed labels Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:completed Issue is completed and on master type:bug An unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants