Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Property data corruption after graph rebuild #86

Open
xujiaxj opened this issue Oct 13, 2020 · 6 comments
Open

Property data corruption after graph rebuild #86

xujiaxj opened this issue Oct 13, 2020 · 6 comments
Assignees
Labels

Comments

@xujiaxj
Copy link

xujiaxj commented Oct 13, 2020

RedisGraph version: 2.2.6
JRedisGraph version: 2.1.0

For simplicity, let's say we have two Java services talking to a RedisGraph server: one called GraphBuilder, the other called GraphQuerier. They both use JRedisGraph client. Our GraphBuilder has a timer task that builds a graph once every minute. We are using MERGE command, so it's essentially a timer task that keeps "upserting" the graph.

Everything is on K8s. When the Redis server crashes, its pod will be restarted, and the graph will be gone, but the GraphBuilder will rebuild the graph in no time. When this happens, queries ran on the GraphQuerier can produce corrupted data. Particularly the property names are messed up.

For instance, if the GraphQuerier has not been restarted after Redis crash, it will print a list of properties with property names all messed up:

properties={name=Service, component=String, instance=String, memory_limits=String, pod=workload | deployment | statefulset | daemonset | replicaset, cpu_limits=String, access_mode=String, _created=1602617008984, _updated=1602617495780}

But if we restart the GraphQuerier, or deploy a new GraphQuerier pod, and query the same node on the same graph, we will get the correct result:

properties={name=Service, app=String, kube_service=String, namespace=String, lookup_workload=workload | deployment | statefulset | daemonset | replicaset, workload=String, workload_type=String, _created=1602617008984, _updated=1602617495780}

Note for some properties, we are storing Java data types like String as the values, so don't be confused there.

The query we run to fetch the data is very simple, like

MATCH (n:EntityType {name:$param0}) WHERE (n._updated >= $param1 AND n._created <= $param2) RETURN n

Is it that JRedisGraph client has some form of cached mapping from property ids to property names? If the graph is rebuilt, the mapping will be out-of-date.

We don't have a way to always restart all our services whenever the Redis restarts. So this kind of data corruption cracks the foundation our software is built on.

@xujiaxj
Copy link
Author

xujiaxj commented Oct 13, 2020

Forgot to mention, if we query the graph on CLI, it returns the correct data.

graph.Query dev "match (n:EntityType{name:'Service'}) return n"

1) 1) "n"
2) 1) 1) 1) 1) "id"
            2) (integer) 350
         2) 1) "labels"
            2) 1) "EntityType"
         3) 1) "properties"
            2)  1) 1) "name"
                   2) "Service"
                2) 1) "_created"
                   2) (integer) 1602617008984
                3) 1) "_updated"
                   2) (integer) 1602617374081
                4) 1) "app"
                   2) "String"
                5) 1) "kube_service"
                   2) "String"
                6) 1) "namespace"
                   2) "String"
                7) 1) "workload"
                   2) "String"
                8) 1) "lookup_workload"
                   2) "workload | deployment | statefulset | daemonset | replicaset"
               9) 1) "workload_type"
                   2) "String"

@DvirDukhan
Copy link
Contributor

@xujiaxj Thanks for reporting
JRedisGraph (as well of all our RedisGraph clients) maintains a client-side cache for mapping between properties, labels, and relationship IDs to their string values. As you wrote correctly, JRedisGraph sends the query with a --compact flag, which causes RedisGraph to return a compact representation of the results set, containing only the properties' IDs. If JRedisGraph misses any id<->string mapping, it will later trigger a procedure call to complete its mapping.
We will check this and update you.

@xujiaxj
Copy link
Author

xujiaxj commented Oct 15, 2020

Looks like the JRedisGraph refreshes its local cache only if it detects a higher ID than the current max. Presumably, this indicates a new label/relationshipType/property is created.

However, if we delete some entities and recreate a few in a way a smaller ID is reused, will we run into the same wrong mapping problem? From the code, it looks like it might.

@DvirDukhan
Copy link
Contributor

@xujiaxj
Don't be confused with entity ID and property/relationship/label ID.
RedisGraph schema IDs take the "append-only" approach, meaning that for every property/label/relationship added, the object respective mapping container is added with the id<->string mapping.
JRedisGraph caches the schema mapping, meaning it has its own view (state) of the graph schema. Since there is no re-use for mapping id, only append-only, JRedisGraph will refresh its cache when a new property/label/relationship ID has received, and it doesn't hold its id<->string mapping.
Your issue deals with graph swapping (e.g., changing the graph key's value without client awareness).

@arramos84
Copy link

Hi @DvirDukhan , it looks like we ran into this data corruption issue in our prod environment and it caused a complete outage. Looking at the metrics we collect from our redis subscriptions in the Redis Enterprise Cloud, we can see that all of the redis instances we are using were upgraded which triggered the issue. Despite the upgrade being made in all of our environments, the issue only occurred in our prod environment. The only difference being that there are more graphs in that env (5 total). Our dev environments have anywhere from 1-3.

Another thing to note is that the only graph which had no corruption issues was the first graph created for that redis instance. The other 4 graphs that were created at later dates had problems. Only restarting our services communicating with redis resolved the corruption issue.

Our subscription numbers are: #1403485 and #1335035. There are 3 redisgraph deployments: prod-monitoring-redisgraph, prod-redisgraph, and dev-redisgraph. prod-redisgraph is the only one that had the issue.

We are running JredisGraph 2.3.0.

Thanks!

@manoja1
Copy link

manoja1 commented Jul 6, 2021

Here is the time (in PT) recorded for redis graph update : 5:24 AM Jul 4 PT

Screen Shot 2021-07-06 at 12 47 57 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants