-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing issues in export #3682
Fixing issues in export #3682
Conversation
… is called concurrently, and adding locks to protect the counter would be too expensive
…e validation should be run on individual files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @gitlw and @manishrjain)
worker/export.go, line 554 at r1 (raw file):
separator = []byte(",\n") case "rdf": // the separator for RDF should be empty since the toRDF function already
minor: comments with uppercase and periods at the end.
Also for the comments below.
worker/export.go, line 573 at r1 (raw file):
if kv.Version == 1 { // only insert separator for data if hasDataBefore {
we are accessing hasBefore
from multiple go routines without any sort of synchronization. Is that safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @manishrjain and @martinmr)
worker/export.go, line 573 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
we are accessing
hasBefore
from multiple go routines without any sort of synchronization. Is that safe?
The Stream.Send function is only called from a single go routine. So it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @manishrjain)
worker/export.go, line 573 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
The Stream.Send function is only called from a single go routine. So it should be fine.
fixes #3610 #3478
As described in #3610, there are two main issues in the export implementation
the json output begins with a , with no entry before it
The current code uses the variable exporter.counter to track the current line number in the output, and tries to insert the separator ",\n" when the line number is not 1.
However, the problem is that the badger Stream.KeyToList function is called in parallel is multiple goroutines, and they would all be accessing the same shared exporter.counter variable, which means when checking if exporter.counter has the value of 1, other goroutines most likely has updated its value to be above 1, hence causing the first output to have an extra separator ",\n" prepended.
some predicate has weird output that does not confirm to its schema. For example, the name predicate has the type string, but the output would contain entries of the name predicate with an uid value.
Again, the reason is that the KeyToList function is called in parallel, and our current logic uses a shared exporter variable in the function. This PR changes it so that each go routine has its own local exporter variable.
I've verified that after the change, the json and rdf output are no longer messed up, and the exported rdf and json files can be re-imported into dgraph.
This change is