Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dgraph exports incorrect data in JSON and RDF formats. #3610

Closed
danielmai opened this issue Jun 27, 2019 · 3 comments · Fixed by #3682 or #4773
Closed

Dgraph exports incorrect data in JSON and RDF formats. #3610

danielmai opened this issue Jun 27, 2019 · 3 comments · Fixed by #3682 or #4773
Labels
kind/bug Something is broken. status/accepted We accept to investigate/work on it.

Comments

@danielmai
Copy link
Contributor

If you suspect this could be a bug, follow the template.

What version of Dgraph are you using?

master dbd7540

Have you tried reproducing the issue with latest release?

Yes. This issue does not happen in v1.0.15.

Steps to reproduce the issue (command/config used to run Dgraph).

  • Run 1 Dgraph Zero and bulk load the 21-million movie data set (21million.rdf.gz and 21million.schema).
  • Run 1 Dgraph Alpha with the bulk loaded data.
  • Run a JSON export:
curl localhost:8080/admin/export?format=json
  • Run an RDF export:
curl localhost:8080/admin/export

Use live loader or bulk loader to re-import the results back to Dgraph. They don't work since the exports output invalid triples. These exports are incredibly messed up.

Actual result

Trying to load the JSON export shows this error:

2019/06/27 15:44:46 Expected JSON map start. Found: ,

The beginning of the json export a single line with just a comma,

$ zcat g01.json.gz | head
[
,
  {"uid":"0x45","wpt_description@en":"Tite Kubo"},
  {"uid":"0x45","rottentomatoes_id@fi":"Tite Kubo"},
  {"uid":"0x45","produced_by@zh":"久保带人"},
  {"uid":"0x45","produced_by@hu":"Kubo Tite"},
  {"uid":"0x45","produced_by@ca":"Tite Kubo"},
  {"uid":"0x45","produced_by@ko":"쿠보 타이토"},
  {"uid":"0x45","produced_by@pt":"Tite Kubo"},
  {"uid":"0x45","produced_by@no":"Tite Kubo"},

Irrespective of the line with just the comma, when counting the number of records between the RDF and JSON exports there's missing triples in the JSON export.

And the JSON export has triples that don't make sense based on the initial data set, like these:

Schema:

name:string @index(hash,term,trigram,fulltext) @lang . 
cinematography:[uid] . 

Export data, where name is a uid and cinematography is a language string:

  {"uid":"0x20572","name":[{"uid":"0x279e38"}]},
  ...
  {"uid":"0x7ef","cinematography@en":"You'll never laugh as long and as loud again as long as you live! The laughs come so fast and so furious you'll wish it would end before you collapse!"},

The RDF export has lines like these:

<0x3f522> <name<0x1e16e4>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00> <0x4f48d> .
...
<0x13b174> <written_by\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00> <0x1226fc> .

Expected behaviour

The export data should be valid inputs to import back into Dgraph.

@danielmai danielmai added the kind/bug Something is broken. label Jun 27, 2019
@danielmai danielmai changed the title Export format issues for JSON and RDF Dgraph exports incorrect data in JSON and RDF formats. Jun 27, 2019
@danielmai
Copy link
Contributor Author

#3478 seems related.

@gitlw gitlw self-assigned this Jun 28, 2019
@danielmai danielmai added this to the Dgraph v1.1 milestone Jul 12, 2019
@MichelDiz
Copy link
Contributor

This still true in

Dgraph version   : v1.2.0
Dgraph SHA-256   : ff6a20cdd76a03a37f916b039b82bf0120b1d8ffb89edba90cecbac0652cc207
Commit SHA-1     : 24b4b7439
Commit timestamp : 2020-01-27 15:53:31 -0800
Branch           : HEAD
Go version       : go1.13.5

command used:

curl 'localhost:8080/admin/export?format=json'

Captura de Tela 2020-01-30 às 02 00 05

@MichelDiz MichelDiz reopened this Jan 30, 2020
@MichelDiz MichelDiz added the status/accepted We accept to investigate/work on it. label Jan 30, 2020
@sleto-it sleto-it removed this from the Dgraph v1.1 milestone Feb 12, 2020
@MichelDiz
Copy link
Contributor

MichelDiz commented Feb 13, 2020

Okay, that was easy to understand what is happening.

The JSON export parser. Somehow is not ignoring the deleted nodes.
As you can see in the print bellow, the error happens after deleting one of the 3 nodes created.

Captura de Tela 2020-02-12 às 23 59 51

First, I exported the just loaded DB. And then deleted the 0x15

{
  delete {
    <0x15> <initial_release_date> * .
  }
}

And exported it again

curl 'localhost:8080/admin/export?format=json'

Boom! the ghost node appears letting his comma behind.

The test was done with the version bellow

Dgraph version   : v2.0.0-beta1
Dgraph SHA-256   : 178663a98a3d59879a3d5c42928c89eb5f83afc2bfc0093272941e7a53515847
Commit SHA-1     : 6fac5d7c4
Commit timestamp : 2020-01-30 14:45:54 +1100
Branch           : HEAD
Go version       : go1.13.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken. status/accepted We accept to investigate/work on it.
4 participants