Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk loader fails on some exports #3571

Closed
KevOrr opened this issue Jun 18, 2019 · 6 comments
Closed

Bulk loader fails on some exports #3571

KevOrr opened this issue Jun 18, 2019 · 6 comments
Assignees
Labels
area/bulk-loader Issues related to bulk loading. area/import-export Issues related to data import and export. kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it.
Milestone

Comments

@KevOrr
Copy link

KevOrr commented Jun 18, 2019

If you suspect this could be a bug, follow the template.

  • What version of Dgraph are you using?
    v1.0.15

  • Have you tried reproducing the issue with latest release?
    Yes

  • What is the hardware spec (RAM, OS)?
    Linux, 6GiB

  • Steps to reproduce the issue (command/config used to run Dgraph).

  1. Run this mutation:
{
  set {
    _:node <name> "me" (kind="<") .
  }
}
  1. Export the graph using curl http://localhost:8080/admin/export
  2. Start new zero
  3. Run dgraph bulk on the export
  • Expected behaviour and actual result.
    The < character in the facet is exported as "\u003c" instead of "<" (also > is exported as "\u003e"). This causes dgraph bulk to choke as shown in the log below:
dgraph bulk log

Dgraph version   : v1.0.15
Commit SHA-1     : ff5ee1e2
Commit timestamp : 2019-05-30 15:46:55 -0700
Branch           : HEAD
Go version       : go1.12.5

For Dgraph official documentation, visit https://docs.dgraph.io. For discussions about Dgraph , visit https://discuss.dgraph.io. To say hi to the community , visit https://dgraph.slack.com.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License. Copyright 2015-2018 Dgraph Labs, Inc.
{ "RDFDir": "db/less/g01.rdf.gz", "JSONDir": "", "SchemaFile": "db/less/g01.schema.gz", "DgraphsDir": "out", "TmpDir": "tmp", "NumGoroutines": 4, "MapBufSize": 67108864, "ExpandEdges": true, "SkipMapPhase": false, "CleanupTmp": true, "NumShufflers": 1, "Version": false, "StoreXids": false, "ZeroAddr": "localhost:5080", "HttpAddr": "localhost:8080", "IgnoreErrors": false, "CustomTokenizers": "", "MapShards": 1, "ReduceShards": 1 }
Connecting to zero at localhost:5080 badger 2019/06/18 14:20:33 INFO: All 0 tables opened in 0s Processing file (1 out of 1): db/less/g01.rdf.gz 2019/06/18 14:20:33 Expected , or ) or text but found while lexing <_:uid1> "me" (kind="\u003c") .: Not a valid escape char: 'u' while parsing line "<_:uid1> \"me\" (kind=\"\\u003c\") .\n" github.com/dgraph-io/dgraph/dgraph/cmd/bulk.rdfChunker.parse /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/chunk.go:110 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).run /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/mapper.go:124 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).mapStage.func1 /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:207 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1337
github.com/dgraph-io/dgraph/x.Wrap /tmp/go/src/github.com/dgraph-io/dgraph/x/error.go:91 github.com/dgraph-io/dgraph/x.Check /tmp/go/src/github.com/dgraph-io/dgraph/x/error.go:41 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).run /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/mapper.go:130 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).mapStage.func1 /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:207 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1337

PS: maybe this should be a separate issue, but the bulk loader also craps out on an empty file. E.g., create an empty graph, export, and try to bulk import it. I'd be happy to open a second issue if desired.

@MichelDiz
Copy link
Contributor

A work around this would be doing JSON escaping and unescaping.

@KevOrr
Copy link
Author

KevOrr commented Jun 20, 2019

Thanks for your reply. I'm not sure how JSON encoding the string will help. The string ">" will be encoded as "\">\"" and the exporter will still erroneously replace > with \u003e, which the importer doesn't understand.

@danielmai
Copy link
Contributor

This looks like an issue with how we handle facet strings in N-Quad triples. This same mutation works properly in the JSON mutation format:

{"uid":"_:node","name":"me","name|kind":"\u003c"}

@KevOrr
Copy link
Author

KevOrr commented Jun 20, 2019

@danielmai that seems to produce the same RDF on export.

$ curl -X POST localhost:8080/mutate -H 'X-Dgraph-MutationType: json' -H 'X-Dgraph-CommitNow: true' -d  $'
    {
      "set": [
        {"uid":"_:node","name":"me","name|kind":"\u003c"}
      ]
    }'
$ curl localhost:8008/admin/export
$ gzip -dc <export>/g01.rdf.gz
<_:uid1> <name> "me"^^<xs:string> (kind="\u003c") .

Then starting a new zero and running dgraph live dgraph bulk fails in the same way:


Dgraph version   : v1.0.15
Commit SHA-1     : ff5ee1e2
Commit timestamp : 2019-05-30 15:46:55 -0700
Branch           : HEAD
Go version       : go1.12.5

For Dgraph official documentation, visit https://docs.dgraph.io. For discussions about Dgraph , visit https://discuss.dgraph.io. To say hi to the community , visit https://dgraph.slack.com.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License. Copyright 2015-2018 Dgraph Labs, Inc.

{ "RDFDir": "/exports/less/g01.rdf.gz", "JSONDir": "", "SchemaFile": "/exports/less/g01.schema.gz", "DgraphsDir": "out", "TmpDir": "tmp", "NumGoroutines": 4, "MapBufSize": 67108864, "ExpandEdges": true, "SkipMapPhase": false, "CleanupTmp": true, "NumShufflers": 1, "Version": false, "StoreXids": false, "ZeroAddr": "localhost:5082", "HttpAddr": "localhost:2600", "IgnoreErrors": false, "CustomTokenizers": "", "MapShards": 1, "ReduceShards": 1 } Connecting to zero at localhost:5082 Error communicating with dgraph zero, retrying: rpc error: code = Unknown desc = Assigning IDs is only allowed on leader.badger 2019/06/20 15:11:39 INFO: All 0 tables opened in 0s Processing file (1 out of 1): /exports/less/g01.rdf.gz 2019/06/20 15:11:39 Expected , or ) or text but found while lexing <_:uid1> "me"^^ (kind="\u003c") .: Not a valid escape char: 'u' while parsing line "<_:uid1> \"me\"^^ (kind=\"\\u003c\") .\n" github.com/dgraph-io/dgraph/dgraph/cmd/bulk.rdfChunker.parse /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/chunk.go:110 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).run /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/mapper.go:124 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).mapStage.func1 /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:207 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1337
github.com/dgraph-io/dgraph/x.Wrap /tmp/go/src/github.com/dgraph-io/dgraph/x/error.go:91 github.com/dgraph-io/dgraph/x.Check /tmp/go/src/github.com/dgraph-io/dgraph/x/error.go:41 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).run /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/mapper.go:130 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).mapStage.func1 /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:207 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1337

@danielmai danielmai added the kind/bug Something is broken. label Jun 26, 2019
@campoy campoy added area/import-export Issues related to data import and export. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it. labels Sep 13, 2019
@campoy campoy added this to the Dgraph v1.1.1 milestone Sep 13, 2019
@campoy campoy added the area/bulk-loader Issues related to bulk loading. label Sep 13, 2019
@martinmr martinmr self-assigned this Nov 6, 2019
@martinmr
Copy link
Contributor

martinmr commented Nov 6, 2019

The unicode issue was fixed by #4175 but it needs to be cherry-picked into the 1.0 release branch. I'll do just that.

I'll also look into the second issue (bulk loader fails on an empty graph).

@martinmr
Copy link
Contributor

This has been fixed.

Also, the bulk loader fails with a more descriptive warning when no data is passed to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bulk-loader Issues related to bulk loading. area/import-export Issues related to data import and export. kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it.
Development

No branches or pull requests

5 participants