Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation process appears to exhaust all available memory #31

Open
lucaswerkmeister opened this issue Jun 8, 2018 · 5 comments
Open

Comments

@lucaswerkmeister
Copy link
Collaborator

lucaswerkmeister commented Jun 8, 2018

I’ve reduced my shape expression to this fairly short snippet:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX ex: <http://example.com/>

ex:A {
  schema:description rdf:langString+;
  wdt:P106 @ex:B
}

ex:B {
  schema:description rdf:langString+;
  schema:name rdf:langString+;
  rdfs:label rdf:langString+
}

Let’s say you’ve saved this as /tmp/ex.shex. If you then try to validate this against Q19296, a fairly small Wikidata example item, using the command

bin/validate \
  --data 'http://www.wikidata.org/entity/Q19296' \
  --shex '/tmp/ex.shex' \
  --node 'http://www.wikidata.org/entity/Q19296' \
  --shape 'http://example.com/A'

then Node.js will fairly quickly (within less than ten seconds on my system) crash with a fatal out-of-memory error. (You probably don’t want to try this in a browser – for me, that hung up the whole system for a while.)

<--- Last few GCs --->

[28338:0x5573831c1570]     5324 ms: Scavenge 1383.4 (1410.8) -> 1383.3 (1412.3) MB, 3.9 / 0.0 ms  (average mu = 0.201, current mu = 0.137) allocation failure                                                     
[28338:0x5573831c1570]     5329 ms: Scavenge 1384.6 (1412.3) -> 1384.5 (1413.3) MB, 4.1 / 0.0 ms  (average mu = 0.201, current mu = 0.137) allocation failure                                                     
[28338:0x5573831c1570]     5335 ms: Scavenge 1386.8 (1414.5) -> 1386.8 (1416.0) MB, 3.7 / 0.0 ms  (average mu = 0.201, current mu = 0.137) allocation failure                                                     


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x2b39af19e589 <JSObject>
    0: builtin exit frame: concat(this=0x20b22af022d1 <JSArray[149730]>,0x20b22af02279 <JSArray[155]>,0x20b22af022d1 <JSArray[149730]>)                                                                           

    1: /* anonymous */ [0x1d8016002239] [/home/lucas/git/shex.js/lib/regex/threaded-val-nerr.js:~192] [pc=0x12bbaa51b858](this=0x78964086519 <JSGlobal Object>,nextThreads=0x20b22af022d1 <JSArray[149730]>,exprThread=0x19da8e58ced9 <Object map = 0x5c64157d91>)
  ...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: node::Abort() [node]
 2: 0x557381b65c1f [node]
 3: v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0x55738208f3b3 [node]
 6: 0x55738208f505 [node]
 7: v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]                                                                                                      
 8: v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]                                                                          
 9: v8::internal::Heap::AllocateRawWithRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [node]                                                                                        
10: v8::internal::Factory::AllocateRawArray(int, v8::internal::PretenureFlag) [node]
11: v8::internal::Factory::NewFixedArrayWithFiller(v8::internal::Heap::RootListIndex, int, v8::internal::Object*, v8::internal::PretenureFlag) [node]                                                             
12: v8::internal::Factory::NewJSArrayStorage(v8::internal::Handle<v8::internal::JSArray>, int, int, v8::internal::ArrayStorageAllocationMode) [node]                                                              
13: v8::internal::Factory::NewJSArray(v8::internal::ElementsKind, int, int, v8::internal::ArrayStorageAllocationMode, v8::internal::PretenureFlag) [node]                                                         
14: v8::internal::ElementsAccessor::Concat(v8::internal::Isolate*, v8::internal::Arguments*, unsigned int, unsigned int) [node]                                                                                   
15: 0x557381d88649 [node]
16: 0x557381d8f78f [node]
17: 0x12bbaa109efd
Aborted (core dumped)

Commenting out any of the lines in the ShEx code makes the validation pass, so it feels like this is not a problem with any particular part of the shape, but rather like shex.js is just getting overwhelmed by the sheer amount of labels and descriptions?

Any ideas what could be done here? :/

@lucaswerkmeister
Copy link
Collaborator Author

So I’ve spent some more time trying to arrive at shape expressions that don’t crash shex.js, and I found something odd. Given the data and schema in this gist, validating http://www.wikidata.org/entity/Q42 against http://www.wikidata.org/entity/Q5 will succeed – but only because I’ve commented out two wdt:P40 (“child”) links: with those lines not commented out, shex.js crashes. Is shex.js perhaps trying to validate the same nodes against the same shapes again and again, because of the circular references between child items (P40) and parent items (P22, P25)?

@ericprud
Copy link
Contributor

ericprud commented Sep 5, 2018

I tried reproducing this in shex-simple but I don't think I've got the right shapemap. I didn't see http://www.wikidata.org/entity/Q19296 in the gist data.

@lucaswerkmeister
Copy link
Collaborator Author

Sorry, the example in the gist is completely independent from the one in the original bug report. The minimal schema from the issue description (ex:A, ex:B) causes an error when you let shex.js download the data for Q19296, but not when you use the data file from the gist (probably because the gist doesn’t contain label or description triples). The data file from the gist, on the other hand, produces an error only with the schema from the gist, and only if the wdt:P40 lines are uncommented.

@thadguidry
Copy link
Contributor

thadguidry commented May 30, 2019

This is a lot easier to see if you use Chrome Latest and then profile the Performance while validating. Areas in the source code show highlighted function call timing in ms after profile.

shex-simple js

Each validated object uses a certain amount of memory for the object plus the graph it retains in bytes, and how much depends on the Object Properties. (your mileage may vary), but take a look using Chrome Memory Allocation timeline and the Containment / Retained Size.

Memory_Allocation_Timeline

@ericprud
Copy link
Contributor

ericprud commented Jun 1, 2019

Tx, @thadguidry! I'd never ventured that far to the right in the Chrome debugger.
I guess the issue is that I need t throw stuff away, in particular, error reports, once they're represented in the UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants