-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provenance could potentially OOM the workspace #576
Comments
The unserialized memory hit could be mostly avoided by pulling the provenance data as BSON (assuming that's possible) and then serially converting to an in memory object, making any necessary changes, serializing to JSON, and embedding in a JsonTokenStream and UObject. |
You can theoretically get raw BSON like this in private Map<ObjectId, Provenance> getProvenance(
final Map<ResolvedObjectID, Map<String, Object>> vers)
throws WorkspaceCommunicationException {
final Map<ObjectId, Map<String, Object>> provIDs = new HashMap<>();
for (final ResolvedObjectID id: vers.keySet()) {
provIDs.put((ObjectId) vers.get(id).get(Fields.VER_PROV), vers.get(id));
}
final Map<ObjectId, Provenance> ret = new HashMap<>();
final Document query = new Document(Fields.MONGO_ID,
new Document("$in", provIDs.keySet()));
try {
// TODO MEM does this reduce memory usage if we store the provenance as a string?
// should only be deserializing BSON one object at a time vs. all of them
final MongoCollection<RawBsonDocument> col = wsmongo.getCollection(
COL_PROVENANCE, RawBsonDocument.class);
for (final RawBsonDocument rbd: col.find(query)) {
// final BsonDocument bdoc = rbd.toBsonDocument(BsonDocument.class, null);
final Document dbo = wsmongo.getCodecRegistry().get(Document.class)
.decode(rbd.asBsonReader(), DecoderContext.builder().build());
final ObjectId oid = dbo.getObjectId(Fields.MONGO_ID);
// rest of the method is the same To return JTS wrapped JSON strings rather than provenance objects we'd have to ignore the SDK compiled return classes and change the return type in |
get_objects2
can return up to 10K objects, each with their own provenance, and provenance can be up to 1MB serialized. That's 10GB serialized, or 5-20x that unserialized.Save the size of the provenance in the provenance mongo doc. Before pulling the provenance check the total size and throw an error if it's over some reasonable amount (100MB?)
This is pretty unlikely to ever cause a problem - most provenance is a few KB.
The text was updated successfully, but these errors were encountered: