Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provenance could potentially OOM the workspace #576

Open
MrCreosote opened this issue Apr 10, 2022 · 2 comments
Open

Provenance could potentially OOM the workspace #576

MrCreosote opened this issue Apr 10, 2022 · 2 comments

Comments

@MrCreosote
Copy link
Member

MrCreosote commented Apr 10, 2022

get_objects2 can return up to 10K objects, each with their own provenance, and provenance can be up to 1MB serialized. That's 10GB serialized, or 5-20x that unserialized.

Save the size of the provenance in the provenance mongo doc. Before pulling the provenance check the total size and throw an error if it's over some reasonable amount (100MB?)

This is pretty unlikely to ever cause a problem - most provenance is a few KB.

@MrCreosote
Copy link
Member Author

The unserialized memory hit could be mostly avoided by pulling the provenance data as BSON (assuming that's possible) and then serially converting to an in memory object, making any necessary changes, serializing to JSON, and embedding in a JsonTokenStream and UObject.

@MrCreosote
Copy link
Member Author

You can theoretically get raw BSON like this in MongoWorkspaceDB:

private Map<ObjectId, Provenance> getProvenance(
			final Map<ResolvedObjectID, Map<String, Object>> vers)
			throws WorkspaceCommunicationException {
		final Map<ObjectId, Map<String, Object>> provIDs = new HashMap<>();
		for (final ResolvedObjectID id: vers.keySet()) {
			provIDs.put((ObjectId) vers.get(id).get(Fields.VER_PROV), vers.get(id));
		}
		final Map<ObjectId, Provenance> ret = new HashMap<>();
		final Document query = new Document(Fields.MONGO_ID,
				new Document("$in", provIDs.keySet()));
		try {
			// TODO MEM does this reduce memory usage if we store the provenance as a string?
			// should only be deserializing BSON one object at a time vs. all of them
			final MongoCollection<RawBsonDocument> col = wsmongo.getCollection(
					COL_PROVENANCE, RawBsonDocument.class);
			for (final RawBsonDocument rbd: col.find(query)) {
//				final BsonDocument bdoc = rbd.toBsonDocument(BsonDocument.class, null);
				final Document dbo = wsmongo.getCodecRegistry().get(Document.class)
						.decode(rbd.asBsonReader(), DecoderContext.builder().build());
				final ObjectId oid = dbo.getObjectId(Fields.MONGO_ID);
// rest of the method is the same

To return JTS wrapped JSON strings rather than provenance objects we'd have to ignore the SDK compiled return classes and change the return type in WorkspaceServer to the new class type or Object (which is what JSONServerServlet expects anyway). Every time the server was recompiled the return types would be overwritten, and so that'd have to be fixed on recompile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant