You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're trying to manipulate a big amount of data in jsonlines format, which means we're looping through up to 100.000 lines or even more. At the start we had a libsonnet file which would be imported for every line, but we ended up manually including our libsonnet file and merged the library with the code as one big file. To not parse the code for every line I remade the code for the parse_snippet for the State object, and parsed it outside the loop.
The problem I'm having is that my RAM usage maxes out almost immediately. I suppose there's a memory leak somewhere in the library, since I've tried removing all references to jrsonnet, and added std::thread::sleep to simulate the processing time that jrsonnet had, and wrote arbitary data to the output instead. It took the same amount of time, but didn't use anywhere near all of my RAM, so my conclusion is that it's Jrsonnet.
I'm currently on 0.5.0-pre95 from Cargo, and I'm completely aware that it isn't a full release yet, this is merely to help make it even greater!
You'll just get the whole Rust file (it's part of a bigger library, so won't necessarily work on it's own. Let me know if a minimal script really is necessary):
use std::io::{self,BufRead,BufWriter,Write};use jrsonnet_evaluator::{
error::ErrorKind::ImportSyntaxError, evaluate, manifest, parser::LocExpr, trace::PathResolver,State,};use jrsonnet_parser::{IStr,ParserSettings,Source};use jrsonnet_stdlib::ContextInitializer;use log::{info, trace};use serde::Deserialize;usecrate::{InputReader,Writer};#[derive(Deserialize)]pubstructJsonnetExporter{pubfile:JsonnetLibrary,publibrarires:Vec<JsonnetLibrary>,}implJsonnetExporter{pubfnnew(file:JsonnetLibrary,library_contents:Vec<JsonnetLibrary>) -> Self{JsonnetExporter{
file,librarires: library_contents,}}}#[derive(Deserialize)]pubstructJsonnetTemplate{pubfile:String,publibrary_paths:Option<Vec<String>>,}/// A jsonnet library in code, not a filepath#[derive(Deserialize)]pubstructJsonnetLibrary(pubString);pubfnexport(input:&mutInputReader,output:&mutWriter,jsonnet:&JsonnetExporter,) -> io::Result<()>{letmut out = BufWriter::new(output);// Merge libraries and filelet code:Vec<&str> = jsonnet.librarires.iter().map(|v| v.0.as_str()).collect();let library_code = code.join("\n");let code = &jsonnet.file.0;trace!("Merging libraries and files");let merge = format!("{}{}", library_code, code);trace!("Parsing merged files");// Read merged code and parse it, to avoid parsing for every itemlet source = get_source("<jsonnet_exporter>",&merge);let parsed = parse_snippet(&source,&merge).unwrap();info!("Started exporting via jsonnet");letmut buf = String::new();let state = State::default();while input.read_line(&mut buf).unwrap() > 0{let ctx = ContextInitializer::new(state.clone(),PathResolver::Absolute);let _ = ctx.add_ext_code("item",&buf).unwrap();
state.set_context_initializer(ctx);let res = evaluate(state.create_default_context(source.clone()),&parsed).unwrap();let output = res.manifest(manifest::StringFormat).unwrap();
out.write_all(output.as_bytes())?;// Cleanup
out.flush().unwrap();
buf.clear();}info!("Finished exporting file");Ok(())}fnget_source(name:implInto<IStr>,code:implInto<IStr>) -> Source{let code = code.into();let source = Source::new_virtual(name.into(), code.clone());
source
}fnparse_snippet(source:&Source,code:implInto<IStr>,) -> jrsonnet_evaluator::error::Result<LocExpr>{let code = code.into();let parsed:LocExpr = jrsonnet_parser::parse(&code,&ParserSettings{source: source.clone(),},).map_err(|e| ImportSyntaxError{path: source.clone(),error:Box::new(e),})?;Ok(parsed)}
The text was updated successfully, but these errors were encountered:
Nerglej
changed the title
Multiple evaluations of same source with different contexts
Multiple evaluations of same source with different contexts maxes out RAM
Jun 27, 2024
For long-running applications you need to collect garbage sometimes, it is not fully automatic (There was a branch with automated GC based on linux's memory pressure information, but it is not stable, and only works on linux) jrsonnet_gcmodule::collect_thread_cycles()
Note that collection happens per-thread, collecting garbage in one thread will not affect other threads.
You can check how many objects are allocated with jrsonnet_gcmodule::count_thread_tracked()
Okay thank you. I ended up collecting the garbage for x amount of loops, and it helped a lot on memory usage. Thanks! This should probably be documented somewhere😊
Hello!
We're trying to manipulate a big amount of data in jsonlines format, which means we're looping through up to 100.000 lines or even more. At the start we had a libsonnet file which would be imported for every line, but we ended up manually including our libsonnet file and merged the library with the code as one big file. To not parse the code for every line I remade the code for the parse_snippet for the State object, and parsed it outside the loop.
The problem I'm having is that my RAM usage maxes out almost immediately. I suppose there's a memory leak somewhere in the library, since I've tried removing all references to jrsonnet, and added std::thread::sleep to simulate the processing time that jrsonnet had, and wrote arbitary data to the output instead. It took the same amount of time, but didn't use anywhere near all of my RAM, so my conclusion is that it's Jrsonnet.
I'm currently on 0.5.0-pre95 from Cargo, and I'm completely aware that it isn't a full release yet, this is merely to help make it even greater!
You'll just get the whole Rust file (it's part of a bigger library, so won't necessarily work on it's own. Let me know if a minimal script really is necessary):
The text was updated successfully, but these errors were encountered: