Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple evaluations of same source with different contexts maxes out RAM #171

Open
Nerglej opened this issue Jun 27, 2024 · 2 comments
Open

Comments

@Nerglej
Copy link

Nerglej commented Jun 27, 2024

Hello!

We're trying to manipulate a big amount of data in jsonlines format, which means we're looping through up to 100.000 lines or even more. At the start we had a libsonnet file which would be imported for every line, but we ended up manually including our libsonnet file and merged the library with the code as one big file. To not parse the code for every line I remade the code for the parse_snippet for the State object, and parsed it outside the loop.

The problem I'm having is that my RAM usage maxes out almost immediately. I suppose there's a memory leak somewhere in the library, since I've tried removing all references to jrsonnet, and added std::thread::sleep to simulate the processing time that jrsonnet had, and wrote arbitary data to the output instead. It took the same amount of time, but didn't use anywhere near all of my RAM, so my conclusion is that it's Jrsonnet.

I'm currently on 0.5.0-pre95 from Cargo, and I'm completely aware that it isn't a full release yet, this is merely to help make it even greater!

You'll just get the whole Rust file (it's part of a bigger library, so won't necessarily work on it's own. Let me know if a minimal script really is necessary):

use std::io::{self, BufRead, BufWriter, Write};

use jrsonnet_evaluator::{
    error::ErrorKind::ImportSyntaxError, evaluate, manifest, parser::LocExpr, trace::PathResolver,
    State,
};
use jrsonnet_parser::{IStr, ParserSettings, Source};
use jrsonnet_stdlib::ContextInitializer;
use log::{info, trace};
use serde::Deserialize;

use crate::{InputReader, Writer};

#[derive(Deserialize)]
pub struct JsonnetExporter {
    pub file: JsonnetLibrary,
    pub librarires: Vec<JsonnetLibrary>,
}

impl JsonnetExporter {
    pub fn new(file: JsonnetLibrary, library_contents: Vec<JsonnetLibrary>) -> Self {
        JsonnetExporter {
            file,
            librarires: library_contents,
        }
    }
}

#[derive(Deserialize)]
pub struct JsonnetTemplate {
    pub file: String,
    pub library_paths: Option<Vec<String>>,
}

/// A jsonnet library in code, not a filepath
#[derive(Deserialize)]
pub struct JsonnetLibrary(pub String);

pub fn export(
    input: &mut InputReader,
    output: &mut Writer,
    jsonnet: &JsonnetExporter,
) -> io::Result<()> {
    let mut out = BufWriter::new(output);

    // Merge libraries and file
    let code: Vec<&str> = jsonnet.librarires.iter().map(|v| v.0.as_str()).collect();
    let library_code = code.join("\n");

    let code = &jsonnet.file.0;

    trace!("Merging libraries and files");

    let merge = format!("{}{}", library_code, code);

    trace!("Parsing merged files");

    // Read merged code and parse it, to avoid parsing for every item
    let source = get_source("<jsonnet_exporter>", &merge);
    let parsed = parse_snippet(&source, &merge).unwrap();

    info!("Started exporting via jsonnet");

    let mut buf = String::new();
    let state = State::default();

    while input.read_line(&mut buf).unwrap() > 0 {
        let ctx = ContextInitializer::new(state.clone(), PathResolver::Absolute);
        let _ = ctx.add_ext_code("item", &buf).unwrap();

        state.set_context_initializer(ctx);

        let res = evaluate(state.create_default_context(source.clone()), &parsed).unwrap();
        let output = res.manifest(manifest::StringFormat).unwrap();

        out.write_all(output.as_bytes())?;

        // Cleanup
        out.flush().unwrap();
        buf.clear();
    }

    info!("Finished exporting file");

    Ok(())
}

fn get_source(name: impl Into<IStr>, code: impl Into<IStr>) -> Source {
    let code = code.into();
    let source = Source::new_virtual(name.into(), code.clone());
    source
}

fn parse_snippet(
    source: &Source,
    code: impl Into<IStr>,
) -> jrsonnet_evaluator::error::Result<LocExpr> {
    let code = code.into();
    let parsed: LocExpr = jrsonnet_parser::parse(
        &code,
        &ParserSettings {
            source: source.clone(),
        },
    )
    .map_err(|e| ImportSyntaxError {
        path: source.clone(),
        error: Box::new(e),
    })?;

    Ok(parsed)
}
@Nerglej Nerglej changed the title Multiple evaluations of same source with different contexts Multiple evaluations of same source with different contexts maxes out RAM Jun 27, 2024
@CertainLach
Copy link
Owner

For long-running applications you need to collect garbage sometimes, it is not fully automatic (There was a branch with automated GC based on linux's memory pressure information, but it is not stable, and only works on linux)
jrsonnet_gcmodule::collect_thread_cycles()

Note that collection happens per-thread, collecting garbage in one thread will not affect other threads.
You can check how many objects are allocated with jrsonnet_gcmodule::count_thread_tracked()

@Nerglej
Copy link
Author

Nerglej commented Jul 3, 2024

Okay thank you. I ended up collecting the garbage for x amount of loops, and it helped a lot on memory usage. Thanks! This should probably be documented somewhere😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants