Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2377] [Spike] Performance improvement to memory footprint #7281

Closed
nathaniel-may opened this issue Apr 5, 2023 · 2 comments · Fixed by #7371
Closed

[CT-2377] [Spike] Performance improvement to memory footprint #7281

nathaniel-may opened this issue Apr 5, 2023 · 2 comments · Fixed by #7371
Assignees
Labels

Comments

@nathaniel-may
Copy link
Contributor

nathaniel-may commented Apr 5, 2023

When a macro runs a call statement with fetch_result set to True, that result may be held in memory longer than it needs to be.

@nathaniel-may nathaniel-may added enhancement New feature or request performance labels Apr 5, 2023
@github-actions github-actions bot changed the title [Spike] Performance improvement to memory footprint [CT-2377] [Spike] Performance improvement to memory footprint Apr 5, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 5, 2023

@nathaniel-may Thanks for opening! (I'd been meaning to)

More context in this internal Slack thread

As a general rule, should we be garbage-collecting the context from each NodeRunner, after that node finishes compiling/executing?

@iknox-fa
Copy link
Contributor

iknox-fa commented Apr 17, 2023

Spike resulted in a fix since it was easy-- but for future reference:

The issue at hand seems to be that ProviderContext and it's subclasses have an attribute dict called sql_results where all of the results from statement calls get stored via the store_results and/or store_raw_results context functions. Over time the number of entries in that dict grows and so does the memory used. The ideal solution would be to clear the dict when each calling macro is complete but unfortunately, determining the lifecycle of these classes once instantiated is a little tricky because it happens in the depths of the jinja code where we can't really effect the outcome (I think... I am def getting into language team things here).

That said, we can control what happens when we read the values via the load_results context function, so we can do something clever and remove the value from the dict as it's being returned. What this means is that <type>Context.sql_results will only ever get as big as the data you've put in but not yet assigned to a variable in jinja. For most use cases I think this works, however it does mean that we can only ever call {% set foo = load_result(result_name) %} once.

Instinctually, that feels like a best practice anyway, but I'd be curious to hear what others think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@nathaniel-may @jtcohen6 @iknox-fa and others