Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf improvements #58

Merged
merged 7 commits into from
May 12, 2021
Merged

Perf improvements #58

merged 7 commits into from
May 12, 2021

Conversation

RickMoynihan
Copy link
Member

@RickMoynihan RickMoynihan commented May 11, 2021

I've seen these changes improve real world pipeline performance somewhere between 30-50%.

By far the most significant performance increase is the fix for #56.

Avoiding the work of preping length validations when there aren't any makes is a very small improvement (maybe 200-400ms quicker on a 15s) input, but I think it's at least sensible to try and avoid the cost/complexity of a feature if we're not using it (though the approach here could be improved further, by compiling a function to do the specific validations earlier).

The fix for #59 is a very small performance improvement on inputs I've tested (barely noticeable) but should the interning scares me and may cause GC problems in large pipelines, so better to avoid this altogether.

This allows easier REPL testing and profiling.
Pretty small performance improvement, just 200-400ms on my 1.4mb input
from ~10.4s to just under ~10s when running:

(time (inner-main ["-t" "out/hmrc-rts-small-area.csv" "-u" "out/hmrc-rts-small-area.csv-metadata.json" "-m" "minimal" "-o" "cube.nt"]))

However makes sense to avoid this work.

We could certainly avoid more work on things like this, but it would
be a bigger change.
On small to moderate inputs there is little difference in performance
for this fix.

e.g. the 4mb cn8.csv currently takes ~15.5 seconds to process with or
without this change.
@RickMoynihan RickMoynihan marked this pull request as ready for review May 12, 2021 11:56
Another small perf advantage on hmrc-rts-small-area.csv ~1% faster
Copy link
Contributor

@lkitching lkitching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@RickMoynihan
Copy link
Member Author

Thanks for checking @lkitching 🙇

@RickMoynihan RickMoynihan merged commit c79bcb1 into master May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants