Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize fixed-point conversion #1315

Open
3 tasks
rachitnigam opened this issue Dec 19, 2022 · 2 comments
Open
3 tasks

Optimize fixed-point conversion #1315

rachitnigam opened this issue Dec 19, 2022 · 2 comments
Labels
C: fud Calyx Driver good first issue Good issue to start contributing on Calyx S: Available Can be worked upon

Comments

@rachitnigam
Copy link
Contributor

Writing some some investigating I did in the json_to_dat code we have for converting floating point numbers to fixed point. Currently, conversion to fixed-point is a very big overhead when working with big input files. For example, when converting the lenet input data file, we take almost 25 seconds to convert the largest data array.

Profiling points to this line of code: https://github.com/cucapra/calyx/blob/54c68896bb7aa1d7138d4cff05866052b1dbc62d/fud/fud/stages/verilator/numeric_types.py#L173

The Fraction class in python attempts to convert the floating-point number into a rational by computing the numerator and denominator. Unfortunately, this computation is very slow because it attempts to provide a precise answer. Another code path exacerbates the problem: https://github.com/cucapra/calyx/blob/54c68896bb7aa1d7138d4cff05866052b1dbc62d/fud/fud/stages/verilator/json_to_dat.py#L135-L143

The code attempts to convert the float into fixed and uses the exception thrown from conversion to decide whether to round the number. Exception-based control flow is a bad idea and we should instead check if the number needs rounding before calling the conversion method. A quick (incorrect) change to the code shows that we can get back 9 seconds by doing this.

Unfortunately this leaves a lot of performance on the table which cannot be recovered without substantial changes. Specifically, conversion to Fraction is always going to be slow. For example, a tight loop of converting random float numbers to fixed-point shows that we can convert 100,000 numbers in 15 seconds. The trend seems to hold if we use gmpy or fxpmath libraries.

On the other hand, rust's fixed library is faster–10,000,000 in 17 seconds. Unfortunately, something like googlenet, which is significantly larger, even this will probably not be fast enough. I wonder if there is a fundamentally faster way to convert to fixed-point...

Anyways, here is recommended course of action in order of increasing difficulty:

  • Implement function to figure out if a number needs rounding and remove the exception-based rounding
  • Try to remove/reduce use of the Fraction class
  • Implement a fixed-based conversion json_to_dat implementation
@rachitnigam rachitnigam added S: Available Can be worked upon C: fud Calyx Driver good first issue Good issue to start contributing on Calyx labels Dec 19, 2022
@sampsyo
Copy link
Contributor

sampsyo commented Dec 19, 2022

It's a good point that this data-conversion stuff could be better! With apologies for zooming out quite a bit, here are some thoughts I have been meaning to write down for a while:

  • I think converting between textual, human-readable formats like JSON and numerical data for a variety of formats is a big enough problem that it deserves a kind of standalone treatment. We have a hacky one-off built into fud, but this is a general issue that could be "done right" if given independent attention. I abandoned one attempt to do that a long time ago, but this was the spirit: https://github.com/sampsyo/samizdat
  • I also don't love the details of the particular JSON format we currently have. It always seemed weird that we stored the data type alongside the values, even though you can of course convert the same human-readable/decimal values to many different literal data types. Which is another reason to rethink this part of the pipeline altogether.
  • When looking into engines for actually doing the data conversions a while back, and also into hand-implementing it myself, I found that it's just pretty complicated to do in an arbitrary-precision way. It's probably not so bad if all the precisions you care about are sub-word (i.e., you guarantee your bit widths are like 64 or smaller or something, as the fixed crate assumes), but handling "big" numbers is just going to be slow IMO. Something like fixed could of course be useful as a "fast path" with a general-case fallback behind it.
  • On the other hand, if the goal is making the tools more usable (shortening the compile/edit/run cycle), the higher priority is probably to just cache the converted data. That is, there is no reason fud e foo.futil -s verilog.data my.dat should re-convert my.dat every time; during development, it tends to change a lot less than foo.futil.

@rachitnigam
Copy link
Contributor Author

Another arbitrary precision lib in rust: https://www.malachite.rs/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: fud Calyx Driver good first issue Good issue to start contributing on Calyx S: Available Can be worked upon
Projects
None yet
Development

No branches or pull requests

2 participants