Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement static data slicing #948

Closed
sffc opened this issue Aug 11, 2021 · 2 comments · Fixed by #1480
Closed

Implement static data slicing #948

sffc opened this issue Aug 11, 2021 · 2 comments · Fixed by #1480
Assignees
Labels
C-data-infra Component: provider, datagen, fallback, adapters S-large Size: A few weeks (larger feature, major refactoring) T-core Type: Required functionality
Milestone

Comments

@sffc
Copy link
Member

sffc commented Aug 11, 2021

I have been talking about data slicing via static code analysis for a long time. I did early experiments to demonstrate that this is possible. This issue is to track the eventual implementation of this feature.

How it will work:

  1. Dead code elimination produces an optimized binary
  2. We inspect the optimized binary to find all ResourceKeys
  3. We build a new data bundle with only the matched keys

Step 2 is the tricky part. Some experimentation will be necessary. The solution which I know will work would be to conditionally annotate the data model of ResourceKey with a special value that can be found via strings(1) or a similar tool.

@sffc sffc added T-core Type: Required functionality C-data-infra Component: provider, datagen, fallback, adapters S-epic Size: Major project (create smaller child issues) labels Aug 11, 2021
@sffc sffc added this to the ICU4X 0.5 milestone Aug 11, 2021
@sffc sffc self-assigned this Aug 11, 2021
@iainireland
Copy link
Contributor

Is the expectation that users will build an optimized binary, inspect it offline, and emit a key file to feed into datagen, then check in the results of datagen? Or that this will be integrated into the build process in CI?

@sffc
Copy link
Member Author

sffc commented Aug 12, 2021

Eventually I would like the tooling to be fairly transparent to clients. At first though it would probably be several manual steps in sequence.

@sffc sffc added S-large Size: A few weeks (larger feature, major refactoring) and removed S-epic Size: Major project (create smaller child issues) labels Sep 26, 2021
@sffc sffc linked a pull request Jan 27, 2022 that will close this issue
@sffc sffc closed this as completed Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-data-infra Component: provider, datagen, fallback, adapters S-large Size: A few weeks (larger feature, major refactoring) T-core Type: Required functionality
Projects
None yet
3 participants