Skip to content

Conversation

@CyberShadow
Copy link
Member

This updates the distributed version to the latest version including the improvements from https://dlang.org/blog/2020/04/13/dustmite-the-general-purpose-data-reduction-tool/

  • d785720 Add "indent" split mode
  • e77126f splitter: Speed up optimization
  • e0138ca dustmite: Preserve no-remove flags when applying "Concat" reductions
  • d501228 Don't move code marked as no-remove
  • 4d361cb dustmite: Use templates instead of delegates for dump
  • 772a8fb dustmite: Use lockingBinaryWriter
  • 621991b dustmite: Bulk adjacent writes
  • dbd493a splitter: Don't compile in a debug-only field
  • d19b15e Add coverage analysis and upload to Coveralls
  • 630cf9c Track each entity's parent entity
  • f56f6a4 splitter: Make Entity members used only by splitter private
  • be2c452 splitter: Fix Dscanner warning
  • 70d5503 splitter: Delete unused label
  • 0e788b5 Revert "Track each entity's parent entity"
  • 1f1f732 splitter: Don't dereference enum AAs at runtime
  • 3fea926 dustmite: Mark final classes as such
  • 02f8b2e splitter: Mark Entity class final
  • 2ca0522 Overhaul tree representation and edit algorithms
  • d9da7cf dustmite: Fail no-op concat reductions
  • d439fed dustmite: Remove the concatPerformed bodge
  • 0ec8fc5 dustmite: Log inapplicable reductions
  • 2e19085 dustmite: Fix a TODO
  • ad4124f dustmite: Start a new iteration after a successful Concat reduction
  • 6d0cd9f dustmite: Remove reduction == initialReduction lookahead hack
  • f197986 dustmite: Get rid of the "root" global
  • 690ab07 dustmite: Update the lookahead iterator's root according to predictions
  • 0dc5e04 dustmite: Remove special handling for the first lookahead step
  • 8b5f639 dustmite: Handle inapplicable reductions in lookahead
  • fd45d61 dustmite: Fix placement of --version in help text
  • bf407bc dustmite: Make descendant recounting incremental
  • 6878138 dustmite: Clean up test directories before writing to them
  • a269d25 dustmite: Distinguish zero and uninitialized digests in lookahead queue
  • 9eb4126 dustmite: Move lookahead saving and process creation into worker thread
  • 5034c01 polyhash: Initial commit
  • 3d28c6e polyhash: Add some comments
  • 751ea2b polyhash: Optimize calculating powers of p
  • f675253 polyhash: Use decreasing powers of p, instead of increasing
  • b1b76cd polyhash: Convert to an output range interface
  • 62d145b License under Boost Software License 1.0
  • 19f0200 polyhash: Add mod-q (with non-power-of-two q) support
  • 5b80b03 Unify incremental and full updates of computed Entity fields
  • f85acdf Switch to incremental polynomial hashing
  • 401d408 Work around DMD bug 20677
  • 575406e Speed up applyReduction.edit
  • 23a67fb Re-optimize incrementally for the Concat reduction
  • 80b7ba4 dustmite: Speed up removing dependencies under removed nodes
  • ec81973 Speed up address comparison
  • 26f2039 dustmite: Tweak tree initialization order
  • d5523e1 splitter: Clear hash for killed entities
  • 048a0fd Keep children of removed nodes in the tree
  • 48ed0a5 dustmite: Make findEntity traverse dead nodes
  • 196f5f7 dustmite: Improve dump formatting of redirects and dead entities
  • 404c8f9 dustmite: With both --trace and --dump, save dumps during trace
  • 72cd08c dustmite: Don't attempt to concatenate dead files
  • 53d3bf6 dustmite: Recalculate dead entities recursively too
  • c3d1215 dustmite: Traverse dead entities when editing them, too
  • 226a651 dustmite: Do not copy dead entities for editing
  • b8f2844 Maintain cached cumulative dependents per-node
  • 9f5a4f1 dustmite: Create less garbage during I/O
  • df752dc Maintain cached full content of each node
  • 4b165e6 Revert "Maintain cached full content of each node"
  • 965fbc3 dustmite: Speed up strategy iteration over dead nodes
  • 15d0a8f dustmite: Remove use of lazy arguments in address iteration
  • b5c1ec0 splitter: Fix lexing D raw string literals (r"...")
  • 2630496 dustmite: Fix "reduced to empty set" message
  • 9505bf6 dustmite: Fix recursion for dead nodes in recalculate
  • 6764e8d dustmite: Add --in-place
  • 3a76633 dustmite: Remove Reduction.target
  • c04c843 dustmite: Replace Reduction.address with an Address*
  • d2cfa23 dustmite: Allow the test function to take an array of reductions
  • 5e510c8 dustmite: Introduce Reduction.Type.Swap
  • d4303ca dustmite: Add fuzzing mode
  • 5fffd18 dustmite: Split up the DustMiteNoRemove string literals
  • 714ea99 dustmite: Allow --reduce-only and --no-remove rules to stack
  • ca18a07 dustmite: Add --remove switch
  • de92616 dustmite: Reorder --help text
  • 157b305 dustmite: Remove trailing punctuation from --help text
  • 6746464 Add --white-out option
  • 6705a94 dustmite: Update tagline
  • e76496f splitter: Make EntityRef.address const
  • 4cfed4c dustmite: Add debug=DETERMINISTIC_LOOKAHEAD
  • 214d000 dustmite: Add reduction application cache
  • e859e86 dustmite: Grow reduction application cache dynamically
  • fd3ad29 dustmite: Speed up dependency recalculation
  • a10ef7f dustmite: Fix crash with --whiteout + --trace
  • 256a651 dustmite: Speed up dumping
  • df42f62 dustmite: Add --max-steps
  • 886c6f2 dustmite: Make measure's delegate scoped
  • 732d0f1 dustmite: Add more performance timers
  • 05acf86 dustmite: Implement non-linear lookahead prediction
  • 0a7a937 dustmite: Improve prediction formula
  • 990b3bc splitter: Improve parsing of successive keyword-prefixed blocks
  • cb0855d dustmite: Make detection of suspicious files non-fatal

This updates the distributed version to the latest version including
the improvements from
https://dlang.org/blog/2020/04/13/dustmite-the-general-purpose-data-reduction-tool/

* d785720 Add "indent" split mode
* e77126f splitter: Speed up optimization
* e0138ca dustmite: Preserve no-remove flags when applying "Concat" reductions
* d501228 Don't move code marked as no-remove
* 4d361cb dustmite: Use templates instead of delegates for dump
* 772a8fb dustmite: Use lockingBinaryWriter
* 621991b dustmite: Bulk adjacent writes
* dbd493a splitter: Don't compile in a debug-only field
* d19b15e Add coverage analysis and upload to Coveralls
* 630cf9c Track each entity's parent entity
* f56f6a4 splitter: Make Entity members used only by splitter private
* be2c452 splitter: Fix Dscanner warning
* 70d5503 splitter: Delete unused label
* 0e788b5 Revert "Track each entity's parent entity"
* 1f1f732 splitter: Don't dereference enum AAs at runtime
* 3fea926 dustmite: Mark final classes as such
* 02f8b2e splitter: Mark Entity class final
* 2ca0522 Overhaul tree representation and edit algorithms
* d9da7cf dustmite: Fail no-op concat reductions
* d439fed dustmite: Remove the concatPerformed bodge
* 0ec8fc5 dustmite: Log inapplicable reductions
* 2e19085 dustmite: Fix a TODO
* ad4124f dustmite: Start a new iteration after a successful Concat reduction
* 6d0cd9f dustmite: Remove `reduction == initialReduction` lookahead hack
* f197986 dustmite: Get rid of the "root" global
* 690ab07 dustmite: Update the lookahead iterator's root according to predictions
* 0dc5e04 dustmite: Remove special handling for the first lookahead step
* 8b5f639 dustmite: Handle inapplicable reductions in lookahead
* fd45d61 dustmite: Fix placement of --version in help text
* bf407bc dustmite: Make descendant recounting incremental
* 6878138 dustmite: Clean up test directories before writing to them
* a269d25 dustmite: Distinguish zero and uninitialized digests in lookahead queue
* 9eb4126 dustmite: Move lookahead saving and process creation into worker thread
* 5034c01 polyhash: Initial commit
* 3d28c6e polyhash: Add some comments
* 751ea2b polyhash: Optimize calculating powers of p
* f675253 polyhash: Use decreasing powers of p, instead of increasing
* b1b76cd polyhash: Convert to an output range interface
* 62d145b License under Boost Software License 1.0
* 19f0200 polyhash: Add mod-q (with non-power-of-two q) support
* 5b80b03 Unify incremental and full updates of computed Entity fields
* f85acdf Switch to incremental polynomial hashing
* 401d408 Work around DMD bug 20677
* 575406e Speed up applyReduction.edit
* 23a67fb Re-optimize incrementally for the Concat reduction
* 80b7ba4 dustmite: Speed up removing dependencies under removed nodes
* ec81973 Speed up address comparison
* 26f2039 dustmite: Tweak tree initialization order
* d5523e1 splitter: Clear hash for killed entities
* 048a0fd Keep children of removed nodes in the tree
* 48ed0a5 dustmite: Make findEntity traverse dead nodes
* 196f5f7 dustmite: Improve dump formatting of redirects and dead entities
* 404c8f9 dustmite: With both --trace and --dump, save dumps during trace
* 72cd08c dustmite: Don't attempt to concatenate dead files
* 53d3bf6 dustmite: Recalculate dead entities recursively too
* c3d1215 dustmite: Traverse dead entities when editing them, too
* 226a651 dustmite: Do not copy dead entities for editing
* b8f2844 Maintain cached cumulative dependents per-node
* 9f5a4f1 dustmite: Create less garbage during I/O
* df752dc Maintain cached full content of each node
* 4b165e6 Revert "Maintain cached full content of each node"
* 965fbc3 dustmite: Speed up strategy iteration over dead nodes
* 15d0a8f dustmite: Remove use of lazy arguments in address iteration
* b5c1ec0 splitter: Fix lexing D raw string literals (r"...")
* 2630496 dustmite: Fix "reduced to empty set" message
* 9505bf6 dustmite: Fix recursion for dead nodes in recalculate
* 6764e8d dustmite: Add --in-place
* 3a76633 dustmite: Remove Reduction.target
* c04c843 dustmite: Replace Reduction.address with an Address*
* d2cfa23 dustmite: Allow the test function to take an array of reductions
* 5e510c8 dustmite: Introduce Reduction.Type.Swap
* d4303ca dustmite: Add fuzzing mode
* 5fffd18 dustmite: Split up the DustMiteNoRemove string literals
* 714ea99 dustmite: Allow --reduce-only and --no-remove rules to stack
* ca18a07 dustmite: Add --remove switch
* de92616 dustmite: Reorder --help text
* 157b305 dustmite: Remove trailing punctuation from --help text
* 6746464 Add --white-out option
* 6705a94 dustmite: Update tagline
* e76496f splitter: Make EntityRef.address const
* 4cfed4c dustmite: Add debug=DETERMINISTIC_LOOKAHEAD
* 214d000 dustmite: Add reduction application cache
* e859e86 dustmite: Grow reduction application cache dynamically
* fd3ad29 dustmite: Speed up dependency recalculation
* a10ef7f dustmite: Fix crash with --whiteout + --trace
* 256a651 dustmite: Speed up dumping
* df42f62 dustmite: Add --max-steps
* 886c6f2 dustmite: Make measure's delegate scoped
* 732d0f1 dustmite: Add more performance timers
* 05acf86 dustmite: Implement non-linear lookahead prediction
* 0a7a937 dustmite: Improve prediction formula
* 990b3bc splitter: Improve parsing of successive keyword-prefixed blocks
* cb0855d dustmite: Make detection of suspicious files non-fatal
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @CyberShadow!

Bugzilla references

Auto-close Bugzilla Severity Description
20677 normal Compilation of bad inline asm in speculative template instantiation fails with no messages

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + tools#419"

@Geod24
Copy link
Member

Geod24 commented Nov 22, 2020

Why don't we use something like a submodule instead of manually updating from time to time ?

@CyberShadow
Copy link
Member Author

I can think of a few arguments in favor of the status quo:

  • There is some benefit of an additional layer of separation providing a stable version of DustMite vs. the more experimental version in the development repository;
  • Probably it wouldn't make sense to include the DustMite test suite together with just the implementation;
  • Submodules are clunky if you don't expect them or know how to use them, with some people completely advocating against their usage;
  • Technically, a submodule would also have to be bumped periodically.

The build script could be changed to download the code from the DustMite repository instead of us keeping a copy here, but then it would depend on network availability.

I'm also thinking of trying to make DustMite more appealing to use as a general-purpose tool, and rename it to just reduce, with a new command-line interface, leaving dustmite as a D-oriented wrapper around reduce.

@Geod24
Copy link
Member

Geod24 commented Nov 23, 2020

There is some benefit of an additional layer of separation providing a stable version of DustMite vs. the more experimental version in the development repository;

This works in synergy with:

Technically, a submodule would also have to be bumped periodically.

So we wouldn't use the (newest) branch-follow feature, but instead of copying the code over, those updates would just be a submodule update. You are the one doing it anyways, it would just greatly simplify the process (no need to copy the list of commits for example).

Probably it wouldn't make sense to include the DustMite test suite together with just the implementation;

Well we ship binaries, so we only provide the implementation anyways, or am I missing something ?

Submodules are clunky if you don't expect them or know how to use them, with some people completely advocating against their usage;

We still have to teach some core people basic git flow. Does it mean they are right not to learn to properly use their tool ? I don't think so. Also, as mentioned, you're the only one doing the update AFAIK. We could trivially add a check in the makefile that inform users to run a certain git command, if that's a concern.

Copy link
Member

@Geod24 Geod24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course the discussion shouldn't block this PR tho.

@Geod24 Geod24 merged commit 39c73d8 into dlang:master Nov 23, 2020
@CyberShadow
Copy link
Member Author

Does it mean they are right not to learn to properly use their tool ? I don't think so.

Not sure this is the best way to look at things. Even experienced users need to remember to use --recursive and run git submodule update after pulling. Seeing how rare these updates are, and considering that I don't mind continuing to do it this way, introducing submodules just for the sake of a rarely updated component seems excessive for now.

DustMite might at some point be expanded beyond the point of the build infrastructure available in this repository, such as if it were to begin using packages from the Dub repository, so we may yet require some change in how it is included with D in the future.

@ibuclaw
Copy link
Member

ibuclaw commented Jul 14, 2021

Because of #383, this PR introduced a regression where dustmite segfaults/aborts with any -jN (higher than 0). This started since the 2.095.0 release, so one of the commit references here introduced the issue in dustmite.

@CyberShadow
Copy link
Member Author

It might have been

9eb4126 dustmite: Move lookahead saving and process creation into worker thread

or another commit part of the multiprocessing overhaul.

@ibuclaw
Copy link
Member

ibuclaw commented Jul 14, 2021

or another commit part of the multiprocessing overhaul.

Yeah, overhaul seems about right. Git bisect says (with the test dustmite -j1)

d785720...02f8b2e: OK
2ca0522...f197986: core.exception.RangeError@dustmite.d(1926): Range violation
690ab07...a269d25: OK
9eb4126...b8f2844: core.exception.OutOfMemoryError@src/core/exception.d(647): Memory allocation failed
9f5a4f1...cb0855d: Segmentation fault (core dumped)

Edit: with dustmite -j8

d785720...02f8b2e: OK
2ca0522...f197986: core.exception.RangeError@dustmite.d(1926): Range violation
690ab07...0dc5e04: Deadlock
8b5f639...a269d25: OK
9eb4126...b8f2844: core.exception.OutOfMemoryError@src/core/exception.d(647): Memory allocation failed
9f5a4f1...cb0855d: Segmentation fault (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants