Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds a new feature to
tsv-filter
: Marking each record as either passing the filter test, or not.Consider the following command, which identifies lines where the
Color
field is a primary color.The above filters out all records not satisfying the test. However, it is often desirable to keep all the records, instead marking the records to indicate the matches. The following command does this, adding a hew field,
IsPrimaryColor
populated with values1
or0
to indicate pass or not.The label values can be customized using the
--label-values
option. To change the above to usedtrue
andfalse
, run:Implementation
Adding the label field is straightforward. In the main loop, instead of choosing to output a line or not, an indicator is appended. However, the additional conditional tests in the loop caused a performance degradation. This was partly due to the recent addition of the
--count
option, which counts the number of records satisfying the criteria. The performance degradation was minor for wide files with long lines, but substantial for narrow files.To regain performance the code was templatized to reduce the number of tests in the main loop. In addition, some changes to
BufferedOutputRange
to streamline that code. It had also added some additional checks as part of the recent--line-buffered
support. Between the two changes all the original performance was regained, and possibly a bit more.