Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#13049 Refactored RecordTransformer & merged RecordEnricher #13086

Closed
wants to merge 14 commits into from

Conversation

deepthi912
Copy link
Collaborator

@deepthi912 deepthi912 commented May 6, 2024

  1. Removed RecordEnricher, RecordEnricherPipeline, RecordEnricherValidationConfig(directly using the boolean instead of having a class for this), RecordEnricherFactory and moved the methods into RecordTransformer.
  2. Moved some of the methods from RecordEnricherRegistry into RecordTransformer except static block ---> This will help in recording the details of type of ericher into a map while loading.
  3. Removed CLPEncodingEnricherFactory and CustomFunctionEnricherFactory classes and merged into respective EncodingEricher classes

@codecov-commenter
Copy link

codecov-commenter commented May 6, 2024

Codecov Report

Attention: Patch coverage is 40.15152% with 79 lines in your changes are missing coverage. Please review.

Project coverage is 62.16%. Comparing base (59551e4) to head (437c840).
Report is 521 commits behind head on master.

Files Patch % Lines
...ent/local/recordtransformer/RecordTransformer.java 24.39% 29 Missing and 2 partials ⚠️
...cord/enricher/function/CustomFunctionEnricher.java 0.00% 13 Missing ⚠️
...ache/pinot/segment/local/utils/IngestionUtils.java 33.33% 11 Missing and 1 partial ⚠️
...lugin/record/enricher/clp/CLPEncodingEnricher.java 0.00% 8 Missing ⚠️
...he/pinot/common/utils/config/TableConfigUtils.java 45.45% 4 Missing and 2 partials ⚠️
...he/pinot/segment/local/utils/TableConfigUtils.java 69.23% 1 Missing and 3 partials ⚠️
...l/realtime/converter/RealtimeSegmentConverter.java 77.77% 2 Missing ⚠️
...plugin/record/enricher/RecordEnricherRegistry.java 66.66% 0 Missing and 1 partial ⚠️
...creator/RecordReaderSegmentCreationDataSource.java 50.00% 0 Missing and 1 partial ⚠️
...t/creator/impl/SegmentIndexCreationDriverImpl.java 88.88% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13086      +/-   ##
============================================
+ Coverage     61.75%   62.16%   +0.40%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2530      +94     
  Lines        133233   139028    +5795     
  Branches      20636    21544     +908     
============================================
+ Hits          82274    86420    +4146     
- Misses        44911    46136    +1225     
- Partials       6048     6472     +424     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.13% <40.15%> (+0.42%) ⬆️
java-21 62.04% <40.15%> (+0.41%) ⬆️
skip-bytebuffers-false 62.15% <40.15%> (+0.40%) ⬆️
skip-bytebuffers-true 62.03% <40.15%> (+34.30%) ⬆️
temurin 62.16% <40.15%> (+0.40%) ⬆️
unittests 62.15% <40.15%> (+0.40%) ⬆️
unittests1 46.67% <6.87%> (-0.23%) ⬇️
unittests2 27.82% <33.33%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to delete the record enricher related interfaces and general handling classes (e.g. factory, pipeline, registry) after merging it into record transformer package. Can you take a look and see if it is possible to further clean up the code?

@deepthi912
Copy link
Collaborator Author

Sure, I wanted to do that too. Will look and see where I can improve further

@deepthi912
Copy link
Collaborator Author

deepthi912 commented May 11, 2024

@Jackie-Jiang I noticed that registery, pipeline have static methods which are used by some of the classes. These static methods are calling RecordEnricherFactory which is again implemented by function and clp Enricher classes. I tried to check if I could squeeze in anywhere but no right spot. Not sure if by just removing RecordEnricher, we can make any big difference in refactoring though.

@Jackie-Jiang
Copy link
Contributor

Basically the idea is to re-do #12243 and integrate CLP transform as a RecordTransformer

@deepthi-912
Copy link

@Jackie-Jiang PR is ready for review.

@@ -78,7 +77,7 @@ public class FlinkSegmentWriter implements SegmentWriter {
private String _outputDirURI;
private Schema _schema;
private Set<String> _fieldsToRead;
private RecordEnricherPipeline _recordEnricherPipeline;
private RecordTransformer _recordEnricherPipeline;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changing RecordEnricher into RecordTransformer, we should be able to put the RecordEnricher in front of the existing RecordTransformers as part of the default CompositeTransformer

import java.util.HashMap;
import java.util.Map;
import java.util.ServiceLoader;
import org.apache.pinot.spi.config.table.ingestion.EnrichmentConfig;
import org.apache.pinot.segment.local.recordtransformer.RecordTransformer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;


public class RecordEnricherRegistry {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to remove this registry and follow the existing way of creating RecordTransformer. Same for other places where 2 pipelines exist

@@ -275,7 +275,7 @@ public void deleteSegmentFile() {
private final int _partitionGroupId;
private final PartitionGroupConsumptionStatus _partitionGroupConsumptionStatus;
final String _clientId;
private final RecordEnricherPipeline _recordEnricherPipeline;
private final RecordTransformer _recordEnricherPipeline;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we should be able to integrate it into the TransformPipeline

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will take a look into TransformPipeline. Didn't know this class existed. Will modify according to it.



/**
* The record transformer which takes a {@link GenericRow} and transform it based on some custom rules.
*/
public interface RecordTransformer extends Serializable {
final boolean GROOVY_DISABLED = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest keeping the interface simple. We need to add getInputColumns() from RecordEnricher, other method seems unnecessary. Take a look at CompositeTransformer for record transformer handling

@@ -113,6 +112,29 @@ public class SegmentIndexCreationDriverImpl implements SegmentIndexCreationDrive
private long _totalStatsCollectorTime = 0;
private boolean _continueOnError;

public static void persistCreationMeta(File indexDir, long crc, long creationTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are not related to this PR. I guess you might enabled Rearrange code option in IDE during auto reformat. Can you revert these unrelated changes? There are quite some unrelated auto reformat in several files

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I will look into this formatting issue.

@Jackie-Jiang
Copy link
Contributor

Merge #14601 instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants