-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#13049 Refactored RecordTransformer & merged RecordEnricher #13086
Conversation
deepthi912
commented
May 6, 2024
•
edited
Loading
edited
- Removed RecordEnricher, RecordEnricherPipeline, RecordEnricherValidationConfig(directly using the boolean instead of having a class for this), RecordEnricherFactory and moved the methods into RecordTransformer.
- Moved some of the methods from RecordEnricherRegistry into RecordTransformer except static block ---> This will help in recording the details of type of ericher into a map while loading.
- Removed CLPEncodingEnricherFactory and CustomFunctionEnricherFactory classes and merged into respective EncodingEricher classes
…moved RecordEnricher as it serves similar purpose as RecordTransformer.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13086 +/- ##
============================================
+ Coverage 61.75% 62.16% +0.40%
+ Complexity 207 198 -9
============================================
Files 2436 2530 +94
Lines 133233 139028 +5795
Branches 20636 21544 +908
============================================
+ Hits 82274 86420 +4146
- Misses 44911 46136 +1225
- Partials 6048 6472 +424
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to delete the record enricher related interfaces and general handling classes (e.g. factory, pipeline, registry) after merging it into record transformer package. Can you take a look and see if it is possible to further clean up the code?
Sure, I wanted to do that too. Will look and see where I can improve further |
@Jackie-Jiang I noticed that registery, pipeline have static methods which are used by some of the classes. These static methods are calling RecordEnricherFactory which is again implemented by function and clp Enricher classes. I tried to check if I could squeeze in anywhere but no right spot. Not sure if by just removing RecordEnricher, we can make any big difference in refactoring though. |
Basically the idea is to re-do #12243 and integrate CLP transform as a |
…rom RecordEnricher package
@Jackie-Jiang PR is ready for review. |
@@ -78,7 +77,7 @@ public class FlinkSegmentWriter implements SegmentWriter { | |||
private String _outputDirURI; | |||
private Schema _schema; | |||
private Set<String> _fieldsToRead; | |||
private RecordEnricherPipeline _recordEnricherPipeline; | |||
private RecordTransformer _recordEnricherPipeline; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After changing RecordEnricher
into RecordTransformer
, we should be able to put the RecordEnricher
in front of the existing RecordTransformer
s as part of the default CompositeTransformer
import java.util.HashMap; | ||
import java.util.Map; | ||
import java.util.ServiceLoader; | ||
import org.apache.pinot.spi.config.table.ingestion.EnrichmentConfig; | ||
import org.apache.pinot.segment.local.recordtransformer.RecordTransformer; | ||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
|
||
public class RecordEnricherRegistry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be able to remove this registry and follow the existing way of creating RecordTransformer
. Same for other places where 2 pipelines exist
@@ -275,7 +275,7 @@ public void deleteSegmentFile() { | |||
private final int _partitionGroupId; | |||
private final PartitionGroupConsumptionStatus _partitionGroupConsumptionStatus; | |||
final String _clientId; | |||
private final RecordEnricherPipeline _recordEnricherPipeline; | |||
private final RecordTransformer _recordEnricherPipeline; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we should be able to integrate it into the TransformPipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I will take a look into TransformPipeline. Didn't know this class existed. Will modify according to it.
|
||
|
||
/** | ||
* The record transformer which takes a {@link GenericRow} and transform it based on some custom rules. | ||
*/ | ||
public interface RecordTransformer extends Serializable { | ||
final boolean GROOVY_DISABLED = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest keeping the interface simple. We need to add getInputColumns()
from RecordEnricher
, other method seems unnecessary. Take a look at CompositeTransformer
for record transformer handling
@@ -113,6 +112,29 @@ public class SegmentIndexCreationDriverImpl implements SegmentIndexCreationDrive | |||
private long _totalStatsCollectorTime = 0; | |||
private boolean _continueOnError; | |||
|
|||
public static void persistCreationMeta(File indexDir, long crc, long creationTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are not related to this PR. I guess you might enabled Rearrange code
option in IDE during auto reformat. Can you revert these unrelated changes? There are quite some unrelated auto reformat in several files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I will look into this formatting issue.
Merge #14601 instead |