-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-2: Adding Type Persuasion for Primitive Types #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -37,6 +37,7 @@ | |
|
|
||
| import static java.lang.String.format; | ||
| import static parquet.Log.DEBUG; | ||
| import static parquet.hadoop.ParquetInputFormat.STRICT_TYPE_CHECKING; | ||
|
|
||
| class InternalParquetRecordReader<T> { | ||
| private static final Log LOG = Log.getLog(InternalParquetRecordReader.class); | ||
|
|
@@ -57,6 +58,7 @@ class InternalParquetRecordReader<T> { | |
| private ParquetFileReader reader; | ||
| private parquet.io.RecordReader<T> recordReader; | ||
| private UnboundRecordFilter recordFilter; | ||
| private boolean strictTypeChecking; | ||
|
|
||
| private long totalTimeSpentReadingBytes; | ||
| private long totalTimeSpentProcessingRecords; | ||
|
|
@@ -106,7 +108,7 @@ private void checkRead() throws IOException { | |
| BenchmarkCounter.incrementTime(timeSpentReading); | ||
| LOG.info("block read in memory in " + timeSpentReading + " ms. row count = " + pages.getRowCount()); | ||
| if (Log.DEBUG) LOG.debug("initializing Record assembly with requested schema " + requestedSchema); | ||
| MessageColumnIO columnIO = columnIOFactory.getColumnIO(requestedSchema, fileSchema); | ||
| MessageColumnIO columnIO = columnIOFactory.getColumnIO(requestedSchema, fileSchema, strictTypeChecking); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the ParquetInputFormat already has read this setting from the conf.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I looked into this one a bit and found that using the constructor results in Currently, ParquetInputFormat and InternalParquetRecordReader read the I'm not sure there's a clean way to do this. -Dan On Fri, Jun 20, 2014 at 9:26 PM, Julien Le Dem notifications@github.com
|
||
| recordReader = columnIO.getRecordReader(pages, recordConverter, recordFilter); | ||
| startedAssemblingCurrentBlockAt = System.currentTimeMillis(); | ||
| totalCountLoadedSoFar += pages.getRowCount(); | ||
|
|
@@ -142,7 +144,7 @@ public void initialize(MessageType requestedSchema, MessageType fileSchema, | |
| this.recordConverter = readSupport.prepareForRead( | ||
| configuration, extraMetadata, fileSchema, | ||
| new ReadSupport.ReadContext(requestedSchema, readSupportMetadata)); | ||
|
|
||
| this.strictTypeChecking = configuration.getBoolean(STRICT_TYPE_CHECKING, true); | ||
| List<ColumnDescriptor> columns = requestedSchema.getColumns(); | ||
| reader = new ParquetFileReader(configuration, file, blocks, columns); | ||
| for (BlockMetaData block : blocks) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should put the default implementation here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to put the default implementation in the abstract class, but the
maven enforcer plugin wouldn't allow me to do it. I assume removing the
abstract is considered an interface change.
On Fri, Jun 20, 2014 at 9:21 PM, Julien Le Dem notifications@github.com
wrote: