Skip to content

Conversation

@lw-lin
Copy link
Contributor

@lw-lin lw-lin commented Feb 2, 2016

The current Log class is intended to allow swapping out logger back-ends, but SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, which can handle formatting to avoid the cost of building log messages that won't be used.

We should deprecate the org.apache.parquet.Log class and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305).

@liancheng
Copy link
Contributor

@proflin The last build failure was probably caused by Travis network issue. You may push a minor commit to trigger Travis again.

@rdblue Is there any way to trigger Travis without pushing a commit?

@julienledem
Copy link
Member

@liancheng I think the owner of the PR can retrigger the build from the travis ui (need to be logged in)

@julienledem
Copy link
Member

This seems fine.
can we have a perf test run to make sure this has no impact on perf?
I would think not but i'd rather be sure since the if (DEBUG) was optimized out at compile time.

@lw-lin lw-lin closed this Mar 8, 2016
@lw-lin lw-lin reopened this Mar 8, 2016
lw-lin added 5 commits March 11, 2016 16:50
…T-401--Deprecate-Log-and-move-to-SLF4J-Logger

# Conflicts:

#	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java

#	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java

#	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
@lw-lin lw-lin changed the title Parquet-401: Deprecate Log and move to SLF4J Logger PARQUET-401: Deprecate Log and move to SLF4J Logger Mar 11, 2016
@lw-lin
Copy link
Contributor Author

lw-lin commented Mar 31, 2016

hi @rdblue, finally this is ready for a another (hopefully the last) round of review.

Changes since the first round:

  • many if (LOGGER.isXXXEnabled()) have been merged together;
  • some unnecessary if (LOGGER.isXXXEnabled()) have been removed;
  • the static final constants optimization has been applied to some more classes where necessary.

Also, things that are intentionally left out(let's fix them in following-up PRs):

  • moving to use the parametered form of LOGGER.xxx(), i.e., replacing Log.debug("msg is " + msg + ", number is " + number) with LOGGER.debug("msg is {}, number is {}", msg, number);
  • fixing the slf4j log level in the CI hadoop2 enviroment, i.e. slf4J binding leaked through Parquet's dependencies issue.

So, could you take another look at this? Thanks! :-)

…ate-Log-and-move-to-SLF4J-Logger

# Conflicts:
#
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputForm
at.java
@lw-lin
Copy link
Contributor Author

lw-lin commented Apr 19, 2016

two weeks' ping to some committer @rdblue :-)

@rdblue
Copy link
Contributor

rdblue commented Apr 21, 2016

@lw-lin, thanks for working on this.

Looks like my previous comment about fixing the if (LOG.isDebugEnabled()) statements was misinterpreted. Sorry about that, I probably wasn't very clear.

You've replaced a lot of those calls with a static final constant at the top of the file. That's a bad thing because it means we can't turn on debug logging once the code is loaded. We should only do that when it matters for performance, which you said above was in MessageColumnIO, ColumnWriterV1, and ColumnWriterV2. The rest of the constants should be removed, and there shouldn't be unused constants like ERROR_ENABLED.

Most of the rest of the debug logging calls should not be wrapped by if (LOG.isDebugEnabled()) at all. Just call LOG.debug(...) and let the framework decide. The benefit of checking the log level is to avoid expensive operations to prepare a log message that is discarded at the current log level. Using the SLF4J formatting calls mostly avoids that.

The only time we should use LOG.isDebugEnabled() is when there's an expensive call that can't be handled inside the logger. An example is debugging the total memory used: we would inspect all of the write framework before calling into the logger, so guarding that with a level check is a good idea. But when you can simply pass arguments into the logger and the work is done there by calling toString on what you pass in, there's no need for the check.

I think getting the debug logging right is going to require going through the code and making sure each call makes sense, rather than transforming certain patterns with an IDE.

if (DEBUG_ENABLED) {
++indent;
}
if (DEBUG_ENABLED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lw-lin, it looks like there are still some if statements that can be combined. Please make sure you check through the code before the next round of review for these.

@julienledem
Copy link
Member

@rdblue @lw-lin what's next for this PR?

@lw-lin
Copy link
Contributor Author

lw-lin commented Aug 16, 2016

@rdblue @julienledem sorry for the late response -- oh I somehow missed @rdblue 's kind comments.

I'll update this within this week. Thanks @rdblue @julienledem !

@julienledem
Copy link
Member

thank you @lw-lin !

lw-lin added 3 commits August 24, 2016 12:09
# Conflicts:

#	parquet-column/src/main/java/org/apache/parquet/column/values/boundedint/BitWriter.java

#	parquet-column/src/main/java/org/apache/parquet/column/values/boundedint/BoundedIntValuesReader.java

#	parquet-column/src/main/java/org/apache/parquet/column/values/boundedint/BoundedIntValuesWriter.java

#	parquet-encoding/src/test/java/org/apache/parquet/column/values/bitpacking/TestByteBitPacking.java

#	parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java

#	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java

#	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java

#	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java

#	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java
Fix 2
@lw-lin
Copy link
Contributor Author

lw-lin commented Sep 6, 2016

This has been updated a lot -- @julienledem @rdblue would you take a look at your convenience?

@lw-lin
Copy link
Contributor Author

lw-lin commented Sep 6, 2016

Plus, five rounds of PerfTest show that, comparing to the Log.java approach, the slf4j-simple approach:

  • on average takes 0.87% longer in writing 1,000,000 records
  • on average takes 1.31% longer in reading 1,000,000 records

For detailed results please refer to https://docs.google.com/spreadsheets/d/1FLwD71WFmkEfqDWyo2pe1vfkk7BI6bJKMOnAzGCl5XI/edit#gid=1865972057)

import org.junit.Assert;
import org.junit.Test;

public class TestLog {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class is not necessary any more

@julienledem
Copy link
Member

Thank you @lw-lin for this work. Sorry this particular PR didn't go through.
Those patches that touch a lot of files are painful to maintain.
PR #369 implemented a similar changes and now has been merged.
Thank your for your contribution @lw-lin !

@lw-lin
Copy link
Contributor Author

lw-lin commented Oct 27, 2016

Thank you @julienledem . I like that PR too. Thank you also @rdblue @liancheng for the efforts you'd put into this!

@lw-lin lw-lin closed this Oct 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants