Skip to content

Conversation

@nielsbasjes
Copy link
Contributor

And make writing files less noisy

@nielsbasjes
Copy link
Contributor Author

I have no clue how to fix this error and what it has to do with my changes:

Current codebase is incompatible with version <1.7.0>. Version should be at least <2.0.0>.

Please advise.

@nielsbasjes
Copy link
Contributor Author

What do you want?

  1. Retain the Log class or
  2. Remove it and add it to the exclude list of "org.semver.enforcer.RequireSemanticVersioningConformance"?

@nielsbasjes
Copy link
Contributor Author

I already created PARQUET-725 for those failing AVRO tests.
At this point I do not yet see what I did to break this.

@nielsbasjes
Copy link
Contributor Author

The root cause is AVRO-1799 which surfaced after changing the logging setup.
For now I pushed a default log4j.properties that is set to INFO logging.
This passed the build. Set if to DEBUG and the build will fail because somewhere the 'toString' is called and as a consequence the records become different.

@nielsbasjes
Copy link
Contributor Author

@julienledem : Can you comment on this please? Thanks.

@julienledem
Copy link
Member

Hi @nielsbasjes
This looks fine to me.
Did you see #319 ?
One thing to check is whether perf is affected since the if (DEBUG) statements where totally removed by the compiler.

@nielsbasjes
Copy link
Contributor Author

I only saw #319 after I implemented most of the changes I did. Then I noticed that #319 appears to be 'stuck' so I went ahead with putting up this set of changes.

I did some performance testing and found that writing is slower (i'll look into that) and reading is the same as before.

@nielsbasjes
Copy link
Contributor Author

I did some performance tests to see what the differences are when focusing on MessageColumnIO which seems to be a bottleneck in the logging situation.
I used the PerfTest and changed it to only do a write of 1M records, 5 times in a row.

Just about each method in there has 2 calls to logging functions:

log("< MESSAGE END >");
printState();

The original code does this:

written 1,000,000 recs in 1,553 ms at 643,915 rec/s
written 1,000,000 recs in 1,154 ms at 866,551 rec/s
written 1,000,000 recs in 1,248 ms at 801,282 rec/s
written 1,000,000 recs in 1,235 ms at 809,716 rec/s
written 1,000,000 recs in 1,270 ms at 787,401 rec/s

Simply doing a if (LOG.isDebugEnabled()) in those methods shows a write performance of

written 1,000,000 recs in 2,466 ms at 405,515 rec/s
written 1,000,000 recs in 1,963 ms at 509,424 rec/s
written 1,000,000 recs in 2,636 ms at 379,362 rec/s
written 1,000,000 recs in 2,629 ms at 380,372 rec/s
written 1,000,000 recs in 2,019 ms at 495,294 rec/s

Adding a static final boolean DEBUG = LOG.isDebugEnabled() and then if(DEBUG) yields

written 1,000,000 recs in 1,769 ms at 565,291 rec/s
written 1,000,000 recs in 1,300 ms at 769,230 rec/s
written 1,000,000 recs in 1,365 ms at 732,600 rec/s
written 1,000,000 recs in 1,364 ms at 733,137 rec/s
written 1,000,000 recs in 1,339 ms at 746,825 rec/s

Same but now static final boolean DEBUG = false yields

written 1,000,000 recs in 1,769 ms at 565,291 rec/s
written 1,000,000 recs in 1,313 ms at 761,614 rec/s
written 1,000,000 recs in 1,289 ms at 775,795 rec/s
written 1,000,000 recs in 1,282 ms at 780,031 rec/s
written 1,000,000 recs in 1,253 ms at 798,084 rec/s

Moving the if (DEBUG) to the calling methods improves it even further

static final boolean DEBUG = LOG.isDebugEnabled()
if (DEBUG) log("< MESSAGE END >");
if (DEBUG) printState();

written 1,000,000 recs in 1,552 ms at 644,329 rec/s
written 1,000,000 recs in 1,237 ms at 808,407 rec/s
written 1,000,000 recs in 1,268 ms at 788,643 rec/s
written 1,000,000 recs in 1,260 ms at 793,650 rec/s
written 1,000,000 recs in 1,231 ms at 812,347 rec/s

static final boolean DEBUG = false
if (DEBUG) log("< MESSAGE END >");
if (DEBUG) printState();

written 1,000,000 recs in 1,653 ms at 604,960 rec/s
written 1,000,000 recs in 1,271 ms at 786,782 rec/s
written 1,000,000 recs in 1,270 ms at 787,401 rec/s
written 1,000,000 recs in 1,255 ms at 796,812 rec/s
written 1,000,000 recs in 1,231 ms at 812,347 rec/s

My conclusions:
Apparently there is no performance difference between these two

static final boolean DEBUG = false
static final boolean DEBUG = LOG.isDebugEnabled()

Apparently the if (DEBUG) outside the methods makes a big difference. I expected the JIT to inline it and optimize it out, but it looks like it doesn't.

So this is what I implemented now:

static final boolean DEBUG = LOG.isDebugEnabled()
if (DEBUG) log("< MESSAGE END >");
if (DEBUG) printState();

As you can see above this makes the performance pretty close to the original performance.

@nielsbasjes
Copy link
Contributor Author

The rebase/pull/merge seems to indicate a few files as 'changed by me' that I haven't changed.

@julienledem
Copy link
Member

In general this looks good to me.

When you do if (DEBUG) and DEBUG is constant of the literal false (not a runtime evaluated expression), the compiler (javac not JIT) will completely remove the if statement.
That is probably the source of the few remaining percents. I think the JIT treats final variable as not really constants and does not optimize based on them.

In some classes (ColumnWriter) we could leave a local private static final DEBUG = false to avoid debug logging in the hot path (logged for each value written)

@nielsbasjes
Copy link
Contributor Author

@julienledem : Please check the latest commits as they should implement the changes you mentioned.

@julienledem
Copy link
Member

Hi @nielsbasjes,
can you resolve the 2 conflicting files? I'll merge ASAP.

@nielsbasjes
Copy link
Contributor Author

@julienledem Done.

@asfgit asfgit closed this in df9d8e4 Oct 26, 2016
@julienledem
Copy link
Member

@nielsbasjes thanks for your patience. Merged!

@rdblue
Copy link
Contributor

rdblue commented Oct 26, 2016

Thanks @nielsbasjes! It's great to have this done!

rdblue added a commit to rdblue/parquet-mr that referenced this pull request Jan 6, 2017
And make writing files less noisy

Author: Niels Basjes <nbasjes@bol.com>

Closes apache#369 from nielsbasjes/PARQUET-423-2 and squashes the following commits:

b31e30f [Niels Basjes] Merge branch 'master' of github.com:apache/parquet-mr into PARQUET-423-2
2d4db4b [Niels Basjes] Merge branch 'PARQUET-423-2' of github.com:nielsbasjes/parquet-mr into PARQUET-423-2
49fcaa7 [Niels Basjes] PARQUET-423: Remove debug logging statements in high performance sections during build time
aaaf4a6 [Niels Basjes] Merge branch 'PARQUET-423-2' of github.com:nielsbasjes/parquet-mr into PARQUET-423-2
745666e [Niels Basjes] Undo needless change
94e0c7a [Niels Basjes] PARQUET-423: Further optimize logging performance
b72f924 [Niels Basjes] PARQUET-423: Improved the performance
cb7eb61 [Niels Basjes] PARQUET-423: Workaround AVRO errors
7d161b3 [Niels Basjes] PARQUET-423: Restore the old (obsolete) Log class
05d6a47 [Niels Basjes] PARQUET-423: Replace old Log class with SLF4J Logging
692ebfb [Niels Basjes] Undo needless change
f1ede3d [Niels Basjes] PARQUET-423: Further optimize logging performance
a0c6b59 [Niels Basjes] PARQUET-423: Improved the performance
67bef9b [Niels Basjes] PARQUET-423: Workaround AVRO errors
87cd64f [Niels Basjes] PARQUET-423: Restore the old (obsolete) Log class
96d97d5 [Niels Basjes] PARQUET-423: Replace old Log class with SLF4J Logging
rdblue added a commit to rdblue/parquet-mr that referenced this pull request Jan 6, 2017
And make writing files less noisy

Author: Niels Basjes <nbasjes@bol.com>

Closes apache#369 from nielsbasjes/PARQUET-423-2 and squashes the following commits:

b31e30f [Niels Basjes] Merge branch 'master' of github.com:apache/parquet-mr into PARQUET-423-2
2d4db4b [Niels Basjes] Merge branch 'PARQUET-423-2' of github.com:nielsbasjes/parquet-mr into PARQUET-423-2
49fcaa7 [Niels Basjes] PARQUET-423: Remove debug logging statements in high performance sections during build time
aaaf4a6 [Niels Basjes] Merge branch 'PARQUET-423-2' of github.com:nielsbasjes/parquet-mr into PARQUET-423-2
745666e [Niels Basjes] Undo needless change
94e0c7a [Niels Basjes] PARQUET-423: Further optimize logging performance
b72f924 [Niels Basjes] PARQUET-423: Improved the performance
cb7eb61 [Niels Basjes] PARQUET-423: Workaround AVRO errors
7d161b3 [Niels Basjes] PARQUET-423: Restore the old (obsolete) Log class
05d6a47 [Niels Basjes] PARQUET-423: Replace old Log class with SLF4J Logging
692ebfb [Niels Basjes] Undo needless change
f1ede3d [Niels Basjes] PARQUET-423: Further optimize logging performance
a0c6b59 [Niels Basjes] PARQUET-423: Improved the performance
67bef9b [Niels Basjes] PARQUET-423: Workaround AVRO errors
87cd64f [Niels Basjes] PARQUET-423: Restore the old (obsolete) Log class
96d97d5 [Niels Basjes] PARQUET-423: Replace old Log class with SLF4J Logging
rdblue added a commit to rdblue/parquet-mr that referenced this pull request Jan 10, 2017
And make writing files less noisy

Author: Niels Basjes <nbasjes@bol.com>

Closes apache#369 from nielsbasjes/PARQUET-423-2 and squashes the following commits:

b31e30f [Niels Basjes] Merge branch 'master' of github.com:apache/parquet-mr into PARQUET-423-2
2d4db4b [Niels Basjes] Merge branch 'PARQUET-423-2' of github.com:nielsbasjes/parquet-mr into PARQUET-423-2
49fcaa7 [Niels Basjes] PARQUET-423: Remove debug logging statements in high performance sections during build time
aaaf4a6 [Niels Basjes] Merge branch 'PARQUET-423-2' of github.com:nielsbasjes/parquet-mr into PARQUET-423-2
745666e [Niels Basjes] Undo needless change
94e0c7a [Niels Basjes] PARQUET-423: Further optimize logging performance
b72f924 [Niels Basjes] PARQUET-423: Improved the performance
cb7eb61 [Niels Basjes] PARQUET-423: Workaround AVRO errors
7d161b3 [Niels Basjes] PARQUET-423: Restore the old (obsolete) Log class
05d6a47 [Niels Basjes] PARQUET-423: Replace old Log class with SLF4J Logging
692ebfb [Niels Basjes] Undo needless change
f1ede3d [Niels Basjes] PARQUET-423: Further optimize logging performance
a0c6b59 [Niels Basjes] PARQUET-423: Improved the performance
67bef9b [Niels Basjes] PARQUET-423: Workaround AVRO errors
87cd64f [Niels Basjes] PARQUET-423: Restore the old (obsolete) Log class
96d97d5 [Niels Basjes] PARQUET-423: Replace old Log class with SLF4J Logging
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants