Change books to book in Hadoop InputFormat example #159

mattroberts297 · 2016-08-24T10:40:17Z

The example books.xml file does not contain a books element. It does
contain a catalog element and many book elements. Previously,
records.count() would return 0. Now, it would return 12.

The example books.xml file does not contain a books element. It does contain a catalog element and many book elements. Previously, records.count() would return 0. Now, it would return 12.

HyukjinKwon · 2016-08-24T10:46:29Z

Nice catch! Could you please check if there are similar instances in the documentation?

codecov-io · 2016-08-24T10:59:05Z

Current coverage is 90.19% (diff: 100%)

Merging #159 into master will not change coverage

@@             master       #159   diff @@
==========================================
  Files            14         14          
  Lines           663        663          
  Methods         605        605          
  Messages          0          0          
  Branches         58         58          
==========================================
  Hits            598        598          
  Misses           65         65          
  Partials          0          0

Powered by Codecov. Last update 0c74b41...08c2beb

mattroberts297 · 2016-08-24T15:02:41Z

Ah yes. You're correct there are a few other usages of <books> and </books> under the Features heading, but I think they should be changed to <catalog> and </catalog> respectively. I will update the PR.

mattroberts297 · 2016-08-24T15:23:49Z

Having looked at the diff of my update prior to commit. The usages of <books> and </books> under Features are currently correct because the examples are given inline and before the example books.xml file is linked to:

-* `rowTag`: The row tag of your xml files to treat as a row. For example, in this xml `<books> <book><book> ...</books>`, the appropriate value would be `book`. Default is `ROW`.
+* `rowTag`: The row tag of your xml files to treat as a row. For example, in this xml `<catalog> <book><book> ...</catalog>`, the appropriate value would be `book`. Default is `ROW`.

That said, they'd still be correct if I did update and consistent with books.xml. Completely up to you.

Also, in the API examples where the file is written out the root element is changed from catalog to books. I think it's nice to demonstrate that this can be done, so might be best to leave those as books. Unless your original intention was to keep the root element the same of course!

HyukjinKwon · 2016-08-25T01:07:46Z

Yup, thanks. Let's just merge this one. Thanks! merging to master.

HyukjinKwon · 2016-08-25T01:10:15Z

README.md

+sc.hadoopConfiguration.set(XmlInputFormat.END_TAG_KEY, "</book>")
 sc.hadoopConfiguration.set(XmlInputFormat.ENCODING_KEY, "utf-8")

 val records = context.newAPIHadoopFile(


Oh, wait, do you mind just to change val records = context.newAPIHadoopFile( to val records = sc.newAPIHadoopFile(? It'd be great if typos are fixed together in this example!

Ah yes great spot. Will do... there's a short hand for this in Spark 1.6+, but I think it's best to just document that on this PR because I'm not sure about backwards compatibility:

sc.newAPIHadoopFile[LongWritable, Text, XmlInputFormat](path)

Previously, if a developer copied and pasted the solution they would have had to rename context to sc or vice versa or create to variables. Now, it "just works", provided their SparkContext is named sc.

HyukjinKwon · 2016-08-25T08:03:41Z

Yeap, I am going to merge this. Thanks.

This PR prepares the release for 0.3.4. This will include the changes below: - Produces correct order of columns for nested rows when user specifies a schema 527b976 - No value in nested struct causes arrayIndexOutOfBounds (19eb277) - `compression` aslias for `codec` option #145 - Remove dead codes, #144 - Fix nested element with name of parent bug, #161 - Minor documentation changes - #159 and #143 - Ignore comments even when it is surrounded white spaces #166 Author: hyukjinkwon <gurwls223@gmail.com> Closes #146 from HyukjinKwon/version-0.3.4.

Change books to book in Hadoop InputFormat example

34e60a9

The example books.xml file does not contain a books element. It does contain a catalog element and many book elements. Previously, records.count() would return 0. Now, it would return 12.

HyukjinKwon reviewed Aug 25, 2016
View reviewed changes

Change context to sc in Hadoop InputFormat example

08c2beb

Previously, if a developer copied and pasted the solution they would have had to rename context to sc or vice versa or create to variables. Now, it "just works", provided their SparkContext is named sc.

HyukjinKwon closed this in a50948e Aug 25, 2016

HyukjinKwon mentioned this pull request Aug 31, 2016

Changes for 0.3.4 release #146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change books to book in Hadoop InputFormat example #159

Change books to book in Hadoop InputFormat example #159

Uh oh!

mattroberts297 commented Aug 24, 2016

Uh oh!

HyukjinKwon commented Aug 24, 2016 •

edited

Loading

Uh oh!

codecov-io commented Aug 24, 2016 •

edited

Loading

Uh oh!

mattroberts297 commented Aug 24, 2016

Uh oh!

mattroberts297 commented Aug 24, 2016

Uh oh!

HyukjinKwon commented Aug 25, 2016

Uh oh!

HyukjinKwon Aug 25, 2016

Uh oh!

mattroberts297 Aug 25, 2016 •

edited

Loading

Uh oh!

HyukjinKwon commented Aug 25, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Change books to book in Hadoop InputFormat example #159

Change books to book in Hadoop InputFormat example #159

Uh oh!

Conversation

mattroberts297 commented Aug 24, 2016

Uh oh!

HyukjinKwon commented Aug 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Aug 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 90.19% (diff: 100%)

Uh oh!

mattroberts297 commented Aug 24, 2016

Uh oh!

mattroberts297 commented Aug 24, 2016

Uh oh!

HyukjinKwon commented Aug 25, 2016

Uh oh!

HyukjinKwon Aug 25, 2016

Choose a reason for hiding this comment

Uh oh!

mattroberts297 Aug 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Aug 25, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HyukjinKwon commented Aug 24, 2016 •

edited

Loading

codecov-io commented Aug 24, 2016 •

edited

Loading

mattroberts297 Aug 25, 2016 •

edited

Loading