-
Notifications
You must be signed in to change notification settings - Fork 227
Change books to book in Hadoop InputFormat example #159
Conversation
The example books.xml file does not contain a books element. It does contain a catalog element and many book elements. Previously, records.count() would return 0. Now, it would return 12.
|
Nice catch! Could you please check if there are similar instances in the documentation? |
Current coverage is 90.19% (diff: 100%)@@ master #159 diff @@
==========================================
Files 14 14
Lines 663 663
Methods 605 605
Messages 0 0
Branches 58 58
==========================================
Hits 598 598
Misses 65 65
Partials 0 0
|
|
Ah yes. You're correct there are a few other usages of |
|
Having looked at the diff of my update prior to commit. The usages of -* `rowTag`: The row tag of your xml files to treat as a row. For example, in this xml `<books> <book><book> ...</books>`, the appropriate value would be `book`. Default is `ROW`.
+* `rowTag`: The row tag of your xml files to treat as a row. For example, in this xml `<catalog> <book><book> ...</catalog>`, the appropriate value would be `book`. Default is `ROW`.That said, they'd still be correct if I did update and consistent with books.xml. Completely up to you. Also, in the API examples where the file is written out the root element is changed from |
|
Yup, thanks. Let's just merge this one. Thanks! merging to master. |
README.md
Outdated
| sc.hadoopConfiguration.set(XmlInputFormat.END_TAG_KEY, "</book>") | ||
| sc.hadoopConfiguration.set(XmlInputFormat.ENCODING_KEY, "utf-8") | ||
|
|
||
| val records = context.newAPIHadoopFile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait, do you mind just to change val records = context.newAPIHadoopFile( to val records = sc.newAPIHadoopFile(? It'd be great if typos are fixed together in this example!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes great spot. Will do... there's a short hand for this in Spark 1.6+, but I think it's best to just document that on this PR because I'm not sure about backwards compatibility:
sc.newAPIHadoopFile[LongWritable, Text, XmlInputFormat](path)Previously, if a developer copied and pasted the solution they would have had to rename context to sc or vice versa or create to variables. Now, it "just works", provided their SparkContext is named sc.
|
Yeap, I am going to merge this. Thanks. |
This PR prepares the release for 0.3.4. This will include the changes below: - Produces correct order of columns for nested rows when user specifies a schema 527b976 - No value in nested struct causes arrayIndexOutOfBounds (19eb277) - `compression` aslias for `codec` option #145 - Remove dead codes, #144 - Fix nested element with name of parent bug, #161 - Minor documentation changes - #159 and #143 - Ignore comments even when it is surrounded white spaces #166 Author: hyukjinkwon <gurwls223@gmail.com> Closes #146 from HyukjinKwon/version-0.3.4.
The example books.xml file does not contain a
bookselement. It doescontain a
catalogelement and manybookelements. Previously,records.count()would return0. Now, it would return12.