Skip to content
This repository was archived by the owner on Mar 24, 2025. It is now read-only.

Conversation

@mattroberts297
Copy link
Contributor

The example books.xml file does not contain a books element. It does
contain a catalog element and many book elements. Previously,
records.count() would return 0. Now, it would return 12.

The example books.xml file does not contain a books element. It does
contain a catalog element and many book elements. Previously,
records.count() would return 0. Now, it would return 12.
@HyukjinKwon
Copy link
Member

HyukjinKwon commented Aug 24, 2016

Nice catch! Could you please check if there are similar instances in the documentation?

@codecov-io
Copy link

codecov-io commented Aug 24, 2016

Current coverage is 90.19% (diff: 100%)

Merging #159 into master will not change coverage

@@             master       #159   diff @@
==========================================
  Files            14         14          
  Lines           663        663          
  Methods         605        605          
  Messages          0          0          
  Branches         58         58          
==========================================
  Hits            598        598          
  Misses           65         65          
  Partials          0          0          

Powered by Codecov. Last update 0c74b41...08c2beb

@mattroberts297
Copy link
Contributor Author

Ah yes. You're correct there are a few other usages of <books> and </books> under the Features heading, but I think they should be changed to <catalog> and </catalog> respectively. I will update the PR.

@mattroberts297
Copy link
Contributor Author

Having looked at the diff of my update prior to commit. The usages of <books> and </books> under Features are currently correct because the examples are given inline and before the example books.xml file is linked to:

-* `rowTag`: The row tag of your xml files to treat as a row. For example, in this xml `<books> <book><book> ...</books>`, the appropriate value would be `book`. Default is `ROW`.
+* `rowTag`: The row tag of your xml files to treat as a row. For example, in this xml `<catalog> <book><book> ...</catalog>`, the appropriate value would be `book`. Default is `ROW`.

That said, they'd still be correct if I did update and consistent with books.xml. Completely up to you.

Also, in the API examples where the file is written out the root element is changed from catalog to books. I think it's nice to demonstrate that this can be done, so might be best to leave those as books. Unless your original intention was to keep the root element the same of course!

@HyukjinKwon
Copy link
Member

Yup, thanks. Let's just merge this one. Thanks! merging to master.

README.md Outdated
sc.hadoopConfiguration.set(XmlInputFormat.END_TAG_KEY, "</book>")
sc.hadoopConfiguration.set(XmlInputFormat.ENCODING_KEY, "utf-8")

val records = context.newAPIHadoopFile(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, do you mind just to change val records = context.newAPIHadoopFile( to val records = sc.newAPIHadoopFile(? It'd be great if typos are fixed together in this example!

Copy link
Contributor Author

@mattroberts297 mattroberts297 Aug 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes great spot. Will do... there's a short hand for this in Spark 1.6+, but I think it's best to just document that on this PR because I'm not sure about backwards compatibility:

sc.newAPIHadoopFile[LongWritable, Text, XmlInputFormat](path)

Previously, if a developer copied and pasted the solution they would
have had to rename context to sc or vice versa or create to variables.
Now, it "just works", provided their SparkContext is named sc.
@HyukjinKwon
Copy link
Member

Yeap, I am going to merge this. Thanks.

HyukjinKwon added a commit that referenced this pull request Sep 10, 2016
This PR prepares the release for 0.3.4.

This will include the changes below:

- Produces correct order of columns for nested rows when user specifies a schema 527b976
- No value in nested struct causes arrayIndexOutOfBounds (19eb277)
- `compression` aslias for `codec` option #145
- Remove dead codes, #144
- Fix nested element with name of parent bug, #161
- Minor documentation changes - #159 and #143
- Ignore comments even when it is surrounded white spaces #166

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #146 from HyukjinKwon/version-0.3.4.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants