From 34e60a94ae31f6d99bc194dd631b90dee9473d06 Mon Sep 17 00:00:00 2001 From: Matt Roberts Date: Wed, 24 Aug 2016 11:32:36 +0100 Subject: [PATCH 1/2] Change books to book in Hadoop InputFormat example The example books.xml file does not contain a books element. It does contain a catalog element and many book elements. Previously, records.count() would return 0. Now, it would return 12. --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 48688e20..cf1bb2cf 100644 --- a/README.md +++ b/README.md @@ -475,8 +475,8 @@ which you may make direct use of as follows: import com.databricks.spark.xml.XmlInputFormat // This will detect the tags including attributes -sc.hadoopConfiguration.set(XmlInputFormat.START_TAG_KEY, "") -sc.hadoopConfiguration.set(XmlInputFormat.END_TAG_KEY, "") +sc.hadoopConfiguration.set(XmlInputFormat.START_TAG_KEY, "") +sc.hadoopConfiguration.set(XmlInputFormat.END_TAG_KEY, "") sc.hadoopConfiguration.set(XmlInputFormat.ENCODING_KEY, "utf-8") val records = context.newAPIHadoopFile( From 08c2bebc0250f1f701c3c0c9ad2e4f5e9b87ac59 Mon Sep 17 00:00:00 2001 From: Matt Roberts Date: Thu, 25 Aug 2016 08:53:39 +0100 Subject: [PATCH 2/2] Change context to sc in Hadoop InputFormat example Previously, if a developer copied and pasted the solution they would have had to rename context to sc or vice versa or create to variables. Now, it "just works", provided their SparkContext is named sc. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index cf1bb2cf..c7a2c6a9 100644 --- a/README.md +++ b/README.md @@ -479,7 +479,7 @@ sc.hadoopConfiguration.set(XmlInputFormat.START_TAG_KEY, "") sc.hadoopConfiguration.set(XmlInputFormat.END_TAG_KEY, "") sc.hadoopConfiguration.set(XmlInputFormat.ENCODING_KEY, "utf-8") -val records = context.newAPIHadoopFile( +val records = sc.newAPIHadoopFile( path, classOf[XmlInputFormat], classOf[LongWritable],