Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
306 commits
Select commit Hold shift + click to select a range
d810415
[SPARK-17100] [SQL] fix Python udf in filter on top of outer join
Sep 19, 2016
e719b1c
[SPARK-17160] Properly escape field names in code-generated error mes…
JoshRosen Sep 20, 2016
26145a5
[SPARK-17163][ML] Unified LogisticRegression interface
sethah Sep 20, 2016
be9d57f
[SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata
petermaxlee Sep 20, 2016
f039d96
Revert "[SPARK-17513][SQL] Make StreamExecution garbage-collect its m…
cloud-fan Sep 20, 2016
4a426ff
[SPARK-17437] Add uiWebUrl to JavaSparkContext and pyspark.SparkContext
apetresc Sep 20, 2016
d5ec5db
[SPARK-17502][SQL] Fix Multiple Bugs in DDL Statements on Temporary V…
gatorsmile Sep 20, 2016
eb004c6
[SPARK-17051][SQL] we should use hadoopConf in InsertIntoHiveTable
cloud-fan Sep 20, 2016
a6aade0
[SPARK-15698][SQL][STREAMING] Add the ability to remove the old Metad…
jerryshao Sep 20, 2016
9ac68db
[SPARK-17549][SQL] Revert "[] Only collect table size stat in driver …
yhuai Sep 20, 2016
7e418e9
[SPARK-17611][YARN][TEST] Make shuffle service test really test auth.
Sep 20, 2016
976f3b1
[SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata
petermaxlee Sep 21, 2016
1ea4991
[MINOR][BUILD] Fix CheckStyle Error
weiqingy Sep 21, 2016
e48ebc4
[SPARK-15698][SQL][STREAMING][FOLLW-UP] Fix FileStream source and sin…
jerryshao Sep 21, 2016
61876a4
[CORE][DOC] Fix errors in comments
wangmiao1981 Sep 21, 2016
d3b8869
[SPARK-17585][PYSPARK][CORE] PySpark SparkContext.addFile supports ad…
yanboliang Sep 21, 2016
7654385
[SPARK-17595][MLLIB] Use a bounded priority queue to find synonyms in…
willb Sep 21, 2016
3977223
[SPARK-17617][SQL] Remainder(%) expression.eval returns incorrect res…
clockfly Sep 21, 2016
28fafa3
[SPARK-17599] Prevent ListingFileCatalog from failing if path doesn't…
brkyvz Sep 21, 2016
b366f18
[SPARK-17017][MLLIB][ML] add a chiSquare Selector based on False Posi…
Sep 21, 2016
57dc326
[SPARK-17219][ML] Add NaN value handling in Bucketizer
Sep 21, 2016
25a020b
[SPARK-17583][SQL] Remove uesless rowSeparator variable and set auto-…
HyukjinKwon Sep 21, 2016
dd7561d
[CORE][MINOR] Add minor code change to TaskState and Task
erenavsarogullari Sep 21, 2016
248922f
[SPARK-17590][SQL] Analyze CTE definitions at once and allow CTE subq…
viirya Sep 21, 2016
d7ee122
[SPARK-17418] Prevent kinesis-asl-assembly artifacts from being publi…
JoshRosen Sep 21, 2016
b4a4421
[SPARK-11918][ML] Better error from WLS for cases like singular input
srowen Sep 21, 2016
2cd1bfa
[SPARK-4563][CORE] Allow driver to advertise a different network addr…
Sep 21, 2016
9fcf1c5
[SPARK-17623][CORE] Clarify type of TaskEndReason with a failed task.
squito Sep 21, 2016
8c3ee2b
[SPARK-17512][CORE] Avoid formatting to python path for yarn and meso…
jerryshao Sep 21, 2016
7cbe216
[SPARK-17569] Make StructuredStreaming FileStreamSource batch generat…
brkyvz Sep 22, 2016
c133907
[SPARK-17577][SPARKR][CORE] SparkR support add files to Spark job and…
yanboliang Sep 22, 2016
6902eda
[SPARK-17315][FOLLOW-UP][SPARKR][ML] Fix print of Kolmogorov-Smirnov …
yanboliang Sep 22, 2016
3497ebe
[SPARK-17627] Mark Streaming Providers Experimental
marmbrus Sep 22, 2016
8bde03b
[SPARK-17494][SQL] changePrecision() on compact decimal should respec…
Sep 22, 2016
b50b34f
[SPARK-17609][SQL] SessionCatalog.tableExists should not check temp view
cloud-fan Sep 22, 2016
cb324f6
[SPARK-17425][SQL] Override sameResult in HiveTableScanExec to make R…
watermen Sep 22, 2016
3a80f92
[SPARK-17492][SQL] Fix Reading Cataloged Data Sources without Extendi…
gatorsmile Sep 22, 2016
de7df7d
[SPARK-17625][SQL] set expectedOutputAttributes when converting Simpl…
wzhfy Sep 22, 2016
646f383
[SPARK-17421][DOCS] Documenting the current treatment of MAVEN_OPTS.
frreiss Sep 22, 2016
72d9fba
[SPARK-17281][ML][MLLIB] Add treeAggregateDepth parameter for AFTSurv…
WeichenXu123 Sep 22, 2016
8a02410
[SQL][MINOR] correct the comment of SortBasedAggregationIterator.safe…
cloud-fan Sep 22, 2016
17b72d3
[SPARK-17365][CORE] Remove/Kill multiple executors together to reduce…
Sep 22, 2016
9f24a17
Skip building R vignettes if Spark is not built
shivaram Sep 22, 2016
85d609c
[SPARK-17613] S3A base paths with no '/' at the end return empty Data…
brkyvz Sep 22, 2016
3cdae0f
[SPARK-17638][STREAMING] Stop JVM StreamingContext when the Python pr…
zsxwing Sep 22, 2016
0d63487
[SPARK-17616][SQL] Support a single distinct aggregate combined with …
hvanhovell Sep 22, 2016
f4f6bd8
[SPARK-16240][ML] ML persistence backward compatibility for LDA
GayathriMurali Sep 22, 2016
a166196
[SPARK-17569][SPARK-17569][TEST] Make the unit test added for work again
brkyvz Sep 22, 2016
79159a1
[SPARK-17635][SQL] Remove hardcode "agg_plan" in HashAggregateExec
Sep 23, 2016
a4aeb76
[SPARK-17639][BUILD] Add jce.jar to buildclasspath when building.
Sep 23, 2016
947b8c6
[SPARK-16719][ML] Random Forests should communicate fewer trees on ea…
jkbradley Sep 23, 2016
62ccf27
[SPARK-17640][SQL] Avoid using -1 as the default batchId for FileStre…
zsxwing Sep 23, 2016
5c5396c
[BUILD] Closes some stale PRs
HyukjinKwon Sep 23, 2016
90d5754
[SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulator API on top …
holdenk Sep 23, 2016
f89808b
[SPARK-17499][SPARKR][ML][MLLIB] make the default params in sparkR sp…
WeichenXu123 Sep 23, 2016
f62ddc5
[SPARK-17210][SPARKR] sparkr.zip is not distributed to executors when…
zjffdu Sep 23, 2016
988c714
[SPARK-17643] Remove comparable requirement from Offset
marmbrus Sep 23, 2016
90a30f4
[SPARK-12221] add cpu time to metrics
jisookim0513 Sep 23, 2016
7c38252
[SPARK-17651][SPARKR] Set R package version number along with mvn
shivaram Sep 23, 2016
f3fe554
[SPARK-10835][ML] Word2Vec should accept non-null string array, in ad…
srowen Sep 24, 2016
248916f
[SPARK-17057][ML] ProbabilisticClassifierModels' thresholds should ha…
srowen Sep 24, 2016
7945dae
[MINOR][SPARKR] Add sparkr-vignettes.html to gitignore.
yanboliang Sep 24, 2016
de333d1
[SPARK-17551][SQL] Add DataFrame API for null ordering
xwu0226 Sep 25, 2016
59d87d2
[SPARK-17650] malformed url's throw exceptions before bricking Executors
brkyvz Sep 26, 2016
ac65139
[SPARK-17017][FOLLOW-UP][ML] Refactor of ChiSqSelector and add ML Pyt…
yanboliang Sep 26, 2016
50b89d0
[SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc
JustinPihony Sep 26, 2016
f234b7c
[SPARK-16356][ML] Add testImplicits for ML unit tests and promote toDF()
HyukjinKwon Sep 26, 2016
bde85f8
[SPARK-17649][CORE] Log how many Spark events got dropped in LiveList…
zsxwing Sep 26, 2016
8135e0e
[SPARK-17153][SQL] Should read partition data when reading new files …
viirya Sep 26, 2016
7c7586a
[SPARK-17652] Fix confusing exception message while reserving capacity
sameeragarwal Sep 26, 2016
00be16d
[Docs] Update spark-standalone.md to fix link
ammills01 Sep 26, 2016
93c743f
[SPARK-17577][FOLLOW-UP][SPARKR] SparkR spark.addFile supports adding…
yanboliang Sep 26, 2016
6ee2842
Fix two comments since Actor is not used anymore.
Sep 27, 2016
85b0a15
[SPARK-15962][SQL] Introduce implementation with a dense format for U…
kiszk Sep 27, 2016
7f16aff
[SPARK-17138][ML][MLIB] Add Python API for multinomial logistic regre…
WeichenXu123 Sep 27, 2016
6a68c5d
[SPARK-16757] Set up Spark caller context to HDFS and YARN
weiqingy Sep 27, 2016
5de1737
[SPARK-16777][SQL] Do not use deprecated listType API in ParquetSchem…
HyukjinKwon Sep 27, 2016
2cac3b2
[SPARK-16516][SQL] Support for pushing down filters for decimal and t…
HyukjinKwon Sep 27, 2016
120723f
[SPARK-17682][SQL] Mark children as final for unary, binary, leaf exp…
rxin Sep 27, 2016
2ab24a7
[SPARK-17660][SQL] DESC FORMATTED for VIEW Lacks View Definition
gatorsmile Sep 27, 2016
67c7305
[SPARK-17677][SQL] Break WindowExec.scala into multiple files
rxin Sep 27, 2016
2f84a68
[SPARK-17618] Guard against invalid comparisons between UnsafeRow and…
JoshRosen Sep 27, 2016
e7bce9e
[SPARK-17056][CORE] Fix a wrong assert regarding unroll memory in Mem…
viirya Sep 27, 2016
b03b4ad
[SPARK-17666] Ensure that RecordReaders are closed by data source fil…
JoshRosen Sep 28, 2016
4a83395
[SPARK-17499][SPARKR][FOLLOWUP] Check null first for layers in spark.…
HyukjinKwon Sep 28, 2016
b2a7eed
[SPARK-17017][ML][MLLIB][ML][DOC] Updated the ml/mllib feature select…
lins05 Sep 28, 2016
2190037
[MINOR][PYSPARK][DOCS] Fix examples in PySpark documentation
HyukjinKwon Sep 28, 2016
46d1203
[SPARK-17644][CORE] Do not add failedStages when abortStage for fetch…
scwf Sep 28, 2016
a6cfa3f
[SPARK-17673][SQL] Incorrect exchange reuse with RowDataSourceScan
ericl Sep 28, 2016
557d6e3
[SPARK-17713][SQL] Move row-datasource related tests out of JDBCSuite
ericl Sep 28, 2016
7d09232
[SPARK-17641][SQL] Collect_list/Collect_set should not collect null v…
hvanhovell Sep 28, 2016
7dfad4b
[SPARK-17710][HOTFIX] Fix ClassCircularityError in ReplSuite tests in…
weiqingy Sep 29, 2016
37eb918
[SPARK-17712][SQL] Fix invalid pushdown of data-independent filters b…
JoshRosen Sep 29, 2016
a19a1bb
[SPARK-16356][FOLLOW-UP][ML] Enforce ML test of exception for local/d…
yanboliang Sep 29, 2016
f7082ac
[SPARK-17704][ML][MLLIB] ChiSqSelector performance improvement.
yanboliang Sep 29, 2016
b35b0db
[SPARK-17614][SQL] sparkSession.read() .jdbc(***) use the sql syntax …
srowen Sep 29, 2016
b2e9731
[MINOR][DOCS] Fix th doc. of spark-streaming with kinesis
maropu Sep 29, 2016
9582004
[DOCS] Reorganize explanation of Accumulators and Broadcast Variables
Sep 29, 2016
7f779e7
[SPARK-17648][CORE] TaskScheduler really needs offers to be an Indexe…
squito Sep 29, 2016
cb87b3c
[SPARK-17672] Spark 2.0 history server web Ui takes too long for a si…
wgtmac Sep 29, 2016
027dea8
[SPARK-17715][SCHEDULER] Make task launch logs DEBUG
bchocho Sep 29, 2016
fe33121
[SPARK-17699] Support for parsing JSON string columns
marmbrus Sep 29, 2016
566d7f2
[SPARK-17653][SQL] Remove unnecessary distincts in multiple unions
viirya Sep 29, 2016
4ecc648
[SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQL syntax
dongjoon-hyun Sep 29, 2016
29396e7
[SPARK-17721][MLLIB][ML] Fix for multiplying transposed SparseMatrix …
bwahlgreen Sep 29, 2016
3993ebc
[SPARK-17676][CORE] FsHistoryProvider should ignore hidden files
squito Sep 29, 2016
39eb3bb
[SPARK-17412][DOC] All test should not be run by `root` or any admin …
dongjoon-hyun Sep 29, 2016
2f73956
[SPARK-17697][ML] Fixed bug in summary calculations that pattern matc…
BryanCutler Sep 29, 2016
74ac1c4
[SPARK-17717][SQL] Add exist/find methods to Catalog.
hvanhovell Sep 30, 2016
1fad559
[SPARK-14077][ML] Refactor NaiveBayes to support weighted instances
zhengruifeng Sep 30, 2016
8e491af
[SPARK-14077][ML][FOLLOW-UP] Revert change for NB Model's Load to mai…
zhengruifeng Sep 30, 2016
f327e16
[SPARK-17738] [SQL] fix ARRAY/MAP in columnar cache
Sep 30, 2016
81455a9
[SPARK-17703][SQL] Add unnamed version of addReferenceObj for minor o…
ueshin Oct 1, 2016
a26afd5
[SPARK-15353][CORE] Making peer selection for block replication plugg…
shubhamchopra Oct 1, 2016
aef506e
[SPARK-17739][SQL] Collapse adjacent similar Window operators
dongjoon-hyun Oct 1, 2016
15e9bbb
[MINOR][DOC] Add an up-to-date description for default serialization …
dongjoon-hyun Oct 1, 2016
4bcd9b7
[SPARK-17740] Spark tests should mock / interpose HDFS to ensure that…
ericl Oct 1, 2016
af6ece3
[SPARK-17717][SQL] Add Exist/find methods to Catalog [FOLLOW-UP]
hvanhovell Oct 1, 2016
b88cb63
[SPARK-17704][ML][MLLIB] ChiSqSelector performance improvement.
srowen Oct 1, 2016
f8d7fad
[SPARK-17509][SQL] When wrapping catalyst datatype to Hive data type …
Oct 2, 2016
76dc2d9
[SPARK-14914][CORE][SQL] Skip/fix some test cases on Windows due to l…
taoli91 Oct 2, 2016
de3f71e
[SPARK-17598][SQL][WEB UI] User-friendly name for Spark Thrift Server…
ajbozarth Oct 3, 2016
a27033c
[SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,…
jagadeesanas2 Oct 3, 2016
7bf9212
[SPARK-17073][SQL] generate column-level statistics
wzhfy Oct 3, 2016
1dd68d3
[SPARK-17718][DOCS][MLLIB] Make loss function formulation label note …
srowen Oct 3, 2016
1f31bda
[SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConverter patch
Oct 3, 2016
d8399b6
[SPARK-17587][PYTHON][MLLIB] SparseVector __getitem__ should follow _…
zero323 Oct 4, 2016
2bbecde
[SPARK-17753][SQL] Allow a complex expression as the input a value ba…
hvanhovell Oct 4, 2016
c571cfb
[SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentExc…
dongjoon-hyun Oct 4, 2016
b1b4727
[SPARK-17702][SQL] Code generation including too many mutable states …
ueshin Oct 4, 2016
d2dc8c4
[SPARK-17773] Input/Output] Add VoidObjectInspector
seyfe Oct 4, 2016
126baa8
[SPARK-17559][MLLIB] persist edges if their storage level is non in P…
Oct 4, 2016
8e8de00
[SPARK-17671][WEBUI] Spark 2.0 history server summary page is slow ev…
srowen Oct 4, 2016
7d51608
[SPARK-16962][CORE][SQL] Fix misaligned record accesses for SPARC arc…
sumansomasundar Oct 4, 2016
c17f971
[SPARK-17744][ML] Parity check between the ml and mllib test suites f…
zhengruifeng Oct 4, 2016
068c198
[SPARKR][DOC] minor formatting and output cleanup for R vignettes
felixcheung Oct 4, 2016
8d969a2
[SPARK-17549][SQL] Only collect table size stat in driver for cached …
Oct 4, 2016
a99743d
[SPARK-17495][SQL] Add Hash capability semantically equivalent to Hive's
tejasapatil Oct 5, 2016
c9fe10d
[SPARK-17658][SPARKR] read.df/write.df API taking path optionally in …
HyukjinKwon Oct 5, 2016
89516c1
[SPARK-17258][SQL] Parse scientific decimal literals as decimals
hvanhovell Oct 5, 2016
6a05eb2
[SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE
dongjoon-hyun Oct 5, 2016
9df54f5
[SPARK-17239][ML][DOC] Update user guide for multiclass logistic regr…
sethah Oct 5, 2016
221b418
[SPARK-17778][TESTS] Mock SparkContext to reduce memory usage of Bloc…
zsxwing Oct 5, 2016
5fd54b9
[SPARK-17758][SQL] Last returns wrong result in case of empty partition
hvanhovell Oct 5, 2016
9293734
[SPARK-17346][SQL] Add Kafka source for Structured Streaming
zsxwing Oct 5, 2016
b678e46
[SPARK-17346][SQL][TEST-MAVEN] Generate the sql test jar to fix the m…
zsxwing Oct 6, 2016
7aeb20b
[MINOR][ML] Avoid 2D array flatten in NB training.
yanboliang Oct 6, 2016
5e9f32d
[BUILD] Closing some stale PRs
HyukjinKwon Oct 6, 2016
92b7e57
[SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithmetic.
dongjoon-hyun Oct 6, 2016
79accf4
[SPARK-17798][SQL] Remove redundant Experimental annotations in sql.s…
rxin Oct 6, 2016
9a48e60
[SPARK-17780][SQL] Report Throwable to user in StreamExecution
zsxwing Oct 6, 2016
49d11d4
[SPARK-17803][TESTS] Upgrade docker-client dependency
ckadner Oct 6, 2016
3713bb1
[SPARK-17792][ML] L-BFGS solver for linear regression does not accept…
sethah Oct 7, 2016
bcaa799
[SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list …
BryanCutler Oct 7, 2016
18bf9d2
[SPARK-17782][STREAMING][BUILD] Add Kafka 0.10 project to build modules
hvanhovell Oct 7, 2016
24097d8
[SPARK-17795][WEB UI] Sorting on stage or job tables doesn’t reload p…
ajbozarth Oct 7, 2016
2b01d3c
[SPARK-16960][SQL] Deprecate approxCountDistinct, toDegrees and toRad…
HyukjinKwon Oct 7, 2016
e56614c
[SPARK-16827] Stop reporting spill metrics as shuffle metrics
bchocho Oct 7, 2016
dd16b52
[SPARK-17800] Introduce InterfaceStability annotation
rxin Oct 7, 2016
cff5607
[SPARK-17707][WEBUI] Web UI prevents spark-submit application to be f…
srowen Oct 7, 2016
aa3a684
[SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelationProvider
HyukjinKwon Oct 7, 2016
bb1aaf2
[SPARK-16411][SQL][STREAMING] Add textFile to Structured Streaming.
ScrapCodes Oct 7, 2016
9d8ae85
[SPARK-17665][SPARKR] Support options/mode all for read/write APIs an…
HyukjinKwon Oct 7, 2016
2badb58
[SPARK-15621][SQL] Support spilling for Python UDF
Oct 7, 2016
97594c2
[SPARK-17761][SQL] Remove MutableRow
hvanhovell Oct 7, 2016
94b24b8
[SPARK-17806] [SQL] fix bug in join key rewritten in HashJoin
Oct 7, 2016
24850c9
[HOTFIX][BUILD] Do not use contains in Option in JdbcRelationProvider
HyukjinKwon Oct 8, 2016
471690f
[MINOR][ML] remove redundant comment in LogisticRegression
wangmiao1981 Oct 8, 2016
362ba4b
[SPARK-17793][WEB UI] Sorting on the description on the Job or Stage …
ajbozarth Oct 8, 2016
4201ddc
[SPARK-17768][CORE] Small (Sum,Count,Mean)Evaluator problems and subo…
srowen Oct 8, 2016
8a6bbe0
[MINOR][SQL] Use resource path for test_script.sh
weiqingy Oct 8, 2016
26fbca4
[SPARK-17832][SQL] TableIdentifier.quotedString creates un-parseable …
jiangxb1987 Oct 10, 2016
1659003
[SPARK-17741][SQL] Grammar to parse top level and nested data fields …
jiangxb1987 Oct 10, 2016
23ddff4
[SPARK-17338][SQL] add global temp view
cloud-fan Oct 10, 2016
7e16c94
[HOT-FIX][SQL][TESTS] Remove unused function in `SparkSqlParserSuite`
jiangxb1987 Oct 10, 2016
4bafaca
[SPARK-17417][CORE] Fix # of partitions for Reliable RDD checkpointing
dhruve Oct 10, 2016
689de92
[SPARK-17830] Annotate spark.sql package with InterfaceStability
rxin Oct 10, 2016
3f8a022
[SPARK-17828][DOCS] Remove unused generate-changelist.py
a-roberts Oct 10, 2016
29f186b
[SPARK-14082][MESOS] Enable GPU support with Mesos
tnachen Oct 10, 2016
03c4020
[SPARK-14610][ML] Remove superfluous split for continuous features in…
sethah Oct 11, 2016
d5ec4a3
[SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite
Oct 11, 2016
90217f9
[SPARK-16896][SQL] Handle duplicated field names in header consistent…
HyukjinKwon Oct 11, 2016
19a5bae
[SPARK-17816][CORE] Fix ConcurrentModificationException issue in Bloc…
seyfe Oct 11, 2016
0c0ad43
[SPARK-17719][SPARK-17776][SQL] Unify and tie up options in a single …
HyukjinKwon Oct 11, 2016
b515768
[SPARK-17844] Simplify DataFrame API for defining frame boundaries in…
rxin Oct 11, 2016
19401a2
[SPARK-15957][ML] RFormula supports forcing to index label
yanboliang Oct 11, 2016
658c714
[SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13
BryanCutler Oct 11, 2016
7388ad9
[SPARK-17338][SQL][FOLLOW-UP] add global temp view
cloud-fan Oct 11, 2016
3694ba4
[SPARK-17864][SQL] Mark data type APIs as stable (not DeveloperApi)
rxin Oct 11, 2016
c8c0906
[SPARK-17821][SQL] Support And and Or in Expression Canonicalize
viirya Oct 11, 2016
75b9e35
[SPARK-17346][SQL][TESTS] Fix the flaky topic deletion in KafkaSource…
zsxwing Oct 11, 2016
07508bd
[SPARK-17817][PYSPARK] PySpark RDD Repartitioning Results in Highly S…
viirya Oct 11, 2016
23405f3
[SPARK-15153][ML][SPARKR] Fix SparkR spark.naiveBayes error when labe…
yanboliang Oct 11, 2016
5b77e66
[SPARK-17387][PYSPARK] Creating SparkContext() from python without sp…
zjffdu Oct 11, 2016
b9a1471
[SPARK-17720][SQL] introduce static SQL conf
cloud-fan Oct 12, 2016
299eb04
Fix hadoop.version in building-spark.md
apivovarov Oct 12, 2016
b512f04
[SPARK-17880][DOC] The url linking to `AccumulatorV2` in the document…
sarutak Oct 12, 2016
c264ef9
[SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group…
koeninger Oct 12, 2016
8d33e1e
[SPARK-11560][MLLIB] Optimize KMeans implementation / remove 'runs'
srowen Oct 12, 2016
8880fd1
[SPARK-14761][SQL] Reject invalid join methods when join columns are …
Oct 12, 2016
d5580eb
[SPARK-17884][SQL] To resolve Null pointer exception when casting fro…
priyankagar Oct 12, 2016
5cc503f
[SPARK-17790][SPARKR] Support for parallelizing R data.frame larger t…
falaki Oct 12, 2016
f8062b6
[SPARK-17840][DOCS] Add some pointers for wiki/CONTRIBUTING.md in REA…
srowen Oct 12, 2016
eb69335
[BUILD] Closing stale PRs
srowen Oct 12, 2016
47776e7
[SPARK-17850][CORE] Add a flag to ignore corrupt files
zsxwing Oct 12, 2016
9ce7d3e
[SPARK-17675][CORE] Expand Blacklist for TaskSets
squito Oct 12, 2016
f9a56a1
[SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition …
koeninger Oct 12, 2016
6f20a92
[SPARK-17845] [SQL] More self-evident window function frame boundary API
rxin Oct 12, 2016
0d4a695
[SPARK-17745][ML][PYSPARK] update NB python api - add weight col para…
WeichenXu123 Oct 13, 2016
21cb59f
[SPARK-17835][ML][MLLIB] Optimize NaiveBayes mllib wrapper to elimina…
yanboliang Oct 13, 2016
edeb51a
[SPARK-17876] Write StructuredStreaming WAL to a stream instead of ma…
brkyvz Oct 13, 2016
064d665
[SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicates
viirya Oct 13, 2016
7222a25
minor doc fix for Row.scala
david-weiluo-ren Oct 13, 2016
6f2fa6c
[SPARK-11272][WEB UI] Add support for downloading event logs from His…
ajbozarth Oct 13, 2016
db8784f
[SPARK-17899][SQL] add a debug mode to keep raw table properties in H…
cloud-fan Oct 13, 2016
7bf8a40
[SPARK-17686][CORE] Support printing out scala and java version with …
jerryshao Oct 13, 2016
0a8e51a
[SPARK-17657][SQL] Disallow Users to Change Table Type
gatorsmile Oct 13, 2016
04d417a
[SPARK-17830][SQL] Annotate remaining SQL APIs with InterfaceStability
rxin Oct 13, 2016
84f149e
[SPARK-17827][SQL] maxColLength type should be Int for String and Binary
robbinspg Oct 13, 2016
08eac35
[SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource…
zsxwing Oct 13, 2016
7106866
[SPARK-17731][SQL][STREAMING] Metrics for structured streaming
tdas Oct 13, 2016
adc1124
[SPARK-17661][SQL] Consolidate various listLeafFiles implementations
petermaxlee Oct 13, 2016
9dc0ca0
[SPARK-17368][SQL] Add support for value class serialization and dese…
jodersky Oct 14, 2016
44cbb61
[SPARK-15957][FOLLOW-UP][ML][PYSPARK] Add Python API for RFormula for…
yanboliang Oct 14, 2016
8543996
[SPARK-17927][SQL] Remove dead code in WriterContainer.
rxin Oct 14, 2016
6c29b3d
[SPARK-17925][SQL] Break fileSourceInterfaces.scala into multiple pieces
rxin Oct 14, 2016
2fb12b0
[SPARK-17903][SQL] MetastoreRelation should talk to external catalog …
cloud-fan Oct 14, 2016
1db8fea
[SPARK-15402][ML][PYSPARK] PySpark ml.evaluation should support save/…
yanboliang Oct 14, 2016
a1b136d
[SPARK-14634][ML] Add BisectingKMeansSummary
zhengruifeng Oct 14, 2016
c8b612d
[SPARK-17870][MLLIB][ML] Change statistic to pValue for SelectKBest a…
Oct 14, 2016
28b645b
[SPARK-17855][CORE] Remove query string from jar url
invkrh Oct 14, 2016
7486442
[SPARK-17073][SQL][FOLLOWUP] generate column-level statistics
Oct 14, 2016
a0ebcb3
[DOC] Fix typo in sql hive doc
dhruve Oct 14, 2016
fa37877
Typo: form -> from
ash211 Oct 14, 2016
05800b4
[TEST] Ignore flaky test in StreamingQueryListenerSuite
tdas Oct 14, 2016
de1c1ca
[SPARK-17941][ML][TEST] Logistic regression tests should use sample w…
sethah Oct 14, 2016
7ab8624
[SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Cr…
dilipbiswal Oct 14, 2016
522dd0d
Revert "[SPARK-17620][SQL] Determine Serde by hive.default.fileformat…
yhuai Oct 14, 2016
da9aeb0
[SPARK-17863][SQL] should not add column into Distinct
Oct 14, 2016
5aeb738
[SPARK-16063][SQL] Add storageLevel to Dataset
Oct 14, 2016
f00df40
[SPARK-11775][PYSPARK][SQL] Allow PySpark to register Java UDF
zjffdu Oct 14, 2016
72adfbf
[SPARK-17900][SQL] Graduate a list of Spark SQL APIs to stable
rxin Oct 14, 2016
2d96d35
[SPARK-17946][PYSPARK] Python crossJoin API similar to Scala
srinathshankar Oct 15, 2016
6ce1b67
[SPARK-16980][SQL] Load only catalog table partition metadata require…
Oct 15, 2016
36d81c2
[SPARK-17953][DOCUMENTATION] Fix typo in SparkSession scaladoc
tae-jun Oct 15, 2016
ed14633
[SPARK-17637][SCHEDULER] Packed scheduling for Spark tasks across exe…
Oct 16, 2016
72a6e7a
Revert "[SPARK-17637][SCHEDULER] Packed scheduling for Spark tasks ac…
rxin Oct 16, 2016
59e3eb5
[SPARK-17819][SQL] Support default database in connection URIs for Sp…
dongjoon-hyun Oct 17, 2016
e18d02c
[SPARK-17947][SQL] Add Doc and Comment about spark.sql.debug
gatorsmile Oct 17, 2016
56b0f5f
[MINOR][SQL] Add prettyName for current_database function
weiqingy Oct 17, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 1 addition & 3 deletions .github/PULL_REQUEST_TEMPLATE
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,9 @@

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
R-unit-tests.log
R/unit-tests.out
R/cran-check.out
R/pkg/vignettes/sparkr-vignettes.html
build/*.jar
build/apache-maven*
build/scala*
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ It lists steps that are required before creating a PR. In particular, consider:

- Is the change important and ready enough to ask the community to spend time reviewing?
- Have you searched for existing, related JIRAs and pull requests?
- Is this a new feature that can stand alone as a package on http://spark-packages.org ?
- Is this a new feature that can stand alone as a [third party project](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects) ?
- Is the change being proposed clearly explained and motivated?

When you contribute code, you affirm that the contribution is your original work and that you
Expand Down
30 changes: 28 additions & 2 deletions R/create-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,26 @@
# limitations under the License.
#

# Script to create API docs for SparkR
# This requires `devtools` and `knitr` to be installed on the machine.
# Script to create API docs and vignettes for SparkR
# This requires `devtools`, `knitr` and `rmarkdown` to be installed on the machine.

# After running this script the html docs can be found in
# $SPARK_HOME/R/pkg/html
# The vignettes can be found in
# $SPARK_HOME/R/pkg/vignettes/sparkr_vignettes.html

set -o pipefail
set -e

# Figure out where the script is
export FWDIR="$(cd "`dirname "$0"`"; pwd)"
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"

# Required for setting SPARK_SCALA_VERSION
. "${SPARK_HOME}"/bin/load-spark-env.sh

echo "Using Scala $SPARK_SCALA_VERSION"

pushd $FWDIR

# Install the package (this will also generate the Rd files)
Expand All @@ -43,4 +52,21 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); library(knit

popd

# Find Spark jars.
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

# Only create vignettes if Spark JARs exist
if [ -d "$SPARK_JARS_DIR" ]; then
# render creates SparkR vignettes
Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME"
fi

popd
3 changes: 3 additions & 0 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,9 @@ export("as.DataFrame",
"read.parquet",
"read.text",
"spark.lapply",
"spark.addFile",
"spark.getSparkFilesRootDirectory",
"spark.getSparkFiles",
"sql",
"str",
"tableToDF",
Expand Down
61 changes: 45 additions & 16 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,19 @@ setMethod("initialize", "SparkDataFrame", function(.Object, sdf, isCached) {
.Object
})

#' Set options/mode and then return the write object
#' @noRd
setWriteOptions <- function(write, path = NULL, mode = "error", ...) {
options <- varargsToStrEnv(...)
if (!is.null(path)) {
options[["path"]] <- path
}
jmode <- convertToJSaveMode(mode)
write <- callJMethod(write, "mode", jmode)
write <- callJMethod(write, "options", options)
write
}

#' @export
#' @param sdf A Java object reference to the backing Scala DataFrame
#' @param isCached TRUE if the SparkDataFrame is cached
Expand Down Expand Up @@ -727,6 +740,8 @@ setMethod("toJSON",
#'
#' @param x A SparkDataFrame
#' @param path The directory where the file is saved
#' @param mode one of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default)
#' @param ... additional argument(s) passed to the method.
#'
#' @family SparkDataFrame functions
#' @rdname write.json
Expand All @@ -743,8 +758,9 @@ setMethod("toJSON",
#' @note write.json since 1.6.0
setMethod("write.json",
signature(x = "SparkDataFrame", path = "character"),
function(x, path) {
function(x, path, mode = "error", ...) {
write <- callJMethod(x@sdf, "write")
write <- setWriteOptions(write, mode = mode, ...)
invisible(callJMethod(write, "json", path))
})

Expand All @@ -755,6 +771,8 @@ setMethod("write.json",
#'
#' @param x A SparkDataFrame
#' @param path The directory where the file is saved
#' @param mode one of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default)
#' @param ... additional argument(s) passed to the method.
#'
#' @family SparkDataFrame functions
#' @aliases write.orc,SparkDataFrame,character-method
Expand All @@ -771,8 +789,9 @@ setMethod("write.json",
#' @note write.orc since 2.0.0
setMethod("write.orc",
signature(x = "SparkDataFrame", path = "character"),
function(x, path) {
function(x, path, mode = "error", ...) {
write <- callJMethod(x@sdf, "write")
write <- setWriteOptions(write, mode = mode, ...)
invisible(callJMethod(write, "orc", path))
})

Expand All @@ -783,6 +802,8 @@ setMethod("write.orc",
#'
#' @param x A SparkDataFrame
#' @param path The directory where the file is saved
#' @param mode one of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default)
#' @param ... additional argument(s) passed to the method.
#'
#' @family SparkDataFrame functions
#' @rdname write.parquet
Expand All @@ -800,8 +821,9 @@ setMethod("write.orc",
#' @note write.parquet since 1.6.0
setMethod("write.parquet",
signature(x = "SparkDataFrame", path = "character"),
function(x, path) {
function(x, path, mode = "error", ...) {
write <- callJMethod(x@sdf, "write")
write <- setWriteOptions(write, mode = mode, ...)
invisible(callJMethod(write, "parquet", path))
})

Expand All @@ -825,6 +847,8 @@ setMethod("saveAsParquetFile",
#'
#' @param x A SparkDataFrame
#' @param path The directory where the file is saved
#' @param mode one of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default)
#' @param ... additional argument(s) passed to the method.
#'
#' @family SparkDataFrame functions
#' @aliases write.text,SparkDataFrame,character-method
Expand All @@ -841,8 +865,9 @@ setMethod("saveAsParquetFile",
#' @note write.text since 2.0.0
setMethod("write.text",
signature(x = "SparkDataFrame", path = "character"),
function(x, path) {
function(x, path, mode = "error", ...) {
write <- callJMethod(x@sdf, "write")
write <- setWriteOptions(write, mode = mode, ...)
invisible(callJMethod(write, "text", path))
})

Expand Down Expand Up @@ -2608,7 +2633,7 @@ setMethod("except",
#' @param ... additional argument(s) passed to the method.
#'
#' @family SparkDataFrame functions
#' @aliases write.df,SparkDataFrame,character-method
#' @aliases write.df,SparkDataFrame-method
#' @rdname write.df
#' @name write.df
#' @export
Expand All @@ -2622,21 +2647,25 @@ setMethod("except",
#' }
#' @note write.df since 1.4.0
setMethod("write.df",
signature(df = "SparkDataFrame", path = "character"),
function(df, path, source = NULL, mode = "error", ...) {
signature(df = "SparkDataFrame"),
function(df, path = NULL, source = NULL, mode = "error", ...) {
if (!is.null(path) && !is.character(path)) {
stop("path should be charactor, NULL or omitted.")
}
if (!is.null(source) && !is.character(source)) {
stop("source should be character, NULL or omitted. It is the datasource specified ",
"in 'spark.sql.sources.default' configuration by default.")
}
if (!is.character(mode)) {
stop("mode should be charactor or omitted. It is 'error' by default.")
}
if (is.null(source)) {
source <- getDefaultSqlSource()
}
jmode <- convertToJSaveMode(mode)
options <- varargsToEnv(...)
if (!is.null(path)) {
options[["path"]] <- path
}
write <- callJMethod(df@sdf, "write")
write <- callJMethod(write, "format", source)
write <- callJMethod(write, "mode", jmode)
write <- callJMethod(write, "options", options)
write <- callJMethod(write, "save", path)
write <- setWriteOptions(write, path = path, mode = mode, ...)
write <- handledCallJMethod(write, "save")
})

#' @rdname write.df
Expand Down Expand Up @@ -2691,7 +2720,7 @@ setMethod("saveAsTable",
source <- getDefaultSqlSource()
}
jmode <- convertToJSaveMode(mode)
options <- varargsToEnv(...)
options <- varargsToStrEnv(...)

write <- callJMethod(df@sdf, "write")
write <- callJMethod(write, "format", source)
Expand Down
42 changes: 30 additions & 12 deletions R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,7 @@ setMethod("toDF", signature(x = "RDD"),
#' It goes through the entire dataset once to determine the schema.
#'
#' @param path Path of file to read. A vector of multiple paths is allowed.
#' @param ... additional external data source specific named properties.
#' @return SparkDataFrame
#' @rdname read.json
#' @export
Expand All @@ -341,11 +342,13 @@ setMethod("toDF", signature(x = "RDD"),
#' @name read.json
#' @method read.json default
#' @note read.json since 1.6.0
read.json.default <- function(path) {
read.json.default <- function(path, ...) {
sparkSession <- getSparkSession()
options <- varargsToStrEnv(...)
# Allow the user to have a more flexible definiton of the text file path
paths <- as.list(suppressWarnings(normalizePath(path)))
read <- callJMethod(sparkSession, "read")
read <- callJMethod(read, "options", options)
sdf <- callJMethod(read, "json", paths)
dataFrame(sdf)
}
Expand Down Expand Up @@ -405,16 +408,19 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio = 1.0) {
#' Loads an ORC file, returning the result as a SparkDataFrame.
#'
#' @param path Path of file to read.
#' @param ... additional external data source specific named properties.
#' @return SparkDataFrame
#' @rdname read.orc
#' @export
#' @name read.orc
#' @note read.orc since 2.0.0
read.orc <- function(path) {
read.orc <- function(path, ...) {
sparkSession <- getSparkSession()
options <- varargsToStrEnv(...)
# Allow the user to have a more flexible definiton of the ORC file path
path <- suppressWarnings(normalizePath(path))
read <- callJMethod(sparkSession, "read")
read <- callJMethod(read, "options", options)
sdf <- callJMethod(read, "orc", path)
dataFrame(sdf)
}
Expand All @@ -430,11 +436,13 @@ read.orc <- function(path) {
#' @name read.parquet
#' @method read.parquet default
#' @note read.parquet since 1.6.0
read.parquet.default <- function(path) {
read.parquet.default <- function(path, ...) {
sparkSession <- getSparkSession()
options <- varargsToStrEnv(...)
# Allow the user to have a more flexible definiton of the Parquet file path
paths <- as.list(suppressWarnings(normalizePath(path)))
read <- callJMethod(sparkSession, "read")
read <- callJMethod(read, "options", options)
sdf <- callJMethod(read, "parquet", paths)
dataFrame(sdf)
}
Expand Down Expand Up @@ -467,6 +475,7 @@ parquetFile <- function(x, ...) {
#' Each line in the text file is a new row in the resulting SparkDataFrame.
#'
#' @param path Path of file to read. A vector of multiple paths is allowed.
#' @param ... additional external data source specific named properties.
#' @return SparkDataFrame
#' @rdname read.text
#' @export
Expand All @@ -479,11 +488,13 @@ parquetFile <- function(x, ...) {
#' @name read.text
#' @method read.text default
#' @note read.text since 1.6.1
read.text.default <- function(path) {
read.text.default <- function(path, ...) {
sparkSession <- getSparkSession()
options <- varargsToStrEnv(...)
# Allow the user to have a more flexible definiton of the text file path
paths <- as.list(suppressWarnings(normalizePath(path)))
read <- callJMethod(sparkSession, "read")
read <- callJMethod(read, "options", options)
sdf <- callJMethod(read, "text", paths)
dataFrame(sdf)
}
Expand Down Expand Up @@ -771,8 +782,15 @@ dropTempView <- function(viewName) {
#' @method read.df default
#' @note read.df since 1.4.0
read.df.default <- function(path = NULL, source = NULL, schema = NULL, na.strings = "NA", ...) {
if (!is.null(path) && !is.character(path)) {
stop("path should be charactor, NULL or omitted.")
}
if (!is.null(source) && !is.character(source)) {
stop("source should be character, NULL or omitted. It is the datasource specified ",
"in 'spark.sql.sources.default' configuration by default.")
}
sparkSession <- getSparkSession()
options <- varargsToEnv(...)
options <- varargsToStrEnv(...)
if (!is.null(path)) {
options[["path"]] <- path
}
Expand All @@ -784,16 +802,16 @@ read.df.default <- function(path = NULL, source = NULL, schema = NULL, na.string
}
if (!is.null(schema)) {
stopifnot(class(schema) == "structType")
sdf <- callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sparkSession, source,
schema$jobj, options)
sdf <- handledCallJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sparkSession,
source, schema$jobj, options)
} else {
sdf <- callJStatic("org.apache.spark.sql.api.r.SQLUtils",
"loadDF", sparkSession, source, options)
sdf <- handledCallJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sparkSession,
source, options)
}
dataFrame(sdf)
}

read.df <- function(x, ...) {
read.df <- function(x = NULL, ...) {
dispatchFunc("read.df(path = NULL, source = NULL, schema = NULL, ...)", x, ...)
}

Expand All @@ -805,7 +823,7 @@ loadDF.default <- function(path = NULL, source = NULL, schema = NULL, ...) {
read.df(path, source, schema, ...)
}

loadDF <- function(x, ...) {
loadDF <- function(x = NULL, ...) {
dispatchFunc("loadDF(path = NULL, source = NULL, schema = NULL, ...)", x, ...)
}

Expand Down Expand Up @@ -835,7 +853,7 @@ loadDF <- function(x, ...) {
#' @note createExternalTable since 1.4.0
createExternalTable.default <- function(tableName, path = NULL, source = NULL, ...) {
sparkSession <- getSparkSession()
options <- varargsToEnv(...)
options <- varargsToStrEnv(...)
if (!is.null(path)) {
options[["path"]] <- path
}
Expand Down
Loading