Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
658 commits
Select commit Hold shift + click to select a range
7197a7b
[SPARK-18993][BUILD] Unable to build/compile Spark in IntelliJ due to…
srowen Dec 28, 2016
80d583b
[SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming r…
tdas Dec 28, 2016
47ab4af
[SPARK-19003][DOCS] Add Java example in Spark Streaming Guide, sectio…
adesharatushar Dec 29, 2016
20ae117
[SPARK-19016][SQL][DOC] Document scalable partition handling
liancheng Dec 30, 2016
3483def
[SPARK-19050][SS][TESTS] Fix EventTimeWatermarkSuite 'delay in months…
zsxwing Jan 1, 2017
63857c8
[MINOR][DOC] Minor doc change for YARN credential providers
viirya Jan 2, 2017
517f398
[SPARK-18379][SQL] Make the parallelism of parallelPartitionDiscovery…
Nov 15, 2016
d489e1d
[SPARK-19041][SS] Fix code snippet compilation issues in Structured S…
lw-lin Jan 2, 2017
94272a9
[SPARK-19028][SQL] Fixed non-thread-safe functions used in SessionCat…
gatorsmile Dec 31, 2016
7762550
[SPARK-19048][SQL] Delete Partition Location when Dropping Managed Pa…
gatorsmile Jan 3, 2017
1ecf1a9
[SPARK-18877][SQL][BACKPORT-2.1] CSVInferSchema.inferField` on Decima…
dongjoon-hyun Jan 4, 2017
4ca1788
[SPARK-19033][CORE] Add admin acls for history server
jerryshao Jan 6, 2017
ce9bfe6
[SPARK-19083] sbin/start-history-server.sh script use of $@ without q…
Jan 6, 2017
ee735a8
[SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guid…
tdas Jan 6, 2017
86b6621
[SPARK-19110][ML][MLLIB] DistributedLDAModel returns different logPri…
wangmiao1981 Jan 7, 2017
c95b585
[SPARK-19106][DOCS] Styling for the configuration docs is broken
srowen Jan 7, 2017
ecc1622
[SPARK-18941][SQL][DOC] Add a new behavior document on `CREATE/DROP T…
dongjoon-hyun Jan 8, 2017
8690d4b
[SPARK-19127][DOCS] Update Rank Function Documentation
bllchmbrs Jan 9, 2017
8779e6a
[SPARK-19126][DOCS] Update Join Documentation Across Languages
bllchmbrs Jan 9, 2017
80a3e13
[SPARK-18903][SPARKR][BACKPORT-2.1] Add API to get SparkUI URL
felixcheung Jan 9, 2017
3b6ac32
[SPARK-18952][BACKPORT] Regex strings not properly escaped in codegen…
brkyvz Jan 9, 2017
65c866e
[SPARK-16845][SQL] `GeneratedClass$SpecificOrdering` grows beyond 64 KB
lw-lin Jan 10, 2017
69d1c4c
[SPARK-19137][SQL] Fix `withSQLConf` to reset `OptionalConfigEntry` c…
dongjoon-hyun Jan 10, 2017
e0af4b7
[SPARK-19113][SS][TESTS] Set UncaughtExceptionHandler in onQueryStart…
zsxwing Jan 10, 2017
81c9430
[SPARK-18997][CORE] Recommended upgrade libthrift to 0.9.3
srowen Jan 10, 2017
230607d
[SPARK-19140][SS] Allow update mode for non-aggregation streaming que…
zsxwing Jan 11, 2017
1022049
[SPARK-19133][SPARKR][ML][BACKPORT-2.1] fix glm for Gamma, clarify gl…
felixcheung Jan 11, 2017
82fcc13
[SPARK-19130][SPARKR] Support setting literal value as column implicitly
felixcheung Jan 11, 2017
0b07634
[SPARK-19158][SPARKR][EXAMPLES] Fix ml.R example fails due to lack of…
yanboliang Jan 12, 2017
9b9867e
[SPARK-18857][SQL] Don't use `Iterator.duplicate` for `incrementalCol…
dongjoon-hyun Jan 10, 2017
616a78a
[SPARK-18969][SQL] Support grouping by nondeterministic expressions
cloud-fan Jan 12, 2017
042e32d
[SPARK-19055][SQL][PYSPARK] Fix SparkSession initialization when Spar…
viirya Jan 12, 2017
23944d0
[SPARK-17237][SQL] Remove backticks in a pivot result schema
maropu Jan 12, 2017
0668e06
Fix missing close-parens for In filter's toString
ash211 Jan 13, 2017
b2c9a2c
[SPARK-18687][PYSPARK][SQL] Backward compatibility - creating a Dataf…
vijoshi Jan 13, 2017
2c2ca89
[SPARK-19178][SQL] convert string of large numbers to int should retu…
cloud-fan Jan 13, 2017
ee3642f
[SPARK-18335][SPARKR] createDataFrame to support numPartitions parameter
felixcheung Jan 13, 2017
5e9be1e
[SPARK-19180] [SQL] the offset of short should be 2 in OffHeapColumn
Jan 13, 2017
db37049
[SPARK-19120] Refresh Metadata Cache After Loading Hive Tables
gatorsmile Jan 15, 2017
bf2f233
[SPARK-19092][SQL][BACKPORT-2.1] Save() API of DataFrameWriter should…
gatorsmile Jan 16, 2017
4f3ce06
[SPARK-19082][SQL] Make ignoreCorruptFiles work for Parquet
viirya Jan 16, 2017
9758905
[SPARK-19232][SPARKR] Update Spark distribution download cache locati…
felixcheung Jan 16, 2017
f4317be
[SPARK-18905][STREAMING] Fix the issue of removing a failed jobset fr…
CodingCat Jan 17, 2017
2ff3669
[SPARK-19019] [PYTHON] Fix hijacked `collections.namedtuple` and port…
HyukjinKwon Jan 17, 2017
13986a7
[SPARK-19065][SQL] Don't inherit expression id in dropDuplicates
zsxwing Jan 17, 2017
3ec3e3f
[SPARK-19129][SQL] SessionCatalog: Disallow empty part col values in …
gatorsmile Jan 17, 2017
29b954b
[SPARK-19066][SPARKR][BACKPORT-2.1] LDA doesn't set optimizer correctly
wangmiao1981 Jan 18, 2017
77202a6
[SPARK-19231][SPARKR] add error handling for download and untar for S…
felixcheung Jan 18, 2017
047506b
[SPARK-19113][SS][TESTS] Ignore StreamingQueryException thrown from a…
zsxwing Jan 18, 2017
4cff0b5
[SPARK-19168][STRUCTURED STREAMING] StateStore should be aborted upon…
lw-lin Jan 18, 2017
7bc3e9b
[SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor the error check…
cloud-fan Dec 20, 2016
482d361
[SPARK-19314][SS][CATALYST] Do not allow sort before aggregation in S…
tdas Jan 20, 2017
4d286c9
[SPARK-18589][SQL] Fix Python UDF accessing attributes from both side…
Jan 21, 2017
6f0ad57
[SPARK-19267][SS] Fix a race condition when stopping StateStore
zsxwing Jan 21, 2017
8daf10e
[SPARK-19155][ML] MLlib GeneralizedLinearRegression family and link s…
yanboliang Jan 22, 2017
1e07a71
[SPARK-19155][ML] Make family case insensitive in GLM
actuaryzhang Jan 23, 2017
ed5d1e7
[SPARK-19306][CORE] Fix inconsistent state in DiskBlockObject when ex…
jerryshao Jan 23, 2017
4a2be09
[SPARK-9435][SQL] Reuse function in Java UDF to correctly support exp…
HyukjinKwon Jan 24, 2017
570e5e1
[SPARK-19268][SS] Disallow adaptive query execution for streaming que…
zsxwing Jan 24, 2017
9c04e42
[SPARK-18823][SPARKR] add support for assigning to column
felixcheung Jan 24, 2017
d128b6a
[SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm failing in edge case
imatiach-msft Jan 23, 2017
b94fb28
[SPARK-19017][SQL] NOT IN subquery with more than one column may retu…
nsyca Jan 24, 2017
c133787
[SPARK-19330][DSTREAMS] Also show tooltip for successful batches
lw-lin Jan 25, 2017
e2f7739
[SPARK-16046][DOCS] Aggregations in the Spark SQL programming guide
Jan 25, 2017
f391ad2
[SPARK-18750][YARN] Avoid using "mapValues" when allocating containers.
Jan 25, 2017
af95455
[SPARK-18863][SQL] Output non-aggregate expressions without GROUP BY …
nsyca Jan 25, 2017
c9f075a
[SPARK-19307][PYSPARK] Make sure user conf is propagated to SparkCont…
Jan 25, 2017
97d3353
[SPARK-18750][YARN] Follow up: move test to correct directory in 2.1 …
Jan 25, 2017
a5c10ff
[SPARK-19064][PYSPARK] Fix pip installing of sub components
holdenk Jan 25, 2017
0d7e385
[SPARK-14804][SPARK][GRAPHX] Fix checkpointing of VertexRDD/EdgeRDD
tdas Jan 26, 2017
b12a76a
[SPARK-19338][SQL] Add UDF names in explain
maropu Jan 26, 2017
59502bb
[SPARK-19220][UI] Make redirection to HTTPS apply to all URIs. (branc…
Jan 27, 2017
ba2a5ad
[SPARK-18788][SPARKR] Add API for getNumPartitions
felixcheung Jan 27, 2017
4002ee9
[SPARK-19333][SPARKR] Add Apache License headers to R files
felixcheung Jan 27, 2017
9a49f9a
[SPARK-19324][SPARKR] Spark VJM stdout output is getting dropped in S…
felixcheung Jan 27, 2017
445438c
[SPARK-19396][DOC] JDBC Options are Case In-sensitive
gatorsmile Jan 30, 2017
07a1788
[SPARK-19406][SQL] Fix function to_json to respect user-provided options
gatorsmile Jan 31, 2017
e43f161
[BACKPORT-2.1][SPARKR][DOCS] update R API doc for subset/extract
felixcheung Jan 31, 2017
d35a126
[SPARK-19378][SS] Ensure continuity of stateOperator and eventTime me…
brkyvz Feb 1, 2017
61cdc8c
[SPARK-19410][DOC] Fix brokens links in ml-pipeline and ml-tuning
zhengruifeng Feb 1, 2017
f946464
[SPARK-19377][WEBUI][CORE] Killed tasks should have the status as KILLED
Feb 1, 2017
7c23bd4
[SPARK-19432][CORE] Fix an unexpected failure when connecting timeout
zsxwing Feb 2, 2017
f55bd4c
[SPARK-19472][SQL] Parser should not mistake CASE WHEN(...) for a fun…
hvanhovell Feb 6, 2017
62fab5b
[SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting…
uncleGen Feb 7, 2017
dd1abef
[SPARK-19444][ML][DOCUMENTATION] Fix imports not being present in doc…
anshbansal Feb 7, 2017
e642a07
[SPARK-18682][SS] Batch Source for Kafka
Feb 7, 2017
706d6c1
[SPARK-19499][SS] Add more notes in the comments of Sink.addBatch()
CodingCat Feb 8, 2017
4d04029
[MINOR][DOC] Remove parenthesis in readStream() on kafka structured s…
manugarri Feb 8, 2017
71b6eac
[SPARK-18609][SPARK-18841][SQL][BACKPORT-2.1] Fix redundant Alias rem…
hvanhovell Feb 8, 2017
502c927
[SPARK-19413][SS] MapGroupsWithState for arbitrary stateful operation…
tdas Feb 8, 2017
b3fd36a
[SPARK-19481] [REPL] [MAVEN] Avoid to leak SparkContext in Signaling.…
zsxwing Feb 9, 2017
a3d5300
[SPARK-19509][SQL] Grouping Sets do not respect nullable grouping col…
Feb 9, 2017
ff5818b
[SPARK-19512][BACKPORT-2.1][SQL] codegen for compare structs fails #1…
bogdanrdc Feb 10, 2017
7b5ea00
[SPARK-19543] from_json fails when the input row is empty
brkyvz Feb 10, 2017
e580bb0
[SPARK-18717][SQL] Make code generation for Scala Map work with immut…
aray Dec 13, 2016
173c238
[SPARK-19342][SPARKR] bug fixed in collect method for collecting time…
titicaca Feb 12, 2017
06e77e0
[SPARK-19319][BACKPORT-2.1][SPARKR] SparkR Kmeans summary returns err…
wangmiao1981 Feb 12, 2017
fe4fcc5
[SPARK-19564][SPARK-19559][SS][KAFKA] KafkaOffsetReader's consumers s…
lw-lin Feb 13, 2017
a3b6751
[SPARK-19574][ML][DOCUMENTATION] Fix Liquid Exception: Start indices …
gatorsmile Feb 13, 2017
ef4fb7e
[SPARK-19506][ML][PYTHON] Import warnings in pyspark.ml.util
zero323 Feb 13, 2017
c5a7cb0
[SPARK-19542][SS] Delete the temp checkpoint if a query is stopped wi…
zsxwing Feb 13, 2017
328b229
[SPARK-17714][CORE][TEST-MAVEN][TEST-HADOOP2.6] Avoid using ExecutorC…
zsxwing Feb 13, 2017
2968d8c
[HOTFIX][SPARK-19542][SS]Fix the missing import in DataStreamReaderWr…
zsxwing Feb 13, 2017
5db2347
[SPARK-19529] TransportClientFactory.createClient() shouldn't call aw…
JoshRosen Feb 13, 2017
7fe3543
[SPARK-19520][STREAMING] Do not encrypt data written to the WAL.
Feb 13, 2017
c8113b0
[SPARK-19585][DOC][SQL] Fix the cacheTable and uncacheTable api call …
skambha Feb 14, 2017
f837ced
[SPARK-19501][YARN] Reduce the number of HDFS RPCs during YARN deploy…
jongwook Feb 14, 2017
7763b0b
[SPARK-19387][SPARKR] Tests do not run with SparkR source package in …
felixcheung Feb 14, 2017
8ee4ec8
[SPARK-19584][SS][DOCS] update structured streaming documentation aro…
Feb 15, 2017
6c35399
[SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column
felixcheung Feb 15, 2017
88c43f4
[SPARK-19599][SS] Clean up HDFSMetadataLog
zsxwing Feb 16, 2017
b9ab4c0
[SPARK-19604][TESTS] Log the start of every Python test
yhuai Feb 15, 2017
db7adb6
[SPARK-19603][SS] Fix StreamingQuery explain command
zsxwing Feb 16, 2017
252dd05
[SPARK-19399][SPARKR][BACKPORT-2.1] fix tests broken by merge
felixcheung Feb 16, 2017
55958bc
[SPARK-19622][WEBUI] Fix a http error in a paged table when using a `…
stanzhai Feb 17, 2017
6e3abed
[SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMap
Feb 17, 2017
b083ec5
[SPARK-19517][SS] KafkaSource fails to initialize partition offsets
vitillo Feb 17, 2017
7c371de
[SPARK-19646][CORE][STREAMING] binaryRecords replicates records in sc…
srowen Feb 20, 2017
c331674
[SPARK-19646][BUILD][HOTFIX] Fix compile error from cherry-pick of SP…
srowen Feb 20, 2017
6edf02a
[SPARK-19626][YARN] Using the correct config to set credentials updat…
yaooqinn Feb 21, 2017
9a890b5
[SPARK-19617][SS] Fix the race condition when starting and stopping a…
zsxwing Feb 22, 2017
21afc45
[SPARK-19652][UI] Do auth checks for REST API access (branch-2.1).
Feb 22, 2017
d30238f
[SPARK-19682][SPARKR] Issue warning (or error) when subset method "[[…
actuaryzhang Feb 23, 2017
43084b3
[SPARK-19459][SQL][BRANCH-2.1] Support for nested char/varchar fields…
hvanhovell Feb 23, 2017
66a7ca2
[SPARK-19691][SQL][BRANCH-2.1] Fix ClassCastException when calculatin…
maropu Feb 24, 2017
6da6a27
[SPARK-19707][CORE] Improve the invalid path check for sc.addJar
jerryshao Feb 24, 2017
ed9aaa3
[SPARK-19038][YARN] Avoid overwriting keytab configuration in yarn-cl…
jerryshao Feb 24, 2017
97866e1
[MINOR][DOCS] Fixes two problems in the SQL programing guide page
boazmohar Feb 25, 2017
20a4329
[SPARK-14772][PYTHON][ML] Fixed Params.copy method to match Scala imp…
BryanCutler Feb 26, 2017
04fbb9e
[SPARK-19594][STRUCTURED STREAMING] StreamingQueryListener fails to h…
Feb 26, 2017
4b4c3bf
[SPARK-19748][SQL] refresh function has a wrong order to do cache inv…
windpiger Feb 28, 2017
947c0cd
[SPARK-19677][SS] Committing a delta file atop an existing one should…
vitillo Feb 28, 2017
d887f75
[SPARK-19769][DOCS] Update quickstart instructions
elmiko Feb 28, 2017
f719ccc
[SPARK-19572][SPARKR] Allow to disable hive in sparkR shell
zjffdu Mar 1, 2017
bbe0d8c
[SPARK-19766][SQL] Constant alias columns in INNER JOIN should not be…
stanzhai Mar 1, 2017
27347b5
[SPARK-19373][MESOS] Base spark.scheduler.minRegisteredResourceRatio …
Mar 1, 2017
3a7591a
[SPARK-19750][UI][BRANCH-2.1] Fix redirect issue from http to https
jerryshao Mar 3, 2017
1237aae
[SPARK-19779][SS] Delete needless tmp file after restart structured s…
gf53520 Mar 3, 2017
accbed7
[SPARK-19797][DOC] ML pipeline document correction
ymwdalex Mar 3, 2017
da04d45
[SPARK-19774] StreamExecution should call stop() on sources when a st…
brkyvz Mar 3, 2017
664c979
[SPARK-19816][SQL][TESTS] Fix an issue that DataFrameCallbackSuite do…
zsxwing Mar 4, 2017
ca7a7e8
[SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should…
uncleGen Mar 6, 2017
fd6c6d5
[SPARK-19719][SS] Kafka writer for both structured streaming and batc…
Mar 7, 2017
711addd
[SPARK-19561] [PYTHON] cast TimestampType.toInternal output to long
Mar 7, 2017
551b7bd
[SPARK-19857][YARN] Correctly calculate next credential update time.
Mar 8, 2017
cbc3700
Revert "[SPARK-19561] [PYTHON] cast TimestampType.toInternal output t…
cloud-fan Mar 8, 2017
3b648a6
[SPARK-19859][SS] The new watermark should override the old one
zsxwing Mar 8, 2017
0ba9ecb
[SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe
BryanCutler Mar 8, 2017
320eff1
[SPARK-18055][SQL] Use correct mirror in ExpresionEncoder
marmbrus Mar 8, 2017
f6c1ad2
[SPARK-19813] maxFilesPerTrigger combo latestFirst may miss old files…
brkyvz Mar 8, 2017
3457c32
Revert "[SPARK-19413][SS] MapGroupsWithState for arbitrary stateful o…
zsxwing Mar 8, 2017
78cc572
[MINOR][SQL] The analyzer rules are fired twice for cases when Analys…
dilipbiswal Mar 9, 2017
00859e1
[SPARK-19874][BUILD] Hide API docs for org.apache.spark.sql.internal
zsxwing Mar 9, 2017
0c140c1
[SPARK-19859][SS][FOLLOW-UP] The new watermark should override the ol…
uncleGen Mar 9, 2017
2a76e24
[SPARK-19561][SQL] add int case handling for TimestampType
Mar 9, 2017
ffe65b0
[SPARK-19861][SS] watermark should not be a negative time.
uncleGen Mar 9, 2017
a59cc36
[SPARK-19886] Fix reportDataLoss if statement in SS KafkaSource
brkyvz Mar 10, 2017
f0d50fd
[SPARK-19891][SS] Await Batch Lock notified on stream execution exit
Mar 10, 2017
5a2ad43
[SPARK-19893][SQL] should not run DataFrame set oprations with map type
cloud-fan Mar 11, 2017
e481a73
[SPARK-19611][SQL] Introduce configurable table schema inference
Mar 11, 2017
f9833c6
[DOCS][SS] fix structured streaming python example
uncleGen Mar 12, 2017
8c46080
[SPARK-19853][SS] uppercase kafka topics fail when startingOffsets ar…
uncleGen Mar 13, 2017
4545782
[SPARK-19933][SQL] Do not change output of a subquery
hvanhovell Mar 14, 2017
a0ce845
[SPARK-19887][SQL] dynamic partition keys can be null or empty string
cloud-fan Mar 15, 2017
80ebca6
[SPARK-19944][SQL] Move SQLConf from sql/core to sql/catalyst (branch…
rxin Mar 15, 2017
0622546
[SPARK-19872] [PYTHON] Use the correct deserializer for RDD construct…
HyukjinKwon Mar 15, 2017
9d032d0
[SPARK-19329][SQL][BRANCH-2.1] Reading from or writing to a datasourc…
windpiger Mar 16, 2017
4b977ff
[SPARK-19765][SPARK-18549][SPARK-19093][SPARK-19736][BACKPORT-2.1][SQ…
gatorsmile Mar 17, 2017
710b555
[SPARK-19721][SS][BRANCH-2.1] Good error message for version mismatch…
lw-lin Mar 17, 2017
5fb7083
[SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests mor…
zsxwing Mar 17, 2017
780f606
[SQL][MINOR] Fix scaladoc for UDFRegistration
jaceklaskowski Mar 18, 2017
b60f690
[SPARK-18817][SPARKR][SQL] change derby log output to temp dir
felixcheung Mar 19, 2017
af8bf21
[SPARK-19994][SQL] Wrong outputOrdering for right/full outer smj
Mar 20, 2017
d205d40
[SPARK-17204][CORE] Fix replicated off heap storage
Mar 21, 2017
c4c7b18
[SPARK-19912][SQL] String literals should be escaped for Hive metasto…
dongjoon-hyun Mar 21, 2017
a88c88a
[SPARK-20017][SQL] change the nullability of function 'StringToMap' f…
zhaorongsheng Mar 21, 2017
5c18b6c
[SPARK-19237][SPARKR][CORE] On Windows spark-submit should handle whe…
felixcheung Mar 21, 2017
9dfdd2a
clarify array_contains function description
lwwmanning Mar 21, 2017
a04428f
[SPARK-19980][SQL][BACKPORT-2.1] Add NULL checks in Bean serializer
maropu Mar 22, 2017
30abb95
Preparing Spark release v2.1.1-rc1
pwendell Mar 22, 2017
c4d2b83
Preparing development version 2.1.2-SNAPSHOT
pwendell Mar 22, 2017
277ed37
[SPARK-19925][SPARKR] Fix SparkR spark.getSparkFiles fails when it wa…
yanboliang Mar 22, 2017
56f997f
[SPARK-20021][PYSPARK] Miss backslash in python code
uncleGen Mar 22, 2017
af960e8
[SPARK-19970][SQL][BRANCH-2.1] Table owner should be USER instead of …
dongjoon-hyun Mar 23, 2017
92f0b01
[SPARK-19959][SQL] Fix to throw NullPointerException in df[java.lang…
kiszk Mar 24, 2017
d989434
[SPARK-19674][SQL] Ignore driver accumulator updates don't belong to …
carsonwang Mar 25, 2017
b6d348e
[SPARK-20086][SQL] CollapseWindow should not collapse dependent adjac…
hvanhovell Mar 26, 2017
4056191
[SPARK-20102] Fix nightly packaging and RC packaging scripts w/ two m…
JoshRosen Mar 27, 2017
4bcb7d6
[SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuin…
jerryshao Mar 28, 2017
fd2e406
[SPARK-20125][SQL] Dataset of type option of map does not work
cloud-fan Mar 28, 2017
e669dd7
[SPARK-14536][SQL][BACKPORT-2.1] fix to handle null value in array ty…
sureshthalamati Mar 28, 2017
02b165d
Preparing Spark release v2.1.1-rc2
pwendell Mar 28, 2017
4964dbe
Preparing development version 2.1.2-SNAPSHOT
pwendell Mar 28, 2017
3095480
[SPARK-20043][ML] DecisionTreeModel: ImpurityCalculator builder fails…
facaiy Mar 28, 2017
f8c1b3e
[SPARK-20134][SQL] SQLMetrics.postDriverMetricUpdates to simplify dri…
rxin Mar 29, 2017
103ff54
[SPARK-20059][YARN] Use the correct classloader for HBaseCredentialPr…
jerryshao Mar 29, 2017
6a1b2eb
[SPARK-20164][SQL] AnalysisException not tolerant of null query plan.
kunalkhamar Mar 31, 2017
e3cec18
[SPARK-20084][CORE] Remove internal.metrics.updatedBlockStatuses from…
rdblue Mar 31, 2017
968eace
[SPARK-19999][BACKPORT-2.1][CORE] Workaround JDK-8165231 to identify …
kiszk Apr 2, 2017
ca14410
[SPARK-20197][SPARKR][BRANCH-2.1] CRAN check fail with package instal…
felixcheung Apr 3, 2017
77700ea
[MINOR][DOCS] Replace non-breaking space to normal spaces that breaks…
HyukjinKwon Apr 3, 2017
f9546da
[SPARK-20190][APP-ID] applications//jobs' in rest api,status should b…
Apr 4, 2017
00c1248
[SPARK-20191][YARN] Crate wrapper for RackResolver so tests can overr…
Apr 4, 2017
efc72dc
[SPARK-20042][WEB UI] Fix log page buttons for reverse proxy mode
okoethibm Apr 5, 2017
2b85e05
[SPARK-20223][SQL] Fix typo in tpcds q77.sql
Apr 5, 2017
fb81a41
[SPARK-20214][ML] Make sure converted csc matrix has sorted indices
viirya Apr 6, 2017
7791120
[SPARK-20218][DOC][APP-ID] applications//stages' in REST API,add desc…
Apr 7, 2017
fc242cc
[SPARK-20246][SQL] should not push predicate down through aggregate w…
cloud-fan Apr 8, 2017
658b358
[SPARK-20262][SQL] AssertNotNull should throw NullPointerException
rxin Apr 8, 2017
43a7fca
[SPARK-20260][MLLIB] String interpolation required for error message
Apr 9, 2017
1a73046
[SPARK-20264][SQL] asm should be non-test dependency in sql/core
rxin Apr 10, 2017
bc7304e
[SPARK-20280][CORE] FileStatusCache Weigher integer overflow
bogdanrdc Apr 10, 2017
489c1f3
[SPARK-20285][TESTS] Increase the pyspark streaming test timeout to 3…
zsxwing Apr 10, 2017
b26f2c2
[SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values …
Dec 6, 2016
f40e44d
[SPARK-20270][SQL] na.fill should not change the values in long or in…
Apr 10, 2017
8eb71b8
[SPARK-17564][TESTS] Fix flaky RequestTimeoutIntegrationSuite.further…
zsxwing Apr 11, 2017
03a42c0
[SPARK-18555][MINOR][SQL] Fix the @since tag when backporting from 2.…
dbtsai Apr 11, 2017
46e212d
[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to N…
Apr 12, 2017
b2970d9
[MINOR][DOCS] Fix spacings in Structured Streaming Programming Guide
dongjinleekr Apr 12, 2017
dbb6d1b
[SPARK-20296][TRIVIAL][DOCS] Count distinct error message for streaming
jtoka Apr 12, 2017
7e0ddda
[SPARK-20304][SQL] AssertNotNull should not include path in string re…
rxin Apr 12, 2017
be36c2f
[SPARK-20131][CORE] Don't use `this` lock in StandaloneSchedulerBacke…
zsxwing Apr 13, 2017
98ae548
[SPARK-19924][SQL][BACKPORT-2.1] Handle InvocationTargetException for…
gatorsmile Apr 13, 2017
bca7ce2
[SPARK-19946][TESTS][BACKPORT-2.1] DebugFilesystem.assertNoOpenStream…
bogdanrdc Apr 13, 2017
6f715c0
[SPARK-20243][TESTS] DebugFilesystem.assertNoOpenStreams thread race
bogdanrdc Apr 10, 2017
2ed19cf
Preparing Spark release v2.1.1-rc3
pwendell Apr 14, 2017
2a3e50e
Preparing development version 2.1.2-SNAPSHOT
pwendell Apr 14, 2017
efa11a4
[SPARK-20335][SQL][BACKPORT-2.1] Children expressions of Hive UDF imp…
gatorsmile Apr 17, 2017
7aad057
[SPARK-20349][SQL] ListFunctions returns duplicate functions after us…
gatorsmile Apr 17, 2017
db9517c
[SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patterns.
jodersky Apr 17, 2017
622d7a8
[HOTFIX] Fix compilation.
rxin Apr 17, 2017
3808b47
[SPARK-20349][SQL][REVERT-BRANCH2.1] ListFunctions returns duplicate …
gatorsmile Apr 18, 2017
a4c1ebc
[SPARK-17647][SQL][FOLLOWUP][MINOR] fix typo
felixcheung Apr 18, 2017
171bf65
[SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin …
koertkuipers Apr 19, 2017
9e5dc82
[MINOR][SS] Fix a missing space in UnsupportedOperationChecker error …
zsxwing Apr 20, 2017
66e7a8f
[SPARK-20409][SQL] fail early if aggregate function in GROUP BY
cloud-fan Apr 20, 2017
fb0351a
Small rewording about history server use case
dud225 Apr 21, 2017
ba50580
[SPARK-20407][TESTS][BACKPORT-2.1] ParquetQuerySuite 'Enabling/disabl…
bogdanrdc Apr 22, 2017
d99b49b
[SPARK-20450][SQL] Unexpected first-query schema inference cost with …
ericl Apr 24, 2017
4279665
[SPARK-20451] Filter out nested mapType datatypes from sort order in …
sameeragarwal Apr 25, 2017
65990fc
[SPARK-20455][DOCS] Fix Broken Docker IT Docs
original-brownbear Apr 25, 2017
2d47e1a
[SPARK-20404][CORE] Using Option(name) instead of Some(name)
szhem Apr 25, 2017
359382c
[SPARK-20239][CORE][2.1-BACKPORT] Improve HistoryServer's ACL mechanism
jerryshao Apr 25, 2017
267aca5
Preparing Spark release v2.1.1-rc4
pwendell Apr 25, 2017
8460b09
Preparing development version 2.1.2-SNAPSHOT
pwendell Apr 25, 2017
6696ad0
[SPARK-20439][SQL][BACKPORT-2.1] Fix Catalog API listTables and getTa…
gatorsmile Apr 26, 2017
5131b0a
[SPARK-20496][SS] Bug in KafkaWriter Looks at Unanalyzed Plans
Apr 28, 2017
868b4a1
[SPARK-20517][UI] Fix broken history UI download link
jerryshao May 1, 2017
5915588
[SPARK-20540][CORE] Fix unstable executor requests.
rdblue May 1, 2017
d10b0f6
[SPARK-20558][CORE] clear InheritableThreadLocal variables in SparkCo…
cloud-fan May 3, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.
Please review http://spark.apache.org/contributing.html before opening a pull request.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ project/plugins/project/build.properties
project/plugins/src_managed/
project/plugins/target/
python/lib/pyspark.zip
python/deps
python/pyspark/python
reports/
scalastyle-on-compile.generated.xml
scalastyle-output.xml
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
## Contributing to Spark

*Before opening a pull request*, review the
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
[Contributing to Spark guide](http://spark.apache.org/contributing.html).
It lists steps that are required before creating a PR. In particular, consider:

- Is the change important and ready enough to ask the community to spend time reviewing?
- Have you searched for existing, related JIRAs and pull requests?
- Is this a new feature that can stand alone as a [third party project](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects) ?
- Is this a new feature that can stand alone as a [third party project](http://spark.apache.org/third-party-projects.html) ?
- Is the change being proposed clearly explained and motivated?

When you contribute code, you affirm that the contribution is your original work and that you
Expand Down
3 changes: 0 additions & 3 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr.
This product includes/uses ASM (http://asm.ow2.org/),
Copyright (c) 2000-2007 INRIA, France Telecom.

This product includes/uses org.json (http://www.json.org/java/index.html),
Copyright (c) 2002 JSON.org

This product includes/uses JLine (http://jline.sourceforge.net/),
Copyright (c) 2002-2006, Marc Prud'hommeaux <mwp1@cornell.edu>.

Expand Down
91 changes: 91 additions & 0 deletions R/CRAN_RELEASE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# SparkR CRAN Release

To release SparkR as a package to CRAN, we would use the `devtools` package. Please work with the
`dev@spark.apache.org` community and R package maintainer on this.

### Release

First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.

Note that while `run-tests.sh` runs `check-cran.sh` (which runs `R CMD check`), it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release. Also note that for CRAN checks for pdf vignettes to success, `qpdf` tool must be there (to install it, eg. `yum -q -y install qpdf`).

To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.

Once everything is in place, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::release(); .libPaths(paths)
```

For more information please refer to http://r-pkgs.had.co.nz/release.html#release-check

### Testing: build package manually

To build package manually such as to inspect the resulting `.tar.gz` file content, we would also use the `devtools` package.

Source package is what get released to CRAN. CRAN would then build platform-specific binary packages from the source package.

#### Build source package

To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg"); .libPaths(paths)
```

(http://r-pkgs.had.co.nz/vignettes.html#vignette-workflow-2)

Similarly, the source package is also created by `check-cran.sh` with `R CMD build pkg`.

For example, this should be the content of the source package:

```sh
DESCRIPTION R inst tests
NAMESPACE build man vignettes

inst/doc/
sparkr-vignettes.html
sparkr-vignettes.Rmd
sparkr-vignettes.Rman

build/
vignette.rds

man/
*.Rd files...

vignettes/
sparkr-vignettes.Rmd
```

#### Test source package

To install, run this:

```sh
R CMD INSTALL SparkR_2.1.0.tar.gz
```

With "2.1.0" replaced with the version of SparkR.

This command installs SparkR to the default libPaths. Once that is done, you should be able to start R and run:

```R
library(SparkR)
vignette("sparkr-vignettes", package="SparkR")
```

#### Build binary package

To build binary package locally, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg", binary = TRUE); .libPaths(paths)
```

For example, this should be the content of the binary package:

```sh
DESCRIPTION Meta R html tests
INDEX NAMESPACE help profile worker
```
10 changes: 5 additions & 5 deletions R/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ SparkR is an R package that provides a light-weight frontend to use Spark from R

Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
Example:
Example:
```bash
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
export R_HOME=/home/username/R
Expand Down Expand Up @@ -46,19 +46,19 @@ Sys.setenv(SPARK_HOME="/Users/username/spark")
# This line loads SparkR from the installed directory
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
sparkR.session()
```

#### Making changes to SparkR

The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
The [instructions](http://spark.apache.org/contributing.html) for making contributions to Spark also apply to SparkR.
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
Expand Down
50 changes: 44 additions & 6 deletions R/check-cran.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,30 @@ if [ ! -z "$R_HOME" ]
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"

# Build the latest docs
# Install the package (this is required for code in vignettes to run when building it later)
# Build the latest docs, but not vignettes, which is built with the package next
$FWDIR/create-docs.sh

# Build a zip file containing the source package
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
# Build source package with vignettes
SPARK_HOME="$(cd "${FWDIR}"/..; pwd)"
. "${SPARK_HOME}"/bin/load-spark-env.sh
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

if [ -d "$SPARK_JARS_DIR" ]; then
# Build a zip file containing the source package with vignettes
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Error Spark JARs not found in $SPARK_HOME"
exit 1
fi

# Run check as-cran.
VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
Expand All @@ -54,11 +71,32 @@ fi

if [ -n "$NO_MANUAL" ]
then
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual --no-vignettes"
fi

echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"

"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
if [ -n "$NO_TESTS" ] && [ -n "$NO_MANUAL" ]
then
"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
else
# This will run tests and/or build vignettes, and require SPARK_HOME
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
fi

# Install source package to get it to generate vignettes rds files, etc.
if [ -n "$CLEAN_INSTALL" ]
then
echo "Removing lib path and installing from source package"
LIB_DIR="$FWDIR/lib"
rm -rf $LIB_DIR
mkdir -p $LIB_DIR
"$R_SCRIPT_PATH/"R CMD INSTALL SparkR_"$VERSION".tar.gz --library=$LIB_DIR

# Zip the SparkR package so that it can be distributed to worker nodes on YARN
pushd $LIB_DIR > /dev/null
jar cfM "$LIB_DIR/sparkr.zip" SparkR
popd > /dev/null
fi

popd > /dev/null
19 changes: 1 addition & 18 deletions R/create-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# Script to create API docs and vignettes for SparkR
# This requires `devtools`, `knitr` and `rmarkdown` to be installed on the machine.

# After running this script the html docs can be found in
# After running this script the html docs can be found in
# $SPARK_HOME/R/pkg/html
# The vignettes can be found in
# $SPARK_HOME/R/pkg/vignettes/sparkr_vignettes.html
Expand Down Expand Up @@ -52,21 +52,4 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); library(knit

popd

# Find Spark jars.
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

# Only create vignettes if Spark JARs exist
if [ -d "$SPARK_JARS_DIR" ]; then
# render creates SparkR vignettes
Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME"
fi

popd
2 changes: 1 addition & 1 deletion R/install-dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ if [ ! -z "$R_HOME" ]
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"

# Generate Rd files if devtools is installed
"$R_SCRIPT_PATH/"Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtools); devtools::document(pkg="./pkg", roclets=c("rd")) }'
Expand Down
3 changes: 3 additions & 0 deletions R/pkg/.Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
^.*\.Rproj$
^\.Rproj\.user$
^\.lintr$
^cran-comments\.md$
^NEWS\.md$
^README\.Rmd$
^src-native$
^html$
12 changes: 7 additions & 5 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,26 +1,27 @@
Package: SparkR
Type: Package
Version: 2.1.2
Title: R Frontend for Apache Spark
Version: 2.0.0
Date: 2016-08-27
Description: The SparkR package provides an R Frontend for Apache Spark.
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "shivaram@cs.berkeley.edu"),
person("Xiangrui", "Meng", role = "aut",
email = "meng@databricks.com"),
person("Felix", "Cheung", role = "aut",
email = "felixcheung@apache.org"),
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
License: Apache License (== 2.0)
URL: http://www.apache.org/ http://spark.apache.org/
BugReports: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports
BugReports: http://spark.apache.org/contributing.html
Depends:
R (>= 3.0),
methods
Suggests:
knitr,
rmarkdown,
testthat,
e1071,
survival
Description: The SparkR package provides an R frontend for Apache Spark.
License: Apache License (== 2.0)
Collate:
'schema.R'
'generics.R'
Expand Down Expand Up @@ -48,3 +49,4 @@ Collate:
'utils.R'
'window.R'
RoxygenNote: 5.0.1
VignetteBuilder: knitr
Loading