Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
208 commits
Select commit Hold shift + click to select a range
601a237
[SPARK-9825][YARN] Do not overwrite final Hadoop config entries.
Jul 14, 2017
2d968a0
[SPARK-21421][SS] Add the query id as a local property to allow sourc…
zsxwing Jul 14, 2017
ac5d5d7
[SPARK-21344][SQL] BinaryType comparison does signed byte array compa…
kiszk Jul 15, 2017
74ac1fb
[SPARK-21267][DOCS][MINOR] Follow up to avoid referencing programming…
srowen Jul 15, 2017
69e5282
[SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula should handle invalid f…
yanboliang Jul 15, 2017
fd52a74
[SPARK-19810][SPARK-19810][MINOR][FOLLOW-UP] Follow-ups from to remov…
srowen Jul 17, 2017
e398c28
[SPARK-21354][SQL] INPUT FILE related functions do not support more t…
gatorsmile Jul 17, 2017
4ce735e
[SPARK-21394][SPARK-21432][PYTHON] Reviving callable object/partial f…
HyukjinKwon Jul 17, 2017
7047f49
[SPARK-21221][ML] CrossValidator and TrainValidationSplit Persist Nes…
ajaysaini725 Jul 17, 2017
0e07a29
[SPARK-21321][SPARK CORE] Spark very verbose on shutdown
Jul 17, 2017
5346507
[SPARK-21377][YARN] Make jars specify with --jars/--packages load-abl…
jerryshao Jul 17, 2017
9d8c831
[SPARK-21409][SS] Expose state store memory usage in SQL metrics and …
tdas Jul 17, 2017
a8c6d0f
[MINOR] Improve SQLConf messages
gatorsmile Jul 18, 2017
7aac755
[SPARK-21410][CORE] Create less partitions for RangePartitioner if RD…
apapi Jul 18, 2017
e9faae1
[SPARK-21409][SS] Follow up PR to allow different types of custom met…
tdas Jul 18, 2017
5952ad2
[SPARK-21444] Be more defensive when removing broadcasts in MapOutput…
JoshRosen Jul 18, 2017
0be5fb4
[SPARK-21332][SQL] Incorrect result type inferred for some decimal ex…
Jul 18, 2017
26cd2ca
[SPARK-21445] Make IntWrapper and LongWrapper in UTF8String Serializable
brkyvz Jul 18, 2017
e26dac5
[SPARK-21415] Triage scapegoat warnings, part 1
srowen Jul 18, 2017
d3f4a21
[SPARK-15526][ML][FOLLOWUP] Make JPMML provided scope to avoid includ…
srowen Jul 18, 2017
cde64ad
[SPARK-21411][YARN] Lazily create FS within kerberized UGI to avoid t…
jerryshao Jul 18, 2017
264b0f3
[SPARK-21408][CORE] Better default number of RPC dispatch threads.
Jul 18, 2017
f18b905
[SPARK-21457][SQL] ExternalCatalog.listPartitions should correctly ha…
cloud-fan Jul 18, 2017
84f1b25
[SPARK-21462][SS] Added batchId to StreamingQueryProgress.json
tdas Jul 18, 2017
81c99a5
[SPARK-21435][SQL] Empty files should be skipped while write to file
xuanyuanking Jul 19, 2017
ae253e5
[SPARK-21273][SQL][FOLLOW-UP] Propagate logical plan stats using visi…
gatorsmile Jul 19, 2017
46307b2
[SPARK-21401][ML][MLLIB] add poll function for BoundedPriorityQueue
Jul 19, 2017
4eb081c
[SPARK-21414] Refine SlidingWindowFunctionFrame to avoid OOM.
Jul 19, 2017
6b6dd68
[SPARK-21441][SQL] Incorrect Codegen in SortMergeJoinExec results fai…
DonnyZone Jul 19, 2017
70fe99d
[SPARK-21464][SS] Minimize deprecation warnings caused by ProcessingT…
tdas Jul 19, 2017
ef61775
[SPARK-21243][Core] Limit no. of map outputs in a shuffle fetch
dhruve Jul 19, 2017
c972918
[SPARK-21446][SQL] Fix setAutoCommit never executed
DFFuture Jul 19, 2017
c42ef95
[SPARK-21456][MESOS] Make the driver failover_timeout configurable
susanxhuynh Jul 19, 2017
8cd9cdf
[SPARK-21333][DOCS] Removed invalid joinTypes from javadoc of Dataset…
coreywoodfield Jul 19, 2017
2c9d5ef
[SPARK-21463] Allow userSpecifiedSchema to override partition inferen…
brkyvz Jul 19, 2017
b7a40f6
[SPARK-16542][SQL][PYSPARK] Fix bugs about types that result an array…
zasdfgbnm Jul 20, 2017
5b61cc6
[MINOR][DOCS] Fix some missing notes for Python 2.6 support drop
HyukjinKwon Jul 20, 2017
256358f
[SPARK-21477][SQL][MINOR] Mark LocalTableScanExec's input data transient
gatorsmile Jul 20, 2017
5d1850d
[MINOR][ML] Reorg RFormula params.
yanboliang Jul 20, 2017
cb19880
[SPARK-21472][SQL] Introduce ArrowColumnVector as a reader for Arrow …
ueshin Jul 20, 2017
da9f067
[SPARK-19531] Send UPDATE_LENGTH for Spark History service
dosoft Jul 20, 2017
03367d7
[SPARK-21142][SS] spark-streaming-kafka-0-10 should depend on kafka-c…
timvw Jul 20, 2017
3ac6093
[SPARK-10063] Follow-up: remove dead code related to an old output co…
cloud-fan Jul 20, 2017
c57dfae
[MINOR][SS][DOCS] Minor doc change for kafka integration
viirya Jul 21, 2017
2f14684
[SPARK-21472][SQL][FOLLOW-UP] Introduce ArrowColumnVector as a reader…
ueshin Jul 21, 2017
113399b
[SPARK-19810][BUILD][FOLLOW-UP] jcl-over-slf4j dependency needs to be…
srowen Jul 21, 2017
cc00e99
[SPARK-21434][PYTHON][DOCS] Add pyspark pip documentation.
holdenk Jul 21, 2017
ccaee5b
[SPARK-10063] Follow-up: remove a useless test related to an old outp…
cloud-fan Jul 23, 2017
cecd285
[SPARK-20904][CORE] Don't report task failures to driver during shutd…
Jul 23, 2017
2a53fbf
[SPARK-20871][SQL] limit logging of Janino code
Jul 23, 2017
a4eac8b
[MINOR] Remove **** in test case names in FlatMapGroupsWithStateSuite
rxin Jul 23, 2017
481f079
[SPARK-21512][SQL][TEST] DatasetCacheSuite needs to execute unpersist…
kiszk Jul 23, 2017
8666433
[SPARK-17528][SQL][FOLLOWUP] remove unnecessary data copy in object h…
cloud-fan Jul 24, 2017
b09ec92
[SPARK-21502][MESOS] fix --supervise for mesos in cluster mode
skonto Jul 24, 2017
7f29505
[SPARK-21516][SQL][TEST] Overriding afterEach() in DatasetCacheSuite …
kiszk Jul 25, 2017
4f77c06
[SPARK-20855][Docs][DStream] Update the Spark kinesis docs to use the…
yashs360 Jul 25, 2017
996a809
[SPARK-21498][EXAMPLES] quick start -> one py demo have some bug in code
lizhaoch Jul 25, 2017
799e131
[SPARK-21175] Reject OpenBlocks when memory shortage on shuffle service.
Jul 25, 2017
8de080d
[SPARK-21383][YARN] Fix the YarnAllocator allocates more Resource
Jul 25, 2017
06a9793
[SPARK-21447][WEB UI] Spark history server fails to render compressed
Jul 25, 2017
9b4da7b
[SPARK-21491][GRAPHX] Enhance GraphX performance: breakOut instead of…
SereneAnt Jul 25, 2017
ebc24a9
[SPARK-20586][SQL] Add deterministic to ScalaUDF
gatorsmile Jul 26, 2017
300807c
[SPARK-21494][NETWORK] Use correct app id when authenticating to exte…
Jul 26, 2017
1661263
[SPARK-21517][CORE] Avoid copying memory when transfer chunks remotely
caneMi Jul 26, 2017
ae4ea5f
[SPARK-21524][ML] unit test fix: ValidatorParamsSuiteHelpers generate…
YY-OnCall Jul 26, 2017
cf29828
[SPARK-20988][ML] Logistic regression uses aggregator hierarchy
sethah Jul 26, 2017
60472db
[SPARK-21485][SQL][DOCS] Spark SQL documentation generation for built…
HyukjinKwon Jul 26, 2017
cfb25b2
[SPARK-21530] Update description of spark.shuffle.maxChunksBeingTrans…
Jul 27, 2017
ebbe589
[SPARK-21271][SQL] Ensure Unsafe.sizeInBytes is a multiple of 8
kiszk Jul 27, 2017
2ff35a0
[SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and add ArrayTyp…
ueshin Jul 27, 2017
ddcd2e8
[SPARK-19270][ML] Add summary table to GLM summary
actuaryzhang Jul 27, 2017
9f5647d
[SPARK-21319][SQL] Fix memory leak in sorter
cloud-fan Jul 27, 2017
f44ead8
[SPARK-21538][SQL] Attribute resolution inconsistency in the Dataset API
Jul 27, 2017
a5a3189
[SPARK-21306][ML] OneVsRest should support setWeightCol
facaiy Jul 28, 2017
63d168c
[MINOR][BUILD] Fix current lint-java failures
srowen Jul 28, 2017
7846809
[SPARK-21553][SPARK SHELL] Add the description of the default value o…
Jul 28, 2017
69ab0e4
[SPARK-21541][YARN] Spark Logs show incorrect job status for a job th…
Jul 28, 2017
0ef9fe6
Typo in comment
nahoj Jul 28, 2017
b56f79c
[SPARK-20090][PYTHON] Add StructType.fieldNames in PySpark
HyukjinKwon Jul 29, 2017
c143820
[SPARK-21508][DOC] Fix example code provided in Spark Streaming Docum…
Jul 29, 2017
60e9b2b
[SPARK-21357][DSTREAMS] FileInputDStream not remove out of date RDD
shaofei007 Jul 29, 2017
9c8109e
[SPARK-21555][SQL] RuntimeReplaceable should be compared semantically…
viirya Jul 29, 2017
92d8563
[SPARK-19451][SQL] rangeBetween method should accept Long value as bo…
jiangxb1987 Jul 29, 2017
6550086
[SPARK-20962][SQL] Support subquery column aliases in FROM clause
maropu Jul 29, 2017
51f99fb
[SQL] Fix typo in DataframeWriter doc
Jul 30, 2017
d79816d
[SPARK-21297][WEB-UI] Add count in 'JDBC/ODBC Server' page.
Jul 30, 2017
6830e90
[MINOR][DOC] Replace numTasks with numPartitions in programming guide
polarker Jul 30, 2017
f1a798b
[MINOR] Minor comment fixes in merge_spark_pr.py script
HyukjinKwon Jul 31, 2017
44e501a
[SPARK-19839][CORE] release longArray in BytesToBytesMap
Jul 31, 2017
106eaa9
[SPARK-21575][SPARKR] Eliminate needless synchronization in java-R se…
SereneAnt Jul 31, 2017
6b186c9
[SPARK-18950][SQL] Report conflicting fields when merging two StructT…
jiayue-zhang Aug 1, 2017
9570e81
[SPARK-21381][SPARKR] SparkR: pass on setHandleInvalid for classifica…
wangmiao1981 Aug 1, 2017
110695d
[SPARK-21589][SQL][DOC] Add documents about Hive UDF/UDTF/UDAF
maropu Aug 1, 2017
5fd0294
[SPARK-21475][CORE] Use NIO's Files API to replace FileInputStream/Fi…
jerryshao Aug 1, 2017
253a07e
[SPARK-21388][ML][PYSPARK] GBTs inherit from HasStepSize & LInearSVC …
zhengruifeng Aug 1, 2017
97ccc63
[SPARK-21585] Application Master marking application status as Failed…
Aug 1, 2017
b133501
[SPARK-21522][CORE] Fix flakiness in LauncherServerSuite.
Aug 1, 2017
6735433
[SPARK-20079][YARN] Fix client AM not allocating executors after rest…
Aug 1, 2017
74cda94
[SPARK-21592][BUILD] Skip maven-compiler-plugin main and test compila…
gslowikowski Aug 1, 2017
b1d59e6
[SPARK-21593][DOCS] Fix 2 rendering errors on configuration page
srowen Aug 1, 2017
58da1a2
[SPARK-21339][CORE] spark-shell --packages option does not add jars t…
Aug 1, 2017
77cc0d6
[SPARK-12717][PYTHON] Adding thread-safe broadcast pickle registry
BryanCutler Aug 1, 2017
4cc704b
[CORE][MINOR] Improve the error message of checkpoint RDD verification
gatorsmile Aug 2, 2017
14e7575
[SPARK-21578][CORE] Add JavaSparkContextSuite
dongjoon-hyun Aug 2, 2017
845c039
[SPARK-20601][ML] Python API for Constrained Logistic Regression
zero323 Aug 2, 2017
7f63e85
[SPARK-21597][SS] Fix a potential overflow issue in EventTimeStats
zsxwing Aug 2, 2017
9456176
[SPARK-21490][CORE] Make sure SparkLauncher redirects needed streams.
Aug 2, 2017
0d26b3a
[SPARK-21546][SS] dropDuplicates should ignore watermark when it's no…
zsxwing Aug 2, 2017
7c206dd
[SPARK-21615][ML][MLLIB][DOCS] Fix broken redirect in collaborative f…
Aug 3, 2017
f13dbb3
[SPARK-21604][SQL] if the object extends Logging, i suggest to remove…
Aug 3, 2017
3221470
[SPARK-21611][SQL] Error class name for log in several classes.
Aug 3, 2017
e7c59b4
[SPARK-21605][BUILD] Let IntelliJ IDEA correctly detect Language leve…
baibaichen Aug 3, 2017
97ba491
[SPARK-21602][R] Add map_keys and map_values functions to R
HyukjinKwon Aug 3, 2017
13785da
[SPARK-21599][SQL] Collecting column statistics for datasource tables…
dilipbiswal Aug 3, 2017
bb7afb4
[SPARK-20713][SPARK CORE] Convert CommitDenied to TaskKilled.
Aug 3, 2017
dd72b10
Fix Java SimpleApp spark application
christiam Aug 3, 2017
e3967dc
[SPARK-21254][WEBUI] History UI performance fixes
2ooom Aug 4, 2017
25826c7
[SPARK-21330][SQL] Bad partitioning does not allow to read a JDBC tab…
aray Aug 4, 2017
1347b2a
[SPARK-21633][ML][PYTHON] UnaryTransformer in Python
ajaysaini725 Aug 4, 2017
231f672
[SPARK-21205][SQL] pmod(number, 0) should be null.
wangyum Aug 4, 2017
5ad1796
[SPARK-21634][SQL] Change OneRowRelation from a case object to case c…
rxin Aug 4, 2017
6cbd18c
[SPARK-21374][CORE] Fix reading globbed paths from S3 into DF with di…
zsxwing Aug 5, 2017
894d5a4
[SPARK-21580][SQL] Integers in aggregation expressions are wrongly ta…
10110346 Aug 5, 2017
3a45c7f
[INFRA] Close stale PRs
HyukjinKwon Aug 5, 2017
ba327ee
[SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples and arguments s…
HyukjinKwon Aug 5, 2017
dcac1d5
[SPARK-21640] Add errorifexists as a valid string for ErrorIfExists s…
Aug 5, 2017
41568e9
[SPARK-21637][SPARK-21451][SQL] get `spark.hadoop.*` properties from …
yaooqinn Aug 6, 2017
990efad
[SPARK-20963][SQL] Support column aliases for join relations in FROM …
maropu Aug 6, 2017
1ba967b
[SPARK-21588][SQL] SQLContext.getConf(key, null) should return null
vinodkc Aug 6, 2017
d4e7f20
[SPARKR][BUILD] AppVeyor change to latest R version
felixcheung Aug 6, 2017
10b3ca3
[SPARK-21574][SQL] Point out user to set hive config before SparkSess…
wangyum Aug 6, 2017
74b4784
[SPARK-20963][SQL][FOLLOW-UP] Use UnresolvedSubqueryColumnAliases for…
maropu Aug 6, 2017
55aa4da
[SPARK-21622][ML][SPARKR] Support offset in SparkR GLM
actuaryzhang Aug 6, 2017
438c381
Add "full_outer" name to join types
BartekH Aug 6, 2017
39e044e
[MINOR][BUILD] Remove duplicate test-jar:test spark-sql dependency fr…
srowen Aug 6, 2017
534a063
[SPARK-21621][CORE] Reset numRecordsWritten after DiskBlockObjectWrit…
ConeyLiu Aug 7, 2017
663f30d
[SPARK-13041][MESOS] Adds sandbox uri to spark dispatcher ui
skonto Aug 7, 2017
1426eea
[SPARK-21623][ML] fix RF doc
Aug 7, 2017
8b69b17
[SPARK-21544][DEPLOY][TEST-MAVEN] Tests jar of some module should not…
caneGuy Aug 7, 2017
bbfd6b5
[SPARK-21647][SQL] Fix SortMergeJoin when using CROSS
gatorsmile Aug 7, 2017
4f7ec3a
[SPARK][DOCS] Added note on meaning of position to substring function
maclockard Aug 7, 2017
cce25b3
[SPARK-21565][SS] Propagate metadata in attribute replacement.
Aug 7, 2017
baf5cac
[SPARK-21648][SQL] Fix confusing assert failure in JDBC source when p…
gatorsmile Aug 7, 2017
fdcee02
[SPARK-21542][ML][PYTHON] Python persistence helper functions
ajaysaini725 Aug 8, 2017
f763d84
[SPARK-19270][FOLLOW-UP][ML] PySpark GLR model.summary should return …
yanboliang Aug 8, 2017
312bebf
[SPARK-21640][FOLLOW-UP][SQL] added errorifexists on IllegalArgumentE…
Aug 8, 2017
ee13041
[SPARK-21567][SQL] Dataset should work with type alias
viirya Aug 8, 2017
08ef7d7
[MINOR][R][BUILD] More reliable detection of R version for Windows in…
HyukjinKwon Aug 8, 2017
979bf94
[SPARK-20655][CORE] In-memory KVStore implementation.
Aug 8, 2017
2c1bfb4
[SPARK-21671][CORE] Move kvstore to "util" sub-package, add private a…
Aug 8, 2017
fb54a56
[SPARK-20433][BUILD] Bump jackson from 2.6.5 to 2.6.7.1
srowen Aug 9, 2017
6edfff0
[SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the…
zsxwing Aug 9, 2017
031910b
[SPARK-21608][SPARK-9221][SQL] Window rangeBetween() API should allow…
jiangxb1987 Aug 9, 2017
f016f5c
[SPARK-21503][UI] Spark UI shows incorrect task status for a killed E…
Aug 9, 2017
ae8a2b1
[SPARK-21176][WEB UI] Use a single ProxyServlet to proxy all workers …
aosagie Aug 9, 2017
b35660d
[SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in …
WeichenXu123 Aug 9, 2017
6426adf
[SPARK-21663][TESTS] test("remote fetch below max RPC message size") …
wangjiaochun Aug 9, 2017
83fe3b5
[SPARK-21665][CORE] Need to close resources after use
vinodkc Aug 9, 2017
b78cf13
[SPARK-21276][CORE] Update lz4-java to the latest (v1.4.0)
maropu Aug 9, 2017
2d799d0
[SPARK-21504][SQL] Add spark version info into table metadata
gatorsmile Aug 9, 2017
0fb7325
[SPARK-21587][SS] Added filter pushdown through watermarks.
Aug 9, 2017
c06f3f5
[SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator
peay Aug 9, 2017
84454d7
[SPARK-14932][SQL] Allow DataFrame.replace() to replace values with None
jiayue-zhang Aug 10, 2017
95ad960
[SPARK-21669] Internal API for collecting metrics/stats during FileFo…
adrian-ionescu Aug 10, 2017
ca69558
[SPARK-21638][ML] Fix RF/GBT Warning message error
Aug 10, 2017
584c7f1
[SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog
rxin Aug 11, 2017
2387f1e
[SPARK-21675][WEBUI] Add a navigation bar at the bottom of the Detail…
yaooqinn Aug 11, 2017
0377338
[SPARK-21519][SQL] Add an option to the JDBC data source to initializ…
LucaCanali Aug 11, 2017
9443999
[SPARK-21595] Separate thresholds for buffering and spilling in Exter…
tejasapatil Aug 11, 2017
7f16c69
[SPARK-19122][SQL] Unnecessary shuffle+sort added if join predicates …
tejasapatil Aug 11, 2017
da8c59b
[SPARK-12559][SPARK SUBMIT] fix --packages for stand-alone cluster mode
skonto Aug 11, 2017
b0bdfce
[MINOR][BUILD] Download RAT and R version info over HTTPS; use RAT 0.12
srowen Aug 12, 2017
35db3b9
[SPARK-17025][ML][PYTHON] Persistence for Pipelines with Python-only …
ajaysaini725 Aug 12, 2017
c0e333d
[SPARK-21709][BUILD] sbt 0.13.16 and some plugin updates
Aug 12, 2017
5596ce8
[MINOR][SQL] Additional test case for CheckCartesianProducts rule
Aug 14, 2017
34d2134
[SPARK-21176][WEB UI] Format worker page links to work with proxy
aosagie Aug 14, 2017
6847e93
[SPARK-21563][CORE] Fix race condition when serializing TaskDescripti…
ash211 Aug 14, 2017
0fcde87
[SPARK-21658][SQL][PYSPARK] Add default None for value in na.replace …
chihhanyu Aug 14, 2017
0326b69
[MINOR][SQL][TEST] no uncache table in joinsuite test
heary-cao Aug 14, 2017
fbc2692
[SPARK-19471][SQL] AggregationIterator does not initialize the genera…
DonnyZone Aug 14, 2017
282f00b
[SPARK-21696][SS] Fix a potential issue that may generate partial sna…
zsxwing Aug 14, 2017
4c3cf1c
[SPARK-21721][SQL] Clear FileSystem deleteOnExit cache when paths are…
viirya Aug 15, 2017
0422ce0
[SPARK-21724][SQL][DOC] Adds since information in the documentation o…
HyukjinKwon Aug 15, 2017
12411b5
[SPARK-21732][SQL] Lazily init hive metastore client
zsxwing Aug 15, 2017
bc99025
[SPARK-19471][SQL] AggregationIterator does not initialize the genera…
DonnyZone Aug 15, 2017
14bdb25
[SPARK-18464][SQL][FOLLOWUP] support old table which doesn't store sc…
cloud-fan Aug 15, 2017
cba826d
[SPARK-17742][CORE] Handle child process exit in SparkLauncher.
Aug 15, 2017
3f958a9
[SPARK-21731][BUILD] Upgrade scalastyle to 0.9.
Aug 15, 2017
42b9eda
[MINOR] Fix a typo in the method name `UserDefinedFunction.asNonNullabe`
jiangxb1987 Aug 15, 2017
9660831
[SPARK-21712][PYSPARK] Clarify type error for Column.substr()
nchammas Aug 16, 2017
07549b2
[SPARK-19634][ML] Multivariate summarizer - dataframes API
WeichenXu123 Aug 16, 2017
8c54f1e
[SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
dongjoon-hyun Aug 16, 2017
8321c14
[SPARK-21723][ML] Fix writing LibSVM (key not found: numFeatures)
Aug 16, 2017
0bb8d1f
[SPARK-13969][ML] Add FeatureHasher transformer
Aug 16, 2017
adf005d
[SPARK-21656][CORE] spark dynamic allocation should not idle timeout …
Aug 16, 2017
1cce1a3
[SPARK-21603][SQL] The wholestage codegen will be much slower then th…
eatoncys Aug 16, 2017
7add4e9
[SPARK-21738] Thriftserver doesn't cancel jobs when session is closed
mgaido91 Aug 16, 2017
a0345cb
[SPARK-21680][ML][MLLIB] optimize Vector compress
Aug 16, 2017
b8ffb51
[SPARK-3151][BLOCK MANAGER] DiskStore.getBytes fails for files larger…
Aug 17, 2017
a45133b
[SPARK-21743][SQL] top-most limit should not cause memory leak
cloud-fan Aug 17, 2017
d695a52
[SPARK-21642][CORE] Use FQDN for DRIVER_HOST_ADDRESS instead of ip ad…
akitanaka Aug 17, 2017
b83b502
[SPARK-21428] Turn IsolatedClientLoader off while using builtin Hive …
yaooqinn Aug 17, 2017
ede46cf
Merge remote-tracking branch 'origin/master' into palantir-master
ash211 Aug 17, 2017
a958ea8
Resolve conflicts from merge with apache
ash211 Aug 17, 2017
cf576cf
Update rendered dependencies
ash211 Aug 18, 2017
d95d341
Scalastyle fixes
ash211 Aug 18, 2017
00a5e85
Checkstyle fix
ash211 Aug 18, 2017
bd08934
Downgrade Netty from 4.1.x series to 4.0.x series
ash211 Aug 21, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ dev/pr-deps/
dist/
docs/_site
docs/api
sql/docs
sql/site
lib_managed/
lint-r-report.log
log/
Expand Down
2 changes: 2 additions & 0 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,8 @@ exportMethods("%<=>%",
"lower",
"lpad",
"ltrim",
"map_keys",
"map_values",
"max",
"md5",
"mean",
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -593,7 +593,7 @@ setMethod("cache",
#'
#' Persist this SparkDataFrame with the specified storage level. For details of the
#' supported storage levels, refer to
#' \url{http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence}.
#' \url{http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence}.
#'
#' @param x the SparkDataFrame to persist.
#' @param newLevel storage level chosen for the persistance. See available options in
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/R/RDD.R
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ setMethod("cacheRDD",
#'
#' Persist this RDD with the specified storage level. For details of the
#' supported storage levels, refer to
#'\url{http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence}.
#'\url{http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence}.
#'
#' @param x The RDD to persist
#' @param newLevel The new storage level to be assigned
Expand Down
33 changes: 32 additions & 1 deletion R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,10 @@ NULL
#' head(tmp2)
#' head(select(tmp, posexplode(tmp$v1)))
#' head(select(tmp, sort_array(tmp$v1)))
#' head(select(tmp, sort_array(tmp$v1, asc = FALSE)))}
#' head(select(tmp, sort_array(tmp$v1, asc = FALSE)))
#' tmp3 <- mutate(df, v3 = create_map(df$model, df$cyl))
#' head(select(tmp3, map_keys(tmp3$v3)))
#' head(select(tmp3, map_values(tmp3$v3)))}
NULL

#' Window functions for Column operations
Expand Down Expand Up @@ -3055,6 +3058,34 @@ setMethod("array_contains",
column(jc)
})

#' @details
#' \code{map_keys}: Returns an unordered array containing the keys of the map.
#'
#' @rdname column_collection_functions
#' @aliases map_keys map_keys,Column-method
#' @export
#' @note map_keys since 2.3.0
setMethod("map_keys",
signature(x = "Column"),
function(x) {
jc <- callJStatic("org.apache.spark.sql.functions", "map_keys", x@jc)
column(jc)
})

#' @details
#' \code{map_values}: Returns an unordered array containing the values of the map.
#'
#' @rdname column_collection_functions
#' @aliases map_values map_values,Column-method
#' @export
#' @note map_values since 2.3.0
setMethod("map_values",
signature(x = "Column"),
function(x) {
jc <- callJStatic("org.apache.spark.sql.functions", "map_values", x@jc)
column(jc)
})

#' @details
#' \code{explode}: Creates a new row for each element in the given array or map column.
#'
Expand Down
10 changes: 10 additions & 0 deletions R/pkg/R/generics.R
Original file line number Diff line number Diff line change
Expand Up @@ -1213,6 +1213,16 @@ setGeneric("lpad", function(x, len, pad) { standardGeneric("lpad") })
#' @name NULL
setGeneric("ltrim", function(x) { standardGeneric("ltrim") })

#' @rdname column_collection_functions
#' @export
#' @name NULL
setGeneric("map_keys", function(x) { standardGeneric("map_keys") })

#' @rdname column_collection_functions
#' @export
#' @name NULL
setGeneric("map_values", function(x) { standardGeneric("map_values") })

#' @rdname column_misc_functions
#' @export
#' @name NULL
Expand Down
49 changes: 40 additions & 9 deletions R/pkg/R/mllib_classification.R
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,11 @@ setClass("NaiveBayesModel", representation(jobj = "jobj"))
#' @param aggregationDepth The depth for treeAggregate (greater than or equal to 2). If the dimensions of features
#' or the number of partitions are large, this param could be adjusted to a larger size.
#' This is an expert parameter. Default value should be good for most cases.
#' @param handleInvalid How to handle invalid data (unseen labels or NULL values) in features and label
#' column of string type.
#' Supported options: "skip" (filter out rows with invalid data),
#' "error" (throw an error), "keep" (put invalid data in a special additional
#' bucket, at index numLabels). Default is "error".
#' @param ... additional arguments passed to the method.
#' @return \code{spark.svmLinear} returns a fitted linear SVM model.
#' @rdname spark.svmLinear
Expand Down Expand Up @@ -98,7 +103,8 @@ setClass("NaiveBayesModel", representation(jobj = "jobj"))
#' @note spark.svmLinear since 2.2.0
setMethod("spark.svmLinear", signature(data = "SparkDataFrame", formula = "formula"),
function(data, formula, regParam = 0.0, maxIter = 100, tol = 1E-6, standardization = TRUE,
threshold = 0.0, weightCol = NULL, aggregationDepth = 2) {
threshold = 0.0, weightCol = NULL, aggregationDepth = 2,
handleInvalid = c("error", "keep", "skip")) {
formula <- paste(deparse(formula), collapse = "")

if (!is.null(weightCol) && weightCol == "") {
Expand All @@ -107,10 +113,12 @@ setMethod("spark.svmLinear", signature(data = "SparkDataFrame", formula = "formu
weightCol <- as.character(weightCol)
}

handleInvalid <- match.arg(handleInvalid)

jobj <- callJStatic("org.apache.spark.ml.r.LinearSVCWrapper", "fit",
data@sdf, formula, as.numeric(regParam), as.integer(maxIter),
as.numeric(tol), as.logical(standardization), as.numeric(threshold),
weightCol, as.integer(aggregationDepth))
weightCol, as.integer(aggregationDepth), handleInvalid)
new("LinearSVCModel", jobj = jobj)
})

Expand Down Expand Up @@ -218,6 +226,11 @@ function(object, path, overwrite = FALSE) {
#' @param upperBoundsOnIntercepts The upper bounds on intercepts if fitting under bound constrained optimization.
#' The bound vector size must be equal to 1 for binomial regression, or the number
#' of classes for multinomial regression.
#' @param handleInvalid How to handle invalid data (unseen labels or NULL values) in features and label
#' column of string type.
#' Supported options: "skip" (filter out rows with invalid data),
#' "error" (throw an error), "keep" (put invalid data in a special additional
#' bucket, at index numLabels). Default is "error".
#' @param ... additional arguments passed to the method.
#' @return \code{spark.logit} returns a fitted logistic regression model.
#' @rdname spark.logit
Expand Down Expand Up @@ -257,7 +270,8 @@ setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula")
tol = 1E-6, family = "auto", standardization = TRUE,
thresholds = 0.5, weightCol = NULL, aggregationDepth = 2,
lowerBoundsOnCoefficients = NULL, upperBoundsOnCoefficients = NULL,
lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL) {
lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL,
handleInvalid = c("error", "keep", "skip")) {
formula <- paste(deparse(formula), collapse = "")
row <- 0
col <- 0
Expand Down Expand Up @@ -304,6 +318,8 @@ setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula")
upperBoundsOnCoefficients <- as.array(as.vector(upperBoundsOnCoefficients))
}

handleInvalid <- match.arg(handleInvalid)

jobj <- callJStatic("org.apache.spark.ml.r.LogisticRegressionWrapper", "fit",
data@sdf, formula, as.numeric(regParam),
as.numeric(elasticNetParam), as.integer(maxIter),
Expand All @@ -312,7 +328,8 @@ setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula")
weightCol, as.integer(aggregationDepth),
as.integer(row), as.integer(col),
lowerBoundsOnCoefficients, upperBoundsOnCoefficients,
lowerBoundsOnIntercepts, upperBoundsOnIntercepts)
lowerBoundsOnIntercepts, upperBoundsOnIntercepts,
handleInvalid)
new("LogisticRegressionModel", jobj = jobj)
})

Expand Down Expand Up @@ -394,7 +411,12 @@ setMethod("write.ml", signature(object = "LogisticRegressionModel", path = "char
#' @param stepSize stepSize parameter.
#' @param seed seed parameter for weights initialization.
#' @param initialWeights initialWeights parameter for weights initialization, it should be a
#' numeric vector.
#' numeric vector.
#' @param handleInvalid How to handle invalid data (unseen labels or NULL values) in features and label
#' column of string type.
#' Supported options: "skip" (filter out rows with invalid data),
#' "error" (throw an error), "keep" (put invalid data in a special additional
#' bucket, at index numLabels). Default is "error".
#' @param ... additional arguments passed to the method.
#' @return \code{spark.mlp} returns a fitted Multilayer Perceptron Classification Model.
#' @rdname spark.mlp
Expand Down Expand Up @@ -426,7 +448,8 @@ setMethod("write.ml", signature(object = "LogisticRegressionModel", path = "char
#' @note spark.mlp since 2.1.0
setMethod("spark.mlp", signature(data = "SparkDataFrame", formula = "formula"),
function(data, formula, layers, blockSize = 128, solver = "l-bfgs", maxIter = 100,
tol = 1E-6, stepSize = 0.03, seed = NULL, initialWeights = NULL) {
tol = 1E-6, stepSize = 0.03, seed = NULL, initialWeights = NULL,
handleInvalid = c("error", "keep", "skip")) {
formula <- paste(deparse(formula), collapse = "")
if (is.null(layers)) {
stop ("layers must be a integer vector with length > 1.")
Expand All @@ -441,10 +464,11 @@ setMethod("spark.mlp", signature(data = "SparkDataFrame", formula = "formula"),
if (!is.null(initialWeights)) {
initialWeights <- as.array(as.numeric(na.omit(initialWeights)))
}
handleInvalid <- match.arg(handleInvalid)
jobj <- callJStatic("org.apache.spark.ml.r.MultilayerPerceptronClassifierWrapper",
"fit", data@sdf, formula, as.integer(blockSize), as.array(layers),
as.character(solver), as.integer(maxIter), as.numeric(tol),
as.numeric(stepSize), seed, initialWeights)
as.numeric(stepSize), seed, initialWeights, handleInvalid)
new("MultilayerPerceptronClassificationModel", jobj = jobj)
})

Expand Down Expand Up @@ -514,6 +538,11 @@ setMethod("write.ml", signature(object = "MultilayerPerceptronClassificationMode
#' @param formula a symbolic description of the model to be fitted. Currently only a few formula
#' operators are supported, including '~', '.', ':', '+', and '-'.
#' @param smoothing smoothing parameter.
#' @param handleInvalid How to handle invalid data (unseen labels or NULL values) in features and label
#' column of string type.
#' Supported options: "skip" (filter out rows with invalid data),
#' "error" (throw an error), "keep" (put invalid data in a special additional
#' bucket, at index numLabels). Default is "error".
#' @param ... additional argument(s) passed to the method. Currently only \code{smoothing}.
#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model.
#' @rdname spark.naiveBayes
Expand Down Expand Up @@ -543,10 +572,12 @@ setMethod("write.ml", signature(object = "MultilayerPerceptronClassificationMode
#' }
#' @note spark.naiveBayes since 2.0.0
setMethod("spark.naiveBayes", signature(data = "SparkDataFrame", formula = "formula"),
function(data, formula, smoothing = 1.0) {
function(data, formula, smoothing = 1.0,
handleInvalid = c("error", "keep", "skip")) {
formula <- paste(deparse(formula), collapse = "")
handleInvalid <- match.arg(handleInvalid)
jobj <- callJStatic("org.apache.spark.ml.r.NaiveBayesWrapper", "fit",
formula, data@sdf, smoothing)
formula, data@sdf, smoothing, handleInvalid)
new("NaiveBayesModel", jobj = jobj)
})

Expand Down
22 changes: 18 additions & 4 deletions R/pkg/R/mllib_regression.R
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
#' "frequencyDesc", "frequencyAsc", "alphabetDesc", and "alphabetAsc".
#' The default value is "frequencyDesc". When the ordering is set to
#' "alphabetDesc", this drops the same category as R when encoding strings.
#' @param offsetCol the offset column name. If this is not set or empty, we treat all instance offsets
#' as 0.0. The feature specified as offset has a constant coefficient of 1.0.
#' @param ... additional arguments passed to the method.
#' @aliases spark.glm,SparkDataFrame,formula-method
#' @return \code{spark.glm} returns a fitted generalized linear model.
Expand Down Expand Up @@ -127,7 +129,8 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL,
regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power,
stringIndexerOrderType = c("frequencyDesc", "frequencyAsc",
"alphabetDesc", "alphabetAsc")) {
"alphabetDesc", "alphabetAsc"),
offsetCol = NULL) {

stringIndexerOrderType <- match.arg(stringIndexerOrderType)
if (is.character(family)) {
Expand Down Expand Up @@ -159,12 +162,19 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
weightCol <- as.character(weightCol)
}

if (!is.null(offsetCol)) {
offsetCol <- as.character(offsetCol)
if (nchar(offsetCol) == 0) {
offsetCol <- NULL
}
}

# For known families, Gamma is upper-cased
jobj <- callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper",
"fit", formula, data@sdf, tolower(family$family), family$link,
tol, as.integer(maxIter), weightCol, regParam,
as.double(var.power), as.double(link.power),
stringIndexerOrderType)
stringIndexerOrderType, offsetCol)
new("GeneralizedLinearRegressionModel", jobj = jobj)
})

Expand Down Expand Up @@ -192,6 +202,8 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
#' "frequencyDesc", "frequencyAsc", "alphabetDesc", and "alphabetAsc".
#' The default value is "frequencyDesc". When the ordering is set to
#' "alphabetDesc", this drops the same category as R when encoding strings.
#' @param offsetCol the offset column name. If this is not set or empty, we treat all instance offsets
#' as 0.0. The feature specified as offset has a constant coefficient of 1.0.
#' @return \code{glm} returns a fitted generalized linear model.
#' @rdname glm
#' @export
Expand All @@ -209,10 +221,12 @@ setMethod("glm", signature(formula = "formula", family = "ANY", data = "SparkDat
function(formula, family = gaussian, data, epsilon = 1e-6, maxit = 25, weightCol = NULL,
var.power = 0.0, link.power = 1.0 - var.power,
stringIndexerOrderType = c("frequencyDesc", "frequencyAsc",
"alphabetDesc", "alphabetAsc")) {
"alphabetDesc", "alphabetAsc"),
offsetCol = NULL) {
spark.glm(data, formula, family, tol = epsilon, maxIter = maxit, weightCol = weightCol,
var.power = var.power, link.power = link.power,
stringIndexerOrderType = stringIndexerOrderType)
stringIndexerOrderType = stringIndexerOrderType,
offsetCol = offsetCol)
})

# Returns the summary of a model produced by glm() or spark.glm(), similarly to R's summary().
Expand Down
Loading