Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
979 commits
Select commit Hold shift + click to select a range
fdde7d0
[SPARK-16348][ML][MLLIB][PYTHON] Use full classpaths for pyspark ML J…
jkbradley Jul 6, 2016
d0d2850
[SPARK-16286][SQL] Implement stack table generating function
dongjoon-hyun Jul 6, 2016
ec18cd0
[SPARK-16389][SQL] Remove MetastoreRelation from SparkHiveWriterConta…
gatorsmile Jul 6, 2016
ec79183
[SPARK-16340][SQL] Support column arguments for `regexp_replace` Data…
dongjoon-hyun Jul 6, 2016
5f34204
[SPARK-16339][CORE] ScriptTransform does not print stderr when outstr…
tejasapatil Jul 6, 2016
5497242
[SPARK-16249][ML] Change visibility of Object ml.clustering.LDA to pu…
YY-OnCall Jul 6, 2016
7e28fab
[SPARK-16388][SQL] Remove spark.sql.nativeView and spark.sql.nativeVi…
rxin Jul 6, 2016
909c6d8
[SPARK-16307][ML] Add test to verify the predicted variances of a DT …
MechCoder Jul 6, 2016
21eadd1
[SPARK-16229][SQL] Drop Empty Table After CREATE TABLE AS SELECT fails
gatorsmile Jul 6, 2016
478b71d
[SPARK-15591][WEBUI] Paginate Stage Table in Stages tab
nblintao Jul 6, 2016
23eff5e
[SPARK-15979][SQL] Renames CatalystWriteSupport to ParquetWriteSupport
liancheng Jul 6, 2016
b131042
[DOC][SQL] update out-of-date code snippets using SQLContext in all d…
WeichenXu123 Jul 6, 2016
4e14199
[MINOR][PYSPARK][DOC] Fix wrongly formatted examples in PySpark docum…
HyukjinKwon Jul 6, 2016
480357c
[SPARK-16304] LinkageError should not crash Spark executor
petermaxlee Jul 6, 2016
4f8ceed
[SPARK-16371][SQL] Do not push down filters incorrectly when inner na…
HyukjinKwon Jul 6, 2016
040f6f9
[SPARK-15740][MLLIB] Word2VecSuite "big model load / save" caused OOM…
tmnd1991 Jul 6, 2016
a8f89df
[SPARK-16379][CORE][MESOS] Spark on mesos is broken due to race condi…
srowen Jul 6, 2016
9c04199
[MESOS] expand coarse-grained mode docs
Jul 6, 2016
8e3e4ed
[SPARK-16371][SQL] Two follow-up tasks
rxin Jul 6, 2016
b8ebf63
[SPARK-16212][STREAMING][KAFKA] apply test tweaks from 0-10 to 0-8 as…
koeninger Jul 6, 2016
44c7c62
[SPARK-16021] Fill freed memory in test to help catch correctness bugs
ericl Jul 6, 2016
34283de
[SPARK-14839][SQL] Support for other types for `tableProperty` rule i…
HyukjinKwon Jul 7, 2016
42279bf
[SPARK-16374][SQL] Remove Alias from MetastoreRelation and SimpleCata…
gatorsmile Jul 7, 2016
69f5391
[SPARK-16398][CORE] Make cancelJob and cancelStage APIs public
MasterDDT Jul 7, 2016
4b5a72c
[SPARK-16021][TEST-MAVEN] Fix the maven build
zsxwing Jul 7, 2016
ce3ea96
[SPARK-15885][WEB UI] Provide links to executor logs from stage detai…
tmagrino Jul 7, 2016
ab05db0
[SPARK-16368][SQL] Fix Strange Errors When Creating View With Unmatch…
gatorsmile Jul 7, 2016
986b251
[SPARK-16400][SQL] Remove InSet filter pushdown from Parquet
rxin Jul 7, 2016
4c6f00d
[SPARK-16372][MLLIB] Retag RDD to tallSkinnyQR of RowMatrix
yinxusen Jul 7, 2016
6343f66
[SPARK-16399][PYSPARK] Force PYSPARK_PYTHON to python
MechCoder Jul 7, 2016
a04cab8
[SPARK-16174][SQL] Improve `OptimizeIn` optimizer to remove literal r…
dongjoon-hyun Jul 7, 2016
0f7175d
[SPARK-16350][SQL] Fix support for incremental planning in wirteStrea…
lw-lin Jul 7, 2016
28710b4
[SPARK-16415][SQL] fix catalog string error
adrian-wang Jul 7, 2016
f4767bc
[SPARK-16310][SPARKR] R na.string-like default for csv source
felixcheung Jul 7, 2016
6aa7d09
[SPARK-16425][R] `describe()` should not fail with non-numeric columns
dongjoon-hyun Jul 8, 2016
5bce458
[SPARK-16430][SQL][STREAMING] Add option maxFilesPerTrigger
tdas Jul 8, 2016
dff73bf
[SPARK-16052][SQL] Improve `CollapseRepartition` optimizer for Repart…
dongjoon-hyun Jul 8, 2016
8228b06
[SPARK-16436][SQL] checkEvaluation should support NaN
petermaxlee Jul 8, 2016
a54438c
[SPARK-16285][SQL] Implement sentences SQL functions
dongjoon-hyun Jul 8, 2016
255d74f
[SPARK-16369][MLLIB] tallSkinnyQR of RowMatrix should aware of empty …
yinxusen Jul 8, 2016
38cf8f2
[SPARK-13638][SQL] Add quoteAll option to CSV DataFrameWriter
jurriaan Jul 8, 2016
67e085e
[SPARK-16420] Ensure compression streams are closed.
rdblue Jul 8, 2016
142df48
[SPARK-16429][SQL] Include `StringType` columns in `describe()`
dongjoon-hyun Jul 8, 2016
f5fef69
[SPARK-16281][SQL] Implement parse_url SQL function
janplus Jul 8, 2016
60ba436
[SPARK-16453][BUILD] release-build.sh is missing hive-thriftserver fo…
yhuai Jul 8, 2016
3b22291
[SPARK-16387][SQL] JDBC Writer should use dialect to quote field names.
dongjoon-hyun Jul 8, 2016
fd6e8f0
[SPARK-13569][STREAMING][KAFKA] pattern based topic subscription
koeninger Jul 9, 2016
6cef018
[SPARK-16376][WEBUI][SPARK WEB UI][APP-ID] HTTP ERROR 500 when using …
srowen Jul 9, 2016
d8b06f1
[SPARK-16432] Empty blocks fail to serialize due to assert in Chunked…
ericl Jul 9, 2016
b1db26a
[SPARK-11857][MESOS] Deprecate fine grained
Jul 9, 2016
7374e51
[SPARK-16401][SQL] Data Source API: Enable Extending RelationProvider…
gatorsmile Jul 9, 2016
f12a38b
[SPARK-15467][BUILD] update janino version to 3.0.0
kiszk Jul 11, 2016
52b5bb0
[SPARK-16476] Restructure MimaExcludes for easier union excludes
rxin Jul 11, 2016
82f0874
[SPARK-16318][SQL] Implement all remaining xpath functions
petermaxlee Jul 11, 2016
e226278
[SPARK-16355][SPARK-16354][SQL] Fix Bugs When LIMIT/TABLESAMPLE is No…
gatorsmile Jul 11, 2016
9cb1eb7
[SPARK-16381][SQL][SPARKR] Update SQL examples and programming guide …
keypointt Jul 11, 2016
7ac79da
[SPARK-16459][SQL] Prevent dropping current database
dongjoon-hyun Jul 11, 2016
ffcb6e0
[SPARK-16477] Bump master version to 2.1.0-SNAPSHOT
rxin Jul 11, 2016
840853e
[SPARK-16458][SQL] SessionCatalog should support `listColumns` for te…
dongjoon-hyun Jul 11, 2016
2ad031b
[SPARKR][DOC] SparkR ML user guides update for 2.0
yanboliang Jul 11, 2016
7f38b9d
[SPARK-16144][SPARKR] update R API doc for mllib
felixcheung Jul 11, 2016
b4fbe14
[SPARK-16349][SQL] Fall back to isolated class loader when classes no…
Jul 11, 2016
9e2c763
[SPARK-16114][SQL] structured streaming event time window example
jjthomas Jul 12, 2016
05d7151
[MINOR][STREAMING][DOCS] Minor changes on kinesis integration
keypointt Jul 12, 2016
91a443b
[SPARK-16433][SQL] Improve StreamingQuery.explain when no data arrives
zsxwing Jul 12, 2016
e50efd5
[SPARK-16430][SQL][STREAMING] Fixed bug in the maxFilesPerTrigger in …
tdas Jul 12, 2016
9cc74f9
[SPARK-16488] Fix codegen variable namespace collision in pmod and pa…
sameeragarwal Jul 12, 2016
b1e5281
[SPARK-12639][SQL] Mark Filters Fully Handled By Sources with *
RussellSpitzer Jul 12, 2016
c9a6762
[SPARK-16199][SQL] Add a method to list the referenced columns in dat…
petermaxlee Jul 12, 2016
fc11c50
[MINOR][ML] update comment where is inconsistent with code in ml.regr…
WeichenXu123 Jul 12, 2016
5b28e02
[SPARK-16189][SQL] Add ExternalRDD logical plan for input with RDD to…
ueshin Jul 12, 2016
6cb75db
[SPARK-16470][ML][OPTIMIZER] Check linear regression training whether…
WeichenXu123 Jul 12, 2016
5ad68ba
[SPARK-15752][SQL] Optimize metadata only query that has an aggregate…
lianhuiwang Jul 12, 2016
c377e49
[SPARK-16489][SQL] Guard against variable reuse mistakes in expressio…
rxin Jul 12, 2016
d513c99
[SPARK-16414][YARN] Fix bugs for "Can not get user config when callin…
sharkdtu Jul 12, 2016
68df47a
[SPARK-16405] Add metrics and source for external shuffle service
lovexi Jul 12, 2016
7f96886
[SPARK-16119][SQL] Support PURGE option to drop table / partition.
Jul 12, 2016
56bd399
[SPARK-16284][SQL] Implement reflect SQL function
petermaxlee Jul 13, 2016
1c58fa9
[SPARK-16514][SQL] Fix various regex codegen bugs
ericl Jul 13, 2016
772c213
[SPARK-16303][DOCS][EXAMPLES] Updated SQL programming guide and examples
Jul 13, 2016
c190d89
[SPARK-15889][STREAMING] Follow-up fix to erroneous condition in Stre…
srowen Jul 13, 2016
f156136
[SPARK-16375][WEB UI] Fixed misassigned var: numCompletedTasks was as…
ajbozarth Jul 13, 2016
f73891e
[MINOR] Fix Java style errors and remove unused imports
keypointt Jul 13, 2016
83879eb
[SPARK-16439] Fix number formatting in SQL UI
Jul 13, 2016
bf107f1
[SPARK-16438] Add Asynchronous Actions documentation
phalodi Jul 13, 2016
3d6f679
[MINOR][YARN] Fix code error in yarn-cluster unit test
sharkdtu Jul 13, 2016
51ade51
[SPARK-16440][MLLIB] Undeleted broadcast variables in Word2Vec causin…
srowen Jul 13, 2016
ea06e4e
[SPARK-16469] enhanced simulate multiply
uzadude Jul 13, 2016
f376c37
[SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown pre…
jiangxb1987 Jul 13, 2016
d8220c1
[SPARK-16435][YARN][MINOR] Add warning log if initialExecutors is les…
jerryshao Jul 13, 2016
01f09b1
[SPARK-14812][ML][MLLIB][PYTHON] Experimental, DeveloperApi annotatio…
jkbradley Jul 13, 2016
0744d84
[SPARK-16531][SQL][TEST] Remove timezone setting from DataFrameTimeWi…
brkyvz Jul 13, 2016
51a6706
[SPARK-16114][SQL] updated structured streaming guide
jjthomas Jul 13, 2016
b4baf08
[SPARKR][MINOR] R examples and test updates
felixcheung Jul 13, 2016
fb2e8ee
[SPARKR][DOCS][MINOR] R programming guide to include csv data source …
felixcheung Jul 13, 2016
c5ec879
[SPARK-16482][SQL] Describe Table Command for Tables Requiring Runtim…
gatorsmile Jul 13, 2016
a5f51e2
[SPARK-16485][ML][DOC] Fix privacy of GLM members, rename sqlDataType…
jkbradley Jul 13, 2016
9c53057
[SPARK-16536][SQL][PYSPARK][MINOR] Expose `sql` in PySpark Shell
dongjoon-hyun Jul 14, 2016
39c836e
[SPARK-16503] SparkSession should provide Spark version
lw-lin Jul 14, 2016
db7317a
[SPARK-16448] RemoveAliasOnlyProject should not remove alias with met…
cloud-fan Jul 14, 2016
252d4f2
[SPARK-16500][ML][MLLIB][OPTIMIZER] add LBFGS convergence warning for…
WeichenXu123 Jul 14, 2016
e3f8a03
[SPARK-16403][EXAMPLES] Cleanup to remove unused imports, consistent …
BryanCutler Jul 14, 2016
c4bc2ed
[SPARK-14963][MINOR][YARN] Fix typo in YarnShuffleService recovery fi…
jerryshao Jul 14, 2016
b7b5e17
[SPARK-16505][YARN] Optionally propagate error during shuffle service…
Jul 14, 2016
1b5c9e5
[SPARK-16530][SQL][TRIVIAL] Wrong Parser Keyword in ALTER TABLE CHANG…
gatorsmile Jul 14, 2016
56183b8
[SPARK-16543][SQL] Rename the columns of `SHOW PARTITION/COLUMNS` com…
dongjoon-hyun Jul 14, 2016
093ebbc
[SPARK-16509][SPARKR] Rename window.partitionBy and window.orderBy to…
sun-rui Jul 14, 2016
12005c8
[SPARK-16538][SPARKR] fix R call with namespace operator on SparkSess…
felixcheung Jul 14, 2016
c576f9f
[SPARK-16529][SQL][TEST] `withTempDatabase` should set `default` data…
dongjoon-hyun Jul 14, 2016
31ca741
[SPARK-16528][SQL] Fix NPE problem in HiveClientImpl
jacek-lewandowski Jul 14, 2016
91575ca
[SPARK-16540][YARN][CORE] Avoid adding jars twice for Spark running o…
jerryshao Jul 14, 2016
01c4c1f
[SPARK-16553][DOCS] Fix SQL example file name in docs
shivaram Jul 14, 2016
972673a
[SPARK-16555] Work around Jekyll error-handling bug which led to sile…
JoshRosen Jul 14, 2016
2e4075e
[SPARK-16557][SQL] Remove stale doc in sql/README.md
rxin Jul 15, 2016
1832423
[SPARK-16546][SQL][PYSPARK] update python dataframe.drop
WeichenXu123 Jul 15, 2016
71ad945
[SPARK-16426][MLLIB] Fix bug that caused NaNs in IsotonicRegression
neggert Jul 15, 2016
5ffd5d3
[SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLl…
jkbradley Jul 15, 2016
611a8ca
[SPARK-16538][SPARKR] Add more tests for namespace call to SparkSessi…
felixcheung Jul 15, 2016
b2f24f9
[SPARK-16230][CORE] CoarseGrainedExecutorBackend to self kill if ther…
tejasapatil Jul 15, 2016
a1ffbad
[SPARK-16582][SQL] Explicitly define isNull = false for non-nullable …
sameeragarwal Jul 16, 2016
5ec0d69
[SPARK-3359][DOCS] More changes to resolve javadoc 8 errors that will…
srowen Jul 16, 2016
4167304
[SPARK-16112][SPARKR] Programming guide for gapply/gapplyCollect
Jul 16, 2016
c33e4b0
[SPARK-16507][SPARKR] Add a CRAN checker, fix Rd aliases
shivaram Jul 17, 2016
7b84758
[SPARK-16584][SQL] Move regexp unit tests to RegexpExpressionsSuite
rxin Jul 17, 2016
d27fe9b
[SPARK-16027][SPARKR] Fix R tests SparkSession init/stop
felixcheung Jul 18, 2016
480c870
[SPARK-16588][SQL] Deprecate monotonicallyIncreasingId in Scala/Java
rxin Jul 18, 2016
a529fc9
[MINOR][TYPO] fix fininsh typo
WeichenXu123 Jul 18, 2016
8ea3f4e
[SPARK-16055][SPARKR] warning added while using sparkPackages with sp…
krishnakalyan3 Jul 18, 2016
2877f1a
[SPARK-16351][SQL] Avoid per-record type dispatch in JSON when writing
HyukjinKwon Jul 18, 2016
96e9afa
[SPARK-16515][SQL] set default record reader and writer for script tr…
adrian-wang Jul 18, 2016
75f0efe
[SPARKR][DOCS] minor code sample update in R programming guide
felixcheung Jul 18, 2016
ea78edb
[SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to check generated S…
dongjoon-hyun Jul 19, 2016
c4524f5
[HOTFIX] Fix Scala 2.10 compilation
rxin Jul 19, 2016
69c7730
[SPARK-16615][SQL] Expose sqlContext in SparkSession
rxin Jul 19, 2016
e5fbb18
[MINOR] Remove unused arg in als.py
zhengruifeng Jul 19, 2016
1426a08
[SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update
liancheng Jul 19, 2016
6ee40d2
[DOC] improve python doc for rdd.histogram and dataframe.join
mortada Jul 19, 2016
556a943
[MINOR][BUILD] Fix Java Linter `LineLength` errors
dongjoon-hyun Jul 19, 2016
21a6dd2
[SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant de…
keypointt Jul 19, 2016
6caa220
[MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar
ahmed-mahran Jul 19, 2016
8310c07
[SPARK-16600][MLLIB] fix some latex formula syntax error
WeichenXu123 Jul 19, 2016
6c4b9f4
[SPARK-16395][STREAMING] Fail if too many CheckpointWriteHandlers are…
srowen Jul 19, 2016
5d92326
[SPARK-16478] graphX (added graph caching in strongly connected compo…
Jul 19, 2016
6708914
[SPARK-16494][ML] Upgrade breeze version to 0.12
yanboliang Jul 19, 2016
0bd76e8
[SPARK-16620][CORE] Add back the tokenization process in `RDD.pipe(co…
lw-lin Jul 19, 2016
162d04a
[SPARK-16602][SQL] `Nvl` function should support numeric-string cases
dongjoon-hyun Jul 19, 2016
2ae7b88
[SPARK-15705][SQL] Change the default value of spark.sql.hive.convert…
yhuai Jul 19, 2016
004e29c
[SPARK-14702] Make environment of SparkLauncher launched process more…
Jul 20, 2016
9674af6
[SPARK-16568][SQL][DOCUMENTATION] update sql programming guide refres…
WeichenXu123 Jul 20, 2016
fc23263
[SPARK-10683][SPARK-16510][SPARKR] Move SparkR include jar test to Sp…
shivaram Jul 20, 2016
75146be
[SPARK-16632][SQL] Respect Hive schema when merging parquet schema.
Jul 20, 2016
0dc79ff
[SPARK-16440][MLLIB] Destroy broadcasted variables even on driver
Jul 20, 2016
95abbe5
[SPARK-15923][YARN] Spark Application rest api returns 'no such app: …
weiqingy Jul 20, 2016
4b079dc
[SPARK-16613][CORE] RDD.pipe returns values for empty partitions
srowen Jul 20, 2016
b9bab4d
[SPARK-15951] Change Executors Page to use datatables to support sort…
kishorvpatil Jul 20, 2016
e3cd5b3
[SPARK-16634][SQL] Workaround JVM bug by moving some code out of ctor.
Jul 20, 2016
e651900
[SPARK-16344][SQL] Decoding Parquet array of struct with a single fie…
liancheng Jul 20, 2016
75a06aa
[SPARK-16272][CORE] Allow config values to reference conf, env, syste…
Jul 21, 2016
cfa5ae8
[SPARK-16644][SQL] Aggregate should not propagate constraints contain…
cloud-fan Jul 21, 2016
1bf13ba
[MINOR][DOCS][STREAMING] Minor docfix schema of csv rather than parqu…
holdenk Jul 21, 2016
864b764
[SPARK-16226][SQL] Weaken JDBC isolation level to avoid locking when …
srowen Jul 21, 2016
8674054
[SPARK-16632][SQL] Use Spark requested schema to guide vectorized Par…
liancheng Jul 21, 2016
6203668
[SPARK-16640][SQL] Add codegen for Elt function
viirya Jul 21, 2016
69626ad
[SPARK-16632][SQL] Revert PR #14272: Respect Hive schema when merging…
liancheng Jul 21, 2016
235cb25
[SPARK-16194] Mesos Driver env vars
Jul 21, 2016
9abd99b
[SPARK-16656][SQL] Try to make CreateTableAsSelectSuite more stable
yhuai Jul 21, 2016
46f80a3
[SPARK-16334] Maintain single dictionary per row-batch in vectorized …
sameeragarwal Jul 21, 2016
df2c6d5
[SPARK-16287][SQL] Implement str_to_map SQL function
techaddict Jul 22, 2016
94f14b5
[SPARK-16556][SPARK-16559][SQL] Fix Two Bugs in Bucket Specification
gatorsmile Jul 22, 2016
e1bd70f
[SPARK-16287][HOTFIX][BUILD][SQL] Fix annotation argument needs to be…
jaceklaskowski Jul 22, 2016
2c72a44
[SPARK-16487][STREAMING] Fix some batches might not get marked as ful…
ahmed-mahran Jul 22, 2016
b4e16bd
[GIT] add pydev & Rstudio project file to gitignore list
WeichenXu123 Jul 22, 2016
6c56fff
[SPARK-16650] Improve documentation of spark.task.maxFailures
Jul 22, 2016
47f5b88
[SPARK-16651][PYSPARK][DOC] Make `withColumnRenamed/drop` description…
dongjoon-hyun Jul 22, 2016
e10b874
[SPARK-16622][SQL] Fix NullPointerException when the returned value o…
viirya Jul 23, 2016
25db516
[SPARK-16561][MLLIB] fix multivarOnlineSummary min/max bug
WeichenXu123 Jul 23, 2016
ab6e4ae
[SPARK-16662][PYSPARK][SQL] fix HiveContext warning bug
WeichenXu123 Jul 23, 2016
86c2752
[SPARK-16690][TEST] rename SQLTestUtils.withTempTable to withTempView
cloud-fan Jul 23, 2016
53b2456
[SPARK-16380][EXAMPLES] Update SQL examples and programming guide for…
liancheng Jul 23, 2016
e3c7039
[MINOR] Close old PRs that should be closed but have not been
srowen Jul 24, 2016
d6795c7
[SPARK-16515][SQL][FOLLOW-UP] Fix test `script` on OS X/Windows...
lw-lin Jul 24, 2016
cc1d2dc
[SPARK-16463][SQL] Support `truncate` option in Overwrite mode for JD…
dongjoon-hyun Jul 24, 2016
37bed97
[PYSPARK] add picklable SparseMatrix in pyspark.ml.common
WeichenXu123 Jul 24, 2016
23e047f
[SPARK-16416][CORE] force eager creation of loggers to avoid shutdown…
Jul 24, 2016
1221ce0
[SPARK-16645][SQL] rename CatalogStorageFormat.serdeProperties to pro…
cloud-fan Jul 25, 2016
daace60
[SPARK-5581][CORE] When writing sorted map output file, avoid open / …
bchocho Jul 25, 2016
468a3c3
[SPARK-16699][SQL] Fix performance bug in hash aggregate on long stri…
ooq Jul 25, 2016
68b4020
[SPARK-16648][SQL] Make ignoreNullsExpr a child expression of First a…
liancheng Jul 25, 2016
7ffd99e
[SPARK-16674][SQL] Avoid per-record type dispatch in JDBC when reading
HyukjinKwon Jul 25, 2016
d27d362
[SPARK-16660][SQL] CreateViewCommand should not take CatalogTable
cloud-fan Jul 25, 2016
64529b1
[SPARK-16691][SQL] move BucketSpec to catalyst module and use it in C…
cloud-fan Jul 25, 2016
d6a5217
[SPARK-16668][TEST] Test parquet reader for row groups containing bot…
sameeragarwal Jul 25, 2016
79826f3
[SPARK-16698][SQL] Field names having dots should be allowed for data…
HyukjinKwon Jul 25, 2016
7ea6d28
[SPARK-16703][SQL] Remove extra whitespace in SQL generation for wind…
liancheng Jul 25, 2016
b73defd
[SPARKR][DOCS] fix broken url in doc
felixcheung Jul 25, 2016
ad3708e
[SPARK-16653][ML][OPTIMIZER] update ANN convergence tolerance param d…
WeichenXu123 Jul 25, 2016
dd784a8
[SPARK-16685] Remove audit-release scripts.
rxin Jul 25, 2016
978cd5f
[SPARK-15271][MESOS] Allow force pulling executor docker images
philipphoffmann Jul 25, 2016
3b6e1d0
[SPARK-16485][DOC][ML] Fixed several inline formatting in ml features…
lins05 Jul 25, 2016
fc17121
Revert "[SPARK-15271][MESOS] Allow force pulling executor docker images"
JoshRosen Jul 25, 2016
cda4603
[SQL][DOC] Fix a default name for parquet compression
maropu Jul 25, 2016
f5ea7fe
[SPARK-16166][CORE] Also take off-heap memory usage into consideratio…
jerryshao Jul 25, 2016
12f490b
[SPARK-16715][TESTS] Fix a potential ExprId conflict for Subexpressio…
zsxwing Jul 25, 2016
c979c8b
[SPARK-14131][STREAMING] SQL Improved fix for avoiding potential dead…
tdas Jul 25, 2016
db36e1e
[SPARK-15590][WEBUI] Paginate Job Table in Jobs tab
nblintao Jul 26, 2016
e164a04
[SPARK-16722][TESTS] Fix a StreamingContext leak in StreamingContextS…
zsxwing Jul 26, 2016
3fc4566
[SPARK-16678][SPARK-16677][SQL] Fix two View-related bugs
gatorsmile Jul 26, 2016
ba0aade
Fix description of spark.speculation.quantile
nwbvt Jul 26, 2016
8a8d26f
[SPARK-16672][SQL] SQLBuilder should not raise exceptions on EXISTS q…
dongjoon-hyun Jul 26, 2016
f99e34e
[SPARK-16724] Expose DefinedByConstructorParams
marmbrus Jul 26, 2016
815f3ee
[SPARK-16633][SPARK-16642][SPARK-16721][SQL] Fixes three issues relat…
yhuai Jul 26, 2016
7b06a89
[SPARK-16686][SQL] Remove PushProjectThroughSample since it is handle…
viirya Jul 26, 2016
6959061
[SPARK-16706][SQL] support java map in encoder
cloud-fan Jul 26, 2016
03c2743
[TEST][STREAMING] Fix flaky Kafka rate controlling test
tdas Jul 26, 2016
3b2b785
[SPARK-16675][SQL] Avoid per-record type dispatch in JDBC when writing
HyukjinKwon Jul 26, 2016
4c96955
[SPARK-16697][ML][MLLIB] improve LDA submitMiniBatch method to avoid …
WeichenXu123 Jul 26, 2016
a2abb58
[SPARK-16663][SQL] desc table should be consistent between data sourc…
cloud-fan Jul 26, 2016
0869b3a
[SPARK-15271][MESOS] Allow force pulling executor docker images
philipphoffmann Jul 26, 2016
0b71d9a
[SPARK-15703][SCHEDULER][CORE][WEBUI] Make ListenerBus event queue si…
dhruve Jul 26, 2016
738b4cc
[SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGenerator
ooq Jul 27, 2016
5b8e848
[SPARK-16621][SQL] Generate stable SQLs in SQLBuilder
dongjoon-hyun Jul 27, 2016
ef0ccbc
[SPARK-16729][SQL] Throw analysis exception for invalid date casts
petermaxlee Jul 27, 2016
3c3371b
[MINOR][ML] Fix some mistake in LinearRegression formula.
yanboliang Jul 27, 2016
045fc36
[MINOR][DOC][SQL] Fix two documents regarding size in bytes
viirya Jul 27, 2016
7e8279f
[SPARK-15254][DOC] Improve ML pipeline Cross Validation Scaladoc & PyDoc
krishnakalyan3 Jul 27, 2016
70f846a
[SPARK-5847][CORE] Allow for configuring MetricsSystem's use of app I…
markgrover Jul 27, 2016
bc4851a
[MINOR][DOC] missing keyword new
Jul 27, 2016
b14d7b5
[SPARK-16110][YARN][PYSPARK] Fix allowing python version to be specif…
KevinGrealish Jul 27, 2016
11d427c
[SPARK-16730][SQL] Implement function aliases for type casts
petermaxlee Jul 28, 2016
5c2ae79
[SPARK-15232][SQL] Add subquery SQL building tests to LogicalPlanToSQ…
dongjoon-hyun Jul 28, 2016
762366f
[SPARK-16552][SQL] Store the Inferred Schemas into External Catalog T…
gatorsmile Jul 28, 2016
9ade77c
[SPARK-16639][SQL] The query with having condition that contains grou…
viirya Jul 28, 2016
1178d61
[SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap
sylvinus Jul 28, 2016
3fd39b8
[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on O…
sameeragarwal Jul 28, 2016
274f3b9
[SPARK-16772] Correct API doc references to PySpark classes + formatt…
nchammas Jul 28, 2016
d1d5069
[SPARK-16664][SQL] Fix persist call on Data frames with more than 200…
Jul 29, 2016
0557a45
[SPARK-16750][ML] Fix GaussianMixture training failed due to feature …
yanboliang Jul 29, 2016
04a2c07
[SPARK-16751] Upgrade derby to 10.12.1.1
a-roberts Jul 29, 2016
266b92f
[SPARK-16637] Unified containerizer
Jul 29, 2016
2c15323
[SPARK-16761][DOC][ML] Fix doc link in docs/ml-guide.md
sundapeng Jul 29, 2016
2182e43
[SPARK-16772][PYTHON][DOCS] Restore "datatype string" to Python API d…
nchammas Jul 29, 2016
bbc2475
[SPARK-16748][SQL] SparkExceptions during planning should not wrapped…
tdas Jul 30, 2016
0dc4310
[SPARK-16694][CORE] Use for/foreach rather than map for Unit expressi…
srowen Jul 30, 2016
bce354c
[SPARK-16696][ML][MLLIB] destroy KMeans bcNewCenters when loop finish…
WeichenXu123 Jul 30, 2016
a6290e5
[SPARK-16800][EXAMPLES][ML] Fix Java examples that fail to run due to…
BryanCutler Jul 30, 2016
957a8ab
[SPARK-16818] Exchange reuse incorrectly reuses scans over different …
ericl Jul 31, 2016
7c27d07
[SPARK-16812] Open up SparkILoop.getAddedJars
rxin Jul 31, 2016
064d91f
[SPARK-16813][SQL] Remove private[sql] and private[spark] from cataly…
rxin Jul 31, 2016
301fb0d
[SPARK-16731][SQL] use StructType in CatalogTable and remove CatalogC…
cloud-fan Aug 1, 2016
579fbcf
[SPARK-16805][SQL] Log timezone when query result does not match
rxin Aug 1, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
.idea/
.idea_modules/
.project
.pydevproject
.scala_dependencies
.settings
/lib/
Expand Down Expand Up @@ -77,3 +78,7 @@ spark-warehouse/
# For R session data
.RData
.RHistory
.Rhistory
*.Rproj
*.Rproj.*

3 changes: 2 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)
(The BSD License) xmlenc Library (xmlenc:xmlenc:0.52 - http://xmlenc.sourceforge.net)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.9.2 - http://py4j.sourceforge.net/)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.1 - http://py4j.sourceforge.net/)
(Two-clause BSD-style license) JUnit-Interface (com.novocode:junit-interface:0.10 - http://github.com/szeiger/junit-interface/)
(BSD licence) sbt and sbt-launch-lib.bash
(BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE)
Expand Down Expand Up @@ -296,3 +296,4 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(MIT License) blockUI (http://jquery.malsup.com/block/)
(MIT License) RowsGroup (http://datatables.net/license/mit)
(MIT License) jsonFormatter (http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html)
(MIT License) modernizr (https://github.com/Modernizr/Modernizr/blob/master/LICENSE)
2 changes: 1 addition & 1 deletion NOTICE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Apache Spark
Copyright 2014 The Apache Software Foundation.
Copyright 2014 and onwards The Apache Software Foundation.

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
Expand Down
12 changes: 6 additions & 6 deletions R/DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# SparkR Documentation

SparkR documentation is generated using in-source comments annotated using using
`roxygen2`. After making changes to the documentation, to generate man pages,
SparkR documentation is generated by using in-source comments and annotated by using
[`roxygen2`](https://cran.r-project.org/web/packages/roxygen2/index.html). After making changes to the documentation and generating man pages,
you can run the following from an R console in the SparkR home directory

library(devtools)
devtools::document(pkg="./pkg", roclets=c("rd"))

```R
library(devtools)
devtools::document(pkg="./pkg", roclets=c("rd"))
```
You can verify if your changes are good by running

R CMD check pkg/
32 changes: 18 additions & 14 deletions R/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# R on Spark

SparkR is an R package that provides a light-weight frontend to use Spark from R.

### Installing sparkR

Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
Example:
```
```bash
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
export R_HOME=/home/username/R
./install-dev.sh
Expand All @@ -17,8 +18,9 @@ export R_HOME=/home/username/R
#### Build Spark

Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
```
build/mvn -DskipTests -Psparkr package

```bash
build/mvn -DskipTests -Psparkr package
```

#### Running sparkR
Expand All @@ -37,8 +39,8 @@ To set other options like driver memory, executor memory etc. you can pass in th

#### Using SparkR from RStudio

If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
```
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
```R
# Set this to where Spark is installed
Sys.setenv(SPARK_HOME="/Users/username/spark")
# This line loads SparkR from the installed directory
Expand All @@ -55,23 +57,25 @@ Once you have made your changes, please include unit tests for them and run exis

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
To run one of them, use `./bin/spark-submit <filename> <args>`. For example:

./bin/spark-submit examples/src/main/r/dataframe.R

You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):

R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
./R/run-tests.sh
```bash
./bin/spark-submit examples/src/main/r/dataframe.R
```
You can also run the unit tests for SparkR by running. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:
```bash
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
./R/run-tests.sh
```

### Running on YARN

The `./bin/spark-submit` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
```
```bash
export YARN_CONF_DIR=/etc/hadoop/conf
./bin/spark-submit --master yarn examples/src/main/r/dataframe.R
```
20 changes: 20 additions & 0 deletions R/WINDOWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,23 @@ include Rtools and R in `PATH`.
directory in Maven in `PATH`.
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`

## Unit tests

To run the SparkR unit tests on Windows, the following steps are required —assuming you are in the Spark root directory and do not have Apache Hadoop installed already:

1. Create a folder to download Hadoop related files for Windows. For example, `cd ..` and `mkdir hadoop`.

2. Download the relevant Hadoop bin package from [steveloughran/winutils](https://github.com/steveloughran/winutils). While these are not official ASF artifacts, they are built from the ASF release git hashes by a Hadoop PMC member on a dedicated Windows VM. For further reading, consult [Windows Problems on the Hadoop wiki](https://wiki.apache.org/hadoop/WindowsProblems).

3. Install the files into `hadoop\bin`; make sure that `winutils.exe` and `hadoop.dll` are present.

4. Set the environment variable `HADOOP_HOME` to the full path to the newly created `hadoop` directory.

5. Run unit tests for SparkR by running the command below. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:

```
R -e "install.packages('testthat', repos='http://cran.us.r-project.org')"
.\bin\spark-submit2.cmd --conf spark.hadoop.fs.default.name="file:///" R\pkg\tests\run-all.R
```

52 changes: 52 additions & 0 deletions R/check-cran.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#!/bin/bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

set -o pipefail
set -e

FWDIR="$(cd `dirname $0`; pwd)"
pushd $FWDIR > /dev/null

if [ ! -z "$R_HOME" ]
then
R_SCRIPT_PATH="$R_HOME/bin"
else
# if system wide R_HOME is not found, then exit
if [ ! `command -v R` ]; then
echo "Cannot find 'R_HOME'. Please specify 'R_HOME' or make sure R is properly installed."
exit 1
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"

# Build the latest docs
$FWDIR/create-docs.sh

# Build a zip file containing the source package
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg

# Run check as-cran.
# TODO(shivaram): Remove the skip tests once we figure out the install mechanism

VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`

"$R_SCRIPT_PATH/"R CMD check --as-cran --no-tests SparkR_"$VERSION".tar.gz

popd > /dev/null
7 changes: 6 additions & 1 deletion R/install-dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,12 @@ pushd $FWDIR > /dev/null
if [ ! -z "$R_HOME" ]
then
R_SCRIPT_PATH="$R_HOME/bin"
else
else
# if system wide R_HOME is not found, then exit
if [ ! `command -v R` ]; then
echo "Cannot find 'R_HOME'. Please specify 'R_HOME' or make sure R is properly installed."
exit 1
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
Expand Down
5 changes: 5 additions & 0 deletions R/pkg/.Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
^.*\.Rproj$
^\.Rproj\.user$
^\.lintr$
^src-native$
^html$
8 changes: 3 additions & 5 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,20 +1,18 @@
Package: SparkR
Type: Package
Title: R frontend for Spark
Title: R Frontend for Apache Spark
Version: 2.0.0
Date: 2013-09-09
Date: 2016-07-07
Author: The Apache Software Foundation
Maintainer: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Imports:
methods
Depends:
R (>= 3.0),
methods,
Suggests:
testthat,
e1071,
survival
Description: R frontend for Spark
Description: The SparkR package provides an R frontend for Apache Spark.
License: Apache License (== 2.0)
Collate:
'schema.R'
Expand Down
36 changes: 31 additions & 5 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,16 @@ importFrom(methods, setGeneric, setMethod, setOldClass)
#useDynLib(SparkR, stringHashCode)

# S3 methods exported
export("sparkR.session")
export("sparkR.init")
export("sparkR.stop")
export("sparkR.session.stop")
export("sparkR.conf")
export("print.jobj")

export("sparkRSQL.init",
"sparkRHive.init")

# MLlib integration
exportMethods("glm",
"spark.glm",
Expand Down Expand Up @@ -45,6 +51,7 @@ exportMethods("arrange",
"corr",
"covar_samp",
"covar_pop",
"createOrReplaceTempView",
"crosstab",
"dapply",
"dapplyCollect",
Expand All @@ -61,6 +68,8 @@ exportMethods("arrange",
"filter",
"first",
"freqItems",
"gapply",
"gapplyCollect",
"group_by",
"groupBy",
"head",
Expand All @@ -79,6 +88,7 @@ exportMethods("arrange",
"orderBy",
"persist",
"printSchema",
"randomSplit",
"rbind",
"registerTempTable",
"rename",
Expand All @@ -99,6 +109,7 @@ exportMethods("arrange",
"summary",
"take",
"transform",
"union",
"unionAll",
"unique",
"unpersist",
Expand All @@ -109,6 +120,7 @@ exportMethods("arrange",
"write.df",
"write.jdbc",
"write.json",
"write.orc",
"write.parquet",
"write.text",
"write.ml")
Expand Down Expand Up @@ -185,6 +197,8 @@ exportMethods("%in%",
"isNaN",
"isNotNull",
"isNull",
"is.nan",
"isnan",
"kurtosis",
"lag",
"last",
Expand All @@ -208,6 +222,7 @@ exportMethods("%in%",
"mean",
"min",
"minute",
"monotonically_increasing_id",
"month",
"months_between",
"n",
Expand All @@ -220,6 +235,7 @@ exportMethods("%in%",
"over",
"percent_rank",
"pmod",
"posexplode",
"quarter",
"rand",
"randn",
Expand Down Expand Up @@ -248,6 +264,7 @@ exportMethods("%in%",
"skewness",
"sort_array",
"soundex",
"spark_partition_id",
"stddev",
"stddev_pop",
"stddev_samp",
Expand Down Expand Up @@ -281,22 +298,22 @@ exportMethods("%in%",

exportClasses("GroupedData")
exportMethods("agg")

export("sparkRSQL.init",
"sparkRHive.init")
exportMethods("pivot")

export("as.DataFrame",
"cacheTable",
"clearCache",
"createDataFrame",
"createExternalTable",
"dropTempTable",
"dropTempView",
"jsonFile",
"loadDF",
"parquetFile",
"read.df",
"read.jdbc",
"read.json",
"read.orc",
"read.parquet",
"read.text",
"spark.lapply",
Expand Down Expand Up @@ -324,5 +341,14 @@ export("partitionBy",
"rowsBetween",
"rangeBetween")

export("window.partitionBy",
"window.orderBy")
export("windowPartitionBy",
"windowOrderBy")

S3method(print, jobj)
S3method(print, structField)
S3method(print, structType)
S3method(print, summary.GeneralizedLinearRegressionModel)
S3method(structField, character)
S3method(structField, jobj)
S3method(structType, jobj)
S3method(structType, structField)
Loading