Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
743 commits
Select commit Hold shift + click to select a range
647c686
SPARK-2757 [BUILD] [STREAMING] Add Mima test for Spark Sink after 1.1…
srowen Jan 1, 2015
9520a3b
[SPARK-5038] Add explicit return type for implicit functions.
rxin Jan 1, 2015
1e06a41
[HOTFIX] Bind web UI to ephemeral port in DriverSuite
JoshRosen Jan 1, 2015
53722b9
[SPARK-3325][Streaming] Add a parameter to the method print in class …
watermen Jan 2, 2015
cd4e844
Fixed typos in streaming-kafka-integration.md
Jan 2, 2015
2db89cf
[SPARK-5058] Updated broken links
sigmoidanalytics Jan 4, 2015
37a5415
[SPARK-794][Core] Remove sleep() in ClusterScheduler.stop
Jan 4, 2015
6cf67a7
[SPARK-4787] Stop SparkContext if a DAGScheduler init error occurs
tigerquoll Jan 4, 2015
37d7d5c
[SPARK-4631] unit test for MQTT
Jan 5, 2015
3fc9497
[SPARK-4835] Disable validateOutputSpecs for Spark Streaming jobs
JoshRosen Jan 5, 2015
3dbf4f2
[SPARK-5067][Core] Use '===' to compare well-defined case class
zsxwing Jan 5, 2015
5583c3b
[SPARK-5069][Core] Fix the race condition of TaskSchedulerImpl.dagSch…
zsxwing Jan 5, 2015
ed6dc94
[SPARK-5083][Core] Fix a flaky test in TaskResultGetterSuite
zsxwing Jan 5, 2015
136141c
[SPARK-5074][Core] Fix a non-deterministic test failure
zsxwing Jan 5, 2015
2bcf38f
[SPARK-4688] Have a single shared network timeout in Spark
varunsaxena Jan 5, 2015
618d9d5
[SPARK-5057] Log message in failed askWithReply attempts
WangTaoTheTonic Jan 5, 2015
a81b624
[SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos enviro…
jongyoul Jan 5, 2015
dda9d6f
[SPARK-5089][PYSPARK][MLLIB] Fix vector convert
freeman-lab Jan 5, 2015
2ce91a3
[SPARK-5093] Set spark.network.timeout to 120s consistently.
rxin Jan 5, 2015
dbd12d5
[SPARK-5040][SQL] Support expressing unresolved attributes using $"at…
rxin Jan 5, 2015
6ef3660
SPARK-4843 [YARN] Squash ExecutorRunnableUtil and ExecutorRunnable
Jan 6, 2015
a305931
[SPARK-1600] Refactor FileInputStream tests to remove Thread.sleep() …
JoshRosen Jan 6, 2015
8373706
[Minor] Fix comments for GraphX 2D partitioning strategy
Jan 6, 2015
b491f13
SPARK-4159 [CORE] Maven build doesn't run JUnit test suites
srowen Jan 6, 2015
8c1a588
SPARK-5017 [MLlib] - Use SVD to compute determinant and inverse of co…
tgaloppo Jan 6, 2015
e8f7484
[SPARK-5050][Mllib] Add unit test for sqdist
viirya Jan 6, 2015
977cc31
[SPARK-5099][Mllib] Simplify logistic loss function
viirya Jan 7, 2015
718750c
[YARN][SPARK-4929] Bug fix: fix the yarn-client code to support HA
SaintBacchus Jan 7, 2015
84182f0
[SPARK-2165][YARN]add support for setting maxAppAttempts in the Appli…
WangTaoTheTonic Jan 7, 2015
0e703eb
[SPARK-2458] Make failed application log visible on History Server
tsudukim Jan 7, 2015
d1e87b3
[SPARK-5128][MLLib] Add common used log1pExp API in MLUtils
Jan 7, 2015
60fde12
[SPARK-5132][Core]Correct stage Attempt Id key in stageInfofromJson
suyanNone Jan 7, 2015
65c9e10
[SPARK-5126][Core] Verify Spark urls before creating Actors so that i…
zsxwing Jan 8, 2015
536b82f
[SPARK-5116][MLlib] Add extractor for SparseVector and DenseVector
coderxiang Jan 8, 2015
0114e81
SPARK-5087. [YARN] Merge yarn.Client and yarn.ClientBase
sryza Jan 8, 2015
46dca8c
[SPARK-4917] Add a function to convert into a graph with canonical ed…
maropu Jan 8, 2015
60b9227
[SPARK-4989][CORE] avoid wrong eventlog conf cause cluster down in st…
liyezhang556520 Jan 8, 2015
a9940b5
[Minor] Fix the value represented by spark.executor.id for consistency.
sarutak Jan 8, 2015
b4fb97d
[SPARK-5130][Deploy]Take yarn-cluster as cluster mode in spark-submit
WangTaoTheTonic Jan 8, 2015
31d6715
Document that groupByKey will OOM for large keys
Jan 8, 2015
854319e
SPARK-5148 [MLlib] Make usersOut/productsOut storagelevel in ALS conf…
zeitos Jan 8, 2015
d9cad94
[SPARK-4973][CORE] Local directory in the driver of client-mode conti…
sarutak Jan 8, 2015
b14068b
[SPARK-4891][PySpark][MLlib] Add gamma/log normal/exp dist sampling t…
rnowling Jan 8, 2015
5a1b7a9
[SPARK-4048] Enhance and extend hadoop-provided profile.
Jan 9, 2015
013e031
[SPARK-5122] Remove Shark from spark-ec2
nchammas Jan 9, 2015
8a95a3e
[SPARK-5169][YARN]fetch the correct max attempts
WangTaoTheTonic Jan 9, 2015
82f1259
[Minor] Fix test RetryingBlockFetcherSuite after changed config name
aarondav Jan 9, 2015
2f2b837
SPARK-5136 [DOCS] Improve documentation around setting up Spark Intel…
srowen Jan 9, 2015
37fea2d
HOTFIX: Minor improvements to make-distribution.sh
pwendell Jan 9, 2015
0a3aa5f
[SPARK-1143] Separate pool tests into their own suite.
kayousterhout Jan 9, 2015
d2a450c
[SPARK-5145][Mllib] Add BLAS.dsyr and use it in GaussianMixtureEM
viirya Jan 9, 2015
831a0d2
[SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688
jongyoul Jan 9, 2015
40d8a94
[SPARK-5015] [mllib] Random seed for GMM + make test suite deterministic
jkbradley Jan 9, 2015
7884948
[SPARK-1953][YARN]yarn client mode Application Master memory size is …
WangTaoTheTonic Jan 9, 2015
a4f1946
[SPARK-4737] Task set manager properly handles serialization errors
mccheah Jan 9, 2015
30f7f17
[DOC] Fixed Mesos version in doc from 0.18.1 to 0.21.0
sarutak Jan 9, 2015
a675d98
[Minor] Fix import order and other coding style
Jan 9, 2015
37a27b4
[SPARK-4990][Deploy]to find default properties file, search SPARK_CON…
WangTaoTheTonic Jan 10, 2015
0a9c325
[SPARK-4406] [MLib] FIX: Validate k in SVD
MechCoder Jan 10, 2015
29534b6
[SPARK-5141][SQL]CaseInsensitiveMap throws java.io.NotSerializableExc…
luogankun Jan 10, 2015
5d2bb0f
[SPARK-4925][SQL] Publish Spark SQL hive-thriftserver maven artifact
alexoss68 Jan 10, 2015
cf5686b
[SPARK-4943][SQL] Allow table name having dot for db/catalog
alexoss68 Jan 10, 2015
37a7955
[SPARK-4574][SQL] Adding support for defining schema in foreign DDL c…
scwf Jan 10, 2015
94b489f
[SPARK-4861][SQL] Refactory command in spark sql
scwf Jan 10, 2015
447f643
SPARK-4963 [SQL] Add copy to SQL's Sample operator
Jan 10, 2015
63729e1
[SPARK-5187][SQL] Fix caching of tables with HiveUDFs in the WHERE cl…
marmbrus Jan 10, 2015
6687ee8
[SPARK-4692] [SQL] Support ! boolean logic operator like NOT
YanTangZhai Jan 10, 2015
dbbd5f5
[SPARK-5181] do not print writing WAL log when WAL is disabled
CodingCat Jan 10, 2015
04da703
[Minor]Resolve sbt warnings during build (MQTTStreamSuite.scala).
witgo Jan 10, 2015
c9b4a7d
[SPARK-4871][SQL] Show sql statement in spark ui when run sql with sp…
scwf Jan 11, 2015
afaab62
[SPARK-5029][SQL] Enable from follow multiple brackets
scwf Jan 11, 2015
865988e
[SPARK-5032] [graphx] Remove GraphX MIMA exclude for 1.3
jkbradley Jan 11, 2015
c9f4166
[SPARK-5073] spark.storage.memoryMapThreshold have two default value
Lewuathe Jan 11, 2015
18823b9
[SPARK-4951][Core] Fix the issue that a busy executor may be killed
zsxwing Jan 12, 2015
29f6893
[SPARK-4033][Examples]Input of the SparkPi too big causes the emptio…
SaintBacchus Jan 12, 2015
f2f69fc
SPARK-5018 [MLlib] [WIP] Make MultivariateGaussian public
tgaloppo Jan 12, 2015
483114c
[SPARK-5200] Disable web UI in Hive ThriftServer tests
JoshRosen Jan 12, 2015
b0b8dd8
[SPARK-5102][Core]subclass of MapStatus needs to be registered with Kryo
lianhuiwang Jan 12, 2015
bb65b1b
SPARK-4159 [BUILD] Addendum: improve running of single test after ena…
srowen Jan 12, 2015
5b35b48
[SPARK-5078] Optionally read from SPARK_LOCAL_HOSTNAME
marmbrus Jan 12, 2015
c8723ed
SPARK-5172 [BUILD] spark-examples-***.jar shades a wrong Hadoop distr…
srowen Jan 12, 2015
a9a7910
[SPARK-4999][Streaming] Change storeInBlockManager to false by default
jerryshao Jan 12, 2015
22251f0
[SPARK-5049][SQL] Fix ordering of partition columns in ParquetTableScan
marmbrus Jan 12, 2015
1f45125
[SPARK-5138][SQL] Ensure schema can be inferred from a namedtuple
mulby Jan 13, 2015
6ddb11c
[SPARK-5006][Deploy]spark.port.maxRetries doesn't work
WangTaoTheTonic Jan 13, 2015
0b62aef
[SPARK-4697][YARN]System properties should override environment varia…
WangTaoTheTonic Jan 13, 2015
6b269b0
[SPARK-5131][Streaming][DOC]: There is a discrepancy in WAL implement…
uncleGen Jan 13, 2015
b883fc6
[SPARK-5223] [MLlib] [PySpark] fix MapConverter and ListConverter in …
Jan 13, 2015
b4130e8
[SPARK-4912][SQL] Persistent tables for the Spark SQL data sources api
yhuai Jan 13, 2015
adaee02
[SPARK-5168] Make SQLConf a field rather than mixin in SQLContext
rxin Jan 13, 2015
e804588
[SPARK-5123][SQL] Reconcile Java/Scala API for data types.
rxin Jan 14, 2015
d860ab4
[SPARK-5167][SQL] Move Row into sql package and make it usable for Java.
rxin Jan 14, 2015
f92e15e
[SPARK-5248] [SQL] move sql.types.decimal.Decimal to sql.types.Decimal
adrian-wang Jan 14, 2015
85a7b47
[SPARK-5211][SQL]Restore HiveMetastoreTypes.toDataType
yhuai Jan 14, 2015
5ef37e9
[SQL] some comments fix for GROUPING SETS
adrian-wang Jan 14, 2015
afaa960
[SPARK-2909] [MLlib] [PySpark] SparseVector in pyspark now supports i…
MechCoder Jan 14, 2015
3ebb743
[SPARK-5228][WebUI] Hide tables for "Active Jobs/Completed Jobs/Faile…
sarutak Jan 14, 2015
7d972df
[SPARK-4014] Add TaskContext.attemptNumber and deprecate TaskContext.…
JoshRosen Jan 14, 2015
4aeaa2b
[SPARK-5235] Make SQLConf Serializable
alexbaretta Jan 14, 2015
3d7ff1f
[SPARK-5234][ml]examples for ml don't have sparkContext.stop
Jan 14, 2015
efe78ab
[SPARK-5254][MLLIB] Update the user guide to position spark.ml better
mengxr Jan 15, 2015
2a5076d
[SPARK-5193][SQL] Tighten up SQLContext API
rxin Jan 15, 2015
721abc3
[SPARK-5254][MLLIB] remove developers section from spark.ml guide
mengxr Jan 15, 2015
e95fb7a
[SPARK-5193][SQL] Tighten up HiveContext API
rxin Jan 15, 2015
ab895e1
[SPARK-5224] [PySpark] improve performance of parallelize list/ndarray
Jan 15, 2015
f08c4f3
[SPARK-5274][SQL] Reconcile Java and Scala UDFRegistration.
rxin Jan 16, 2015
2340976
[Minor] Fix tiny typo in BlockManager
sarutak Jan 16, 2015
a88bb28
[SPARK-4857] [CORE] Adds Executor membership events to SparkListener
Jan 16, 2015
ce15195
[SPARK-4092] [CORE] Fix InputMetrics for coalesce'd Rdds
Jan 16, 2015
eac9c7c
[SPARK-1507][YARN]specify # cores for ApplicationMaster
WangTaoTheTonic Jan 16, 2015
0eae5ec
[SPARK-5201][CORE] deal with int overflow in the ParallelCollectionRD…
advancedxy Jan 16, 2015
51a4e1e
[DOCS] Fix typo in return type of cogroup
srowen Jan 16, 2015
98840b6
[SPARK-5231][WebUI] History Server shows wrong job submission time.
sarutak Jan 16, 2015
e130d82
[WebUI] Fix collapse of WebUI layout
sarutak Jan 16, 2015
b38ecc8
[SPARK-4923][REPL] Add Developer API to REPL to allow re-publishing t…
Jan 16, 2015
be38374
[SPARK-733] Add documentation on use of accumulators in lazy transfor…
Jan 16, 2015
55ade8d
[SPARK-4937][SQL] Adding optimization to simplify the And, Or condit…
scwf Jan 16, 2015
29e3f73
[SPARK-5193][SQL] Remove Spark SQL Java-specific API.
rxin Jan 17, 2015
5329804
[SQL][minor] Improved Row documentation.
rxin Jan 17, 2015
02d799e
[SPARK-4937][SQL] Comment for the newly optimization rules in `Boolea…
scwf Jan 17, 2015
fb918f6
[SPARK-5096] Use sbt tasks instead of vals to get hadoop version
marmbrus Jan 18, 2015
f4f646d
[SQL][Minor] Added comments and examples to explain BooleanSimplifica…
rxin Jan 18, 2015
c229be5
[HOTFIX]: Minor clean up regarding skipped artifacts in build files.
pwendell Jan 18, 2015
e56d0a3
[SPARK-5279][SQL] Use java.math.BigDecimal as the exposed Decimal type.
rxin Jan 18, 2015
cfe78ce
[SQL][Minor] Update sql doc according to data type APIs changes
scwf Jan 18, 2015
b2ee0a3
[SQL][minor] Put DataTypes.java in java dir.
rxin Jan 19, 2015
3b640c9
[SQL] fix typo in class description
Jan 19, 2015
fdf83c5
SPARK-5217 Spark UI should report pending stages during job execution…
ScrapCodes Jan 19, 2015
186d044
[SPARK-3288] All fields in TaskMetrics should be private and use gett…
Jan 19, 2015
51eed7e
[SPARK-5088] Use spark-class for running executors directly
jongyoul Jan 19, 2015
0d2eca2
[SPARK-5282][mllib]: RowMatrix easily gets int overflow in the memory…
hhbyyh Jan 19, 2015
13be6ee
[SPARK-5284][SQL] Insert into Hive throws NPE when a inner complex ty…
yhuai Jan 19, 2015
a9ed74c
[SPARK-5286][SQL] Fail to drop an invalid table when using the data s…
yhuai Jan 19, 2015
c5fca46
[SPARK-4504][Examples] fix run-example failure if multiple assembly j…
gvramana Jan 19, 2015
7dc4feb
[SPARK-5214][Core] Add EventLoop and change DAGScheduler to an EventLoop
zsxwing Jan 20, 2015
cfdab68
SPARK-5270 [CORE] Provide isEmpty() function in RDD API
srowen Jan 20, 2015
80066e1
[SQL][minor] Add a log4j file for catalyst test.
rxin Jan 20, 2015
60ca928
[SPARK-4803] [streaming] Remove duplicate RegisterReceiver message
ilayaperumalg Jan 20, 2015
3e1b1de
[SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException
jongyoul Jan 20, 2015
ec4c11a
[SQL][Minor] Refactors deeply nested FP style code in BooleanSimplifi…
liancheng Jan 20, 2015
5ed73b8
SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3…
jacek-lewandowski Jan 20, 2015
bc22142
[SPARK-5329][WebUI] UIWorkloadGenerator should stop SparkContext.
sarutak Jan 20, 2015
97b4e88
SPARK-5019 [MLlib] - GaussianMixtureModel exposes instances of Multiv…
tgaloppo Jan 20, 2015
d38ac49
[SPARK-5287][SQL] Add defaultSizeOf to every data type.
yhuai Jan 20, 2015
a3f50b7
[SPARK-5323][SQL] Remove Row's Seq inheritance.
rxin Jan 20, 2015
d65133a
[SPARK-5186] [MLLIB] Vector.equals and Vector.hashCode are very inef…
hhbyyh Jan 20, 2015
ae39101
[SPARK-5294][WebUI] Hide tables in AllStagePages for "Active Stages, …
sarutak Jan 21, 2015
3613b8c
[SPARK-5275] [Streaming] include python source code
Jan 21, 2015
4d94974
[HOTFIX] Update pom.xml to pull MapR's Hadoop version 2.4.1.
rkannan82 Jan 21, 2015
241fdf2
[SPARK-5297][Streaming] Fix Java file stream type erasure problem
jerryshao Jan 21, 2015
26c5be7
[SPARK-5336][YARN]spark.executor.cores must not be less than spark.ta…
WangTaoTheTonic Jan 21, 2015
9b0f51d
SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in Ya…
sryza Jan 21, 2015
ede0680
[MLlib] [SPARK-5301] Missing conversions and operations on IndexedRow…
Jan 21, 2015
2ef2e21
[SPARK-4749] [mllib]: Allow initializing KMeans clusters using a seed
str-janus Jan 21, 2015
5b72d30
[SPARK-5064][GraphX] Add numEdges upperbound validation for R-MAT gra…
Jan 21, 2015
7f04f9a
[SPARK-5244] [SQL] add coalesce() in sql parser
adrian-wang Jan 21, 2015
5e6aa8a
[SPARK-5009] [SQL] Long keyword support in SQL Parsers
chenghao-intel Jan 21, 2015
294ed8f
Revert "[SPARK-5244] [SQL] add coalesce() in sql parser"
JoshRosen Jan 21, 2015
979fe78
[SQL] [Minor] Remove deprecated parquet tests
liancheng Jan 21, 2015
b2cca4c
[SPARK-4984][CORE][WEBUI] Adding a pop-up containing the full job des…
scwf Jan 21, 2015
943055b
[SPARK-5355] make SparkConf thread-safe
Jan 22, 2015
c2179b0
[SPARK-5202] [SQL] Add hql variable substitution support
chenghao-intel Jan 22, 2015
08547ac
[SPARK-3424][MLLIB] cache point distances during k-means|| init
mengxr Jan 22, 2015
95f4a8b
[SPARK-5317]Set BoostingStrategy.defaultParams With Enumeration Algo.…
Peishen-Jia Jan 22, 2015
5b5a1ef
[SPARK-5147][Streaming] Delete the received data WAL log periodically
tdas Jan 22, 2015
2524228
[SPARK-5365][MLlib] Refactor KMeans to reduce redundant data
viirya Jan 22, 2015
9598ede
SPARK-5370. [YARN] Remove some unnecessary synchronization in YarnAll…
sryza Jan 22, 2015
917b7ae
[SPARK-5233][Streaming] Fix error replaying of WAL introduced bug
jerryshao Jan 23, 2015
24f2bdc
[SPARK-5315][Streaming] Fix reduceByWindow Java API not work bug
jerryshao Jan 23, 2015
812e345
[SPARK-3541][MLLIB] New ALS implementation with improved storage
mengxr Jan 23, 2015
7781750
[SPARK-5063] More helpful error messages for several invalid operations
JoshRosen Jan 24, 2015
9a057b7
[SPARK-5351][GraphX] Do not use Partitioner.defaultPartitioner as a p…
maropu Jan 24, 2015
c759171
[SPARK-5058] Part 2. Typos and broken URL
jongyoul Jan 24, 2015
49889ea
[SPARK-5214][Test] Add a test to demonstrate EventLoop can be stopped…
zsxwing Jan 24, 2015
1952c91
Add comment about defaultMinPartitions
idanz Jan 25, 2015
6a2ba7d
Update the statsd parser to expect these new signs in the format, whi…
airhorns Jan 25, 2015
dfa8c42
Added support for HA namenode
angelini Feb 4, 2015
c6ac972
Merge pull request #48 from Shopify/support_ha_nn
angelini Feb 4, 2015
a970f6e
Merge pull request #47 from Shopify/orens_problematic_bump
orenmazor Feb 9, 2015
03aa82b
Revert "Merge pull request #47 from Shopify/orens_problematic_bump"
orenmazor Feb 9, 2015
0d30b5f
use release path for uploading jar, not current
orenmazor Feb 9, 2015
aba41d2
Merge pull request #49 from Shopify/use_releases_path
orenmazor Feb 9, 2015
f37de6c
Revert "Revert "Merge pull request #47 from Shopify/orens_problematic…
orenmazor Feb 9, 2015
b089305
Revert "Revert "Revert "Merge pull request #47 from Shopify/orens_pro…
orenmazor Feb 9, 2015
133bfed
build local spark if one does not already exist
orenmazor Feb 10, 2015
d6027fb
make the modified binary path match the existing schema for production
orenmazor Feb 10, 2015
15652c8
DRY up the local sha value
orenmazor Feb 10, 2015
558d3a1
Merge pull request #51 from Shopify/fix_test_spark_jar
orenmazor Feb 12, 2015
bd71ee3
Keep more INFO level spark logs around
airhorns Feb 20, 2015
2a9d540
Revert "Merge pull request #23 from Shopify/profile-refactor"
airhorns Feb 24, 2015
8095336
Merge remote-tracking branch 'apache/master'
airhorns Feb 24, 2015
48711b3
Update the spark-defaults.conf to be closer to Starscream
airhorns Feb 24, 2015
8cbf07a
deploy to reportify-etl4.chi
snormore Feb 24, 2015
fad7111
Fix CI detection in script/setup
airhorns Feb 24, 2015
b28f99b
Merge pull request #53 from Shopify/deploy-reportify-etl4
snormore Feb 24, 2015
17070be
Merge remote-tracking branch 'apache/master'
airhorns Feb 26, 2015
9eda8df
Merge remote-tracking branch 'apache/master'
airhorns Mar 5, 2015
6ca71c0
Reset cap file to not include any stuff for testing jars so it works
airhorns Mar 9, 2015
bd4e2c9
Rid ourselves of the doubledot statsd metrics
airhorns Mar 9, 2015
aa0b682
Merge remote-tracking branch 'apache/master'
airhorns Mar 9, 2015
0874b1c
Catch and properly scrub fenced driver statsd logs
airhorns Mar 10, 2015
14059a5
Merge remote-tracking branch 'apache/master'
angelini Mar 10, 2015
0ae47d3
Remove old broken nodes from Capfile
angelini Mar 10, 2015
f0e2c23
Remove platfora2 from Capfile
angelini Mar 10, 2015
0694e4b
Merge remote-tracking branch 'apache/master'
airhorns Mar 23, 2015
91dbbbd
Fix Statsd sink
yagnik Mar 26, 2015
566b871
Merge pull request #54 from Shopify/Statsd-executor-sink
yagnik Mar 27, 2015
7bf9920
Merge remote-tracking branch 'apache/master'
airhorns Mar 27, 2015
0efbd34
Update script/setup to use the built in mvn package so that zinc is u…
airhorns Mar 31, 2015
973ec0e
Update script/get config to replace the entire stanza for the topolog…
airhorns Mar 31, 2015
4bc6653
Merge remote-tracking branch 'apache/master'
airhorns Apr 1, 2015
5413b9c
add jdk1.6 repackage support for jdk1.7 build
zhzhan Apr 22, 2015
2e1638d
Compile and then repackage with java 6 on packserv
airhorns Apr 25, 2015
df4da60
Merge pull request #56 from Shopify/package_with_java_6
airhorns Apr 27, 2015
263178b
Merge remote-tracking branch 'apache/master'
airhorns Apr 27, 2015
3d7ca92
Make make-distribution.sh output verbose
airhorns Apr 27, 2015
78a7029
Revert "Merge remote-tracking branch 'apache/master'"
airhorns Apr 27, 2015
344df1d
Make packserv fix its perms issues until we figure out why they are b…
airhorns Apr 27, 2015
225e7be
Try removing hacky and wrong permissions after packserv land fix
airhorns Apr 28, 2015
cdd5f5b
Revert "Revert "Merge remote-tracking branch 'apache/master'""
airhorns May 4, 2015
bf0d0b9
Merge remote-tracking branch 'apache/master'
airhorns May 4, 2015
8c9f124
Bump to Hadoop 2.6.0 for CDH 5.4
airhorns May 14, 2015
c23e838
Merge remote-tracking branch 'apache/master'
airhorns May 20, 2015
90314d7
Merge remote-tracking branch 'apache/master' into packserv
airhorns May 20, 2015
a2179de
Merge remote-tracking branch 'apache/master' into packserv
airhorns May 28, 2015
fb7e39e
Make script/setup use the OS X Java 1.6 jar tool, and clean before it…
airhorns May 28, 2015
af7fa68
Move the shade plugin above the ant-run plugin for pyspark so that wh…
airhorns May 28, 2015
3ef03ea
Merge remote-tracking branch 'apache/master' into packserv
airhorns Jun 1, 2015
8d2dc5a
Don't set spark-defaults.conf any more so that Starscream and the lik…
airhorns Jun 1, 2015
b68a83b
Neuter spark-env.sh to not include the YARN conf from the cluster unl…
airhorns Jun 2, 2015
85db5cc
Do not shrink shuffle batch size
angelini Jun 9, 2015
044b478
Merge pull request #57 from Shopify/do_not_shrink_shuffle_batch_size
angelini Jun 9, 2015
0292ed3
Merge remote-tracking branch 'apache/master' into bump_june_16
airhorns Jun 16, 2015
dafce1b
Use the built in libexec helpers for OS X Java homes to find java 6
airhorns Jun 16, 2015
cfc39d6
Revert "Merge pull request #56 from Shopify/package_with_java_6"
airhorns Jun 19, 2015
c5586de
Stop repackaging with java 6 now that Pyspark makes it to yarn as it'…
airhorns Jun 19, 2015
fe80af6
Upload py4j and pyspark as zips into the sparkles directory for inclu…
airhorns Jun 19, 2015
0bc709d
Build new spark with Java 1.8
airhorns Jun 19, 2015
8d43a5f
Revert "Merge pull request #57 from Shopify/do_not_shrink_shuffle_bat…
airhorns Jun 23, 2015
ef52210
Merge remote-tracking branch 'apache/master' into bump_june_16
airhorns Jun 23, 2015
c4e7b2a
add hadoop-misc4 to spark deploy list
orenmazor Jul 7, 2015
f612ccd
Merge pull request #58 from Shopify/add_hadoop_misc_to_deploy_targets
orenmazor Jul 7, 2015
025df45
Merge branch 'master' of github.com:apache/spark
kevincox Jul 21, 2015
217263f
Merge branch 'master' of github.com:apache/spark
kevincox Jul 23, 2015
a6fb8aa
change file
Jul 28, 2015
afd862c
Merge branch 'master' of github.com:apache/spark into 2015-07-28
kevincox Jul 28, 2015
d6d7f2e
Merge pull request #59 from Shopify/2015-07-28
kevincox Jul 28, 2015
0f0e0ef
Upgrade pyrolite to 4.9
angelini Aug 4, 2015
9c18437
Merge pull request #60 from Shopify/upgrade_pyrolite
angelini Aug 5, 2015
385fceb
Merge branch 'negativeExecutor1' of github.com:KaiXinXiaoLei/spark in…
kevincox Aug 10, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ unit-tests.log
ec2/lib/
rat-results.txt
scalastyle.txt
conf/spark-defaults.conf.bak
conf/*.conf
conf/conf.cloudera.yarn
scalastyle-output.xml
R-unit-tests.log
R/unit-tests.out
Expand Down
69 changes: 69 additions & 0 deletions Capfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
require 'bundler/setup'
require 'capistrano_recipes/deploy/packserv'

set :application, "spark"
set :user, "deploy"
set :shared_work_path, "/u/apps/spark/shared/work"
set :shared_logs_path, "/u/apps/spark/shared/log"
set :shared_conf_path, "/u/apps/spark/shared/conf"
set :spark_jar_path, "hdfs://hadoop-production/user/sparkles"
set :gateway, nil
set :keep_releases, 5
set :branch, fetch(:branch, `git symbolic-ref --short HEAD`.gsub("\s",""))

DATANODES = (2..48).map {|i| "dn%02d.chi.shopify.com" % i }
OTHERNODES = ["hadoop-etl1.chi.shopify.com", "hadoop-misc4.chi.shopify.com", "spark-etl1.chi.shopify.com", "reportify-etl4.chi.shopify.com"]
BROKEN = [] # Node is down don't try to send code

task :production do
role :app, *(DATANODES + OTHERNODES - BROKEN)
role :history, "hadoop-rm.chi.shopify.com"
role :uploader, "spark-etl1.chi.shopify.com"
end

namespace :deploy do
task :cleanup do
count = fetch(:keep_releases, 5).to_i
run "ls -1dt /u/apps/spark/releases/* | tail -n +#{count + 1} | xargs rm -rf"
end

task :upload_to_hdfs, :roles => :uploader, :on_no_matching_servers => :continue do
run "hdfs dfs -copyFromLocal -f #{release_path}/lib/spark-assembly-*.jar #{fetch(:spark_jar_path)}/spark-assembly-#{fetch(:sha)}.jar"
run "hdfs dfs -copyFromLocal -f #{release_path}/python/lib/pyspark.zip #{fetch(:spark_jar_path)}/pyspark-#{fetch(:sha)}.zip"
run "hdfs dfs -copyFromLocal -f #{release_path}/python/lib/py4j-*.zip #{fetch(:spark_jar_path)}/py4j-#{fetch(:sha)}.zip"
end

task :prevent_gateway do
set :gateway, nil
end

task :symlink_shared do
run "ln -nfs #{shared_work_path} #{release_path}/work"
run "ln -nfs #{shared_logs_path} #{release_path}/logs"
run "rm -rf #{release_path}/conf && ln -nfs #{shared_conf_path} #{release_path}/conf"
end

task :remind_us_to_update_starscream do
puts "****************************************************************"
puts "*"
puts "* Remember to update starscream/conf/config.yml"
puts "*"
puts "* spark_production"
puts "* conf_options:"
puts "* <<: *spark_remote"
puts "* spark.yarn.jar: \"#{fetch(:spark_jar_path)}/spark-assembly-\033[31m#{fetch(:sha)}\033[0m.jar\""
puts "*"
puts "****************************************************************"
end

task :restart do
end

after 'deploy:initialize_variables', 'deploy:prevent_gateway' # capistrano recipes packserv deploy always uses a gateway
before 'deploy:symlink_current', 'deploy:symlink_shared'
before 'deploy:test_spark_jar', 'deploy:initialize_variables'
before 'deploy:upload_to_hdfs', 'deploy:initialize_variables'
after 'deploy:unpack', 'deploy:upload_to_hdfs'
after 'deploy:restart', 'deploy:cleanup'
after 'deploy:cleanup', 'deploy:remind_us_to_update_starscream'
end
7 changes: 7 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# A sample Gemfile
source "https://rubygems.org"

group :deploy do
gem 'capistrano', '~> 2'
gem 'capistrano-recipes', git: "git@github.com:Shopify/capistrano-recipes", ref: '57bd4ed4accc5561d4774ec2f072bb71bd1b2ea7'
end
34 changes: 34 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
GIT
remote: git@github.com:Shopify/capistrano-recipes
revision: 57bd4ed4accc5561d4774ec2f072bb71bd1b2ea7
ref: 57bd4ed4accc5561d4774ec2f072bb71bd1b2ea7
specs:
capistrano-recipes (1.1.0)
capistrano (~> 2.15.5)
json (>= 1.8.1)

GEM
remote: https://rubygems.org/
specs:
capistrano (2.15.5)
highline
net-scp (>= 1.0.0)
net-sftp (>= 2.0.0)
net-ssh (>= 2.0.14)
net-ssh-gateway (>= 1.1.0)
highline (1.6.21)
json (1.8.1)
net-scp (1.1.2)
net-ssh (>= 2.6.5)
net-sftp (2.1.2)
net-ssh (>= 2.6.5)
net-ssh (2.8.0)
net-ssh-gateway (1.2.0)
net-ssh (>= 2.6.5)

PLATFORMS
ruby

DEPENDENCIES
capistrano (~> 2)
capistrano-recipes!
93 changes: 10 additions & 83 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,9 @@
# Apache Spark
# Shopify's Apache Spark

Spark is a fast and general cluster computing system for Big Data. It provides
high-level APIs in Scala, Java, and Python, and an optimized engine that
supports general computation graphs for data analysis. It also supports a
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
MLlib for machine learning, GraphX for graph processing,
and Spark Streaming for stream processing.

<http://spark.apache.org/>
Spark is a fast and general cluster computing system for Big Data.

This is Shopify's clone with specific to Shopify customizations, mostly
surrounding configuration.

## Online Documentation

Expand All @@ -17,82 +12,14 @@ guide, on the [project web page](http://spark.apache.org/documentation.html)
and [project wiki](https://cwiki.apache.org/confluence/display/SPARK).
This README file only contains basic setup instructions.

## Building Spark

Spark is built using [Apache Maven](http://maven.apache.org/).
To build Spark and its example programs, run:

build/mvn -DskipTests clean package

(You do not need to do this if you downloaded a pre-built package.)
More detailed documentation is available from the project site, at
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).

## Interactive Scala Shell

The easiest way to start using Spark is through the Scala shell:

./bin/spark-shell

Try the following command, which should return 1000:

scala> sc.parallelize(1 to 1000).count()

## Interactive Python Shell

Alternatively, if you prefer Python, you can use the Python shell:

./bin/pyspark

And run the following command, which should also return 1000:

>>> sc.parallelize(range(1000)).count()

## Example Programs

Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./bin/run-example <class> [params]`. For example:

./bin/run-example SparkPi

will run the Pi example locally.

You can set the MASTER environment variable when running examples to submit
examples to a cluster. This can be a mesos:// or spark:// URL,
"yarn-cluster" or "yarn-client" to run on YARN, and "local" to run
locally with one thread, or "local[N]" to run locally with N threads. You
can also use an abbreviated class name if the class is in the `examples`
package. For instance:

MASTER=spark://host:7077 ./bin/run-example SparkPi

Many of the example programs print usage help if no params are given.

## Running Tests

Testing first requires [building Spark](#building-spark). Once Spark is built, tests
can be run using:

./dev/run-tests

Please see the guidance on how to
[run tests for a module, or individual tests](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools).
## Building Shopify Spark

## A Note About Hadoop Versions
You can build Shopify spark using `script/setup`, or continuously and incrementally using `script/watch`

Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the protocols have changed in different versions of
Hadoop, you must build Spark against the same version that your cluster runs.
## Testing Shopify Spark

Please refer to the build documentation at
["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
for detailed guidance on building for a particular distribution of Hadoop, including
building for particular Hive and Hive Thriftserver distributions. See also
["Third Party Hadoop Distributions"](http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html)
for guidance on building a Spark application that works with a particular
distribution.
To test a Shopify spark build, assemble the spark jar with `script/setup` or maven, and then unset the `spark.yarn.jar` property from the defaults.conf or the config of the application you are using. Spark will then upload your local assembly to your YARN application's staging, no deploy involved.

## Configuration
## Deploying Shopify Spark

Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html)
in the online documentation for an overview on how to configure Spark.
The cap deploy script is only for deploying Shopify Spark to production. To deploy, execute `bundle exec cap production deploy`
1 change: 1 addition & 0 deletions SHOPIFY_HADOOP_OPTIONS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
-Phadoop-2.4 -Dhadoop.version=2.6.0 -Pyarn -Phive
42 changes: 21 additions & 21 deletions assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -92,27 +92,6 @@
<skip>true</skip>
</configuration>
</plugin>
<!-- zip pyspark archives to run python application on yarn mode -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>run</goal>
</goals>
</execution>
</executions>
<configuration>
<target>
<delete dir="${basedir}/../python/lib/pyspark.zip"/>
<zip destfile="${basedir}/../python/lib/pyspark.zip">
<fileset dir="${basedir}/../python/" includes="pyspark/**/*"/>
</zip>
</target>
</configuration>
</plugin>
<!-- Use the shade plugin to create a big JAR with all the dependencies -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
Expand Down Expand Up @@ -162,6 +141,27 @@
</execution>
</executions>
</plugin>
<!-- zip pyspark archives to run python application on yarn mode -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>run</goal>
</goals>
</execution>
</executions>
<configuration>
<target>
<delete dir="${basedir}/../python/lib/pyspark.zip"/>
<zip destfile="${basedir}/../python/lib/pyspark.zip">
<fileset dir="${basedir}/../python/" includes="pyspark/**/*"/>
</zip>
</target>
</configuration>
</plugin>
</plugins>
</build>

Expand Down
1 change: 1 addition & 0 deletions conf/java-opts
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
-Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Djava.security.krb5.conf=/dev/null
22 changes: 22 additions & 0 deletions conf/log4j.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Set everything to be logged to the console
log4j.rootCategory=INFO, console, file

log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.Threshold=WARN
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# setttings for file appender that captures more verbose output
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/tmp/spark.log
log4j.appender.file.MaxFileSize=20MB
log4j.appender.file.Threshold=INFO
log4j.appender.file.MaxBackupIndex=1
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1} %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
3 changes: 3 additions & 0 deletions conf/spark-defaults.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Shopify doesn't use defaults here and instead lets all the clients specify their own set of defaults.
# This way, each client can set defaults appropriate to it, as well as change those defaults based on the environment.
# They also don't have to care about this weird set of overridden values that is different than the defaults listed in the docs.
31 changes: 31 additions & 0 deletions conf/spark-env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/usr/bin/env bash

echoerr() { echo "$@" 1>&2; }
FWDIR="$(cd `dirname $0`/..; pwd)"


if [ "$(uname)" == "Darwin" ]; then
case "$PYTHON_ENV" in
'remote_development')
echoerr "Sparkify: Connecting to chicago spark cluster ..."
# Figure out the local IP to bind spark to for shell <-> master communication
vpn_interface=tap0;
get_ip_command="ifconfig $vpn_interface 2>&1 | grep 'inet' | awk '{print \$2}'"
if ifconfig $vpn_interface > /dev/null 2>&1; then
export SPARK_LOCAL_IP=`bash -c "$get_ip_command"`
else
echoerr "ERROR: could not find an VPN interface to connect to the Shopify Spark Cluster! Please connect your VPN client! See https://vault-unicorn.shopify.com/VPN---Servers ."
exit 1
fi

export HADOOP_CONF_DIR=$FWDIR/conf/conf.cloudera.yarn
;;
'test'|'development')
export SPARK_LOCAL_IP=127.0.0.1
;;
esac
fi

if which ipython > /dev/null; then
export IPYTHON=1
fi
2 changes: 1 addition & 1 deletion core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,7 @@
<dependency>
<groupId>net.razorvine</groupId>
<artifactId>pyrolite</artifactId>
<version>4.4</version>
<version>4.9</version>
<exclusions>
<exclusion>
<groupId>net.razorvine</groupId>
Expand Down
Loading