-
Notifications
You must be signed in to change notification settings - Fork 29
Configuration
Ondřej Moravčík edited this page Jun 12, 2015
·
16 revisions
There are several ways how to configure ruby-spark and spark. Configuration can be changed before creating context. After that config is read-only.
SPARK_RUBY_SERIALIZER="oj" ruby-spark shell
Content of property file:
# This is just a comment
spark.ruby.serializer oj
spark.ruby.serializer.batch_size 4096
spark.ruby.executor.options -W1
# For shell
ruby-spark shell --properties-file conf.conf
# In ruby
Spark.config.from_file(FILE_PATH)
The ~/.ruby-spark.conf contains default configuration which is always loaded. File is automatically created during the first run. There is for example path to target folder.
This muts be done before starting.
Spark.config do
set_app_name 'RubySpark'
set_master 'local[*]'
set 'spark.ruby.serializer', 'oj'
set 'spark.ruby.serializer.batch_size', 100
end
Spark.config.set('spark.ruby.serializer.batch_size', 100)
sc.parallelize(1..10, 3, serializer)
Key | Default value | Description |
---|---|---|
spark.ruby.serializer | marshal |
Default serializer
marshal: ruby's default (slowest but can serialize everything) oj: faster than marshal but doesn't work on jruby message_pack: fastest but cannot serialize large numbers and some objects |
spark.ruby.serializer.batch_size | 1024 |
Number of items which will be serialized and send as one item. If size should be calculated automatically use: auto |
spark.ruby.serializer.compress | false | Compress serilized bytes |
spark.ruby.worker.type | process |
Type of workers.
process: new workers are created by fork function thread: worker is represented as thread |
spark.ruby.executor.command | %s |
Command template for ruby script execution.
Can be useful if you are using some ruby version manager.
Rbenv: bash --norc -i -c "export HOME=/home/user; cd; source .bashrc; %s" Template must contain '%s' which represent origin ruby command. |
spark.ruby.executor.options | Ruby options for scripts. | |
spark.ruby.executor.env.[VARIABLE] | Environment variables for ruby scripts. |