Skip to content

Commit

Permalink
Merge branch 'master' into add-slave
Browse files Browse the repository at this point in the history
* master:
  0.6.0 dev begins
  add some minor steps
  update standalone version in example
  this is 0.5.0
  upgrade dependencies (nchammas#128)
  use latest Amazon Linux AMI
  rephrase note about future Windows support
  remove note about squashing PR commits
  up default Spark version to 1.6.2
  add CHANGES for spark download source and additional security groups
  rename some internals related to security groups
  Resolve nchammas#72 add --ec2-security-group flag support (nchammas#112)
  added HADOOP_LIBEXEC_DIR env var (nchammas#127)
  Add option to download Spark from a custom URL (nchammas#125)
  add custom Hadoop URL change; reformat Markdown links
  • Loading branch information
Soyeon Baek committed Jul 22, 2016
2 parents 648e2cf + 1e15d1a commit a311203
Show file tree
Hide file tree
Showing 15 changed files with 205 additions and 65 deletions.
125 changes: 102 additions & 23 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,52 +1,131 @@
# Change Log

## [Unreleased]

## [Unreleased](https://github.com/nchammas/flintrock/compare/v0.4.0...master)
Nothing notable yet.

[Unreleased]: https://github.com/nchammas/flintrock/compare/v0.5.0...master

## [0.5.0] - 2016-07-20

[0.5.0]: https://github.com/nchammas/flintrock/compare/v0.4.0...v0.5.0

### Added

* [#118]: You can now specify `--hdfs-download-source` (or the
equivalent in your config file) to tell Flintrock to download Hadoop
from a specific URL when launching your cluster.
* [#125]: You can now specify `--spark-download-source` (or the
equivalent in your config file) to tell Flintrock to download Spark
from a specific URL when launching your cluster.
* [#112]: You can now specify `--ec2-security-group` to associate
additional security groups with your cluster on launch.

[#118]: https://github.com/nchammas/flintrock/pull/118
[#125]: https://github.com/nchammas/flintrock/pull/125
[#112]: https://github.com/nchammas/flintrock/pull/112

### Changed

* [#103](https://github.com/nchammas/flintrock/pull/103): Flintrock now opens port 7077 so local
clients like Apache Zeppelin can connect directly to the Spark master on the cluster.
* [#103], [#114]: Flintrock now opens port 6066 and 7077 so local
clients like Apache Zeppelin can connect directly to the Spark
master on the cluster.
* [#122]: Flintrock now automatically adds executables like
`spark-submit`, `pyspark`, and `hdfs` to the default `PATH`, so
they're available to call right when you login to the cluster.

[#103]: https://github.com/nchammas/flintrock/pull/103
[#114]: https://github.com/nchammas/flintrock/pull/114
[#122]: https://github.com/nchammas/flintrock/pull/122

## [0.4.0](https://github.com/nchammas/flintrock/compare/v0.3.0...v0.4.0) - 2016-03-27
## [0.4.0] - 2016-03-27

[0.4.0]: https://github.com/nchammas/flintrock/compare/v0.3.0...v0.4.0

### Added

* [#98](https://github.com/nchammas/flintrock/pull/98), [#99](https://github.com/nchammas/flintrock/pull/99): You can now specify `latest` for `--spark-git-commit` and Flintrock will automatically build Spark on your cluster at the latest commit. This feature is only available for Spark repos hosted on GitHub.
* [#94](https://github.com/nchammas/flintrock/pull/94): Flintrock now supports launching clusters into non-default VPCs.
* [#98], [#99]: You can now specify `latest` for `--spark-git-commit`
and Flintrock will automatically build Spark on your cluster at the
latest commit. This feature is only available for Spark repos
hosted on GitHub.
* [#94]: Flintrock now supports launching clusters into non-default
VPCs.

[#94]: https://github.com/nchammas/flintrock/pull/94
[#98]: https://github.com/nchammas/flintrock/pull/98
[#99]: https://github.com/nchammas/flintrock/pull/99

### Changed

* [#86](https://github.com/nchammas/flintrock/pull/86): Flintrock now correctly catches when spot requests fail and bubbles up an appropriate error message.
* [#93](https://github.com/nchammas/flintrock/pull/93), [#97](https://github.com/nchammas/flintrock/pull/97): Fixed the ability to build Spark from git. (It was broken for recent commits.)
* [#96](https://github.com/nchammas/flintrock/pull/96), [#100](https://github.com/nchammas/flintrock/pull/100): Flintrock launches should now work correctly whether the default Python on the cluster is Python 2.7 or Python 3.4+.
* [#86]: Flintrock now correctly catches when spot requests fail and
bubbles up an appropriate error message.
* [#93], [#97]: Fixed the ability to build Spark from git. (It was
broken for recent commits.)
* [#96], [#100]: Flintrock launches should now work correctly whether
the default Python on the cluster is Python 2.7 or Python 3.4+.

[#86]: https://github.com/nchammas/flintrock/pull/86
[#93]: https://github.com/nchammas/flintrock/pull/93
[#96]: https://github.com/nchammas/flintrock/pull/96
[#97]: https://github.com/nchammas/flintrock/pull/97
[#100]: https://github.com/nchammas/flintrock/pull/100

## [0.3.0](https://github.com/nchammas/flintrock/compare/v0.2.0...v0.3.0) - 2016-02-14
## [0.3.0] - 2016-02-14

[0.3.0]: https://github.com/nchammas/flintrock/compare/v0.2.0...v0.3.0

### Changed

* [`eca59fc`](https://github.com/nchammas/flintrock/commit/eca59fc0052874d9aa48b7d4d7d79192b5e609d1), [`3cf6ee6`](https://github.com/nchammas/flintrock/commit/3cf6ee64162ceaac6429d79c3bc6ef25988eaa8e): Tweaked a few things so that Flintrock can launch 200+ node clusters without hitting certain limits.
* [`eca59fc`], [`3cf6ee6`]: Tweaked a few things so that Flintrock
can launch 200+ node clusters without hitting certain limits.

[`eca59fc`]: https://github.com/nchammas/flintrock/commit/eca59fc0052874d9aa48b7d4d7d79192b5e609d1
[`3cf6ee6`]: https://github.com/nchammas/flintrock/commit/3cf6ee64162ceaac6429d79c3bc6ef25988eaa8e

## [0.2.0](https://github.com/nchammas/flintrock/compare/v0.1.0...v0.2.0) - 2016-02-07
## [0.2.0] - 2016-02-07

### Added
[0.2.0]: https://github.com/nchammas/flintrock/compare/v0.1.0...v0.2.0

* [`b00fd12`](https://github.com/nchammas/flintrock/commit/b00fd128f36e0a05dafca69b26c4d1b190fa42c9): Added `--assume-yes` option to the `launch` command. Use `--assume-yes` to tell Flintrock to automatically destroy the cluster if there are problems during launch.
### Added

### Changed
* [`b00fd12`]: Added `--assume-yes` option to the `launch` command.
Use `--assume-yes` to tell Flintrock to automatically destroy the
cluster if there are problems during launch.

* [#69](https://github.com/nchammas/flintrock/pull/69): Automatically retry Hadoop download from flaky Apache mirrors.
* [`0df7004`](https://github.com/nchammas/flintrock/commit/0df70043f3da215fe699165bc961bd0c4ba4ea88): Delete unneeded security group after a cluster is destroyed.
* [`244f734`](https://github.com/nchammas/flintrock/commit/244f7345696d1b8cec1d1b575a304b9bd9a77840): Default HDFS not to install. Going forward, Spark will be the only service that Flintrock installs by default. Defaults can easily be changed via Flintrock's config file.
* [`de33412`](https://github.com/nchammas/flintrock/commit/de3341221ca8d57f5a465b13f07c8e266ae11a59): Flintrock installs services, not modules. The terminology has been updated accordingly throughout the code and docs. Update your config file to use `services` instead of `modules`. **Warning**: Flintrock will have problems managing existing clusters that were launched with versions of Flintrock from before this change.
* [#73](https://github.com/nchammas/flintrock/pull/73): Major refactoring of Flintrock internals.
* [#74](https://github.com/nchammas/flintrock/pull/74): Flintrock now catches common configuration problems upfront and provides simple error messages, instead of barfing out errors from EC2 or launching broken clusters.
* [`bf766ba`](https://github.com/nchammas/flintrock/commit/bf766ba48f12a8752c2e32f9b3daf29501c30866): Fixed a bug in how Flintrock polls SSH availability from Linux. Cluster launches now work from Linux as intended.
[`b00fd12`]: https://github.com/nchammas/flintrock/commit/b00fd128f36e0a05dafca69b26c4d1b190fa42c9

### Changed

## [0.1.0](https://github.com/nchammas/flintrock/releases/tag/v0.1.0) - 2015-12-11
* [#69]: Automatically retry Hadoop download from flaky Apache
mirrors.
* [`0df7004`]: Delete unneeded security group after a cluster is
destroyed.
* [`244f734`]: Default HDFS not to install. Going forward, Spark will
be the only service that Flintrock installs by default. Defaults can
easily be changed via Flintrock's config file.
* [`de33412`]: Flintrock installs services, not modules. The
terminology has been updated accordingly throughout the code and
docs. Update your config file to use `services` instead of
`modules`. **Warning**: Flintrock will have problems managing
existing clusters that were launched with versions of Flintrock from
before this change.
* [#73]: Major refactoring of Flintrock internals.
* [#74]: Flintrock now catches common configuration problems upfront
and provides simple error messages, instead of barfing out errors
from EC2 or launching broken clusters.
* [`bf766ba`]: Fixed a bug in how Flintrock polls SSH availability
from Linux. Cluster launches now work from Linux as intended.

[#69]: https://github.com/nchammas/flintrock/pull/69
[`0df7004`]: https://github.com/nchammas/flintrock/commit/0df70043f3da215fe699165bc961bd0c4ba4ea88
[`244f734`]: https://github.com/nchammas/flintrock/commit/244f7345696d1b8cec1d1b575a304b9bd9a77840
[`de33412`]: https://github.com/nchammas/flintrock/commit/de3341221ca8d57f5a465b13f07c8e266ae11a59
[#73]: https://github.com/nchammas/flintrock/pull/73
[#74]: https://github.com/nchammas/flintrock/pull/74
[`bf766ba`]: https://github.com/nchammas/flintrock/commit/bf766ba48f12a8752c2e32f9b3daf29501c30866

## [0.1.0] - 2015-12-11

[0.1.0]: https://github.com/nchammas/flintrock/releases/tag/v0.1.0

* Initial release.
2 changes: 0 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,6 @@ When building something new, don't just consider the value it will provide. Cons

Make sure each pull request you submit captures a single coherent idea. This limits the scope of any given pull request and makes it much easier for a reviewer to understand what you are doing and give precise feedback. Don't mix logically independent changes in the same request if they can be submitted separately.

After you and the reviewers agree that a pull request is ready to be accepted, you will be asked to squash your commits into one before your change is merged in. This helps us ensure that every commit in Flintrock's history represents a working state, and makes changes easier to browse through and understand.

#### Expect many revisions

If you are adding or touching lots of code, then be prepared to go through many rounds of revisions before your pull request is accepted. This is normal, especially as you are still getting acquainted with the project's standards and style.
Expand Down
18 changes: 13 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Here's a quick way to launch a cluster on EC2, assuming you already have an [AWS
```sh
flintrock launch test-cluster \
--num-slaves 1 \
--spark-version 1.6.1 \
--spark-version 1.6.2 \
--ec2-key-name key_name \
--ec2-identity-file /path/to/key.pem \
--ec2-ami ami-08111162 \
Expand Down Expand Up @@ -58,9 +58,17 @@ That's not all. Flintrock has a few more [features](#features) that you may find

## Installation

Before using Flintrock, take a quick look at the [copyright](https://github.com/nchammas/flintrock/blob/master/COPYRIGHT) notice and [license](https://github.com/nchammas/flintrock/blob/master/LICENSE) and make sure you're OK with their terms.
Before using Flintrock, take a quick look at the
[copyright](https://github.com/nchammas/flintrock/blob/master/COPYRIGHT)
notice and [license](https://github.com/nchammas/flintrock/blob/master/LICENSE)
and make sure you're OK with their terms.

**Flintrock requires Python 3.4 or newer**, unless you are using one of our **standalone packages**. Flintrock has been thoroughly tested only on OS X, but it should run on all POSIX systems. We have plans to [add Windows support](https://github.com/nchammas/flintrock/issues/46) in the future, too.
**Flintrock requires Python 3.4 or newer**, unless you are using one
of our **standalone packages**. Flintrock has been thoroughly tested
only on OS X, but it should run on all POSIX systems.
A motivated contributor should be able to add
[Windows support](https://github.com/nchammas/flintrock/issues/46)
without too much trouble, too.

### Release version

Expand Down Expand Up @@ -91,7 +99,7 @@ unzip it to a location of your choice, and run the `flintrock` executable inside
For example:

```sh
flintrock_version="0.4.0"
flintrock_version="0.5.0"

curl --location --remote-name "https://github.com/nchammas/flintrock/releases/download/v$flintrock_version/Flintrock-$flintrock_version-standalone-OSX-x86_64.zip"
unzip -q -d flintrock "Flintrock-$flintrock_version-standalone-OSX-x86_64.zip"
Expand Down Expand Up @@ -186,7 +194,7 @@ provider: ec2

services:
spark:
version: 1.6.1
version: 1.6.2

launch:
num-slaves: 1
Expand Down
2 changes: 1 addition & 1 deletion flintrock/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# See: https://packaging.python.org/en/latest/distributing/#standards-compliance-for-interoperability
__version__ = '0.5.0.dev0'
__version__ = '0.6.0.dev0'
15 changes: 12 additions & 3 deletions flintrock/config.yaml.template
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
services:
spark:
version: 1.6.1
version: 1.6.2
# git-commit: latest # if not 'latest', provide a full commit SHA; e.g. d6dc12ef0146ae409834c78737c116050961f350
# git-repository: # optional; defaults to https://github.com/apache/spark
# optional; defaults to download from from the official Spark S3 bucket
# - must contain a {v} template corresponding to the version
# - Spark must be pre-built
# - must be a tar.gz file
# download-source: "https://www.example.com/files/spark/{v}/spark-{v}.tar.gz"
hdfs:
version: 2.7.2
# optional; defaults to download from a dynamically selected Apache mirror
# must contain a {v} template corresponding to the version; must be a .tar.gz file
# - must contain a {v} template corresponding to the version
# - must be a .tar.gz file
# download-source: "https://www.example.com/files/hadoop/{v}/hadoop-{v}.tar.gz"

provider: ec2
Expand All @@ -18,14 +24,17 @@ providers:
instance-type: m3.medium
region: us-east-1
# availability-zone: <name>
ami: ami-08111162 # Amazon Linux, us-east-1
ami: ami-6869aa05 # Amazon Linux, us-east-1
user: ec2-user
# ami: ami-61bbf104 # CentOS 7, us-east-1
# user: centos
# spot-price: <price>
# vpc-id: <id>
# subnet-id: <id>
# placement-group: <name>
# security-groups:
# - group-name1
# - group-name2
tenancy: default # default | dedicated
ebs-optimized: no # yes | no
instance-initiated-shutdown-behavior: terminate # terminate | stop
Expand Down
40 changes: 36 additions & 4 deletions flintrock/ec2.py
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,33 @@ def check_network_config(*, region_name: str, vpc_id: str, subnet_id: str):
)


def get_or_create_ec2_security_groups(
def get_security_groups(
*,
vpc_id,
region,
security_group_names) -> "List[boto3.resource('ec2').SecurityGroup]":
ec2 = boto3.resource(service_name='ec2', region_name=region)

groups = list(
ec2.security_groups.filter(
Filters=[
{'Name': 'group-name', 'Values': security_group_names},
{'Name': 'vpc-id', 'Values': [vpc_id]},
]))

found_group_names = [group.group_name for group in groups]
missing_group_names = set(security_group_names) - set(found_group_names)
if missing_group_names:
raise Error(
"Could not find the following security group{s}: {groups}"
.format(
s='' if len(missing_group_names) == 1 else 's',
groups=', '.join(list(missing_group_names))))

return groups


def get_or_create_flintrock_security_groups(
*,
cluster_name,
vpc_id,
Expand Down Expand Up @@ -511,6 +537,7 @@ def launch(
availability_zone,
ami,
user,
security_groups,
spot_price=None,
vpc_id,
subnet_id,
Expand Down Expand Up @@ -547,10 +574,15 @@ def launch(
v=vpc_id))

try:
security_groups = get_or_create_ec2_security_groups(
flintrock_security_groups = get_or_create_flintrock_security_groups(
cluster_name=cluster_name,
vpc_id=vpc_id,
region=region)
user_security_groups = get_security_groups(
vpc_id=vpc_id,
region=region,
security_group_names=security_groups)
security_group_ids = [sg.id for sg in user_security_groups + flintrock_security_groups]
block_device_mappings = get_ec2_block_device_mappings(
ami=ami,
region=region)
Expand Down Expand Up @@ -585,7 +617,7 @@ def launch(
'Placement': {
'AvailabilityZone': availability_zone,
'GroupName': placement_group},
'SecurityGroupIds': [sg.id for sg in security_groups],
'SecurityGroupIds': security_group_ids,
'SubnetId': subnet_id,
'IamInstanceProfile': {
'Name': instance_profile_name},
Expand Down Expand Up @@ -634,7 +666,7 @@ def launch(
'AvailabilityZone': availability_zone,
'Tenancy': tenancy,
'GroupName': placement_group},
SecurityGroupIds=[sg.id for sg in security_groups],
SecurityGroupIds=security_group_ids,
SubnetId=subnet_id,
IamInstanceProfile={
'Name': instance_profile_name},
Expand Down
13 changes: 12 additions & 1 deletion flintrock/flintrock.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,10 @@ def cli(cli_context, config, provider):
@click.option('--install-spark/--no-install-spark', default=True)
@click.option('--spark-version',
help="Spark release version to install.")
@click.option('--spark-download-source',
help="URL to download a release of Spark from.",
default='https://s3.amazonaws.com/spark-related-packages/spark-{v}-bin-hadoop2.6.tgz',
show_default=True)
@click.option('--spark-git-commit',
help="Git commit to build Spark from. "
"Set to 'latest' to build Spark from the latest commit on the "
Expand All @@ -206,6 +210,10 @@ def cli(cli_context, config, provider):
@click.option('--ec2-availability-zone', default='')
@click.option('--ec2-ami')
@click.option('--ec2-user')
@click.option('--ec2-security-group', 'ec2_security_groups',
multiple=True,
help="Additional security groups names to assign to the instances. "
"You can specify this option multiple times.")
@click.option('--ec2-spot-price', type=float)
@click.option('--ec2-vpc-id', default='', help="Leave empty for default VPC.")
@click.option('--ec2-subnet-id', default='')
Expand All @@ -227,6 +235,7 @@ def launch(
spark_version,
spark_git_commit,
spark_git_repository,
spark_download_source,
assume_yes,
ec2_key_name,
ec2_identity_file,
Expand All @@ -235,6 +244,7 @@ def launch(
ec2_availability_zone,
ec2_ami,
ec2_user,
ec2_security_groups,
ec2_spot_price,
ec2_vpc_id,
ec2_subnet_id,
Expand Down Expand Up @@ -289,7 +299,7 @@ def launch(
services += [hdfs]
if install_spark:
if spark_version:
spark = Spark(version=spark_version)
spark = Spark(version=spark_version, download_source=spark_download_source)
elif spark_git_commit:
print(
"Warning: Building Spark takes a long time. "
Expand All @@ -315,6 +325,7 @@ def launch(
availability_zone=ec2_availability_zone,
ami=ec2_ami,
user=ec2_user,
security_groups=ec2_security_groups,
spot_price=ec2_spot_price,
vpc_id=ec2_vpc_id,
subnet_id=ec2_subnet_id,
Expand Down
Loading

0 comments on commit a311203

Please sign in to comment.