Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 154 additions & 0 deletions docs/ambari-dev/bigtop-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
---
title: Compiling Components for Ambari Bigtop Stack
---

# Compiling Components for Ambari Bigtop Stack

## Introduction

Apache Bigtop is designed for infrastructure engineers and data scientists seeking comprehensive packaging, testing, and configuration of leading open-source big data components. Bigtop supports a wide range of components/projects, including but not limited to Hadoop, HBase, and Spark. This guide specifically focuses on how to compile components for the **Ambari Bigtop Stack**.

## Use Cases for Apache Bigtop

1. **Simplified Package Building**: Bigtop significantly simplifies the process of compiling RPM or DEB packages for big data components across different operating systems through pre-configured Docker images, making it quick and efficient.

2. **Dependency Management**: Bigtop integrates complex dependencies required during the compilation process, effectively resolving common compilation errors and ensuring a smooth compilation experience through patches in the code. This means users no longer need to worry about official packages failing to compile or setting up complex compilation environments.

3. **Apache Ambari Support**: Bigtop provides support for Apache Ambari, allowing users to easily package big data software that is compatible with Ambari and meets installation requirements.

## Getting Started with Bigtop

This guide uses the official Bigtop 3.3.0 as an example, with CentOS 7 as the compilation operating system. The same operations apply to other systems and versions.

### Prerequisites

- Linux environment
- Docker installed on your system
- Git

### Step-by-Step Guide

#### 1. Create a Development Directory

```bash
mkdir ~/dev/
```

#### 2. Clone Bigtop Repository

```bash
cd ~/dev/
git clone https://github.com/apache/bigtop.git
```

#### 3. Switch to Version 3.3.0

```bash
cd bigtop
git checkout release-3.3.0
```

#### 4. Pull the Bigtop CentOS 7 Compilation Environment Image

```bash
# If you need to compile for other operating systems or architectures (e.g., ARM),
# you can search for the corresponding Bigtop version in the image repository
# https://hub.docker.com/r/bigtop/slaves/tags
docker pull bigtop/slaves:3.3.0-centos-7
```

#### 5. Launch the Container

**Scenario 1**: If you've previously compiled big data components locally and have a Maven repository cache, it's best to map this directory to the container's default Maven download directory to avoid downloading packages again.

For example, if your local Maven repository directory is `/data/repository`:

```bash
cd ~/dev/bigtop
docker run -d -it --network host -v `pwd`:/ws -v /data/repository:/root/.m2/repository --workdir /ws --name bigtopr bigtop/slaves:3.3.0-centos-7
```

**Scenario 2**: If you don't have a Maven cache locally or are unfamiliar with this, you should still map a directory to the Bigtop container to facilitate repeated compilations using downloaded Maven cache. Otherwise, when the container is deleted, your Maven cache will be lost, and dependency downloading is the most time-consuming stage of recompilation.

```bash
mkdir -p ~/m2/repository
cd ~/dev/bigtop
docker run -d -it --network host -v `pwd`:/ws -v ~/m2/repository:/root/.m2/repository --workdir /ws --name bigtopr bigtop/slaves:3.3.0-centos-7
```

#### 6. Modify Maven Repository Settings (Optional)

You can configure Maven to use mirrors that are faster for your location. This step is optional but can significantly improve download speeds.

1. Enter the container:
```bash
docker exec -it bigtopr /bin/bash
```

2. Edit the Maven settings file:
```bash
vi /usr/local/maven/conf/settings.xml
```

3. Add appropriate mirror repositories based on your location. For example:

```xml
<mirrors>
<mirror>
<id>central-mirror</id>
<mirrorOf>central</mirrorOf>
<name>Central Repository Mirror</name>
<url>https://repo1.maven.org/maven2/</url>
</mirror>
<!-- Add other mirrors as needed -->
</mirrors>
```

#### 7. Compile Big Data Components

Enter your running container:

```bash
docker exec -it bigtopr /bin/bash
```

Compile components:

```bash
. /etc/profile.d/bigtop.sh
./gradlew flink-clean flink-pkg -PparentDir=/usr/bigtop -PpkgSuffix -PbuildThreads=2C repo
```

**Explanation of compilation parameters**:

- `-PparentDir=/usr/bigtop`: Changes the default installation path of the package, making Bigtop-built packages conform to Ambari installation specifications.
- `-PpkgSuffix`: Makes the output package include the Bigtop version number (e.g., hadoop_3_3_0), conforming to Ambari Bigtop service specifications.
- `-PbuildThreads=2C`: Sets the number of threads for compilation (2 times the number of CPU cores).

## Parallel Compilation for Improved Performance

A pull request for parallel compilation to speed up the build process has been submitted to the community and is currently under review. Once merged, all Java components in Bigtop will be able to compile in parallel, expected to be available in versions after Bigtop 3.3.1.

Performance comparison for parallel compilation (after all dependencies are downloaded):

| Component | Time Before | Time After |
|------------|-------------|------------|
| Alluxio | 21min | 07:43min |
| Hive | 05:33min | 03:04min |
| HBase | 06:18min | 02:55min |
| Zookeeper | 01:25min | 35s |
| Livy | 03:29min | 03:12min |
| Phoenix | 11:23min | 05:32min |
| Zeppelin | 14:15min | 13:19min |
| Flink | 36:27min | 14:16min |
| Hadoop | 50min | 16min |

Example of parallel compilation command:

```bash
docker run -d -it --network host -v `pwd`:/ws -v /data/repository:/data/repository --workdir /ws --name bigtop bigtop/slaves:trunk-centos-7 --cpus 16
source /etc/profile.d/bigtop.sh
./gradlew alluxio-clean alluxio-pkg -PcompileThreads=2C
```

This approach shows a 2-3x improvement in compilation speed, with even more significant effects during initial compilation (e.g., Hadoop initial compilation time reduced from 3 hours to 1 hour).
126 changes: 126 additions & 0 deletions docs/ambari-dev/building-from-source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
title: Building from Source
---

# Building Apache Ambari from Source

This guide explains how to build Apache Ambari 3.0 and its related subprojects from source code.

## Prerequisites

Before you begin, ensure you have the following requirements installed:

### System Requirements
- Operating System: Rocky Linux 8 or 9 (recommended)
- Python 3 Development Tools (`python3-devel`)

### Java Requirements
- Ambari Main Project: JDK 17
- Ambari Metrics: JDK 8
- Ambari Infra: JDK 8

## Building Ambari Main Project

### 1. Clone the Repository
```bash
git clone git@github.com:apache/ambari.git
cd ambari
```

### 2. Build Options

#### Build Without RPM
To build Ambari without creating RPM packages:
```bash
mvn -B -T 2C clean install package \
-Drat.skip=true \
-DskipTests \
-Dmaven.test.skip=true \
-Dfindbugs.skip=true \
-Dcheckstyle.skip=true
```

#### Build With RPM
To build Ambari and create RPM packages:
```bash
mvn -B -T 2C clean install package rpm:rpm \
-Drat.skip=true \
-DskipTests \
-Dmaven.test.skip=true \
-Dfindbugs.skip=true \
-Dcheckstyle.skip=true
```

The RPM packages will be generated at:
- Ambari Agent: `ambari/ambari-agent/target/rpm/ambari-agent/RPMS/x86_64/ambari-agent-3.0.0.0-SNAPSHOT.x86_64.rpm`
- Ambari Server: `ambari/ambari-server/target/rpm/ambari-server/RPMS/x86_64/ambari-server-3.0.0.0-SNAPSHOT.x86_64.rpm`

## Building Ambari Metrics

:::tip Performance Optimization
To significantly improve build performance, download binary dependencies locally before building:

1. Create a local directory for dependencies:
```bash
mkdir -p /ws/dl/
```

2. Download the required binary files:
```bash
wget -P /ws/dl/ http://repo.bigtop.apache.org.s3.amazonaws.com/bigtop-stack-binary/3.2.0/centos-7/x86_64/hbase-2.4.13-bin.tar.gz
wget -P /ws/dl/ http://repo.bigtop.apache.org.s3.amazonaws.com/bigtop-stack-binary/3.2.0/centos-7/x86_64/hadoop-3.3.4.tar.gz
wget -P /ws/dl/ https://dl.grafana.com/oss/release/grafana-11.1.4.linux-amd64.tar.gz
wget -P /ws/dl/ http://repo.bigtop.apache.org.s3.amazonaws.com/bigtop-stack-binary/3.2.0/centos-7/x86_64/phoenix-hbase-2.4-5.1.2-bin.tar.gz
```

3. Modify the `pom.xml` in ambari-metrics project to use local files:
```xml
<!-- Update these properties to use local files -->
<properties>
<hbase.tar>file:///ws/dl/hbase-2.4.13-bin.tar.gz</hbase.tar>
<hadoop.tar>file:///ws/dl/hadoop-3.3.4.tar.gz</hadoop.tar>
<grafana.tar>file:///ws/dl/grafana-11.1.4.linux-amd64.tar.gz</grafana.tar>
<phoenix.tar>file:///ws/dl/phoenix-hbase-2.4-5.1.2-bin.tar.gz</phoenix.tar>
</properties>
```

This optimization will save significant time during repeated builds by avoiding large downloads.
:::

### 1. Clone the Repository
```bash
git clone git@github.com:apache/ambari-metrics.git
cd ambari-metrics
```

### 2. Build Options

#### Build Without RPM
To build Ambari Metrics without creating RPM packages:
```bash
mvn -T 2C clean install -DskipTests
```

#### Build With RPM
To build Ambari Metrics and create RPM packages:
```bash
mvn -T 2C clean install -DskipTests -Dbuild-rpm
```

To locate the generated RPM packages:
```bash
find ./ -name "*.rpm"
```

## Building Ambari Infra

### 1. Clone the Repository
```bash
git clone git@github.com:apache/ambari-infra.git
cd ambari-infra
```

### 2. Build RPM Package
```bash
make rpm
```
22 changes: 16 additions & 6 deletions docs/ambari-dev/how-to-contribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,21 @@ Repeat these steps for all the branches that needs to be synced with the remote.

Apache Ambari uses JIRA to track issues including bugs and improvements, and uses Github pull requests to manage code reviews and code merges. Major design changes are discussed in JIRA and implementation changes are discussed in pull requests after a pull request is created.

:::note Important Changes to JIRA Registration
* JIRA registration is currently closed to the public
* To get a JIRA account:
1. Register on [Apache JIRA](https://issues.apache.org/jira)
2. Contact a PMC member to approve your registration
* Alternatively, you can:
1. Submit your Pull Request first
2. Community members will help create the corresponding JIRA ticket for you
:::

* Find an existing Apache JIRA that the change pertains to
* Do not create a new JIRA if the change is minor and relates to an existing JIRA; add to the existing discussion and work instead
* Look for existing pull requests that are linked from the JIRA, to understand if someone is already working on the JIRA

* If the change is new, then create a new JIRA:
* If the change is new and you have JIRA access, then create a new JIRA:
* Provide a descriptive Title
* Write a detailed Description. For bug reports, this should ideally include a short reproduction of the problem. For new features, it may include a design document.
* Fill the required fields:
Expand All @@ -49,11 +59,11 @@ Apache Ambari uses JIRA to track issues including bugs and improvements, and use
* Blocker: pointless to release without this change as the release would be unusable to a large minority of users
* Critical: a large minority of users are missing important functionality without this, and/or a workaround is difficult
* Major: a small minority of users are missing important functionality without this, and there is a workaround
* Minor: a niche use case is missing some support, but it does not affect usage or is easily worked around
* Trivial: a nice-to-have change but unlikely to be any problem in practice otherwise
* Component. Choose the components that are affected by this change. Choose from Ambari Components
* Affects Version. For Bugs, assign at least one version that is known to exhibit the problem or need the change
* Do not include a patch file; pull requests are used to propose the actual change.

* If you don't have JIRA access:
* Submit your Pull Request first
* In the PR description, clearly describe the issue or improvement
* A community member will create a JIRA ticket and link it to your PR

### Pull Request

Expand Down
Loading
Loading