Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import ParallelMode #166

Merged
merged 4 commits into from
Mar 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/CONCEPTS/tensor_model_parallelism.html
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,7 @@ <h2> Contents </h2>
<section class="tex2jax_ignore mathjax_ignore" id="concept-of-tensor-model-parallelism">
<h1>Concept of Tensor Model Parallelism<a class="headerlink" href="#concept-of-tensor-model-parallelism" title="Permalink to this heading">#</a></h1>
<ul class="simple">
<li><p>Authors: Kichang Yang, Kevin Ko</p></li>
<li><p>Authors: Kichang Yang, Kevin Ko, Minho Ryu</p></li>
</ul>
<p><strong>Tensor Model Parallelism</strong> makes it possible to train larger models by partitioning the parameter tensors into multiple dimensions.
We support 1D, 2D, 2.5D, and 3D tensor partitioning algorithms which make tensor parallel training more efficient.</p>
Expand Down
2 changes: 1 addition & 1 deletion docs/CONCEPTS/tp/1d_parallel_algorithm.html
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ <h2>Usage<a class="headerlink" href="#usage" title="Permalink to this heading">#
<p>Use <code class="docutils literal notranslate"><span class="pre">ParallelMode.TENSOR_1D</span></code> as a parameter of <code class="docutils literal notranslate"><span class="pre">tensor_parallel_mode</span></code>. Model weight should be divisible by <code class="docutils literal notranslate"><span class="pre">tp_size</span></code>.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># model = defined in section 2.2</span>

<span class="kn">from</span> <span class="nn">oslo</span> <span class="kn">import</span> <span class="n">ParallelContext</span>
<span class="kn">from</span> <span class="nn">oslo</span> <span class="kn">import</span> <span class="n">ParallelContext</span><span class="p">,</span> <span class="n">ParallelMode</span>
<span class="kn">from</span> <span class="nn">oslo.torch.nn.parallel</span> <span class="kn">import</span> <span class="n">TensorParallel</span>

<span class="n">tp_size</span> <span class="o">=</span> <span class="mi">4</span>
Expand Down
2 changes: 1 addition & 1 deletion docs/CONCEPTS/tp/2d_parallel_algorithm.html
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ <h1>2D parallel (SUMMA) algorithm<a class="headerlink" href="#d-parallel-summa-a
<section id="usage">
<h2>Usage<a class="headerlink" href="#usage" title="Permalink to this heading">#</a></h2>
<p>Use <code class="docutils literal notranslate"><span class="pre">ParallelMode.TENSOR_2D</span></code> as a parameter of <code class="docutils literal notranslate"><span class="pre">tensor_parallel_mode</span></code>. Since the algorithm splits model along both rows and columns, <code class="docutils literal notranslate"><span class="pre">tp_size</span></code> should be a <strong>square of positive integer</strong>.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">oslo</span> <span class="kn">import</span> <span class="n">ParallelContext</span>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">oslo</span> <span class="kn">import</span> <span class="n">ParallelContext</span><span class="p">,</span> <span class="n">ParallelMode</span>
<span class="kn">from</span> <span class="nn">oslo.torch.nn.parallel</span> <span class="kn">import</span> <span class="n">TensorParallel</span>

<span class="n">tp_size</span> <span class="o">=</span> <span class="mi">4</span>
Expand Down
2 changes: 1 addition & 1 deletion docs/CONCEPTS/tp/2p5d_parallel_algorithm.html
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ <h2>Usage<a class="headerlink" href="#usage" title="Permalink to this heading">#
It is recommended to set <code class="docutils literal notranslate"><span class="pre">tp_depth</span></code> to more than 1, as the algorithm becomes identical to the 2D algorithm if <code class="docutils literal notranslate"><span class="pre">tp_depth</span></code> is 1.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># model = defined in section 2.2</span>

<span class="kn">from</span> <span class="nn">oslo</span> <span class="kn">import</span> <span class="n">ParallelContext</span>
<span class="kn">from</span> <span class="nn">oslo</span> <span class="kn">import</span> <span class="n">ParallelContext</span><span class="p">,</span> <span class="n">ParallelMode</span>
<span class="kn">from</span> <span class="nn">oslo.torch.nn.parallel</span> <span class="kn">import</span> <span class="n">TensorParallel</span>

<span class="n">tp_size</span> <span class="o">=</span> <span class="mi">8</span>
Expand Down
2 changes: 1 addition & 1 deletion docs/CONCEPTS/tp/3d_parallel_algorithm.html
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ <h2>Usage<a class="headerlink" href="#usage" title="Permalink to this heading">#
<p>Use <code class="docutils literal notranslate"><span class="pre">ParallelMode.TENSOR_3D</span></code> as a parameter of <code class="docutils literal notranslate"><span class="pre">tensor_parallel_mode</span></code>. <code class="docutils literal notranslate"><span class="pre">tp_size</span></code> should be a <strong>cubic of positive integer</strong>.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># model = defined in section 2.2</span>

<span class="kn">from</span> <span class="nn">oslo</span> <span class="kn">import</span> <span class="n">ParallelContext</span>
<span class="kn">from</span> <span class="nn">oslo</span> <span class="kn">import</span> <span class="n">ParallelContext</span><span class="p">,</span> <span class="n">ParallelMode</span>
<span class="kn">from</span> <span class="nn">oslo.torch.nn.parallel</span> <span class="kn">import</span> <span class="n">TensorParallel</span>

<span class="n">tp_size</span> <span class="o">=</span> <span class="mi">8</span>
Expand Down
6 changes: 3 additions & 3 deletions docs/TUTORIALS/tensor_model_parallelism.html
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ <h2> Contents </h2>
<section class="tex2jax_ignore mathjax_ignore" id="tensor-model-parallelism-tutorial">
<h1>Tensor Model Parallelism Tutorial<a class="headerlink" href="#tensor-model-parallelism-tutorial" title="Permalink to this heading">#</a></h1>
<ul class="simple">
<li><p>Authors: Kichang Yang, Kevin Ko</p></li>
<li><p>Authors: Kichang Yang, Kevin Ko, Minho Ryu</p></li>
</ul>
<p><img alt="260461C3-EA3B-405C-9B34-05BA3C781161.png" src="../_images/260461C3-EA3B-405C-9B34-05BA3C781161.png" /></p>
<p><strong>Tensor Model Parallelism</strong>
Expand Down Expand Up @@ -409,7 +409,7 @@ <h3>1.2. Parallelize the model<a class="headerlink" href="#parallelize-the-model
<li><p><code class="docutils literal notranslate"><span class="pre">pipeline_parallel_size</span></code> must be 1 if you want to use <code class="docutils literal notranslate"><span class="pre">tensor_parallel</span></code> algorithm ( mixing PP and PP will be supported in later version.)</p></li>
</ul>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">oslo</span>
<span class="kn">from</span> <span class="nn">oslo</span> <span class="kn">import</span> <span class="n">ParallelContext</span>
<span class="kn">from</span> <span class="nn">oslo</span> <span class="kn">import</span> <span class="n">ParallelContext</span><span class="p">,</span> <span class="n">ParallelMode</span>
<span class="kn">from</span> <span class="nn">oslo.torch.nn.parallel</span> <span class="kn">import</span> <span class="n">TensorParallel</span>

<span class="n">tp_size</span> <span class="o">=</span> <span class="mi">4</span>
Expand Down Expand Up @@ -480,7 +480,7 @@ <h3>2.2. Create model, optimizer and tokenizer<a class="headerlink" href="#creat
<h3>2.3. Parallelize the model<a class="headerlink" href="#id1" title="Permalink to this heading">#</a></h3>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># model = defined in section 2.2</span>

<span class="kn">from</span> <span class="nn">oslo</span> <span class="kn">import</span> <span class="n">ParallelContext</span>
<span class="kn">from</span> <span class="nn">oslo</span> <span class="kn">import</span> <span class="n">ParallelContext</span><span class="p">,</span> <span class="n">ParallelMode</span>
<span class="kn">from</span> <span class="nn">oslo.torch.nn.parallel</span> <span class="kn">import</span> <span class="n">TensorParallel</span>

<span class="n">tp_size</span> <span class="o">=</span> <span class="mi">4</span>
Expand Down
2 changes: 1 addition & 1 deletion docs/_sources/CONCEPTS/tensor_model_parallelism.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Concept of Tensor Model Parallelism
- Authors: Kichang Yang, Kevin Ko
- Authors: Kichang Yang, Kevin Ko, Minho Ryu

**Tensor Model Parallelism** makes it possible to train larger models by partitioning the parameter tensors into multiple dimensions.
We support 1D, 2D, 2.5D, and 3D tensor partitioning algorithms which make tensor parallel training more efficient.
Expand Down
2 changes: 1 addition & 1 deletion docs/_sources/CONCEPTS/tp/1d_parallel_algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Use `ParallelMode.TENSOR_1D` as a parameter of `tensor_parallel_mode`. Model wei
```python
# model = defined in section 2.2

from oslo import ParallelContext
from oslo import ParallelContext, ParallelMode
from oslo.torch.nn.parallel import TensorParallel

tp_size = 4
Expand Down
2 changes: 1 addition & 1 deletion docs/_sources/CONCEPTS/tp/2d_parallel_algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The result is a matrix $Y$ that is the product of $X$ and $A$.
Use `ParallelMode.TENSOR_2D` as a parameter of `tensor_parallel_mode`. Since the algorithm splits model along both rows and columns, `tp_size` should be a **square of positive integer**.

```python
from oslo import ParallelContext
from oslo import ParallelContext, ParallelMode
from oslo.torch.nn.parallel import TensorParallel

tp_size = 4
Expand Down
2 changes: 1 addition & 1 deletion docs/_sources/CONCEPTS/tp/2p5d_parallel_algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ It is recommended to set `tp_depth` to more than 1, as the algorithm becomes ide
```python
# model = defined in section 2.2

from oslo import ParallelContext
from oslo import ParallelContext, ParallelMode
from oslo.torch.nn.parallel import TensorParallel

tp_size = 8
Expand Down
2 changes: 1 addition & 1 deletion docs/_sources/CONCEPTS/tp/3d_parallel_algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Use `ParallelMode.TENSOR_3D` as a parameter of `tensor_parallel_mode`. `tp_size`
```python
# model = defined in section 2.2

from oslo import ParallelContext
from oslo import ParallelContext, ParallelMode
from oslo.torch.nn.parallel import TensorParallel

tp_size = 8
Expand Down
6 changes: 3 additions & 3 deletions docs/_sources/TUTORIALS/tensor_model_parallelism.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Tensor Model Parallelism Tutorial
- Authors: Kichang Yang, Kevin Ko
- Authors: Kichang Yang, Kevin Ko, Minho Ryu

![260461C3-EA3B-405C-9B34-05BA3C781161.png](image/260461C3-EA3B-405C-9B34-05BA3C781161.png)

Expand Down Expand Up @@ -86,7 +86,7 @@ Here is some explain about arguments to parallel_context.

```python
import oslo
from oslo import ParallelContext
from oslo import ParallelContext, ParallelMode
from oslo.torch.nn.parallel import TensorParallel

tp_size = 4
Expand Down Expand Up @@ -157,7 +157,7 @@ tokenizer.pad_token = tokenizer.eos_token
```python
# model = defined in section 2.2

from oslo import ParallelContext
from oslo import ParallelContext, ParallelMode
from oslo.torch.nn.parallel import TensorParallel

tp_size = 4
Expand Down
Loading