Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat straighten compress memory #9094

Merged
merged 50 commits into from
Oct 7, 2022
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
7265472
An initial inplementation of linear programming primal matrix
Yipeng1994 Aug 26, 2022
cb18165
Coding for the revised simplex method
Yipeng1994 Aug 29, 2022
46b627f
Finish coding for the phase 1
Yipeng1994 Aug 30, 2022
0a60ef1
Fix bug.
Yipeng1994 Aug 30, 2022
ee5f80c
Drive the artificial variables out in phase 1
Yipeng1994 Sep 1, 2022
1735099
Bland's rule and bug fix
Yipeng1994 Sep 1, 2022
c7dda00
Adjust the mapping between the basic variables and compact columns
Yipeng1994 Sep 1, 2022
6a1d769
No columns removed while driving artificial variables out.
Yipeng1994 Sep 2, 2022
1868586
Implement the phase 2 of the revised simplex method.
Yipeng1994 Sep 2, 2022
a39f5a1
Update is_solved status and original problem recovery.
Yipeng1994 Sep 2, 2022
9a5481e
Rows and artificial columns activation
Yipeng1994 Sep 2, 2022
df639d6
An initial implementation of mix integer programming
Yipeng1994 Sep 5, 2022
c5ea872
Try to assemble the original problem but fial due to the massive excl…
Yipeng1994 Sep 5, 2022
e933357
Steal initial position from current setting
Yipeng1994 Sep 7, 2022
08396fd
Compute the optimal cost from the compact relationship
Yipeng1994 Sep 7, 2022
c6d3ece
Move to a neighbor status and compute the cost
Yipeng1994 Sep 8, 2022
c7c592c
Find the smallest cost and actually move to that status
Yipeng1994 Sep 9, 2022
664e656
Check conflit after the adjustment.
Yipeng1994 Sep 13, 2022
d41723e
Generate a compact position from nothing
Yipeng1994 Sep 14, 2022
f06f7f0
Straighten for memory
Yipeng1994 Sep 14, 2022
aa23e66
Update the offset
Yipeng1994 Sep 14, 2022
82193ce
Add a demo for using the revised simplex method
Yipeng1994 Sep 14, 2022
603714a
Remove the linear programming part
Yipeng1994 Sep 14, 2022
be401dc
Merge branch 'master' into feat-streighten_compress-memory
Yipeng1994 Sep 15, 2022
51c6a17
Recompute the compact relationship after moving to a new status
Yipeng1994 Sep 19, 2022
82e773c
Merge branch 'feat-streighten_compress-memory' of github.com:Oneflow-…
Yipeng1994 Sep 19, 2022
023a4ad
Rename
Yipeng1994 Sep 19, 2022
7f618b1
Code clean up
Yipeng1994 Sep 19, 2022
efd9df8
Merge branch 'master' into feat-streighten_compress-memory
Yipeng1994 Sep 19, 2022
1c1f9a0
Set the tag for the straighten algorithm
Yipeng1994 Sep 19, 2022
ac04bd5
Code clean up
Yipeng1994 Sep 19, 2022
f219851
An attemp to explore the dependency between consumer nodes of a register
Yipeng1994 Sep 20, 2022
50c88db
Revert "An attemp to explore the dependency between consumer nodes of…
Yipeng1994 Sep 20, 2022
4c346e6
Compute the lower bound and only execute
Yipeng1994 Sep 20, 2022
175198b
Pre-compute and store the memory size for registers
Yipeng1994 Sep 20, 2022
a0f403c
Use pre-stored total register num
Yipeng1994 Sep 28, 2022
bd6867d
Limit the maximum iteration step
Yipeng1994 Sep 28, 2022
7a067be
Use VLOG(3) instead of std::cout
Yipeng1994 Sep 28, 2022
2a187c4
Merge branch 'master' into feat-streighten_compress-memory
Yipeng1994 Sep 28, 2022
67390b4
Change interface
Yipeng1994 Sep 30, 2022
40398ee
Package up memory share strategy interfaces
Yipeng1994 Sep 30, 2022
ddd00b0
Address comments
Yipeng1994 Sep 30, 2022
f72d7e3
Address comments
Yipeng1994 Sep 30, 2022
ea99250
Merge branch 'master' into feat-streighten_compress-memory
Yipeng1994 Sep 30, 2022
a9d619a
Merge branch 'master' into feat-streighten_compress-memory
mergify[bot] Sep 30, 2022
24c0e99
Of format
Yipeng1994 Sep 30, 2022
4733e4e
Merge branch 'master' into feat-streighten_compress-memory
mergify[bot] Sep 30, 2022
180d47c
Merge branch 'master' into feat-streighten_compress-memory
Yipeng1994 Oct 7, 2022
f30b0af
Fix bug lower bound = 0
Yipeng1994 Oct 7, 2022
5035652
Merge branch 'feat-streighten_compress-memory' of github.com:Oneflow-…
Yipeng1994 Oct 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 80 additions & 21 deletions oneflow/core/graph/straighten_nodes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,16 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
#include <memory>
#include "oneflow/core/graph/straighten_nodes.h"
#include "oneflow/core/common/shape.h"
#include "oneflow/core/graph/op_graph.h"
#include "oneflow/core/graph/task_graph.h"
#include "oneflow/core/graph/task_node.h"
#include "oneflow/core/job/job_desc.h"
#include "oneflow/core/common/protobuf.h"
#include "oneflow/core/job/task.pb.h"
#include "oneflow/core/register/runtime_register_desc.h"

namespace oneflow {

Expand All @@ -39,6 +43,7 @@ class TopoStruct {
bool on_mainstem = false;
int32_t counter = 0;
int32_t min_distance2transfer = -1;
int64_t memory_increment = -1;
TopoStruct* next_same_node = nullptr;
// We can have some other nodes in it for example
// SbpNode<NdSbpSignature>* node;
Expand All @@ -55,16 +60,46 @@ class TopoStruct {
// The minimum computation distance from the beginning of this op to the next transfer
int32_t GetMinDistance2Transfer(HashMap<TaskNode*, TopoStruct>* task_node2topo_struct);

// Memory increment = (memory of out registers) - (memory of in registers)
void GetMeomoryIncrement();
Yipeng1994 marked this conversation as resolved.
Show resolved Hide resolved

// TODO: We might design more deciding parameter and choose a right combination of them in the
// future.

// deciding parameter
// i = 0: those with small tributary layers go first
// i = 1: those with small minimum distance to transfer go first
// i = 2: first in first out
// i = 3: those with large tributary layers go first
// i = 4: those with long distance to transfer go first
// i = 5: last in first out
int32_t GetDecidingParameter(int32_t i) const;
// i = 3: those with small memory increment go first
// i = 4: those with large tributary layers go first
// i = 5: those with long distance to transfer go first
// i = 6: last in first out
// i = 7: those with large memory increment go first
int64_t GetDecidingParameter(int32_t i) const;
};

// NOTE: Leave these code for debugging in the future
// static std::vector<int64_t> decide_parameters({ParseIntegerFromEnv("Parameter0", 3),
// ParseIntegerFromEnv("Parameter1", 0),
// ParseIntegerFromEnv("Parameter2", 3)});
// The best parameter set for saving time is {6, 4}
// The best parameter set for saving memory is {3, 0}
static std::vector<int64_t> decide_parameters;

// SAT, a.k.a. Scholastic Aptitude Test, is the college admission test in the United States of
// America.
void InitDecideParameters(StraightenAlgorithmTag sat) {
decide_parameters.clear();
if (sat == StraightenAlgorithmTag::kCompressMemory) {
decide_parameters.push_back(3);
decide_parameters.push_back(0);
Yipeng1994 marked this conversation as resolved.
Show resolved Hide resolved
} else {
// sat==StraightenAlgorithmTag::kOverlap4ModelParallelism
decide_parameters.push_back(6);
decide_parameters.push_back(4);
}
}

// move the head from source to target
void MoveFrontBetweenMaps(std::map<int32_t, TopoStruct*>& source,
std::map<int32_t, TopoStruct*>& target) {
Expand Down Expand Up @@ -198,23 +233,50 @@ int32_t TopoStruct::GetMinDistance2Transfer(HashMap<TaskNode*, TopoStruct>* task
return min_distance2transfer;
}

// Memory increment = (memory of out registers) - (memory of in registers)
Yipeng1994 marked this conversation as resolved.
Show resolved Hide resolved
void TopoStruct::GetMeomoryIncrement() {
if (memory_increment < 0) {
memory_increment = 0;
for (const auto& produced_register : node->produced_regsts()) {
if (produced_register.second->enable_reuse_mem()) {
RegstDescProto temp_proto;
produced_register.second->ToProto(&temp_proto);
memory_increment += RtRegstDesc(temp_proto).TotalMainByteSize4AllRegst();
}
}
for (const auto& consumed_register_list : node->consumed_regsts()) {
for (const auto& consumed_register : consumed_register_list.second) {
if (consumed_register->enable_reuse_mem()) {
RegstDescProto temp_proto;
consumed_register->ToProto(&temp_proto);
memory_increment -= RtRegstDesc(temp_proto).TotalMainByteSize4AllRegst()
/ consumed_register->consumers().size();
}
}
}
}
}

// deciding parameter
// i = 0: those with small tributary layers go first
// i = 1: those with small minimum distance to transfer go first
// i = 2: first in first out
// i = 3: those with large tributary layers go first
// i = 4: those with long distance to transfer go first
// i = 5: last in first out
int32_t TopoStruct::GetDecidingParameter(int32_t i) const {
int32_t sign = 1;
if (i >= 3) {
i -= 3;
// i = 3: those with small memory increment go first
// i = 4: those with large tributary layers go first
// i = 5: those with long distance to transfer go first
// i = 6: last in first out
// i = 7: those with large memory increment go first
int64_t TopoStruct::GetDecidingParameter(int32_t i) const {
int64_t sign = 1;
if (i >= 4) {
i -= 4;
sign = -1;
}
switch (i) {
case 0: return sign * tributary_layer;
case 1: return sign * min_distance2transfer;
case 2: return sign * min_layer;
case 3: return sign * memory_increment;
}
return 0;
}
Expand Down Expand Up @@ -264,6 +326,7 @@ void StraightenNodes(TaskGraph* task_graph, std::vector<TaskNode*>* ordered_task
task_graph->TopoForEachNode([&](TaskNode* node) {
auto& topo_struct = task_node2topo_struct[node];
topo_struct.node = node;
topo_struct.GetMeomoryIncrement();
if (node->in_edges().empty()) {
topo_struct.min_layer = 0;
} else {
Expand Down Expand Up @@ -323,21 +386,17 @@ void StraightenNodes(TaskGraph* task_graph, std::vector<TaskNode*>* ordered_task
// Generate other parameters in the topological data structure
FindMainstem(&task_node2topo_struct);

VLOG(3) << "Straightening order: " << 5 << ", " << 3;
// Decide which node should run first
InitDecideParameters(GlobalJobDesc().job_conf().straighten_algorithm_tag_in_task_graph());
VLOG(3) << "Straightening order: ";
for (int32_t decide_parameter : decide_parameters) { VLOG(3) << decide_parameter; }

// Order in the waiting sets
// Decide which node should run first
struct comp {
bool operator()(const TopoStruct* a, const TopoStruct* b) const {
// NOTE: Leave these code for debugging in the future
// static std::vector<int64_t> decide_parameters({ParseIntegerFromEnv("Parameter0", 5),
// ParseIntegerFromEnv("Parameter1", 3),
// ParseIntegerFromEnv("Parameter2", 5)});
// The best parameter set is {5, 3}
static std::vector<int64_t> decide_parameters({5, 3});
for (int32_t decide_parameter : decide_parameters) {
int32_t decide_parameter_a = a->GetDecidingParameter(decide_parameter);
int32_t decide_parameter_b = b->GetDecidingParameter(decide_parameter);
int64_t decide_parameter_a = a->GetDecidingParameter(decide_parameter);
int64_t decide_parameter_b = b->GetDecidingParameter(decide_parameter);
if (decide_parameter_a != decide_parameter_b) {
return decide_parameter_a < decide_parameter_b;
}
Expand Down
23 changes: 17 additions & 6 deletions oneflow/core/graph/task_graph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ limitations under the License.
#include "oneflow/core/graph/task_graph.h"
#include "oneflow/core/common/util.h"
#include "oneflow/core/graph/inplace_lbi_graph.h"
#include "oneflow/core/job/job_desc.h"
#include "oneflow/core/register/blob_desc.h"
#include "oneflow/core/job/global_for.h"
#include "oneflow/core/operator/variable_op.h"
Expand All @@ -31,6 +32,7 @@ limitations under the License.
#include "oneflow/core/graph/task_stream_index_manager.h"
#include "oneflow/core/ep/include/primitive/memcpy.h"
#include "oneflow/core/graph/straighten_nodes.h"
#include "oneflow/core/register/runtime_register_desc.h"
#include "oneflow/core/common/env_var/env_var.h"

namespace oneflow {
Expand Down Expand Up @@ -425,7 +427,7 @@ void ForEachOpGraphNecessaryCtrlEdge(

} // namespace

TaskGraph::TaskGraph(bool enable_straighten_algorithm) {
TaskGraph::TaskGraph() {
OpGraph* op_graph = Singleton<OpGraph>::Get();
sub_tsk_gph_builder_ctx_.reset(new SubTskGphBuilderCtx(this));
boxing_logger_ = CreateBoxingLogger();
Expand Down Expand Up @@ -456,11 +458,6 @@ TaskGraph::TaskGraph(bool enable_straighten_algorithm) {
}
});

if (enable_straighten_algorithm && GlobalProcessCtx::WorldSize() > 1) {
StraightenNodes(this, &ordered_task_nodes_);
} else {
SetOrderInGraphForEachNode();
}
if (Singleton<ResourceDesc, ForSession>::Get()->enable_debug_mode()) { ToDotWithAutoFilePath(); }
}

Expand Down Expand Up @@ -864,4 +861,18 @@ void TaskGraph::BuildTaskPath(TaskNode* src_node, TaskNode* dst_node, const Logi
ConnectWithLbi(proxy_node, dst_node, lbi);
}

void TaskGraph::DecideExecutionOrder() {
Yipeng1994 marked this conversation as resolved.
Show resolved Hide resolved
// For one machine with no transfer available, the straighten algorithm for overlaps consume a lot
// of memory
StraightenAlgorithmTag straighten_algorithm_tag =
GlobalJobDesc().job_conf().straighten_algorithm_tag_in_task_graph();
if (straighten_algorithm_tag == StraightenAlgorithmTag::kCompressMemory
|| (straighten_algorithm_tag == StraightenAlgorithmTag::kOverlap4ModelParallelism
&& GlobalProcessCtx::WorldSize() > 1)) {
StraightenNodes(this, &ordered_task_nodes_);
} else {
SetOrderInGraphForEachNode();
}
}

} // namespace oneflow
3 changes: 2 additions & 1 deletion oneflow/core/graph/task_graph.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,12 @@ class TaskGraph final : public Graph<TaskNode, TaskEdge> {
OF_DISALLOW_COPY_AND_MOVE(TaskGraph);
~TaskGraph() override;

explicit TaskGraph(bool enable_straighten_algorithm);
explicit TaskGraph();

const char* TypeName() const override { return "TaskGraph"; }
void RemoveEmptyRegsts();
void MergeChainAndAddOrderingCtrlEdgeInSameChain();
void DecideExecutionOrder();

void EnableInplaceMemSharing(const std::function<bool(const std::string&, const std::string&)>&
IsOpNameDataOrCtrlReachable);
Expand Down
6 changes: 3 additions & 3 deletions oneflow/core/job/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,18 +58,18 @@ void Compiler::Compile(Job* job, Plan* plan) const {

// Step2: build task_gph.
// TODO(levi): we can rewrite this part of code in visitor pattern.
auto task_gph =
std::make_unique<TaskGraph>(job->job_conf().enable_straighten_algorithm_in_task_graph());
auto task_gph = std::make_unique<TaskGraph>();
using std::placeholders::_1;
task_gph->ForEachNode(std::bind(&TaskNode::ProduceAllRegstsAndBindEdges, _1));
task_gph->ForEachNode(std::bind(&TaskNode::ConsumeAllRegsts, _1));
task_gph->ForEachNode(std::bind(&TaskNode::PinConsumedRegst, _1));
task_gph->TopoForEachNode(&TaskNode::Build);
task_gph->RemoveEmptyRegsts();
task_gph->TopoForEachNode(&TaskNode::InferTimeShapeIfMeaningful);
task_gph->DecideExecutionOrder();
task_gph->MergeChainAndAddOrderingCtrlEdgeInSameChain();
auto IsReachable = Singleton<OpGraph>::Get()->MakePredicatorIsOpNameDataOrCtrlReachable();
if (job_desc.enable_inplace()) { task_gph->EnableInplaceMemSharing(IsReachable); }
task_gph->TopoForEachNode(&TaskNode::InferTimeShapeIfMeaningful);
task_gph->ForEachEdge([&](TaskEdge* task_edge) { task_edge->CheckRegstLbiValid(); });

// Step3: put infomation from task_gph into plan.
Expand Down
Loading