Skip to content

Commit

Permalink
Fixed the MPI initialization issue (#207)
Browse files Browse the repository at this point in the history
* Bring v1.0 to the most recent commit  (#202)

* Request changes from MLPerf Storage (#199)

* added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow

* removed dependencies on dlioprofiler

* fixed bugs

* Fixed potential insufficient samples due to num_files is not divisible by comm.size (#200)

* added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow

* removed dependencies on dlioprofiler

* fixed bugs

* recovered back dlio_profiler

* fixed potential not enough samples

* Update tf_reader.py

* Mlperf requests (#201)

* added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow

* removed dependencies on dlioprofiler

* fixed bugs

* fixed issue with dlio_profiler

* bring back dlio_profiler_py

* sync up (#205)

* Request changes from MLPerf Storage (#199)

* added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow

* removed dependencies on dlioprofiler

* fixed bugs

* Fixed potential insufficient samples due to num_files is not divisible by comm.size (#200)

* added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow

* removed dependencies on dlioprofiler

* fixed bugs

* recovered back dlio_profiler

* fixed potential not enough samples

* Update tf_reader.py

* Mlperf requests (#201)

* added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow

* removed dependencies on dlioprofiler

* fixed bugs

* fixed issue with dlio_profiler

* bring back dlio_profiler_py

* Bring v1.0 to the most recent commit  (#202) (#203)

* Request changes from MLPerf Storage (#199)

* added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow

* removed dependencies on dlioprofiler

* fixed bugs

* Fixed potential insufficient samples due to num_files is not divisible by comm.size (#200)

* added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow

* removed dependencies on dlioprofiler

* fixed bugs

* recovered back dlio_profiler

* fixed potential not enough samples

* Update tf_reader.py

* Mlperf requests (#201)

* added au metric to the configuration file; set shuffling and shuffle buffer size to be 2 for cosmoflow

* removed dependencies on dlioprofiler

* fixed bugs

* fixed issue with dlio_profiler

* bring back dlio_profiler_py

* Fix requirements file (#204)

Signed-off-by: Johnu George <johnugeorge109@gmail.com>

---------

Signed-off-by: Johnu George <johnugeorge109@gmail.com>
Co-authored-by: Johnu George <johnugeorge109@gmail.com>

* barrier in the beginning

* fixed bugs

* fixed MPI initilization issue

---------

Signed-off-by: Johnu George <johnugeorge109@gmail.com>
Co-authored-by: Johnu George <johnugeorge109@gmail.com>
  • Loading branch information
zhenghh04 and johnugeorge authored Jun 12, 2024
1 parent 01283ab commit 3c27260
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions dlio_benchmark/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@
"""
import os
import math

from mpi4py import MPI
comm = MPI.COMM_WORLD
import logging
from time import time, sleep
import json
Expand Down Expand Up @@ -49,9 +50,8 @@
from dlio_benchmark.utils.utility import Profile, PerfTrace

dlp = Profile(MODULE_DLIO_BENCHMARK)
from mpi4py import MPI
# To make sure the output folder is the same in all the nodes. We have to do this.
MPI.COMM_WORLD.Barrier()
comm.Barrier()
import hydra

class DLIOBenchmark(object):
Expand Down

0 comments on commit 3c27260

Please sign in to comment.