MP DataLoader Improvements #742

AaronSpieler · 2020-04-03T19:47:41Z

Description of changes:

dataset iterations are now cache aligned (instead of modulo based, just assign subsets)
switched to default Pool creation method (for linux that would be fork currently: this improves initialisation times significantly on linux systems)
added caching support for file dataset
added support for num_batches_for_shuffling, with default 8
added much more typing, and documentation
some refactoring

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

… method.

AaronSpieler · 2020-04-03T19:48:11Z

@vafl maybe you could check wether you run into problems, or see improvements?

codecov-io · 2020-04-06T18:28:18Z

Codecov Report

Merging #742 into master will increase coverage by 1.05%.
The diff coverage is 91.17%.

@@            Coverage Diff             @@
##           master     #742      +/-   ##
==========================================
+ Coverage   84.45%   85.51%   +1.05%     
==========================================
  Files         171      171              
  Lines       10817    10862      +45     
==========================================
+ Hits         9136     9289     +153     
+ Misses       1681     1573     -108

Impacted Files	Coverage Δ
src/gluonts/dataset/jsonl.py	`84.61% <82.60%> (-0.39%)`	⬇️
src/gluonts/dataset/parallelized_loader.py	`88.72% <92.53%> (-0.03%)`	⬇️
src/gluonts/dataset/common.py	`92.78% <100.00%> (+9.45%)`	⬆️
src/gluonts/dataset/loader.py	`100.00% <100.00%> (ø)`
src/gluonts/support/util.py	`94.23% <0.00%> (+3.20%)`	⬆️
src/gluonts/dataset/artificial/_base.py	`67.25% <0.00%> (+26.02%)`	⬆️

AaronSpieler · 2020-04-07T17:56:40Z

[Outdated]

So for WaveNetEstimator i get 4x performance if num_batches_for_schuffling=1 otherwise 3x (in which num_batches_for_schuffling=8).

The question is whether to leave the default 8 or 1. So far this arguments default was 8, on the other hand, there was a bug in the previous data loader code (before multiprocessing), namely:

@property
    def stream(self) -> Iterable:
        s = self.transform(self.dataset, is_train=self.is_train)
        if self.shuffle_for_training:
            return shuffler(s, self.num_batches_for_shuffling)
        return s

def shuffler(stream: Iterable[T], batch_size: int) -> Iterator[T]:
    """Modifies a stream by shuffling items in windows.

    It continously takes `batch_size`-elements from the stream and yields
    elements from each batch  in random order."""

    for batch in batcher(stream, batch_size):
        random.shuffle(batch)
        yield from batch

which means that only num_batches_for_schuffling samples were shuffled, and not batch_size*num_batches_for_schuffling, which is most likely even worse than having num_batches_for_schuffling=1.

And the model training seems to have worked that way fine too. However, I can imagine fixing this bug could yield better performance now in some cases.

Any preference @vafl ?

AaronSpieler · 2020-04-07T18:13:17Z

[Outdated]

Regarding the read speeds, if I use m4_yearly and run the following code (which is not an exact measurement, but stark differences should show):

num_iter=100
print(f"Runtime: {timeit.timeit(lambda: list(dataset.train), number=num_iter)/num_iter}")

I consistently get the following results:

Chunk: 0.04972371574986027
Modulo: 0.05123165169003187

However, while the difference in speed is measurable, it's also negligible.

AaronSpieler · 2020-04-07T18:47:22Z

[Outdated]

And setting cache=True in the FileDataset yields significant boost to performance (on Wavenet). I get essentially the same performance as if the dataset was converted beforehand to a list, except, there is not significant hit in performance in the beginning of training (due to dataset needing to be copied to all the workers), and overall there is less data cached in memory, as opposed to if the dataset was converted to a list beforehand (only 1/num_workers as much cashed in total)

For example, for this configuration on the electricity dataset:

estim = WaveNetEstimator(
    freq=meta.freq,
    prediction_length=24,
    seasonality=48,
    trainer=Trainer(
        batch_size=32,
        epochs=7,
        hybridize=True,
        learning_rate=0.01,
        num_batches_per_epoch=400
    )
)
estim.train(train_01, num_workers=8, num_batches_for_shuffling=1, shuffle_for_training=True)

I get ~17 vs 24 it/sec uncached vs cached.

…_for_schuffling

AaronSpieler · 2020-04-07T20:25:09Z

With the last commit I changed the functionality of num_batches_for_schuffling to probabilistically choosing every num_batches_for_schuffling entry, which:

massively simplifies the code
improves the performance in the case of num_batches_for_schuffling=8 in my testing that it's basically as performant as num_batches_for_schuffling=1, which is a 50% improvement over the previous implementation
the caching mechanism is unaffected
"drawback": we can't guarantee (in fact it's unlikely) that in num_batches_for_schuffling passes every data entry is processed exactly once. (but the InstanceSplitter is also probabilistic at the moment for training, so that would be the case anyways.)

In order to achieve this, I had to switch back to the cache aligned data iteration, because in the none cached FileDataset the difference between the techniques is significantly noticeable (since now num_batches_for_schuffling as much has to be read from the dataset, i.e. from the disk).

AaronSpieler · 2020-04-07T20:57:03Z

With this I think every feature is added that was missing (that I am aware of), and I tested every design choice that I though needed validation.

lostella · 2020-04-16T07:30:44Z

I have some minor comments, see inline

src/gluonts/dataset/loader.py

src/gluonts/dataset/common.py

src/gluonts/dataset/jsonl.py

src/gluonts/dataset/loader.py

src/gluonts/dataset/parallelized_loader.py

…pieler/gluon-ts into mp_data_loader_updates_V2

src/gluonts/dataset/jsonl.py

vafl

Looks good, thanks!

src/gluonts/dataset/common.py

src/gluonts/dataset/jsonl.py

test/dataset/test_multiprocessing_loader.py

lostella · 2020-04-17T08:09:48Z

I merged upstream changes to let the tests run again, there was a timeout in on test which I want to make sure is not occurring consistently. @AaronSpieler any idea where these time outs come from?

AaronSpieler · 2020-04-17T09:26:53Z

The windows timeout error? IDK: It should be non mp: "You have set num_workers to a non zero value, however, currently multiprocessing is not supported on windows and therefore`num_workers will be set to 0."

…pieler/gluon-ts into mp_data_loader_updates_V2

AaronSpieler · 2020-04-17T09:41:39Z

Should be fixed now. The error came from mp evaluation being enabled on windows. I think this was shadowed by the continual windows errors we were getting so far from other commits.

AaronSpieler · 2020-04-17T10:59:57Z

So windows is still failing, however, it has nothing to do with multiprocessing. So I think we could at some point reenable multiprocessing evaluation for windows, however, how the error is now more clearly visible because its not happening in the forked processes anymore:

File "D:\a\gluon-ts\gluon-ts\test\model\conftest.py", line 100, in test_accuracy
    evaluator=Evaluator(calculate_owa=statsmodels is not None),
  File "d:\a\gluon-ts\gluon-ts\src\gluonts\evaluation\backtest.py", line 224, in backtest_metrics
    ts_it, forecast_it, num_series=maybe_len(test_dataset)
  File "d:\a\gluon-ts\gluon-ts\src\gluonts\evaluation\_base.py", line 178, in __call__
    for ts, forecast in it:
  File "c:\hostedtoolcache\windows\python\3.6.8\x64\lib\site-packages\tqdm\std.py", line 1127, in __iter__
    for obj in iterable:
  File "d:\a\gluon-ts\gluon-ts\src\gluonts\model\predictor.py", line 317, in predict
    num_samples=num_samples,
  File "d:\a\gluon-ts\gluon-ts\src\gluonts\model\forecast_generator.py", line 204, in __call__
    outputs = prediction_net(*inputs).asnumpy()
  File "c:\hostedtoolcache\windows\python\3.6.8\x64\lib\site-packages\mxnet\ndarray\ndarray.py", line 2535, in asnumpy
    ctypes.c_size_t(data.size)))

lostella

Looks good! Thanks!

Dataset iterations are now cache aligned and switched to default Pool…

8f40b94

… method.

AaronSpieler changed the title ~~Improvements~~ MP DataLoader Improvements Apr 3, 2020

AaronSpieler requested a review from vafl April 3, 2020 19:55

Aaron Spieler added 3 commits April 6, 2020 17:09

Added caching support for FileDataset.

c7aad21

Refactoring.

73483b3

Added missing documentation in train loader.

b2563c9

Aaron Spieler added 2 commits April 6, 2020 20:54

Added more return types.

09ecc26

mend

6e71dc6

lostella added this to the v0.5 milestone Apr 7, 2020

Fixed bug regarding num_batches_for_sampling.

f98d8a8

Reverting back to modulo based segmentation for code readability.

50bffcd

Aaron Spieler added 2 commits April 7, 2020 22:01

Massively simplified worker_fn due to simplified logic of num_batches…

4c38ef9

…_for_schuffling

mend

d75202e

User warning in case of mp but not caching.

e50813b

lostella reviewed Apr 16, 2020

View reviewed changes

src/gluonts/dataset/loader.py Outdated Show resolved Hide resolved

src/gluonts/dataset/common.py Outdated Show resolved Hide resolved

AaronSpieler mentioned this pull request Apr 16, 2020

Training fails with num_workers=1 #756

Closed

Aaron Spieler and others added 3 commits April 16, 2020 10:46

Minor refactoring.

9ba3b3a

Merge branch 'master' into mp_data_loader_updates_V2

87178c3

Merge branch 'master' into mp_data_loader_updates_V2

bcf353c

lostella reviewed Apr 16, 2020

View reviewed changes

src/gluonts/dataset/common.py Show resolved Hide resolved

lostella reviewed Apr 16, 2020

View reviewed changes

Smaller reformatting.

f717eeb

Aaron Spieler added 2 commits April 16, 2020 15:42

Merge branch 'mp_data_loader_updates_V2' of https://github.com/AaronS…

7221ecc

…pieler/gluon-ts into mp_data_loader_updates_V2

Updated doc.

f5839a8

lostella reviewed Apr 16, 2020

View reviewed changes

src/gluonts/dataset/jsonl.py Outdated Show resolved Hide resolved

vafl reviewed Apr 16, 2020

View reviewed changes

src/gluonts/dataset/common.py Show resolved Hide resolved

src/gluonts/dataset/jsonl.py Outdated Show resolved Hide resolved

Aaron Spieler added 3 commits April 16, 2020 17:31

Simplified segmenting, test fix.

957113d

Yield from improvement.

23d09db

Dataset Coverage Test Explicit.

33ef3e4

AaronSpieler requested a review from lostella April 16, 2020 17:47

vafl previously approved these changes Apr 16, 2020

View reviewed changes

src/gluonts/dataset/jsonl.py Outdated Show resolved Hide resolved

lostella reviewed Apr 16, 2020

View reviewed changes

test/dataset/test_multiprocessing_loader.py Show resolved Hide resolved

removed print

314cdbe

AaronSpieler dismissed vafl’s stale review via 314cdbe April 16, 2020 20:09

Aaron Spieler and others added 2 commits April 16, 2020 22:32

Removed unused import.

5e21cfc

Merge branch 'master' into mp_data_loader_updates_V2

2e44be8

Aaron Spieler added 2 commits April 17, 2020 11:36

Disabling windows mp evaluation, lowering required JSonLine throughput.

e6582e7

Merge branch 'mp_data_loader_updates_V2' of https://github.com/AaronS…

24988ac

…pieler/gluon-ts into mp_data_loader_updates_V2

AaronSpieler requested a review from lostella April 17, 2020 09:41

lostella approved these changes Apr 17, 2020

View reviewed changes

lostella merged commit 0143054 into awslabs:master Apr 17, 2020

AaronSpieler deleted the mp_data_loader_updates_V2 branch July 16, 2020 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MP DataLoader Improvements #742

MP DataLoader Improvements #742

AaronSpieler commented Apr 3, 2020 •

edited

Loading

AaronSpieler commented Apr 3, 2020

codecov-io commented Apr 6, 2020

AaronSpieler commented Apr 7, 2020 •

edited

Loading

AaronSpieler commented Apr 7, 2020 •

edited

Loading

AaronSpieler commented Apr 7, 2020 •

edited

Loading

AaronSpieler commented Apr 7, 2020 •

edited

Loading

AaronSpieler commented Apr 7, 2020

lostella commented Apr 16, 2020

vafl left a comment

lostella commented Apr 17, 2020

AaronSpieler commented Apr 17, 2020

AaronSpieler commented Apr 17, 2020

AaronSpieler commented Apr 17, 2020

lostella left a comment

MP DataLoader Improvements #742

MP DataLoader Improvements #742

Conversation

AaronSpieler commented Apr 3, 2020 • edited Loading

AaronSpieler commented Apr 3, 2020

codecov-io commented Apr 6, 2020

Codecov Report

AaronSpieler commented Apr 7, 2020 • edited Loading

AaronSpieler commented Apr 7, 2020 • edited Loading

AaronSpieler commented Apr 7, 2020 • edited Loading

AaronSpieler commented Apr 7, 2020 • edited Loading

AaronSpieler commented Apr 7, 2020

lostella commented Apr 16, 2020

vafl left a comment

Choose a reason for hiding this comment

lostella commented Apr 17, 2020

AaronSpieler commented Apr 17, 2020

AaronSpieler commented Apr 17, 2020

AaronSpieler commented Apr 17, 2020

lostella left a comment

Choose a reason for hiding this comment

AaronSpieler commented Apr 3, 2020 •

edited

Loading

AaronSpieler commented Apr 7, 2020 •

edited

Loading

AaronSpieler commented Apr 7, 2020 •

edited

Loading

AaronSpieler commented Apr 7, 2020 •

edited

Loading

AaronSpieler commented Apr 7, 2020 •

edited

Loading