[WIP] parallel perceptron training #10

kmike · 2013-08-11T21:38:02Z

I tried to implement "iterative parameter mixing" strategy for distributed training of structured perceptron:

Ryan Mcdonald, Keith Hall, and Gideon Mann (2010) Distributed training strategies for the structured perceptron. NAACL'10.

The idea is the following:

training data is split into N "shards" (this happens only once);
for each shard OneEpochPerceptron is created - this could happen on different machine;
all OneEpochPerceptrons start with the same weights (but with different training data);
at the end of each iteration learned weights from different perceptrons are collected and mixed together; mixed values are passed to all perceptrons on next iteration (all perceptrons receive the same state again).

So communication should involve only transferring learned weights, and each shard could have its own training data.

ParallelStructuredPerceptron is an attempt to reimplement StructuredPerceptron in terms of OneEpochPerceptrons. It has n_jobs parameter, and ideally it should use multiprocessing or multithreading (numpy/scipy releases GIL and the bottleneck is in dot product isn't it?) for faster training. But I didn't manage to make multiprocessing work without copying shard's X/y/lengths each iteration, so n_jobs = N just creates N OneEpochPerceptrons and trains them sequentially.

Ideally, I want OneEpochPerceptron to be easy to use with IPython.parallel in distributed environment, and ParallelStructuredPerceptron to be easy to use on single machine.

Issues with current implementation:

"parallel" part is not implemented in ParallelStructuredPerceptron (I'm not very versed with multiprocessing/joblib/... and I don't know how to make it work without copying training data on each iteration - ideas are welcome);
code duplication in SequenceShards vs SequenceKFold;
code duplication in OneEpochPerceptron vs StructuredPerceptron vs ParallelStructuredPerceptron;
OneEpochPerceptron uses 'transform' method to learn updated weights;
I don't understand original classes/class_range/n_classes code so maybe I broke something here, and there is also code duplication;
parameters are mixed uniformly - mixing strategy that takes loss in account is not implemented;
not sure about class names and code organization.

sequence_ids shuffling method is changed to make ParallelStructuredPerceptron and StructuredPerceptron learn exactly the same weights given the same random_state.

With n_jobs=1 ParallelStructuredPerceptron is about 10% slower than StructuredPerceptron on my data; I think we could join these classes when (and if) ParallelStructuredPerceptron will be ready.

larsmans · 2013-08-12T08:23:38Z

seqlearn/_utils/shards.py

+from __future__ import absolute_import
+import numpy as np
+
+class SequenceShards(object):


I still have to read this through, but can't this "sharding" be reduced to SequenceKFold somehow? It looks awfully similar.

I think "SequenceKFold" can (and should) be implemented using "sharding", but not the other way around. It is a copy-pasted version of SequenceKFold indeed, with some unnecessary stuff removed.

building blocks for distributed perceptron training

beb9336

kmike mentioned this pull request Aug 11, 2013

partial_fit method? #5

Open

larsmans reviewed Aug 12, 2013
View reviewed changes

larsmans force-pushed the master branch from a36bfdd to 60e94b4 Compare September 22, 2014 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] parallel perceptron training #10

[WIP] parallel perceptron training #10

kmike commented Aug 11, 2013

larsmans Aug 12, 2013

kmike Aug 12, 2013

[WIP] parallel perceptron training #10

Are you sure you want to change the base?

[WIP] parallel perceptron training #10

Conversation

kmike commented Aug 11, 2013

larsmans Aug 12, 2013

Choose a reason for hiding this comment

kmike Aug 12, 2013

Choose a reason for hiding this comment