Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] parallel perceptron training #10

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kmike
Copy link
Collaborator

@kmike kmike commented Aug 11, 2013

I tried to implement "iterative parameter mixing" strategy for distributed training of structured perceptron:

Ryan Mcdonald, Keith Hall, and Gideon Mann (2010) Distributed training strategies for the structured perceptron. NAACL'10.

The idea is the following:

  • training data is split into N "shards" (this happens only once);
  • for each shard OneEpochPerceptron is created - this could happen on different machine;
  • all OneEpochPerceptrons start with the same weights (but with different training data);
  • at the end of each iteration learned weights from different perceptrons are collected and mixed together; mixed values are passed to all perceptrons on next iteration (all perceptrons receive the same state again).

So communication should involve only transferring learned weights, and each shard could have its own training data.

ParallelStructuredPerceptron is an attempt to reimplement StructuredPerceptron in terms of OneEpochPerceptrons. It has n_jobs parameter, and ideally it should use multiprocessing or multithreading (numpy/scipy releases GIL and the bottleneck is in dot product isn't it?) for faster training. But I didn't manage to make multiprocessing work without copying shard's X/y/lengths each iteration, so n_jobs = N just creates N OneEpochPerceptrons and trains them sequentially.

Ideally, I want OneEpochPerceptron to be easy to use with IPython.parallel in distributed environment, and ParallelStructuredPerceptron to be easy to use on single machine.

Issues with current implementation:

  • "parallel" part is not implemented in ParallelStructuredPerceptron (I'm not very versed with multiprocessing/joblib/... and I don't know how to make it work without copying training data on each iteration - ideas are welcome);
  • code duplication in SequenceShards vs SequenceKFold;
  • code duplication in OneEpochPerceptron vs StructuredPerceptron vs ParallelStructuredPerceptron;
  • OneEpochPerceptron uses 'transform' method to learn updated weights;
  • I don't understand original classes/class_range/n_classes code so maybe I broke something here, and there is also code duplication;
  • parameters are mixed uniformly - mixing strategy that takes loss in account is not implemented;
  • not sure about class names and code organization.

sequence_ids shuffling method is changed to make ParallelStructuredPerceptron and StructuredPerceptron learn exactly the same weights given the same random_state.

With n_jobs=1 ParallelStructuredPerceptron is about 10% slower than StructuredPerceptron on my data; I think we could join these classes when (and if) ParallelStructuredPerceptron will be ready.

@kmike kmike mentioned this pull request Aug 11, 2013
from __future__ import absolute_import
import numpy as np

class SequenceShards(object):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have to read this through, but can't this "sharding" be reduced to SequenceKFold somehow? It looks awfully similar.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "SequenceKFold" can (and should) be implemented using "sharding", but not the other way around. It is a copy-pasted version of SequenceKFold indeed, with some unnecessary stuff removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants