Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rugby Analytics commit #764

Closed
wants to merge 14 commits into from
Closed

Rugby Analytics commit #764

wants to merge 14 commits into from

Conversation

springcoil
Copy link
Contributor

Second rugby analytics commit

@jsalvatier
Copy link
Member

Cool! Thanks :)

A few comments.

You don't need the first model = pm3.Model() since you already use the with statement.
I would fit a slightly different model. I don't think you need to de-mean atts and defs. Or maybe I'm misunderstanding why you've done that.

Also, you don't seem to be using away_theta?

with pm3.Model() as model:
    # global model parameters
    home        = pm3.Normal('home',      0, .0001)
    tau_att     = pm3.Gamma('tau_att',   .1, .1)
    tau_def     = pm3.Gamma('tau_def',   .1, .1)
    intercept   = pm3.Normal('intercept', 0, .0001)

    # team-specific model parameters

    atts   = pm3.Normal("atts", 
                           mu   =0,
                           tau  =tau_att, 
                           shape=num_teams)

    home = pm3.Normal("mean_defs", 0, .0001)
    defs   = pm3.Normal("defs", 
                           mu   =home,
                           tau  =tau_def,  
                           shape=num_teams) 

    home_theta  = tt.exp(intercept + atts[away_team] + defs[home_team])

    # likelihood of observed data
    home_points = pm3.Poisson('home_points', mu=home_theta, observed=observed_home_goals)

@springcoil
Copy link
Contributor Author

The demeaning was necessary when reproducing the paper. It might be just a
hack though.

On Wednesday, 17 June 2015, John Salvatier notifications@github.com wrote:

Cool! Thanks :)

A few comments.

You don't need the first model = pm3.Model() since you already use the
with statement.
I would fit a slightly different model. I don't think you need to de-mean
atts and defs. Or maybe I'm misunderstanding why you've done that.

Also, you don't seem to be using away_theta?

with pm3.Model() as model:
# global model parameters
home = pm3.Normal('home', 0, .0001)
tau_att = pm3.Gamma('tau_att', .1, .1)
tau_def = pm3.Gamma('tau_def', .1, .1)
intercept = pm3.Normal('intercept', 0, .0001)

# team-specific model parameters

atts   = pm3.Normal("atts",
                       mu   =0,
                       tau  =tau_att,
                       shape=num_teams)

home = pm3.Normal("mean_defs", 0, .0001)
defs   = pm3.Normal("defs",
                       mu   =home,
                       tau  =tau_def,
                       shape=num_teams)

home_theta  = tt.exp(intercept + atts[away_team] + defs[home_team])

# likelihood of observed data
home_points = pm3.Poisson('home_points', mu=home_theta, observed=observed_home_goals)


Reply to this email directly or view it on GitHub
#764 (comment).

Peadar Coyle
Skype: springcoilarch
www.twitter.com/springcoil
peadarcoyle.wordpress.com

@springcoil
Copy link
Contributor Author

The away theta should be used but maybe I deleted it. I will look tomorrow

On Wednesday, 17 June 2015, Peadar Coyle peadarcoyle@googlemail.com wrote:

The demeaning was necessary when reproducing the paper. It might be just a
hack though.

On Wednesday, 17 June 2015, John Salvatier <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Cool! Thanks :)

A few comments.

You don't need the first model = pm3.Model() since you already use the
with statement.
I would fit a slightly different model. I don't think you need to de-mean
atts and defs. Or maybe I'm misunderstanding why you've done that.

Also, you don't seem to be using away_theta?

with pm3.Model() as model:
# global model parameters
home = pm3.Normal('home', 0, .0001)
tau_att = pm3.Gamma('tau_att', .1, .1)
tau_def = pm3.Gamma('tau_def', .1, .1)
intercept = pm3.Normal('intercept', 0, .0001)

# team-specific model parameters

atts   = pm3.Normal("atts",
                       mu   =0,
                       tau  =tau_att,
                       shape=num_teams)

home = pm3.Normal("mean_defs", 0, .0001)
defs   = pm3.Normal("defs",
                       mu   =home,
                       tau  =tau_def,
                       shape=num_teams)

home_theta  = tt.exp(intercept + atts[away_team] + defs[home_team])

# likelihood of observed data
home_points = pm3.Poisson('home_points', mu=home_theta, observed=observed_home_goals)


Reply to this email directly or view it on GitHub
#764 (comment).

Peadar Coyle
Skype: springcoilarch
www.twitter.com/springcoil
peadarcoyle.wordpress.com

Peadar Coyle
Skype: springcoilarch
www.twitter.com/springcoil
peadarcoyle.wordpress.com

@jsalvatier
Copy link
Member

Ah, gotcha, its reproducing a paper. That makes a lot of sense. Maybe add a
comment that you might not want to do that.

On Wed, Jun 17, 2015 at 2:46 PM, springcoil notifications@github.com
wrote:

The demeaning was necessary when reproducing the paper. It might be just a
hack though.

On Wednesday, 17 June 2015, John Salvatier notifications@github.com
wrote:

Cool! Thanks :)

A few comments.

You don't need the first model = pm3.Model() since you already use the
with statement.
I would fit a slightly different model. I don't think you need to de-mean
atts and defs. Or maybe I'm misunderstanding why you've done that.

Also, you don't seem to be using away_theta?

with pm3.Model() as model:

global model parameters

home = pm3.Normal('home', 0, .0001)
tau_att = pm3.Gamma('tau_att', .1, .1)
tau_def = pm3.Gamma('tau_def', .1, .1)
intercept = pm3.Normal('intercept', 0, .0001)

team-specific model parameters

atts = pm3.Normal("atts",
mu =0,
tau =tau_att,
shape=num_teams)

home = pm3.Normal("mean_defs", 0, .0001)
defs = pm3.Normal("defs",
mu =home,
tau =tau_def,
shape=num_teams)

home_theta = tt.exp(intercept + atts[away_team] + defs[home_team])

likelihood of observed data

home_points = pm3.Poisson('home_points', mu=home_theta,
observed=observed_home_goals)


Reply to this email directly or view it on GitHub
#764 (comment).

Peadar Coyle
Skype: springcoilarch
www.twitter.com/springcoil
peadarcoyle.wordpress.com


Reply to this email directly or view it on GitHub
#764 (comment).

@springcoil
Copy link
Contributor Author

I will add a comment.
Something about the mean of teams affected the results. I forget the
details.

I will flesh it out :)

On Wednesday, 17 June 2015, John Salvatier notifications@github.com wrote:

Ah, gotcha, its reproducing a paper. That makes a lot of sense. Maybe add a
comment that you might not want to do that.

On Wed, Jun 17, 2015 at 2:46 PM, springcoil <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

The demeaning was necessary when reproducing the paper. It might be just
a
hack though.

On Wednesday, 17 June 2015, John Salvatier <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

Cool! Thanks :)

A few comments.

You don't need the first model = pm3.Model() since you already use the
with statement.
I would fit a slightly different model. I don't think you need to
de-mean
atts and defs. Or maybe I'm misunderstanding why you've done that.

Also, you don't seem to be using away_theta?

with pm3.Model() as model:

global model parameters

home = pm3.Normal('home', 0, .0001)
tau_att = pm3.Gamma('tau_att', .1, .1)
tau_def = pm3.Gamma('tau_def', .1, .1)
intercept = pm3.Normal('intercept', 0, .0001)

team-specific model parameters

atts = pm3.Normal("atts",
mu =0,
tau =tau_att,
shape=num_teams)

home = pm3.Normal("mean_defs", 0, .0001)
defs = pm3.Normal("defs",
mu =home,
tau =tau_def,
shape=num_teams)

home_theta = tt.exp(intercept + atts[away_team] + defs[home_team])

likelihood of observed data

home_points = pm3.Poisson('home_points', mu=home_theta,
observed=observed_home_goals)


Reply to this email directly or view it on GitHub
#764 (comment).

Peadar Coyle
Skype: springcoilarch
www.twitter.com/springcoil
peadarcoyle.wordpress.com


Reply to this email directly or view it on GitHub
#764 (comment).


Reply to this email directly or view it on GitHub
#764 (comment).

Peadar Coyle
Skype: springcoilarch
www.twitter.com/springcoil
peadarcoyle.wordpress.com

@springcoil
Copy link
Contributor Author

I know this is still on me, but I've been busy lately. I'll get back to it soon

@springcoil
Copy link
Contributor Author

I made some updates based on what you said John. I think it should be ready to be merged soon. Let me know if you need any other documentation.

@springcoil
Copy link
Contributor Author

It passed the checks and the documentation is ready. Can someone like @jsalvatier or @twiecki merge this?

@twiecki
Copy link
Member

twiecki commented Aug 14, 2015

Thanks! Isn't there more description in PP for hackers or a blog post we could include here? Would make the example much stronger.

@springcoil
Copy link
Contributor Author

Hey Thomas,
Yeah sure - I'll dive into my blog posts and turn the .py file into
something with more substance.

I agree the example could be made much stronger

On Fri, Aug 14, 2015 at 8:30 AM, Thomas Wiecki notifications@github.com
wrote:

Thanks! Isn't there more description in PP for hackers or a blog post we
could include here? Would make the example much stronger.


Reply to this email directly or view it on GitHub
#764 (comment).

Peadar Coyle
Skype: springcoilarch
www.twitter.com/springcoil
peadarcoyle.wordpress.com

@twiecki
Copy link
Member

twiecki commented Aug 14, 2015

We probably don't even need the .py file, just the ipython NB which we could then convert to a .py.

@springcoil
Copy link
Contributor Author

I updated the IPython notebook, I thought I correctly used rebase,
however it seems I haven't I'll need to squash these commits.
I used git fetch upstream and git rebase -i origin master

Fix typos in backend documentation

re parallelize tests

move deps back where they belong

fix parallel tests

typo

upgrade to pymc3 Beta!

add link for NUTS

commas

set transparency for histograms

make transforms classes

typo

add basic transform tests

actually basic test transforms

categorical foolings

improvements for simplex, but still broken

test and fix simplex jacobian

fix logtransform

test all jacobians

check transforms are going the right way

fix categorical

rename transforms

test and fix sum_to_1

fix unit continuous

try docker builds

remove sudo

Revert "remove sudo"

This reverts commit f2ce835.

Revert "try docker builds"

This reverts commit b4d43ad.

added some docstrings

don't import nonexistant things

Combine chains by default

Make all methods for accessing variable values concatenate chains by
default, because this is likely to be the desired output most of the
time.

See #758 and #759.

Simplify single-trace __getitem__ method

The get_values fallback in the single-trace __getitem__ method was
present mostly as leftover code from a previous design.  MultiTrace is
the only user-facing trace object, so there is not much benefit of
letting __getitem__ fall back to get_values for convenience.

Allow trace variable values to be sliced

Extend indexing of trace objects to support an additional slice object
that specifies the burn and thin arguments of get_values:

    >>> trace[x, start::step]

Above differs from

    >>> trace[x][start::step]

because the second form operates on an array that is the combination
of all chains.

BaseTrace: Remove chain keyword argument to point

This is stale code.  The current BaseTrace only deals with a for a
chain argument in the point method.  The point method signatures in
the children classes are already correct.

MultiTrace: Rework docstring

* Add more information about getting variable values.

* Remove parameter information, which isn't relevant because users get
  back an initialized instance from sampling.

backends/base.py: Fix docstring typo

Differentiate names for single-chain traces

Previously, the variable name "trace" was confusingly used to refer to
both BaseTrace instances and MultiTrace instances, making readers
infer which type it was from context.

Change code to use the following conventions:

* Use "strace" for variable names of BaseTrace instances to indicate a
  single-chain trace (where "single-chain" trace refers to an object
  derived from BaseTrace, but not a MultiTrace object with only one
  chain).

* MultiTrace instances are called either "mtrace" or "trace".

All changes in this commit are purely variable renames and are not
user-facing.

Added alpha as an argument to traceplot

NDArray: Rename variable

Rename variable to make it clear that it refers to values for a single
model variable rather than a trace object.

#771 (comment)

Remove gridspec import check

Commit 3c3273d added a check on the gridspec import to support
older versions of matplotlib.  However, the minimum version that is
now specified (1.2.1) has gridspec, so this check is no longer needed.

combine leapfrog and energy computations

combine metropolis density comparison

cleanup things

revert some unintentional changes

avoid infinite recursion

actually fix arraystep

work with tensors too

shapes have to be ints

fix lkjcorr distribution

fastarraystep->arraystepshared

DOC Remove comment about TransformedVar in stoch vol example.

Removed TransformedVar from profiling example

Added standard normal cdf

Added ExGaussian distribution and (rubbish) tests

Added 2-parameter Inverse Gaussian distribution

Added 3-parameter shifted inverse gaussian.

Fix for phi parametrization in the inverse gaussian

Added newline at end of file.

Improved handling of alternative parametrization for inverse gaussian.

Added mean attribute to inverse gaussian.

Added switch to ExGaussian logp and missing self to method random.

Moved inverse gaussian stuff (and tests) to Wald.

Added condition to bound function in Wald logp

Fixed Wald logp

Fixed typos and removed whitespace

Remove unnecssary size checks from Wald/exGaussian random

Added disc_vars property to complement cont_vars. Might be useful for assigning discretes to a Metropolis sampler, for example.

MultiTrace: Check output before returning slice

Out-of-memory backends give a warning when the user tries to slice them,
so the result may be a list of Nones.

re: #790

Support slicing in SQlite and Text backends

Return a NDArray slice instead of warning when an out-of-memory backend
is sliced.

Fixes #790

ENH Add example of a Gaussian Mixture Model.

Add author

fix inappropriate summing

fix test for elemwise transform jacobian dets

remove debug statement
@springcoil
Copy link
Contributor Author

Anyone have any idea why this build is failing?

@twiecki
Copy link
Member

twiecki commented Aug 22, 2015

#803

@twiecki
Copy link
Member

twiecki commented Aug 22, 2015

But also, something seems to have gone wrong with the rebase.

git checkout mybranch
git rebase master
git push -f origin mybranch

@springcoil
Copy link
Contributor Author

Opening a new branch :)

springcoil and others added 5 commits August 23, 2015 16:45
Second commit

removed get_ipython

Update of the ipython notebook with more information

I added a slight update to include more tutorial material

slight update 2

Use Text as the example backend in docstring

Fix typos in backend documentation

re parallelize tests

move deps back where they belong

fix parallel tests

typo

upgrade to pymc3 Beta!

add link for NUTS

commas

set transparency for histograms

make transforms classes

typo

add basic transform tests

actually basic test transforms

categorical foolings

improvements for simplex, but still broken

test and fix simplex jacobian

fix logtransform

test all jacobians

check transforms are going the right way

fix categorical

rename transforms

test and fix sum_to_1

fix unit continuous

try docker builds

remove sudo

Revert "remove sudo"

This reverts commit f2ce835.

Revert "try docker builds"

This reverts commit b4d43ad.

added some docstrings

don't import nonexistant things

Combine chains by default

Make all methods for accessing variable values concatenate chains by
default, because this is likely to be the desired output most of the
time.

See #758 and #759.

Simplify single-trace __getitem__ method

The get_values fallback in the single-trace __getitem__ method was
present mostly as leftover code from a previous design.  MultiTrace is
the only user-facing trace object, so there is not much benefit of
letting __getitem__ fall back to get_values for convenience.

Allow trace variable values to be sliced

Extend indexing of trace objects to support an additional slice object
that specifies the burn and thin arguments of get_values:

    >>> trace[x, start::step]

Above differs from

    >>> trace[x][start::step]

because the second form operates on an array that is the combination
of all chains.

BaseTrace: Remove chain keyword argument to point

This is stale code.  The current BaseTrace only deals with a for a
chain argument in the point method.  The point method signatures in
the children classes are already correct.

MultiTrace: Rework docstring

* Add more information about getting variable values.

* Remove parameter information, which isn't relevant because users get
  back an initialized instance from sampling.

backends/base.py: Fix docstring typo

Differentiate names for single-chain traces

Previously, the variable name "trace" was confusingly used to refer to
both BaseTrace instances and MultiTrace instances, making readers
infer which type it was from context.

Change code to use the following conventions:

* Use "strace" for variable names of BaseTrace instances to indicate a
  single-chain trace (where "single-chain" trace refers to an object
  derived from BaseTrace, but not a MultiTrace object with only one
  chain).

* MultiTrace instances are called either "mtrace" or "trace".

All changes in this commit are purely variable renames and are not
user-facing.

Added alpha as an argument to traceplot

NDArray: Rename variable

Rename variable to make it clear that it refers to values for a single
model variable rather than a trace object.

#771 (comment)

Remove gridspec import check

Commit 3c3273d added a check on the gridspec import to support
older versions of matplotlib.  However, the minimum version that is
now specified (1.2.1) has gridspec, so this check is no longer needed.

combine leapfrog and energy computations

combine metropolis density comparison

cleanup things

revert some unintentional changes

avoid infinite recursion

actually fix arraystep

work with tensors too

shapes have to be ints

fix lkjcorr distribution

fastarraystep->arraystepshared

DOC Remove comment about TransformedVar in stoch vol example.

Removed TransformedVar from profiling example

Added standard normal cdf

Added ExGaussian distribution and (rubbish) tests

Added 2-parameter Inverse Gaussian distribution

Added 3-parameter shifted inverse gaussian.

Fix for phi parametrization in the inverse gaussian

Added newline at end of file.

Improved handling of alternative parametrization for inverse gaussian.

Added mean attribute to inverse gaussian.

Added switch to ExGaussian logp and missing self to method random.

Moved inverse gaussian stuff (and tests) to Wald.

Added condition to bound function in Wald logp

Fixed Wald logp

Fixed typos and removed whitespace

Remove unnecssary size checks from Wald/exGaussian random

Added disc_vars property to complement cont_vars. Might be useful for assigning discretes to a Metropolis sampler, for example.

MultiTrace: Check output before returning slice

Out-of-memory backends give a warning when the user tries to slice them,
so the result may be a list of Nones.

re: #790

Support slicing in SQlite and Text backends

Return a NDArray slice instead of warning when an out-of-memory backend
is sliced.

Fixes #790

ENH Add example of a Gaussian Mixture Model.

Add author

fix inappropriate summing

fix test for elemwise transform jacobian dets

remove debug statement

readme update'
'

update readme.md
@springcoil
Copy link
Contributor Author

This is a lot of commits - can anyone advise me on how to properly rebase this?

@springcoil
Copy link
Contributor Author

Hmm I tried rebasing etc and it didn't work...

@kyleam
Copy link
Contributor

kyleam commented Aug 24, 2015

Hmm I tried rebasing etc and it didn't work...

It seems like things got into a pretty messy state. I've pushed what (I
think) was the intended state to km/rubgy-fix.

So I'd suggest you

  • Back up your local rugby_analytics branch from Rugby analytics: Another commit #807 to another branch.

  • Fetch from the pymc3 repo.

  • With rugby_analytics as your current branch, reset it to the new
    branch I pushed.

    git reset --hard /km/rubgy-fix

  • Re-run the notebook and update the commit with 'git commit --amend'.

  • Force push to the pymc3 repo. This will update PR Rugby analytics: Another commit #807.

@kyleam
Copy link
Contributor

kyleam commented Aug 24, 2015

Doh, please look under the folded text for my last comment. GitHub thinks the rest of my email is my signature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants