Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transform objects for temporal point processes (#294) #341

Merged
merged 1 commit into from
Dec 7, 2019

Conversation

canerturkmen
Copy link
Contributor

Issue #, if available:

This is the first PR setting up the infrastructure for temporal point processes in GluonTS (#294).

Description of changes:

Put simply it introduces the ContinuousTimeInstanceSplitter object, which somewhat replicates the logic of an InstanceSplitter for but for point processes in continuous time. I thought this was a good first PR for me, and a good place to start the process of bringing in TPPs.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@canerturkmen
Copy link
Contributor Author

I understand both test fails are related to building docs on the GPU instance. Can't tell if they're related to the PR.

@lostella
Copy link
Contributor

Can't tell if they're related to the PR.

They aren't, we will look into the CI configuration. And thanks for the PR!

@@ -155,6 +155,42 @@ def __call__(self, ts: np.ndarray, a: int, b: int) -> np.ndarray:
return np.array([b])


class ContinuousTimePointSampler:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This abstraction looks very similar to the InstanceSampler: https://github.com/awslabs/gluon-ts/blob/a97518d436dbb8e2cb6e2958d4ab899e5fa43a9d/src/gluonts/transform.py#L94

The only difference seems the num_instances constructor argument here. I'm wondering if the two could be condensed somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in continuous time you sample between points in time (float), as opposed to indices (int).

@timoschowski timoschowski added this to the v0.4 milestone Oct 9, 2019
@jaheba jaheba modified the milestones: v0.4, v0.5 Oct 24, 2019
@codecov-io
Copy link

codecov-io commented Nov 18, 2019

Codecov Report

Merging #341 into master will increase coverage by 0.07%.
The diff coverage is 96.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #341      +/-   ##
==========================================
+ Coverage    83.7%   83.78%   +0.07%     
==========================================
  Files         159      159              
  Lines        9403     9458      +55     
==========================================
+ Hits         7871     7924      +53     
- Misses       1532     1534       +2
Impacted Files Coverage Δ
src/gluonts/transform.py 89.69% <96.42%> (+0.6%) ⬆️

@canerturkmen
Copy link
Contributor Author

@lostella Is this a good time to bump this? :)

@lostella
Copy link
Contributor

@lostella Is this a good time to bump this? :)

Definitey :-) will go through it asap

@canerturkmen
Copy link
Contributor Author

May I suggest @lovvge as a reviewer as well?

@lostella lostella requested a review from lovvge November 25, 2019 20:24
src/gluonts/transform.py Outdated Show resolved Hide resolved
@canerturkmen canerturkmen force-pushed the rmtpp_transform branch 2 times, most recently from 84da906 to 5bb91fc Compare November 25, 2019 21:51
src/gluonts/transform.py Show resolved Hide resolved
length of the interval seen before making prediction
future_interval_length
length of the interval that must be predicted
train_sampler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. It says "does not accept time-series fields" above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm making it a little more verbose now, but the intention was that the Transformation does not take time_series_fields (not target_field), which I remember were intended for features?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the names are not defined in the transformation chain but in the estimator so it does not really matter.

src/gluonts/transform.py Outdated Show resolved Hide resolved
src/gluonts/transform.py Show resolved Hide resolved

for future_start in sampling_times:

r = dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? You could modify your data copy (d) directly without introducing a new dict, no? This would avoid introducing a new variable and the adding the other fields later. Either way, r should be of type DataEntry if this is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d was initialized outside of the loop here (this may be a spurious copy, now that I look at it). ris the output dict.

I'm not a fan of the style in InstanceSplitter where d is "copied" once per each output. However this is (probably?) a shallow copy. Moreover, features an explicit

del d[ts_field]

I thought this style was a little safer, or maybe I'm missing something here?

In any case, ditto for DataEntry 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I would regard this as safer is if you have to do multiple transformations that rely on one original entry on in the DataEntry (which would make sense to have it immutable in that case). I'm not a fan because it introduces more code than necessary, but tastes differ :-).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I agree. I thought in flatmap_transformation this is indeed the intention that multiple entries derive from a single input. Hence the extra code, to stress that the input is immutable.

Just crossing the t's here to make sure :)

src/gluonts/transform.py Outdated Show resolved Hide resolved
canerturkmen pushed a commit to canerturkmen/gluon-ts that referenced this pull request Dec 6, 2019
canerturkmen pushed a commit to canerturkmen/gluon-ts that referenced this pull request Dec 6, 2019
@canerturkmen canerturkmen merged commit fbcb244 into awslabs:master Dec 7, 2019
@canerturkmen canerturkmen deleted the rmtpp_transform branch December 7, 2019 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants