Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understand the dataset dimension #28

Closed
clsx524 opened this issue Nov 9, 2020 · 6 comments
Closed

Understand the dataset dimension #28

clsx524 opened this issue Nov 9, 2020 · 6 comments

Comments

@clsx524
Copy link

clsx524 commented Nov 9, 2020

I am using the npz_check function to generate the npz file. Before it dumps the data to npz I printed out the dimension of R, Z and X. They are R: (7500, 19) X: (7500, 8, 672) Z: (7500, 18, 672). There are 7500 rows and 672 entry for one time series, as described by the challenge. 19, 8 and 18 are the number of labels for R, Z and X defined in labels.JSON. But I am wondering why R is not defined with 672 entries and is there any particular reason to define it like this?

Npz_check function and these variables are calculated in this file https://github.com/maxjcohen/ozechallenge_benchmark/blob/master/src/utils.py#L218

@maxjcohen
Copy link
Owner

Hi,

Any issue regarding the data challenge should be posted in the data challenge repo that you just linked, as to keep things tidy.

Now regarding your question, R contains characteristics of the building that do not evolve over time, thus there is no reason to add the 672-length time dimension. In the pre processing of the benchmark, and of this repo, I tile R in order to easily concatenate all input tensors, this is just a implementationnal trick.

@clsx524
Copy link
Author

clsx524 commented Nov 9, 2020

Ah, I missed the tile function in this repo. Now I get the right dimension for R.

In the train notebook of this repo, my x and y read from the dataset have dimension of (7500, 18, 691] and [7500, 8, 672] respectively. I can understand it that there are 7500 rows, 18 labels for Z, 672 + 19 (latent labels for R) = 691 in one time series and 8 labels to predict.

When it comes to calculate loss at loss = loss_function(y.to(device), netout). netout is supposed to have the same dimension of y, which is (7500, 8, 672). So my question is where in the network does the transformation from 18 to 8?

Btw, great work and appreciate you can open source it.

@maxjcohen
Copy link
Owner

I think you misunderstood the role of R here: the 19 variables are not to be added to the time dimension. If you choose to tile R, you would obtain a tensor of shape (7500, 19, 672), which you could then concatenate with Z to obtain a (7500, 19+18, 672) tensor.

As to understand how the Transformer converts a tensor of dimension 18 to 8, you should take a look at the original paper, or one of the detailed analysis : The Annotated Transformer and The Illustrated Transformer.

@clsx524
Copy link
Author

clsx524 commented Nov 14, 2020

Thanks. I figured out the issue.

@diegoquintanav
Copy link

@maxjcohen did you see any improvement during training by providing the time-independent sequences in R? As I see it, the distribution of attention weights should be uniform in these sequences. Can you share more about the intuition behind this? Thanks!

@maxjcohen
Copy link
Owner

Hi, I added variables contained in R at each time step to simplify the implementation. Otherwise, the input vector could have had a different dimension at different time steps.

I haven't yet looked at the weights, but I agree they should be uniform, although you could argue that some cyclic patterns could appear. For instance, the "window area" variable holds most of its value during sunny hours, which could appear in the Transformer's weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants