Understand the dataset dimension #28

clsx524 · 2020-11-09T08:37:34Z

I am using the npz_check function to generate the npz file. Before it dumps the data to npz I printed out the dimension of R, Z and X. They are R: (7500, 19) X: (7500, 8, 672) Z: (7500, 18, 672). There are 7500 rows and 672 entry for one time series, as described by the challenge. 19, 8 and 18 are the number of labels for R, Z and X defined in labels.JSON. But I am wondering why R is not defined with 672 entries and is there any particular reason to define it like this?

Npz_check function and these variables are calculated in this file https://github.com/maxjcohen/ozechallenge_benchmark/blob/master/src/utils.py#L218

The text was updated successfully, but these errors were encountered:

maxjcohen · 2020-11-09T08:49:41Z

Hi,

Any issue regarding the data challenge should be posted in the data challenge repo that you just linked, as to keep things tidy.

Now regarding your question, R contains characteristics of the building that do not evolve over time, thus there is no reason to add the 672-length time dimension. In the pre processing of the benchmark, and of this repo, I tile R in order to easily concatenate all input tensors, this is just a implementationnal trick.

clsx524 · 2020-11-09T09:24:10Z

Ah, I missed the tile function in this repo. Now I get the right dimension for R.

In the train notebook of this repo, my x and y read from the dataset have dimension of (7500, 18, 691] and [7500, 8, 672] respectively. I can understand it that there are 7500 rows, 18 labels for Z, 672 + 19 (latent labels for R) = 691 in one time series and 8 labels to predict.

When it comes to calculate loss at loss = loss_function(y.to(device), netout). netout is supposed to have the same dimension of y, which is (7500, 8, 672). So my question is where in the network does the transformation from 18 to 8?

Btw, great work and appreciate you can open source it.

maxjcohen · 2020-11-09T12:46:20Z

I think you misunderstood the role of R here: the 19 variables are not to be added to the time dimension. If you choose to tile R, you would obtain a tensor of shape (7500, 19, 672), which you could then concatenate with Z to obtain a (7500, 19+18, 672) tensor.

As to understand how the Transformer converts a tensor of dimension 18 to 8, you should take a look at the original paper, or one of the detailed analysis : The Annotated Transformer and The Illustrated Transformer.

clsx524 · 2020-11-14T06:46:54Z

Thanks. I figured out the issue.

diegoquintanav · 2021-06-01T09:10:04Z

@maxjcohen did you see any improvement during training by providing the time-independent sequences in R? As I see it, the distribution of attention weights should be uniform in these sequences. Can you share more about the intuition behind this? Thanks!

maxjcohen · 2021-06-04T07:54:25Z

Hi, I added variables contained in R at each time step to simplify the implementation. Otherwise, the input vector could have had a different dimension at different time steps.

I haven't yet looked at the weights, but I agree they should be uniform, although you could argue that some cyclic patterns could appear. For instance, the "window area" variable holds most of its value during sunny hours, which could appear in the Transformer's weights.

clsx524 closed this as completed Nov 14, 2020

maxjcohen mentioned this issue Apr 20, 2021

Where can I download the dataset? #2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understand the dataset dimension #28

Understand the dataset dimension #28

clsx524 commented Nov 9, 2020

maxjcohen commented Nov 9, 2020

clsx524 commented Nov 9, 2020

maxjcohen commented Nov 9, 2020

clsx524 commented Nov 14, 2020

diegoquintanav commented Jun 1, 2021

maxjcohen commented Jun 4, 2021

Understand the dataset dimension #28

Understand the dataset dimension #28

Comments

clsx524 commented Nov 9, 2020

maxjcohen commented Nov 9, 2020

clsx524 commented Nov 9, 2020

maxjcohen commented Nov 9, 2020

clsx524 commented Nov 14, 2020

diegoquintanav commented Jun 1, 2021

maxjcohen commented Jun 4, 2021