Where can I download the dataset? #2

datong-new · 2020-01-17T14:29:43Z

Hello, thanks for the wonderful work!

Can you give more details about the dataset? And where can I download the dataset?

Thank you!

maxjcohen · 2020-01-17T17:36:31Z

Hi,

You can find a sample of the dataset, as well as a brief description, as an open data challenge, in csv format. You will have to transpose it to npz format, or use a custom pytorch dataset (see the challenge demo repo), in order to use the notebooks.

Bests

HuskyLens · 2020-02-03T05:10:26Z

Hi,
Would you like to share the npz file? As the data structure from Open Data Challenge seems different from yours.
See the difference:
Yours
Origin

maxjcohen · 2020-02-06T15:07:33Z

Hi, I can't share a npz file containing any other data than the ones uploaded on the data challenge, as it would go against the very rules of the challenge.
The structure of the labels is different, but that shouldn't be an issue if you just want to convert the csv dataset to npz, as the code was written with these possible modifications in mind. Just load the csv with the OzeDataset class, and export R, Z and X using np.savez. You're aiming at this kind of data structure.

francisduan · 2020-03-25T22:24:34Z

Hi do you have any code that could transform the csv to npz, I am not sure what we should include in the npz

maxjcohen · 2020-04-03T11:24:39Z

Once again, all needed information are present in the challenge benchmark repo, but to prevent further questions on the dataset I have drafted a function to convert csv to npz.

DanielAtKrypton · 2020-05-01T07:37:26Z

Dear @maxjcohen , I joined the challenge 28, downloaded the following files:

x_train_LsAZgHU.csv
y_train_EFo1WyE.csv
x_test_QK7dVsy.csv

Then I copied csv2npz script to utils folder within the project.
Then I created and ran the following python script at project's root folder:

from src.utils.csv2npz import csv2npz

csv2npz('datasets/x_train_LsAZgHU.csv', 'datasets/y_train_EFo1WyE.csv')

But unfortunately it errored as can be seen below.

Traceback (most recent call last):
  File "/home/<username>/Workspaces/Python/transformer/generateNpz.py", line 3, in <module>
    csv2npz('datasets/x_train_LsAZgHU.csv', 'datasets/y_train_EFo1WyE.csv')
  File "/home/<username>/Workspaces/Python/transformer/src/utils/csv2npz.py", line 21, in csv2npz
    R = x[labels["R"]].values
  File "/home/<username>/.virtualenvs/.env/lib/python3.7/site-packages/pandas/core/frame.py", line 2806, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
  File "/home/<username>/.virtualenvs/.env/lib/python3.7/site-packages/pandas/core/indexing.py", line 1553, in _get_listlike_indexer
    keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
  File "/home/<username>/.virtualenvs/.env/lib/python3.7/site-packages/pandas/core/indexing.py", line 1646, in _validate_read_indexer
    raise KeyError(f"{not_found} not in index")
KeyError: "['initial_temperature', 'roof_1_thickness_3'] not in index"

maxjcohen · 2020-05-02T08:01:29Z

Hi, this error means that the index "initial_temperature" and "roof_thickness_3" are not present in the challenge dataset. Indeed, if you take the original labels.json, these values are not present, because they were not intended to be used in the challenge.

In order to solve your error, I recommend using the original labels file from the benchmark repo.

DanielAtKrypton · 2020-05-03T06:39:06Z

I created a pull request #6 with some improvements I came up with up to now, it might be useful to merge @maxjcohen, please advise.

jshyou · 2021-01-07T20:58:46Z

I am looking at your project and try to process different dataset. If convenient, please describe the data format so I can process any data beyond the challenge dataset only. Thanks.

maxjcohen · 2021-01-18T08:32:30Z

Hi, there is no particular data format to use with the Transformer beside the input shape specified in the documentation.

We currently handle our data using the OzeDataset class, inherited from PyTorch's Dataset class. As the format here is a bit specific, I encourage you to write your own Dataset inherited class fitting your data, and feed it to the Transformer.

jiange91 · 2021-02-25T07:30:33Z

Hi, thanks for the reference for the helpful data loading function. Just one minor tip here.

The original data loader uses X.values.reshape((m,-1,k)) where m is the number of observations and k is the length of time series. However, a normal LSTM or Transformer model accepts an input vector in shape (batch, time series length, num_feature). Thus the reshaping of (m, k, -1) is recommended. Same for variable "Z" (have to point out that the naming is quite confusing at the first glance.)
X = X.values.reshape((m, K, -1))
Z = Z.values.reshape((m, K, -1))

For the labels.jason, I delete "week" and "light_blabla_mask" (can't remember the name but the error message alert me that this index is not found). You can also refer to the data specification on Challenge website https://challengedata.ens.fr/participants/challenges/28/ to modify your labels.jason

My final input vector size is (8, 672, 18) (8 batches, 672 time-series, 18 features ignoring room-paras.) - 2021 / 2 / 25

maxjcohen · 2021-02-27T10:31:22Z

LSTM in pytorch accepts a vector of shape (time series length, batch, num_features), see the docs.

diegoquintanav · 2021-04-19T16:52:16Z

I managed to get a .npz file using the labels.json from https://raw.githubusercontent.com/maxjcohen/ozechallenge_benchmark/master/labels.json and the code from https://gist.github.com/diegoquintanav/050765be2ff3f4cfcf7c25da645cfcc2

However, in the notebook in https://timeseriestransformer.readthedocs.io/en/latest/notebooks/trainings/training_2020_06_27__164648.html#Load-dataset the dataset used has (I think) 25k rows (the one downloaded from the ozechallenge has 7500

$ wc -l dataset/x_train_LsAZgHU.csv 
7501 dataset/x_train_LsAZgHU.csv

If I change the splits to dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (5500, 1000, 1000)), I hit an error in the cell that does the training:

[Epoch   1/30]:   0%|          | 0/5500 [00:00<?, ?it/s]

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-20-4b3396332a6c> in <module>
     12 
     13             # Propagate input
---> 14             netout = net(x.to(device))
     15 
     16             # Comupte loss

~/.pyenv/versions/anaconda3-5.3.1/envs/tfm/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/code/notebooks/transformers/transformer/tst/transformer.py in forward(self, x)
    123 
    124         # Embeddin module
--> 125         encoding = self._embedding(x)
    126 
    127         # Add position encoding

~/.pyenv/versions/anaconda3-5.3.1/envs/tfm/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/.pyenv/versions/anaconda3-5.3.1/envs/tfm/lib/python3.8/site-packages/torch/nn/modules/linear.py in forward(self, input)
     92 
     93     def forward(self, input: Tensor) -> Tensor:
---> 94         return F.linear(input, self.weight, self.bias)
     95 
     96     def extra_repr(self) -> str:

~/.pyenv/versions/anaconda3-5.3.1/envs/tfm/lib/python3.8/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1751     if has_torch_function_variadic(input, weight):
   1752         return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
-> 1753     return torch._C._nn.linear(input, weight, bias)
   1754 
   1755 

RuntimeError: mat1 dim 1 must match mat2 dim 0

What is this 'datasets/dataset_57M.npz'? and what are X, R and Z? thanks!

maxjcohen · 2021-04-20T07:57:03Z

Hi, the dataset from the challenge and the one I'm using on this repo are quite different, this is why dimensions don't match. If you want to use this Transformer for the challenge, you'll have to make a few ajdustements.

As for your question about X, R and Z, you can check #28 .

diegoquintanav · 2021-04-20T08:48:22Z

Hi!, thanks for answering.

Can you tell me more about the differences? For example, what are the shapes of X, R, and Z indataset_57M.npz? Also, I'm lost when you say that

If you want to use this Transformer for the challenge, you'll have to make a few adjustments.

Is this not what is going on in this repo? In the readme, you say that the dataset used to train this transformer is the one from the challenge, but that does not seem to be the case. Can you tell me more about what are the adjustments needed?

maxjcohen · 2021-05-03T07:18:57Z

The variables X, R and Z are proper to the challenge dataset, and completely independent from the Transformer model. They simply describe the dataset, with 2 inputs instead of the usual one:

R contains the characteristics of the building, which don't change with time, and are concatenated with Z to serve as input. Shape should be (n_samples, n_characteristics).
Z contains the input time series. Shape should be (n_samples, time_steps, n_input_variables).
X contains the output time series. Shape should be (n_samples, time_steps, n_output_variables).

The original dataset from the challenge has been modified, for instance some variables where removed from R, some added to Z, etc. But the content is roughly the same, and should be sufficient for trying out the Transformer. All changes can be found in the files labels.json.

Please keep in mind that the dataset dataset_57M.npz is not available for download.

inkyusa · 2021-05-12T09:51:48Z

Thanks to the author for the great intuitions and efforts.

For those who may have issues related to the dataset, you might be able to try this that I slightly modified according to the author's suggestions.
https://github.com/afters-cool/transformer

and dataset
https://github.com/afters-cool/transformer/releases/tag/v0.0.1

You can check some plots resulted from the code above (don't know whether it's correct or not).
https://github.com/afters-cool/transformer/tree/master/assets

Hope this helped someone.

sarraAyed · 2021-10-13T16:17:46Z

The dataset of the challenge contain a file named x_train and y_train. Do they complement each other or one of them is enough ?
Plus, If my data are already in a csv file, can't I just devide them into train, test and validate directly and just use them ?

maxjcohen · 2021-10-15T09:55:50Z

Hi, yes they complement each other, x_train are the command (input vectors) while y_train are the observations (output vectors). You are, of course, free to divide your data however you desire.
In the future, please keep discussions about the challenge in the challenge repo.

gaoyanfei1 · 2023-04-10T12:37:56Z

Thank you for your work!

chrismen · 2023-05-17T02:16:33Z

I am new to Transformer methods. Can the package accept csv files directly instead of .npz files?

maxjcohen · 2023-06-02T07:54:47Z

In this repo, we define a Transformer model that takes as inputs Tensors, see the documentation. We present examples loading data as .npz files, but you can load data however you want.

yyldtc · 2024-04-09T02:22:48Z

可以把数据集这一块，做一个详细的解释吗，我已经下载了这两个数据集dataset.npz和lable.json，也放在了目录中，但还是无法运行代码

maxjcohen · 2024-04-10T15:32:08Z

Hi @yyldtc , from what I was able to translate from your message, something is still not working with the dataset. Could you detail the error that you got in a new issue ? I'll take a look.

maxjcohen added the dataset Issue with downloading or loading the dataset label Jan 17, 2020

maxjcohen closed this as completed Jan 20, 2020

maxjcohen mentioned this issue Apr 3, 2020

dataset #3

Closed

maxjcohen mentioned this issue May 2, 2020

Where to download the dataset? #4

Closed

maxjcohen mentioned this issue Jun 11, 2020

transformer (forward pass) #10

Closed

maxjcohen mentioned this issue Jul 3, 2020

Link to datasets? #14

Closed

maxjcohen mentioned this issue Jul 13, 2020

About params settings when using dataset_CAPT_v7.npz #16

Closed

maxjcohen pinned this issue Jul 17, 2020

hongjianyuan mentioned this issue Jul 21, 2020

runtimeerror #17

Closed

Cyh294 mentioned this issue Oct 29, 2020

RuntimeError: Expected object of scalar type Byte but got scalar type Bool for argument #2 'mask' #26

Closed

maxjcohen mentioned this issue Apr 25, 2022

Hello, thanks for your great works, I'm confused with the dataset. #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where can I download the dataset? #2

Where can I download the dataset? #2

datong-new commented Jan 17, 2020 •

edited

Loading

maxjcohen commented Jan 17, 2020

HuskyLens commented Feb 3, 2020

maxjcohen commented Feb 6, 2020

francisduan commented Mar 25, 2020

maxjcohen commented Apr 3, 2020

DanielAtKrypton commented May 1, 2020 •

edited

Loading

maxjcohen commented May 2, 2020

DanielAtKrypton commented May 3, 2020

jshyou commented Jan 7, 2021

maxjcohen commented Jan 18, 2021

jiange91 commented Feb 25, 2021

maxjcohen commented Feb 27, 2021

diegoquintanav commented Apr 19, 2021 •

edited

Loading

maxjcohen commented Apr 20, 2021

diegoquintanav commented Apr 20, 2021

maxjcohen commented May 3, 2021

inkyusa commented May 12, 2021 •

edited

Loading

sarraAyed commented Oct 13, 2021 •

edited

Loading

maxjcohen commented Oct 15, 2021

gaoyanfei1 commented Apr 10, 2023

chrismen commented May 17, 2023

maxjcohen commented Jun 2, 2023

yyldtc commented Apr 9, 2024

maxjcohen commented Apr 10, 2024

Where can I download the dataset? #2

Where can I download the dataset? #2

Comments

datong-new commented Jan 17, 2020 • edited Loading

maxjcohen commented Jan 17, 2020

HuskyLens commented Feb 3, 2020

maxjcohen commented Feb 6, 2020

francisduan commented Mar 25, 2020

maxjcohen commented Apr 3, 2020

DanielAtKrypton commented May 1, 2020 • edited Loading

maxjcohen commented May 2, 2020

DanielAtKrypton commented May 3, 2020

jshyou commented Jan 7, 2021

maxjcohen commented Jan 18, 2021

jiange91 commented Feb 25, 2021

maxjcohen commented Feb 27, 2021

diegoquintanav commented Apr 19, 2021 • edited Loading

maxjcohen commented Apr 20, 2021

diegoquintanav commented Apr 20, 2021

maxjcohen commented May 3, 2021

inkyusa commented May 12, 2021 • edited Loading

sarraAyed commented Oct 13, 2021 • edited Loading

maxjcohen commented Oct 15, 2021

gaoyanfei1 commented Apr 10, 2023

chrismen commented May 17, 2023

maxjcohen commented Jun 2, 2023

yyldtc commented Apr 9, 2024

maxjcohen commented Apr 10, 2024

datong-new commented Jan 17, 2020 •

edited

Loading

DanielAtKrypton commented May 1, 2020 •

edited

Loading

diegoquintanav commented Apr 19, 2021 •

edited

Loading

inkyusa commented May 12, 2021 •

edited

Loading

sarraAyed commented Oct 13, 2021 •

edited

Loading