Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNN Shakespeare example giving error #44

Open
PoLabs opened this issue Nov 13, 2017 · 7 comments
Open

RNN Shakespeare example giving error #44

PoLabs opened this issue Nov 13, 2017 · 7 comments

Comments

@PoLabs
Copy link

PoLabs commented Nov 13, 2017

This code in the tutorial leads to the error:

model <- mx.lstm(X.train, X.val, 
                 ctx=mx.gpu(),
                 num.round=num.round, 
                 update.period=update.period,
                 num.lstm.layer=num.lstm.layer, 
                 seq.len=seq.len,
                 num.hidden=num.hidden, 
                 num.embed=num.embed, 
                 num.label=vocab,
                 batch.size=batch.size, 
                 input.size=vocab,
                 initializer=mx.init.uniform(0.1), 
                 learning.rate=learning.rate,
                 wd=wd,
                 clip_gradient=clip_gradient)

Error: Error in check.data(train.data, batch.size, TRUE) : could not find function "check.data"
Is this function from an unspecific package?

Additionally, there are several missing files referenced in the tutorial:
rnn_model.R, rnn.R, lstm.R, etc

best,

@PoLabs PoLabs changed the title RNN Shakespeare example giving eror RNN Shakespeare example giving error Nov 13, 2017
@jeremiedb
Copy link
Contributor

The RNN API has recently been reworked to facilitate the handling of diverse use cases, such as time-series or seq-to-one models, as well as making the symbolic graph compatible with the general feedforward model training function. I made a few examples, which aims to cover brader use cases. Don't hesitate to open a request for changes or additional features that are still missing.

@PoLabs
Copy link
Author

PoLabs commented Apr 15, 2018

Did this change again? I'm getting a 'can not find fxn rnn.graph.'

I really appreciate you posting tutorials, but I always find the sine/cosine wave example absolute batty. Just difficult to understand. Something simple like time-series with medical or financial data would be easy to grasp: I'm having a hard time following you iter set up.

@jeremiedb
Copy link
Contributor

It should still be there (https://github.com/apache/incubator-mxnet/blob/master/R-package/R/rnn.graph.R#L14). What version of the package are you

rnn.graph function serves as graph builder helper. I'm not aware of a single fits all solution, but I'd be glad to improve the tutorials based on suggestions.

@PoLabs
Copy link
Author

PoLabs commented Apr 16, 2018

Right on, your tutorial ran great on my updated instance. From trying to adapt it to my project, I think some extra commenting could help alot. For instance:

What is the 'samples' variable? Is it single points from 192 waves at a given time-step? This might be analogous to patient laboratory values or financial indicators at a given time-step. 'seq_len' seems to be the number of observations/time-steps.

I'm having errors creating mx.io.arrayiter so it's possible my data structure is wrong. Currently it is a dataframe with x as 68 variables and y as 20k observations. The end goal is to predicting the yth observation for any given x variable.

@jeremiedb
Copy link
Contributor

Hi, the documentation is admittedly very scarce on these tutorials. I've added some comments and tried to remove some ambiguities in the CPU tutorial: file:///C:/Data/GitHub/mxnet_R_bucketing/docs/TimeSeries_CPU.html

The "samples" effectively refer to the number of independent time-series. It has been renamed to "n" in the update. As for data dimensions, an important difference is that whereas in a normal regression problems we would feed network with array of size [num_features X batch_size] with a target of size [batch_size], in a time serie model, features are [num_features X seq_length X batch_size] and target is [seq_length X batch_size], since for each time serie, we have seq_length observations.

@PoLabs
Copy link
Author

PoLabs commented May 10, 2018

These clarifications helped a ton! I've made it to the training step with my silly minute to minute cryptocurrency data, But think I'm having issues with the '@param seq_len int, number of time steps to unroll.' My thought is it would be the same as the length of sequences it's being fed (100 for your example, 20 for mine).

Working with 1,000 sequences of 20x min-obs:

batch.size = 40
train.data <- mx.io.arrayiter(data = x[,,1:800, drop = F], label = y[, 1:800], 
                              batch.size = batch.size, shuffle = TRUE)
eval.data <- mx.io.arrayiter(data = x[,,800:1000, drop = F], label = y[, 800:1000], 
                             batch.size = batch.size, shuffle = FALSE)

Going straight from your tutorial:

symbol <- rnn.graph.unroll(seq_len = 2, 
                           num_rnn_layer =  1, 
                           num_hidden = 50,
                           input_size = NULL,
                           num_embed = NULL, 
                           num_decode = 1,
                           masking = F, 
                           loss_output = "linear",
                           dropout = 0.2, 
                           ignore_label = -1,
                           cell_type = "lstm",
                           output_last_state = F,
                           config = "one-to-one")
system.time(model <- mx.model.buckets(symbol = symbol,
                                      train.data = train.data, 
                                      eval.data = eval.data, 
                                      num.round = 250, ctx = ctx, verbose = TRUE,
                                      metric = mx.metric.mse.seq, 
                                      initializer = initializer, optimizer = optimizer, 
                                      batch.end.callback = NULL, 
                                      epoch.end.callback = epoch.end.callback))
Error in sym_ini$infer.shape(input.shape) : 
  Error in operator split13: [20:28:42] c:\jenkins\workspace\mxnet\mxnet\src\operator\./slice_channel-inl.h:216: Check failed: ishape[real_axis] == static_cast<size_t>(param_.num_outputs) (20 vs. 2) If squeeze axis is True, the size of the sliced axis must be the same as num_outputs. Input shape=[20,40,1], axis=0, num_outputs=2.

It looks like the issue is with the 'seq_len = 2', trying 'seq_len = 1' produced a similar error, but seq_len=20 gave:

Error in sym_ini$infer.shape(input.shape) : 
  Error in operator loss: Shape inconsistent, Provided=[760], inferred shape=[800,1]

(running on mxnet_1.2.0 prebuilt CPU for windows, although the documentation says it requires CUDA? Thanks again!)

@jeremiedb
Copy link
Contributor

Correct, seq_len param should be set to sequence length, therefore to 20 in your case. In the tutorial, it was shown as 2 to make it possible to visualize the resulting graph, but the model is indeed run on seq_len = 100, I'll make the doc more explicit.

Still, it seems like there's a remaining shape issue. I would look at the asctual dimensions of the data and labels fed to the iterator, and to the iterator result as well to confirm the shapes fed to the network. Generating the graph as in the tutorial (with a small seq_len, otherwise it will render forever) should also help identify where' the glitch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants