Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Example Dataset Generation #33

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 16 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Origin

The original RNNLIB is hosted at http://sourceforge.net/projects/rnnl
while this "fork" is created to repeat results for the
online handwriting prediction and synthesis reported in
http://arxiv.org/abs/1308.0850. The later by now is Alex Graves's classic
while this "fork" is created to repeat results for the
online handwriting prediction and synthesis reported in
http://arxiv.org/abs/1308.0850. The later by now is Alex Graves's classic
paper on LSTM networks showing of what RNN can learn about the
structure present in the sequential input.

Expand All @@ -26,49 +26,50 @@ In addition, the following python packages are needed for the auxiliary scripts
* SciPy
* PyLab
* PIL
* netCDF4

And this package is needed to create and manipulate netcdf data files with python, and to run the experiments in the 'examples' directory:

* ScientificPython (NOT Scipy)
The required packages needed for the generation of the example datasets can be installed via the
requirements file that can be found in the `examples/online-prediction` directory. Just execute
`pip install -r requirements.txt`

To build RNNLIB do

$ cmake -DCMAKE_BUILD_TYPE=Release .
$ cmake --build .

Cmake run creates the binary files 'rnnlib', 'rnnsynth' and 'gradient_check' in the current directory.
Cmake run creates the binary files 'rnnlib', 'rnnsynth' and 'gradient_check' in the current directory.

It is recommended that you add the directory containing the 'rnnlib' binary to your path,
as otherwise the tools in the 'utilities' directory will not work.

Project files for the integrated development environments can be generated by cmake. Run cmake --help
to get list of supported IDEs.


# Handwriting synthesis

Step in to examples/online_prediction and go through few steps below to prepare the
Step in to examples/online_prediction and go through few steps below to prepare the
training data, train the model and eventually plot the results of the synthesis

## Downloading online handwriting dataset

Start by registering and downloading pen strokes data from
Start by registering and downloading pen strokes data from
http://www.iam.unibe.ch/~fkiwww/iamondb/data/lineStrokes-all.tar.gz
Text lables for strokes can be found here
http://www.iam.unibe.ch/~fkiwww/iamondb/data/ascii-all.tar.gz
Then unzip ./lineStrokes and ./ascii under examples/online_prediction.
Data format in the downloaded files can not be used as is
Data format in the downloaded files can not be used as is
and requires further preprocessing to convert pen coordinates to offsets from
previous point and merge them into the single file of netcdf format.

## Preparing the training data

Run ./build_netcdf.sh to split dataset to training and validation sets.
Run ./build_netcdf.sh to split dataset to training and validation sets.
The same script does all necessary preprocessing including normalisation
of the input and makes corresponding online.nc and online_validation.nc
of the input and makes corresponding online.nc and online_validation.nc
files for use with rnnlib .

Each point in the input sequences from online.nc consists of three numbers:
Each point in the input sequences from online.nc consists of three numbers:
the x and y offset from the previous point, and the binary end-of-stroke feature.

## Gradient check
Expand Down Expand Up @@ -101,7 +102,7 @@ The best solution found is stored in synth1d@<time step>.best_loss.save file

Best loss error from step 1 is expected to be around -1080 nats and it can be further
improved (ca. 10%) by using weights regularisation. Loss error goes up and down during the
training unlike in Step 1. Therefore one must be more patient to declare early stopping and
training unlike in Step 1. Therefore one must be more patient to declare early stopping and
wait for 20 epochs with loss worse then the best result so far. Rnnlib has implementation
of MDL regulariser which is used in this step. The command line is as following:

Expand Down
2 changes: 1 addition & 1 deletion examples/arabic_offline_handwriting/arabic_offline.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ def convertToPrimaries (labelString):
offset += 1

#create a new .nc file
file = netcdf_helpers.NetCDFFile(outputFilename, 'w')
file = netcdf_helpers.netcdf_file(outputFilename, 'w')

#create the dimensions
netcdf_helpers.createNcDim(file,'numSeqs',len(seqLengths))
Expand Down
2 changes: 1 addition & 1 deletion examples/arabic_online_handwriting/arabic_online.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@
print labels

#create a new .nc file
file = netcdf_helpers.NetCDFFile(ncFilename, 'w')
file = netcdf_helpers.netcdf_file(ncFilename, 'w')

#create the dimensions
netcdf_helpers.createNcDim(file,'numSeqs',len(seqLengths))
Expand Down
2 changes: 1 addition & 1 deletion examples/farsi_offline_handwriting/farsi_chars.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@
print labels

#create a new .nc file
file = netcdf_helpers.NetCDFFile(ncFilename, 'w')
file = netcdf_helpers.netcdf_file(ncFilename, 'w')

#create the dimensions
netcdf_helpers.createNcDim(file,'numSeqs',len(seqLengths))
Expand Down
2 changes: 1 addition & 1 deletion examples/online_prediction/check.config
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
task prediction
hiddenType lstm
trainFile one_line.nc
trainFile online.nc
dataFraction 1
maxTestsNoBest 20
hiddenSize 1
Expand Down
2 changes: 1 addition & 1 deletion examples/online_prediction/check3x1.config
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
task prediction
hiddenType lstm
trainFile one_line.nc
trainFile online.nc
dataFraction 1
maxTestsNoBest 20
hiddenSize 1,1,1
Expand Down
Empty file modified examples/online_prediction/env.sh
100644 → 100755
Empty file.
Loading