-
Notifications
You must be signed in to change notification settings - Fork 378
Conv1d and batchnorm1d layers #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
AnirudhBHarish
wants to merge
4
commits into
microsoft:master
Choose a base branch
from
AnirudhBHarish:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,270
−3
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
// Copyright (c) Microsoft Corporation. All rights reserved. | ||
// Licensed under the MIT license. | ||
|
||
#ifndef __DSCNN_H__ | ||
#define __DSCNN_H__ | ||
|
||
// Function pointer for the Conv layer to be passed as a parameter. (conv1d or conv1d_lr only). | ||
typedef int (*conv_layer)(float*, unsigned, unsigned, const float*, | ||
unsigned, unsigned, unsigned, unsigned, | ||
const void*, unsigned, unsigned); | ||
|
||
/** | ||
* @brief Model definition for the 1D Convolution block applied before the RNN. | ||
* @brief sub-layers : batchnorm1d -> conv1d_lr. | ||
* @param[out] output_signal pointer to the final output signal, minimum size = out_time * in_channels. out_time has to be calculated based on the reduction from all the conv and pool layers. | ||
* @param[in] input_signal pointer to the input signal. size = in_time * in_channels. | ||
* @param[in] cnn function pointer for the CNN layer. (any of the conv layers can be passed with appropriate params). | ||
* @param[in] in_time number of time steps in the input_signal. | ||
* @param[in] in_channels number of input channels. | ||
* @param[in] mean pointer to the mean for the batch normalization, size = in_channels. Pass NULL/0 for affine_config = 2. | ||
* @param[in] var pointer to the variance for the batch normalization, size = in_channels. Pass NULL/0 for affine_config = 2. | ||
* @param[in] affine_config whether the affine operations are applied. | ||
* if affine_config = 0, then only mean and var are used. | ||
* if affine_config = 1, then mean, var, gamma and beta are used for the final computation. | ||
* if affine_config = 2, then only the gamma and beta are used. gamma = original_gamma/sqrt(var), beta = original_beta - gamma * mean/sqrt(var). | ||
* Note: Use affine_config = 2 for faster calculations. The new gamma and beta would need to be pre-computed, stored and passed. | ||
* @param[in] gamma pointer to the scaling factors for the post-norm affine operation, size = in_channels. Pass NULL/0 for affine_config = 0. | ||
* @param[in] beta pointer to the offsets for the post-norm affine operation, size = in_channels. Pass NULL/0 for affine_config = 0. | ||
* @param[in] in_place in-place computation check for the batchnorm. Storage efficient. | ||
* @param[in] cnn_hidden hidden state/out_channels dimensions for the low-rank CNN. The final channel size of this block. | ||
* @param[in] cnn_padding padding for the low-rank CNN layer. Note: applied to both sides of the input. | ||
* @param[in] cnn_kernel_size kernel size of the low-rank CNN. | ||
* @param[in] cnn_params weights, bias and other essential parameters for the low-rank CNN. | ||
* @param[in] cnn_stride stride factor for the low-rank CNN. | ||
* @param[in] cnn_activation an integer to choose the type of activation function. | ||
* 0: none. | ||
* 1: sigmoid. | ||
* 2: tanh. | ||
* 3: relu. | ||
*/ | ||
int phon_pred_lr_cnn(float* output_signal, float* input_signal, | ||
conv_layer cnn, unsigned in_time, unsigned in_channels, | ||
const float* const mean, const float* const var, | ||
unsigned affine_config, const float* const gamma, const float* const beta, unsigned in_place, | ||
unsigned cnn_hidden, unsigned cnn_padding, unsigned cnn_kernel_size, | ||
const void* cnn_params, unsigned cnn_stride, unsigned cnn_activation); | ||
|
||
/** | ||
* @brief Model definition for the 1D Convolution block applied after the RNN. | ||
* @brief sub-layers : custom nonlinearity(semi_sigmoid_tanh) -> batchnorm1d -> conv1d_depth -> conv1d_lr -> avgpool1d. | ||
* @param[out] output_signal pointer to the final output signal, minimum size = out_time * in_channels. out_time has to be calculated based on the reduction from all the conv and pool layers. | ||
* @param[in] input_signal pointer to the input signal. size = in_time * in_channels. | ||
* @param[in] point_cnn function pointer for the point-wise CNN. (any of the conv layers can be passed with appropriate params). | ||
* @param[in] in_time number of time steps in the input. | ||
* @param[in] in_channels number of input channels. | ||
* @param[in] mean pointer to the mean for the batch normalization, size = in_channels. Pass NULL/0 for affine_config = 2. | ||
* @param[in] var pointer to the variance for the batch normalization, size = in_channels. Pass NULL/0 for affine_config = 2. | ||
* @param[in] affine_config whether the affine operations are applied. | ||
* if affine_config = 0, then only mean and var are used. | ||
* if affine_config = 1, then mean, var, gamma and beta are used for the final computation. | ||
* if affine_config = 2, then only the gamma and beta are used. gamma = original_gamma/sqrt(var), beta = original_beta - gamma * mean/sqrt(var). | ||
* Note: Use affine_config = 2 for faster calculations. The new gamma and beta would need to be pre-computed, stored and passed. | ||
* @param[in] gamma pointer to the scaling factors for the post-norm affine operation, size = in_channels. Pass NULL/0 for affine_config = 0. | ||
* @param[in] beta pointer to the offsets for the post-norm affine operation, size = in_channels. Pass NULL/0 for affine_config = 0. | ||
* @param[in] in_place in-place computation of the batchnorm. Storage efficient. | ||
* @param[in] depth_cnn_padding padding for the depth CNN layer. Note: applied to both sides of the input to the depth CNN. | ||
* @param[in] depth_cnn_kernel_size kernel size of the depth CNN. | ||
* @param[in] depth_cnn_params weights, bias and other essential parameters used to describe the depth CNN. | ||
* @param[in] depth_cnn_stride stride factor for the depth CNN. | ||
* @param[in] depth_cnn_activation an integer to choose the type of activation function. | ||
* 0: none. | ||
* 1: sigmoid. | ||
* 2: tanh. | ||
* 3: relu. | ||
* @param[in] point_cnn_hidden hidden state/out_channels dimensions for the point CNN. The final channel size of this block. | ||
* @param[in] point_cnn_padding padding for the point CNN layer. Note: applied to both sides of the input to the point CNN. | ||
* @param[in] point_cnn_kernel_size kernel size of the point CNN. | ||
* @param[in] point_cnn_params weights, bias and other essential parameters used to describe the point CNN. | ||
* @param[in] point_cnn_stride stride factor for the point CNN. | ||
* @param[in] point_cnn_activation an integer to choose the type of activation function. | ||
* 0: none. | ||
* 1: sigmoid. | ||
* 2: tanh. | ||
* 3: relu. | ||
* @param[in] pool_padding padding for the pool layer. Note: applied to both sides of the input to the pool. | ||
* @param[in] pool_kernel_size kernel size of the pool. | ||
* @param[in] pool_stride stride factor for the pool. | ||
* @param[in] pool_activation an integer to choose the type of activation function. | ||
* 0: none. | ||
* 1: sigmoid. | ||
* 2: tanh. | ||
* 3: relu. | ||
*/ | ||
int phon_pred_depth_point_lr_cnn(float* output_signal, float* input_signal, | ||
conv_layer point_cnn, unsigned in_time, unsigned in_channels, | ||
const float* const mean, const float* const var, | ||
unsigned affine_config, const float* const gamma, const float* const beta, unsigned in_place, | ||
unsigned depth_cnn_padding, unsigned depth_cnn_kernel_size, | ||
const void* depth_cnn_params, unsigned depth_cnn_stride, unsigned depth_cnn_activation, | ||
unsigned point_cnn_hidden, unsigned point_cnn_padding, unsigned point_cnn_kernel_size, | ||
const void* point_cnn_params, unsigned point_cnn_stride, unsigned point_cnn_activation, | ||
unsigned pool_padding, unsigned pool_kernel_size, unsigned pool_stride, unsigned pool_activation); | ||
|
||
#endif |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
// Copyright (c) Microsoft Corporation. All rights reserved. | ||
// Licensed under the MIT license. | ||
|
||
#ifndef __RNN_BRICKED_H__ | ||
#define __RNN_BRICKED_H__ | ||
|
||
/* All the matrices are stored in the row major format. | ||
NOTES for using the layers. | ||
-> Single-directional Computation. | ||
While using the bricked fastgrnn layers, the user needs to adhered to the two following constraints. | ||
1) in_time % hop = 0. | ||
2) fwd_window % hop = 0 and bwd_window % hop = 0. | ||
Violation of the above two constraints (1 & 2), will cause segmentation faults. | ||
The layers first compute all the Wx steps and then compute Uh for all the windows parallelly. | ||
Hence, the user needs to adhered to the constraints 1 & 2. | ||
-> Bi-directional Computation. | ||
For bi-directional cases, there are 2 additionally constraints that would need to be followed. | ||
A) sample_first_brick and sample_last_brick = 1. | ||
B) An offset of rnn_hidden would need to be given to the output_signal pointer during the backward function call. | ||
Each function will only process its given context(forward/backward). The other context will need to be called separately. | ||
E.g : 1st step -> forward(output, ..., input, ..., bi-direction=1, ...). | ||
2nd step -> backward(output + rnn_hidden, ..., input, ..., bi-direction=1, ...). | ||
The two extra constraints (A & B) are only for bi-directional cases and can be ignored if only forward (or only backward) is used. | ||
Violating the conditions would cause index mis-matches or data corruption. | ||
If the first (last) brick is not sampled, the first few (last few) time steps would be missing in the forward (backward) result . | ||
If the offset is not passed during the backward function call, the backward pass will overwrite the forward result (bi-directional case only). | ||
*/ | ||
|
||
/** | ||
* @brief Model parameters for the 1D Convolution Layer. | ||
* @var W1 pointer to first low-rank component of W. shape = [rank * in_dims]. | ||
* @var W2 pointer to second low-rank component of W. shape = [rnn_hidden * rank]. | ||
* @var wRank rank of W matrix. | ||
* @var U1 pointer to first low-rank component of U. shape = [rank * rnn_hidden]. | ||
* @var U2 pointer to second low-rank component of U. shape = [rnn_hidden * rank]. | ||
* @var uRank rank of U matrix. | ||
* @var Bg pointer to bias for sigmoid. | ||
* @var Bh pointer to bias for tanh. | ||
* @var sigmoid_zeta first weight parameter for update from input from next step. | ||
* @var sigmoid_nu second weight parameter for update from input from next step. | ||
* @var block_size_w_to_lr block/tile size for the cache. Used for tiled MatMul. For W1 * x. | ||
* @var block_size_w_from_lr block/tile size for the cache. Used for tiled MatMul. For W2 * result(W1 * x). | ||
* @var block_size_u_to_lr block/tile size for the cache. Used for tiled MatMul. For U1 * h. | ||
* @var block_size_u_from_lr block/tile size for the cache. Used for tiled MatMul. For U2 * result(U1 * h). | ||
*/ | ||
typedef struct BrickedFastGRNN_LR_Params { | ||
float* W1; | ||
float* W2; | ||
unsigned wRank; | ||
float* U1; | ||
float* U2; | ||
unsigned uRank; | ||
float* Bg; | ||
float* Bh; | ||
float sigmoid_zeta; | ||
float sigmoid_nu; | ||
unsigned block_size_w_to_lr; | ||
unsigned block_size_w_from_lr; | ||
unsigned block_size_u_to_lr; | ||
unsigned block_size_u_from_lr; | ||
} BrickedFastGRNN_LR_Params; | ||
|
||
/** Forward Bricking and application of the forward RNN for an input signal. | ||
* @param[out] output_signal pointer to output signal. size = out_time * rnn_hidden. | ||
* @param[in] rnn_hidden output dimension for the current cell. | ||
* @param[in] input_signal pointer to input signal. size = in_time * in_dims. | ||
* @param[in] in_time number of input time steps. | ||
* @param[in] in_dims input dimensions. | ||
* @param[in] window window length for each brick. For the final brick, the left over time steps are used(need not be window in length for the last brick). | ||
* @param[in] hop hop distance for between bricks. | ||
* @param[in] params pointer to the parameters for the RNN. | ||
* @param[in] bi_direction determine if the ouput if for a bi-directional RNN. | ||
* @param[in] sample_first_brick determine if the 1st brick should also be sampled. | ||
* -> if = 0, only the last hidden state of each brick is sampled. out_time = (in_time-window)/hop + 1. | ||
* -> if = 1, for the 1st brick, we sample every hop index(similar to ::hop). For all the bricks(including the 1st) we sample the final hiddens state. out_time = in_time/hop + 1. | ||
*/ | ||
int forward_bricked_fastgrnn_lr(float* output_signal, unsigned rnn_hidden, | ||
float* input_signal, unsigned in_time, unsigned in_dims, | ||
unsigned window, unsigned hop, const void* params, | ||
unsigned bi_direction, unsigned sample_first_brick); | ||
|
||
/** Backward Bricking and application of the backward RNN for an input signal. | ||
* @param[out] output_signal pointer to output signal. size = out_time * rnn_hidden. | ||
* @param[in] rnn_hidden output dimension for the current cell. | ||
* @param[in] input_signal pointer to input signal. size = in_time * in_dims. | ||
* @param[in] in_time number of input time steps. | ||
* @param[in] in_dims input dimensions. | ||
* @param[in] window window length for each brick. For the final brick, the left over time steps are used(need not be window in length for the last brick). | ||
* @param[in] hop hop distance for between bricks. | ||
* @param[in] params pointer to the parameters for the RNN. | ||
* @param[in] bi_direction determine if the ouput if for a bi-directional RNN. | ||
* @param[in] sample_last_brick determine if the last brick should also be sampled | ||
* -> if = 0, only the first(last in reverse) hidden state of each brick is sampled. out_time = (in_time-window)/hop + 1. | ||
* -> if = 1, for the last brick, we sample every hop index in reverse(similar to ::hop in reverse). For all the bricks(including the last) we sample the first hiddens state(last in reverse). out_time = in_time/hop + 1. | ||
*/ | ||
ShikharJ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
int backward_bricked_fastgrnn_lr(float* output_signal, unsigned rnn_hidden, | ||
float* input_signal, unsigned in_time, unsigned in_dims, | ||
unsigned window, unsigned hop, const void* params, | ||
unsigned bi_direction, unsigned sample_last_brick); | ||
|
||
#endif |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.