Skip to content

Commit

Permalink
Merge pull request #17967 from SamuelMarks:keras.layers.preprocessing…
Browse files Browse the repository at this point in the history
…-defaults-to

PiperOrigin-RevId: 527024756
  • Loading branch information
tensorflower-gardener committed Apr 25, 2023
2 parents 08f9b1a + a1925ec commit cb1e1a0
Show file tree
Hide file tree
Showing 9 changed files with 56 additions and 50 deletions.
3 changes: 2 additions & 1 deletion keras/layers/preprocessing/category_encoding.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ class CategoryEncoding(base_layer.Layer):
inputs to the layer must integers in the range `0 <= value <
num_tokens`, or an error will be thrown.
output_mode: Specification for the output of the layer.
Defaults to `"multi_hot"`. Values can be `"one_hot"`, `"multi_hot"` or
Values can be `"one_hot"`, `"multi_hot"` or
`"count"`, configuring the layer as follows:
- `"one_hot"`: Encodes each individual element in the input into an
array of `num_tokens` size, containing a 1 at the element index. If
Expand All @@ -105,6 +105,7 @@ class CategoryEncoding(base_layer.Layer):
- `"count"`: Like `"multi_hot"`, but the int array contains a count of
the number of times the token at that index appeared in the sample.
For all output modes, currently only output up to rank 2 is supported.
Defaults to `"multi_hot"`.
sparse: Boolean. If true, returns a `SparseTensor` instead of a dense
`Tensor`. Defaults to `False`.
Expand Down
7 changes: 4 additions & 3 deletions keras/layers/preprocessing/discretization.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,8 @@ class Discretization(base_preprocessing_layer.PreprocessingLayer):
0.01). Higher values of epsilon increase the quantile approximation, and
hence result in more unequal buckets, but could improve performance
and resource consumption.
output_mode: Specification for the output of the layer. Defaults to
`"int"`. Values can be `"int"`, `"one_hot"`, `"multi_hot"`, or
output_mode: Specification for the output of the layer. Values can be
`"int"`, `"one_hot"`, `"multi_hot"`, or
`"count"` configuring the layer as follows:
- `"int"`: Return the discretized bin indices directly.
- `"one_hot"`: Encodes each individual element in the input into an
Expand All @@ -180,9 +180,10 @@ class Discretization(base_preprocessing_layer.PreprocessingLayer):
will be `(..., num_tokens)`.
- `"count"`: As `"multi_hot"`, but the int array contains a count of
the number of times the bin index appeared in the sample.
Defaults to `"int"`.
sparse: Boolean. Only applicable to `"one_hot"`, `"multi_hot"`,
and `"count"` output modes. If True, returns a `SparseTensor` instead of
a dense `Tensor`. Defaults to False.
a dense `Tensor`. Defaults to `False`.
Examples:
Expand Down
8 changes: 4 additions & 4 deletions keras/layers/preprocessing/hashed_crossing.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,15 @@ class HashedCrossing(base_layer.Layer):
Args:
num_bins: Number of hash bins.
output_mode: Specification for the output of the layer. Defaults to
`"int"`. Values can be `"int"`, or `"one_hot"` configuring the layer as
follows:
output_mode: Specification for the output of the layer. Values can be
`"int"`, or `"one_hot"` configuring the layer as follows:
- `"int"`: Return the integer bin indices directly.
- `"one_hot"`: Encodes each individual element in the input into an
array the same size as `num_bins`, containing a 1 at the input's bin
index.
Defaults to `"int"`.
sparse: Boolean. Only applicable to `"one_hot"` mode. If True, returns a
`SparseTensor` instead of a dense `Tensor`. Defaults to False.
`SparseTensor` instead of a dense `Tensor`. Defaults to `False`.
**kwargs: Keyword arguments to construct a layer.
Examples:
Expand Down
18 changes: 9 additions & 9 deletions keras/layers/preprocessing/hashing.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,17 +109,16 @@ class Hashing(base_layer.Layer):
bin, so the effective number of bins is `(num_bins - 1)` if `mask_value`
is set.
mask_value: A value that represents masked inputs, which are mapped to
index 0. Defaults to None, meaning no mask term will be added and the
hashing will start at index 0.
index 0. `None` means no mask term will be added and the
hashing will start at index 0. Defaults to `None`.
salt: A single unsigned integer or None.
If passed, the hash function used will be SipHash64, with these values
used as an additional input (known as a "salt" in cryptography).
These should be non-zero. Defaults to `None` (in that
case, the FarmHash64 hash function is used). It also supports
tuple/list of 2 unsigned integer numbers, see reference paper for
details.
output_mode: Specification for the output of the layer. Defaults to
`"int"`. Values can be `"int"`, `"one_hot"`, `"multi_hot"`, or
These should be non-zero. If `None`, uses the FarmHash64 hash function.
It also supports tuple/list of 2 unsigned integer numbers, see
reference paper for details. Defaults to `None`.
output_mode: Specification for the output of the layer. Values can bes
`"int"`, `"one_hot"`, `"multi_hot"`, or
`"count"` configuring the layer as follows:
- `"int"`: Return the integer bin indices directly.
- `"one_hot"`: Encodes each individual element in the input into an
Expand All @@ -134,9 +133,10 @@ class Hashing(base_layer.Layer):
will be `(..., num_tokens)`.
- `"count"`: As `"multi_hot"`, but the int array contains a count of
the number of times the bin index appeared in the sample.
Defaults to `"int"`.
sparse: Boolean. Only applicable to `"one_hot"`, `"multi_hot"`,
and `"count"` output modes. If True, returns a `SparseTensor` instead of
a dense `Tensor`. Defaults to False.
a dense `Tensor`. Defaults to `False`.
**kwargs: Keyword arguments to construct a layer.
Input shape:
Expand Down
22 changes: 11 additions & 11 deletions keras/layers/preprocessing/image_preprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,9 @@ class Resizing(base_layer.Layer):
height: Integer, the height of the output shape.
width: Integer, the width of the output shape.
interpolation: String, the interpolation method.
Defaults to `"bilinear"`.
Supports `"bilinear"`, `"nearest"`, `"bicubic"`, `"area"`,
`"lanczos3"`, `"lanczos5"`, `"gaussian"`, `"mitchellcubic"`.
Defaults to `"bilinear"`.
crop_to_aspect_ratio: If True, resize the images without aspect
ratio distortion. When the original aspect ratio differs
from the target aspect ratio, the output image will be
Expand Down Expand Up @@ -420,9 +420,9 @@ class RandomFlip(base_layer.BaseRandomLayer):
Args:
mode: String indicating which flip mode to use. Can be `"horizontal"`,
`"vertical"`, or `"horizontal_and_vertical"`. Defaults to
`"horizontal_and_vertical"`. `"horizontal"` is a left-right flip and
`"vertical"` is a top-bottom flip.
`"vertical"`, or `"horizontal_and_vertical"`. `"horizontal"` is a
left-right flip and `"vertical"` is a top-bottom flip. Defaults to
`"horizontal_and_vertical"`
seed: Integer. Used to create a random seed.
"""

Expand Down Expand Up @@ -1055,9 +1055,9 @@ class RandomZoom(base_layer.BaseRandomLayer):
result in an output
zooming out between 20% to 30%.
`width_factor=(-0.3, -0.2)` result in an
output zooming in between 20% to 30%. Defaults to `None`,
output zooming in between 20% to 30%. `None` means
i.e., zooming vertical and horizontal directions
by preserving the aspect ratio.
by preserving the aspect ratio. Defaults to `None`.
fill_mode: Points outside the boundaries of the input are
filled according to the given mode
(one of `{"constant", "reflect", "wrap", "nearest"}`).
Expand Down Expand Up @@ -1377,9 +1377,9 @@ class RandomBrightness(base_layer.BaseRandomLayer):
will be used for upper bound.
value_range: Optional list/tuple of 2 floats
for the lower and upper limit
of the values of the input data. Defaults to [0.0, 255.0].
Can be changed to e.g. [0.0, 1.0] if the image input
has been scaled before this layer.
of the values of the input data.
To make no change, use [0.0, 1.0], e.g., if the image input
has been scaled before this layer. Defaults to [0.0, 255.0].
The brightness adjustment will be scaled to this range, and the
output values will be clipped to this range.
seed: optional integer, for fixed RNG behavior.
Expand Down Expand Up @@ -1539,9 +1539,9 @@ class RandomHeight(base_layer.BaseRandomLayer):
`factor=0.2` results in an output with
height changed by a random amount in the range `[-20%, +20%]`.
interpolation: String, the interpolation method.
Defaults to `"bilinear"`.
Supports `"bilinear"`, `"nearest"`, `"bicubic"`, `"area"`,
`"lanczos3"`, `"lanczos5"`, `"gaussian"`, `"mitchellcubic"`.
Defaults to `"bilinear"`.
seed: Integer. Used to create a random seed.
Input shape:
Expand Down Expand Up @@ -1661,9 +1661,9 @@ class RandomWidth(base_layer.BaseRandomLayer):
`factor=0.2` results in an output with width changed
by a random amount in the range `[-20%, +20%]`.
interpolation: String, the interpolation method.
Defaults to `bilinear`.
Supports `"bilinear"`, `"nearest"`, `"bicubic"`, `"area"`,
`"lanczos3"`, `"lanczos5"`, `"gaussian"`, `"mitchellcubic"`.
Defaults to `bilinear`.
seed: Integer. Used to create a random seed.
Input shape:
Expand Down
11 changes: 6 additions & 5 deletions keras/layers/preprocessing/index_lookup.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,10 +134,10 @@ class IndexLookup(base_preprocessing_layer.PreprocessingLayer):
`"tf_idf"`, this argument must be supplied.
invert: Only valid when `output_mode` is `"int"`. If True, this layer will
map indices to vocabulary items instead of mapping vocabulary items to
indices. Default to False.
output_mode: Specification for the output of the layer. Defaults to
`"int"`. Values can be `"int"`, `"one_hot"`, `"multi_hot"`, `"count"`,
or `"tf_idf"` configuring the layer as follows:
indices. Defaults to `False`.
output_mode: Specification for the output of the layer. Values can be
`"int"`, `"one_hot"`, `"multi_hot"`, `"count"`, or `"tf_idf"`
configuring the layer as follows:
- `"int"`: Return the raw integer indices of the input tokens.
- `"one_hot"`: Encodes each individual element in the input into an
array the same size as the vocabulary, containing a 1 at the element
Expand All @@ -153,6 +153,7 @@ class IndexLookup(base_preprocessing_layer.PreprocessingLayer):
the number of times the token at that index appeared in the sample.
- `"tf_idf"`: As `"multi_hot"`, but the TF-IDF algorithm is applied to
find the value in each token slot.
Defaults to `"int"`.
pad_to_max_tokens: Only valid when `output_mode` is `"multi_hot"`,
`"count"`, or `"tf_idf"`. If True, the output will have its feature axis
padded to `max_tokens` even if the number of unique tokens in the
Expand All @@ -161,7 +162,7 @@ class IndexLookup(base_preprocessing_layer.PreprocessingLayer):
False.
sparse: Boolean. Only applicable to `"one_hot"`, `"multi_hot"`, `"count"`
and `"tf-idf"` output modes. If True, returns a `SparseTensor` instead
of a dense `Tensor`. Defaults to False.
of a dense `Tensor`. Defaults to `False`.
"""

def __init__(
Expand Down
19 changes: 10 additions & 9 deletions keras/layers/preprocessing/integer_lookup.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,18 +71,18 @@ class IntegerLookup(index_lookup.IndexLookup):
only be specified when adapting the vocabulary or when setting
`pad_to_max_tokens=True`. If None, there is no cap on the size of the
vocabulary. Note that this size includes the OOV and mask tokens.
Defaults to None.
Defaults to `None`.
num_oov_indices: The number of out-of-vocabulary tokens to use. If this
value is more than 1, OOV inputs are modulated to determine their OOV
value. If this value is 0, OOV inputs will cause an error when calling
the layer. Defaults to 1.
the layer. Defaults to `1`.
mask_token: An integer token that represents masked inputs. When
`output_mode` is `"int"`, the token is included in vocabulary and mapped
to index 0. In other output modes, the token will not appear in the
vocabulary and instances of the mask token in the input will be dropped.
If set to None, no mask term will be added. Defaults to None.
If set to None, no mask term will be added. Defaults to `None`.
oov_token: Only used when `invert` is True. The token to return for OOV
indices. Defaults to -1.
indices. Defaults to `-1`.
vocabulary: Optional. Either an array of integers or a string path to a
text file. If passing an array, can pass a tuple, list, 1D numpy array,
or 1D tensor containing the integer vocbulary terms. If passing a file
Expand All @@ -98,10 +98,10 @@ class IntegerLookup(index_lookup.IndexLookup):
`"tf_idf"`, this argument must be supplied.
invert: Only valid when `output_mode` is `"int"`. If True, this layer will
map indices to vocabulary items instead of mapping vocabulary items to
indices. Default to False.
output_mode: Specification for the output of the layer. Defaults to
`"int"`. Values can be `"int"`, `"one_hot"`, `"multi_hot"`, `"count"`,
or `"tf_idf"` configuring the layer as follows:
indices. Defaults to `False`.
output_mode: Specification for the output of the layer. Values can be
`"int"`, `"one_hot"`, `"multi_hot"`, `"count"`, or `"tf_idf"`
configuring the layer as follows:
- `"int"`: Return the vocabulary indices of the input tokens.
- `"one_hot"`: Encodes each individual element in the input into an
array the same size as the vocabulary, containing a 1 at the element
Expand All @@ -119,6 +119,7 @@ class IntegerLookup(index_lookup.IndexLookup):
find the value in each token slot.
For `"int"` output, any shape of input and output is supported. For all
other output modes, currently only output up to rank 2 is supported.
Defaults to `"int"`.
pad_to_max_tokens: Only applicable when `output_mode` is `"multi_hot"`,
`"count"`, or `"tf_idf"`. If True, the output will have its feature axis
padded to `max_tokens` even if the number of unique tokens in the
Expand All @@ -127,7 +128,7 @@ class IntegerLookup(index_lookup.IndexLookup):
False.
sparse: Boolean. Only applicable when `output_mode` is `"multi_hot"`,
`"count"`, or `"tf_idf"`. If True, returns a `SparseTensor` instead of a
dense `Tensor`. Defaults to False.
dense `Tensor`. Defaults to `False`.
Examples:
Expand Down
3 changes: 2 additions & 1 deletion keras/layers/preprocessing/normalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,12 @@ class Normalization(base_preprocessing_layer.PreprocessingLayer):
example, if shape is `(None, 5)` and `axis=1`, the layer will track 5
separate mean and variance values for the last axis. If `axis` is set
to `None`, the layer will normalize all elements in the input by a
scalar mean and variance. Defaults to -1, where the last axis of the
scalar mean and variance. When `-1` the last axis of the
input is assumed to be a feature dimension and is normalized per
index. Note that in the specific case of batched scalar inputs where
the only axis is the batch axis, the default will normalize each index
in the batch separately. In this case, consider passing `axis=None`.
Defaults to `-1`.
mean: The mean value(s) to use during normalization. The passed value(s)
will be broadcast to the shape of the kept axes above; if the value(s)
cannot be broadcast, an error will be raised when this layer's
Expand Down
15 changes: 8 additions & 7 deletions keras/layers/preprocessing/string_lookup.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,11 @@ class StringLookup(index_lookup.IndexLookup):
only be specified when adapting the vocabulary or when setting
`pad_to_max_tokens=True`. If None, there is no cap on the size of the
vocabulary. Note that this size includes the OOV and mask tokens.
Defaults to None.
Defaults to `None`.
num_oov_indices: The number of out-of-vocabulary tokens to use. If this
value is more than 1, OOV inputs are hashed to determine their OOV
value. If this value is 0, OOV inputs will cause an error when calling
the layer. Defaults to 1.
the layer. Defaults to `1`.
mask_token: A token that represents masked inputs. When `output_mode` is
`"int"`, the token is included in vocabulary and mapped to index 0. In
other output modes, the token will not appear in the vocabulary and
Expand All @@ -93,10 +93,10 @@ class StringLookup(index_lookup.IndexLookup):
`"tf_idf"`, this argument must be supplied.
invert: Only valid when `output_mode` is `"int"`. If True, this layer will
map indices to vocabulary items instead of mapping vocabulary items to
indices. Default to False.
output_mode: Specification for the output of the layer. Defaults to
`"int"`. Values can be `"int"`, `"one_hot"`, `"multi_hot"`, `"count"`,
or `"tf_idf"` configuring the layer as follows:
indices. Defaults to `False`.
output_mode: Specification for the output of the layer. Values can be
`"int"`, `"one_hot"`, `"multi_hot"`, `"count"`, or `"tf_idf"`
configuring the layer as follows:
- `"int"`: Return the raw integer indices of the input tokens.
- `"one_hot"`: Encodes each individual element in the input into an
array the same size as the vocabulary, containing a 1 at the element
Expand All @@ -114,6 +114,7 @@ class StringLookup(index_lookup.IndexLookup):
find the value in each token slot.
For `"int"` output, any shape of input and output is supported. For all
other output modes, currently only output up to rank 2 is supported.
Defaults to `"int"`
pad_to_max_tokens: Only applicable when `output_mode` is `"multi_hot"`,
`"count"`, or `"tf_idf"`. If True, the output will have its feature axis
padded to `max_tokens` even if the number of unique tokens in the
Expand All @@ -122,7 +123,7 @@ class StringLookup(index_lookup.IndexLookup):
False.
sparse: Boolean. Only applicable when `output_mode` is `"multi_hot"`,
`"count"`, or `"tf_idf"`. If True, returns a `SparseTensor` instead of a
dense `Tensor`. Defaults to False.
dense `Tensor`. Defaults to `False`.
encoding: Optional. The text encoding to use to interpret the input
strings. Defaults to `"utf-8"`.
Expand Down

0 comments on commit cb1e1a0

Please sign in to comment.