Skip to content

Commit

Permalink
Tokenizer and config
Browse files Browse the repository at this point in the history
  • Loading branch information
LysandreJik committed Oct 27, 2020
1 parent d97e9d1 commit 3b480c3
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 11 deletions.
13 changes: 9 additions & 4 deletions src/transformers/configuration_deberta.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,29 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" DeBERTa model configuration """
""" DeDeBERTaa model configuration """

from .configuration_utils import PretrainedConfig
from .utils import logging


logger = logging.get_logger(__name__)

DEBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP = {
DEDeBERTaA_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"microsoft/deberta-base": "https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/deberta-base/config.json",
"microsoft/deberta-large": "https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/deberta-large/config.json",
}


class DebertaConfig(PretrainedConfig):
r"""
:class:`~transformers.DebertaConfig` is the configuration class to store the configuration of a
:class:`~transformers.DebertaModel`.
This is the configuration class to store the configuration of a :class:`~transformers.DebertaModel` or a
:class:`~transformers.TFDebertaModel`. It is used to instantiate a DeBERTa model according to the specified
arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar
configuration to that of the DeBERTa `bert-base-uncased <https://huggingface.co/bert-base-uncased>`__ architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
Arguments:
vocab_size (:obj:`int`, `optional`, defaults to 30522):
Expand Down
15 changes: 8 additions & 7 deletions src/transformers/tokenization_deberta.py
Original file line number Diff line number Diff line change
Expand Up @@ -581,7 +581,7 @@ def convert_tokens_to_string(self, tokens):
def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
"""
Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
adding special tokens. A BERT sequence has the following format:
adding special tokens. A DeBERTa sequence has the following format:
- single sequence: [CLS] X [SEP]
- pair of sequences: [CLS] A [SEP] B [SEP]
Expand All @@ -608,14 +608,15 @@ def get_special_tokens_mask(self, token_ids_0, token_ids_1=None, already_has_spe
special tokens using the tokenizer ``prepare_for_model`` or ``encode_plus`` methods.
Args:
token_ids_0: list of ids (must not contain special tokens)
token_ids_1: Optional list of ids (must not contain special tokens), necessary when fetching sequence ids
for sequence pairs
already_has_special_tokens: (default False) Set to True if the token list is already formated with
special tokens for the model
token_ids_0 (:obj:`List[int]`):
List of IDs.
token_ids_1 (:obj:`List[int]`, `optional`):
Optional second list of IDs for sequence pairs.
already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the token list is already formatted with special tokens for the model.
Returns:
A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
:obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
"""

if already_has_special_tokens:
Expand Down

0 comments on commit 3b480c3

Please sign in to comment.