Releases: kdexd/virtex
Drop Python 3.6 support, update docs theme.
Major changes
- Python 3.6 support is dropped, the minimum requirement is Python 3.8. All major library versions are bumped to the latest releases (PyTorch, OpenCV, Albumentations, etc.).
- Model zoo URLs are changed to Dropbox. All pre-trained checkpoint weights are unchanged.
- There was a spike in training loss when resuming training with
pretrain_virtex.py
, it is fixed now. - Documentation theme is changed from alabaster to read the docs, looks fancier!
Fix beam search bug, add nucleus sampling support.
Bug Fix: Beam Search
The beam search implementation adapted from AllenNLP was more suited for LSTM/GRU (recurrent models), less for transformers (autoregressive models).
This version removes the "backpointer" trick from AllenNLP implementation and improves captioning results for all VirTex models. See below, "Old" metrics are v1.1
(ArXiv v2) and "New" metrics are v1.2
(ArXiv v3).
This bug does not affect pre-training or other downstream task results. Thanks to Nicolas Carion (@alcinos) and Aishwarya Kamath (@ashkamath) for spotting this issue and helping me to fix it!
Feature: Nucleus Sampling
This codebase now supports decoding through Nucleus Sampling, as introduced in The Curious Case of Neural Text Degeneration. Try running captioning evaluation script with --config-override MODEL.DECODER.NAME nucleus_sampling MODEL.DECODER.NUCLEUS_SIZE 0.9
! To have consistent behavior with prior versions, the default decoding method is Beam Search with 5 beams.
Note: Nucleus sampling would give worse results specifically on COCO Captions, but will produce more interesting sounding language with larger transformers trained on much more data than COCO Captions.
New config arguments to support this:
MODEL:
DECODER:
# What algorithm to use for decoding. Supported values: {"beam_search",
# "nucleus_sampling"}.
NAME: "beam_search"
# Number of beams to decode (1 = greedy decoding). Ignored when decoding
# through nucleus sampling.
BEAM_SIZE: 5
# Size of nucleus for sampling predictions. Ignored when decoding through
# beam search.
NUCLEUS_SIZE: 0.9
# Maximum length of decoded caption. Decoding may end earlier when [EOS]
# token is sampled.
MAX_DECODING_STEPS: 50 # Same as DATA.MAX_CAPTION_LENGTH
Remove obsolete modules and rename config parameters.
This version is a small increment over v1.0
with only cosmetic changes and obsolete code removals. The final results of models rained from this codebase would remain unchanged.
Removed feature extraction support:
-
Removed
virtex.downstream.FeatureExtractor
and its usage inscripts/clf_voc07.py
. By default, the script will only evaluate on global average pooled features (2048-d), as with the CVPR 2021 paper version. -
Removed
virtex.modules.visual_backbones.BlindVisualBackbone
. I introduced it a long time ago for debugging, it is not much useful anymore.
Two config-related changes:
-
Renamed config parameters: (
OPTIM.USE_LOOKAHEAD
—>OPTIM.LOOKAHEAD.USE
), (OPTIM.LOOKAHEAD_ALPHA
—>OPTIM.LOOKAHEAD_ALPHA
) and (OPTIM.LOOKAHEAD_STEPS
—>OPTIM.LOOKAHEAD.STEPS
). -
Renamed
TransformerTextualHead
toTransformerDecoderTextualHead
for clarity. Model names in config also change accordingly:"transformer_postnorm"
—>"transdec_postnorm"
(same for prenorm).
These changes may be breaking if you wrote your own config and explicitly added these arguments.
CVPR 2021 release
CVPR 2021 release of VirTex.
Code and pre-trained models can reproduce results according
to the paper: https://arxiv.org/abs/2006.06666v2