Skip to content

Deep learning based content moderation from text, audio, video & image input modalities.

License

Notifications You must be signed in to change notification settings

fcakyon/content-moderation-deep-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deep-learning-content-moderation

Various sources for deep learning based content moderation, sensitive content detection, scene genre classification, nudity detection, violence detection, substance detection from text, audio, video & image input modalities.

citation

If you find this source useful, please consider citing it in your work as:

@INPROCEEDINGS{10193621,
  author={Akyon, Fatih Cagatay and Temizel, Alptekin},
  booktitle={2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)}, 
  title={State-of-the-Art in Nudity Classification: A Comparative Analysis}, 
  year={2023},
  pages={1-5},
  keywords={Analytical models;Convolution;Conferences;Transfer learning;Benchmark testing;Transformers;Safety;content moderation;nudity detection;safety;transformers},
  doi={10.1109/ICASSPW59220.2023.10193621}}
@article{akyon2022contentmoderation,
  title={Deep Architectures for Content Moderation and Movie Content Rating},
  author={Akyon, Fatih Cagatay and Temizel, Alptekin},
  journal={arXiv},
  doi={https://doi.org/10.48550/arXiv.2212.04533},
  year={2022}
}

table of contents

datasets

movie and content moderation datasets

name paper year url input modality task labels
LSPD pdf 2022 page image, video image/video classification, instance segmentation porn, normal, sexy, hentai, drawings, female/male genital, female breast, anus
MM-Trailer pdf 2021 page video video classification age rating
Movienet scholar 2021 page image, video, text object detection, video classification scene level actions and places, character bboxes
Movie script severity dataset pdf 2021 github text text classification frightening, mild, moderate, severe
LVU pdf 2021 page video video classification relationship, place, like ration, view count, genre, writer, year per movie scene
Violence detection dataset scholar 2020 github video video classification violent, not-violent
Movie script dataset pdf 2019 github text text classification violent or not
Nudenet github 2019 archive.org image image classification nude or not
Adult content dataset pdf 2017 contact image image classification nude or not
Substance use dataset pdf 2017 first author image image classification drug related or not
NDPI2k dataset pdf 2016 contact video video classification porn or not
Violent Scenes Dataset springer 2014 page video video classification blood, fire, gun, gore, fight
VSD2014 pdf 2014 download video video classification blood, fire, gun, gore, fight
AIIA-PID4 pdf 2013 - image image classification bikini, porn, skin, non-skin
NDPI800 dataset scholar 2013 page video video classification porn or not
HMDB-51 scholar 2011 page video video classification smoke, drink

techniques

sensitive content detection

movie content rating

name paper year model features datasets tasks context
Movies2Scenes: Learning Scene Representations Using Movie Similarities scholar 2022 ViT-like video encoder + MLP ViT-like video encoder embedings Private, Movienet, LVU movie scene representation learning, video classifcation (sex, violence, drug-use) movie scene content rating
Detection and Classification of Sensitive Audio-Visual Content for Automated Film Censorship and Rating pdf 2022 CNN + GRU + MLP CNN embeddings from video frames Violence detection dataset violent/non-violent classification from videos movie scene content rating
Automatic parental guide ratings for short movies page 2021 separate model for each task: concat + LSTM, object detector, one-class CNN embeddings video frame pixel values, image embeddings, text Nudenet, private dataset profanity, violence, nudity, drug classification movie content rating
From None to Severe: Predicting Severity in Movie Scripts scholar 2021 multi-task pairwise ranking-classification network GloVe, Bert and TextCNN text embeddings Movie script severity dataset rating classifcation (frightening, mild, moderate, severe) movie content rating
A Case Study of Deep Learning-Based Multi-Modal Methods for Labeling the Presence of Questionable Content in Movie Trailers scholar 2021 multi-modal + multi output concat+MLP CNN+LSTM video features, Bert and DeepMoji text embeddings, MFCC audio features MM-Trailer rating classifcation (red, yellow, green) movie trailer content rating
Automatic Parental Guide Scene Classification Menggunakan Metode Deep Convolutional Neural Network Dan Lstm scholar 2020 3 CNN model for 3 modality, multi-label dataset CNN video and audio embeddings, LSTM text (subitle) embeddings private dataset gore, nudity, drug, profanity classification from video and subtitle movie scene content rating
Multimodal data fusion for sensitive scene localization scholar 2019 meta-learning with Naive Bayes, SVM MFCC and prosodic features from audio, HOG and TRoF features from images Pornography-2k dataset, VSD2014 violent and pornographic scene localization from video movie scene content rating
A Deep Learning approach for the Motion Picture Content Rating scholar 2019 MLP + rule-based decision InceptionV3 image embeddings Violent Scenes Dataset, private dataset violence (shooting, blood, fire, weapon) classification from video movie scene content rating
Hybrid System for MPAA Ratings of Movie Clips Using Support Vector Machine springer 2019 SVM DCT features from image private dataset movie content rating classification from images movie content rating
Inappropriate scene detection in a video stream page 2017 SVM classifier + Lenet image classifier + rules-based decision HoG and CNN features for image private dataset image classification: no/mild/high violence, safe/unsafe/pornoghraphy movie frame content rating

content moderation

name paper year model features datasets tasks context
State-of-the-Art in Nudity Classification: A Comparative Analysis ieee 2023 CNN, Transformers EfficientNet, ViT, ConvNeXT image embeddings LSPD, Nudenet, NDPI2k nudity classification from images general content moderation
Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild scholar 2022 novel threshold optimization tech. (TruSThresh) prediction scores UnSmile (Korean hatespeech dataset) optimum threshold prediction social media content moderation
On-Device Content Moderation scholar 2021 mobilenet v3 + SSD object detector mobilenet v3 image embeddings private dataset object detection + nudity classification from images on-device content moderation
Gore Classification and Censoring in Images scholar 2021 ensemble of CNN + MLP mobilenet v2, densenent, vgg16 image embeddings private dataset gore classification from images general content moderation
Automated Censoring of Cigarettes in Videos Using Deep Learning Techniques scholar 2020 CNN + MLP inception v3 image embeddings private dataset cigarette classification from video general content moderation
A Multimodal CNN-based Tool to Censure Inappropriate Video Scenes scholar 2019 CNN + SVM InceptionV3 image embeddings, AudioVGG audio embeddings private dataset inappropriate (nudity+gore) classification from video general video content moderation
A baseline for NSFW video detection in e-learning environments scholar 2019 concat + SVM, MLP InceptionV3 image embeddings, AudioVGG audio embeddings YouTube8M, NDPI, Cholec80 nudity classification from video e-learning content moderation
Bringing the kid back into youtube kids: Detecting inappropriate content on video streaming platforms scholar 2019 CNN + LSTM (late fusion) CNN based encoder for image, video and audio spectrograms private dataset video classification: orignal, fake explicit, fake violent social media content moderation

movie/scene genre classification

name paper year model features datasets tasks
Effectively leveraging Multi-modal Features for Movie Genre Classification scholar 2022 embeddings + fusion + MLP CLIP image embeddings, PANNs audio embeddings, CLIP text embeddings MovieNet movie genre classification
OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification scholar 2022 embeddings + novel transformer ResNet-18 image embeddings, ResNet-VLAD audio embeddings TI-News news scene segmentation/classification (studio, outdoor, interview)
Detection of Animated Scenes Among Movie Trailers scholar 2022 CNN + GRU EfficientNet image embeddings Private dataset genre classification from movie trailer scenes
A multi-label movie genre classification scheme based on the movie's subtitles springer 2022 KNN text frequency vectors Private dataset genre classification from movie subtitle text
A multimodal approach for multi-label movie genre classification scholar 2020 CNN + LSTM MFCCs/SSD/LBP from audio, LBP/3DCNN from video frames, Inception-v3 from poster, TFIDF from text Private dataset genre classification from movie trailers
Genre classification of movie trailers using 3d convolutional neural networks ieee 2020 3D CNN images Private dataset genre classification from movie trailer scenes
A unified framework of deep networks for genre classification using movie trailer scholar 2020 CNN + LSTM Inception V4 image embeddings EmoGDB genre classification from movie trailer scenes
Towards story-based classification of movie scenes scholar 2020 logistic regression manually extracted categorical features Flintstones Scene Dataset scene classification (Obstacle, Midpoint, Climax of Act 1)

multimodal architectures

synchronous multimodal architectures

name paper year model features datasets tasks modalities
M&M Mix: A Multimodal Multiview Transformer Ensemble scholar 2022 transformer with 2 cls heads ViT image embeddings from audio spect., frame image, optical flow Epic-Kitchens video/action classification image + audio + optical flow
MultiMAE: Multi-modal Multi-task Masked Autoencoders scholar 2022 transformer with 3 decoder + cls heads ViT-like image enc. patch embeddings (optional modalities) ImageNet: Pseudo labeled multi-task training dataset (depth, segm) image cs., semantic segm., depth est. image + depth map
Data2vec: A general framework for self-supervised learning in speech, vision and language scholar 2022 single encoder transformer based audio, text, image encoder embeddings ImageNet, Librispeech masked pretraining image + audio + text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text scholar 2022 1 encoder per modality transformer based audio, text, image encoder embeddings AudioSet, HowTo100M pretraining + video/audio classification image + audio + text
Expanding Language-Image Pretrained Models for General Video Recognition scholar 2022 1 encoder per modality transformer based video, text encoder embeddings HMDB-51, UCF-101 contrastive pretraining video + text
Audio-Visual Instance Discrimination with Cross-Modal Agreement scholar 2021 1 encoder per modality CNN based audio, video encoder embeddings HMDB-51, UCF-101 video/audio classification video + audio
Robust Audio-Visual Instance Discrimination scholar 2021 1 encoder per modality CNN based audio, video encoder embeddings HMDB-51, UCF-101 video/audio classification video + audio
Learning transferable visual models from natural language supervision scholar 2021 1 encoder per modality transformer based image, text encoder embeddings JFT-300M contrastive pretraining image + text
Self-supervised multimodal versatile networks scholar 2020 multiple encoders CNN based image/audio embeddings, word2vec text embeddings UCF101, Kinetics, AudioSet contrastive pretraining + classification image + audio + text
Uniter: Universal image-text representation learning scholar 2020 multimodal encoder combined embeddings COCO, Visual Genome, Conceptual Captions qa/image-text retrieval image + text
12-in-1: Multi-task vision and language representation learning scholar 2020 multimodal encoder combined embeddings COCO, Flickr30k qa/image-text retrieval image + text
Two-stream convolutional networks for action recognition in videos scholar 2014 1 encoder per modality CNN based audio, text encoder embeddings HMDB-51, UCF-101 video/audio classification video + optical flow

asynchronous multimodal architectures

name paper year model features datasets tasks modalities
OmniMAE: Single Model Masked Pretraining on Images and Videos scholar 2022 transformer with 1 cls. head ViT-like image/video enc. patch embeddings ImageNet, SSv2 video/action classification image + video
OMNIVORE: A Single Model for Many Visual Modalities scholar 2022 transformer with 3 cls. heads ViT-like image/video enc. patch embeddings ImageNet, Kinetics, SSv2, SUN RGB-D image cls., action recog., depth est. image + video + depth map
Polyvit: Co-training vision transformers on images, videos and audio scholar 2021 transformer with 9 cls. heads ViT-like image/video/audio enc. embeddings ImageNet, CIFAR, Kinetics, Moments in Time, AudioSet, VGGSound image cls., video cls., audio cls. image + video + audio

action recognition

with transformers

name paper year model features datasets tasks
Frozen CLIP Models are Efficient Video Learners scholar 2022 transformer with 1 cls head CLIP image embeddings ImageNet, Kinetics, SSv2 action recognition
Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training scholar 2022 transformer with 1 cls head ViT-like video enc. patch embeddings Kinetics, SSv2 action recognition
Bevt: Bert pretraining of video transformers scholar 2022 encoder-decoder transformer VideoSwin image/video enc. embeddings Kinetics, SSv2 action recognition
Video swin transformer scholar 2022 Swin trans. with cls.head Swin video enc. embeddings Kinetics, SSv2 action recognition
Is space-time attention all you need for video understanding? scholar 2021 transformer with cls. head ViT-like video enc. patch embeddings Kinetics, SSv2 action recognition

with 3D CNNs

name paper year model features datasets tasks
X3d: Expanding architectures for efficient video recognition scholar 2020 CNN with cls. head 3D CNN based video enc. embeddings Kinetics, SSv2 action recognition
Slowfast networks for video recognition scholar 2019 CNN with cls. head 3D CNN based video enc. embeddings Kinetics, SSv2 action recognition
A closer look at spatiotemporal convolutions for action recognition (R2+1D) scholar 2018 CNN with cls. head 3D CNN based video enc. embeddings Kinetics, HMDB-51, UCF-101 action recognition
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (I3D) scholar 2017 CNN with cls. head 3D CNN based video enc. embeddings Kinetics, HMDB-51, UCF-101 action recognition

contrastive representation learning

name paper date
Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text scholar 2021
Supervised contrastive learning scholar 2020

review papers

name paper date
Machine Learning Models for Content Classification in Film Censorship and Rating pdf 2022
A survey of artificial intelligence strategies for automatic detection of sexually explicit videos scholar 2022
A survey on video content rating: taxonomy, challenges and open issues pdf 2021
Multimodal Learning with Transformers: A Survey scholar 2022
A Survey Paper on Movie Trailer Genre Detection scholar 2020

tools

name url description
safetext github multilingual swear word detection and filtering from strings
PySceneDetect github Python and OpenCV-based scene cut/transition detection program & library
LAION safety toolkit github NSFW detector trained on LAION dataset
pysrt github Python parser for SubRip (srt) files
ffsubsync github Automagically synchronize subtitles with video.
MoviePy github Video editing with Python