Machine Translation using RNNs

We have built a profound neural network that capacities as a major aspect of a machine translation pipeline. The pipeline acknowledges English content/picture as info and returns the French interpreted content. We have also used Image to text conversion and then English text to French text translation.

Approach

To translate a corpus of English text to French, we have built a custom Architecture as we have tried different combinations of layers and implemented the one which gave the best result. We have also used Image to text conversion and then English text to French text translation.

Dataset

The Europarl is a standard dataset used for statistical machine translation, and more recently, neural machine translation.It comprises the proceedings of the European Parliament, hence the name of the dataset as the contraction Europarl. The proceedings are the transcriptions of speakers at the European Parliament, which are translated into 11 different languages.
Link of the dataset - Europal Dataset

Pipeline

Below is a summary of the various preprocessing and modeling steps. The high-level steps include:

Preprocessing: converting text to sequence of integers.
Modeling: Creating a model which accepts a sequence of integers as input and returns a probability distribution over possible translations.
Prediction: Running the model on English text. Model will convert the text into french so it compares it with the original french text.
Iteration: iterating on the models, experimenting with different architectures to find out the best model for our problem.

Architecture

Inputs — Input sequences are fed into the model with one word for every time step. Each word is encoded as a unique integer that maps to the English dataset vocabulary.
Embedding Layers — Embeddings are used to convert each word to a vector. The size of the vector depends on the complexity of the vocabulary.
Recurrent Layers (Encoder) — This is where the context from word vectors in previous time steps is applied to the current word vector.
Dense Layers (Decoder) — These are typical fully connected layers used to decode the encoded input into the correct translation sequence.
Outputs — The outputs are returned as a sequence of integers which can then be mapped to the French dataset vocabulary.

Results

Validation accuracy: 85%
Training time: 50 epochs

Toolset

Text Detection and Extraction using OpenCV and Pytesseract. Keras for frontend Tensorflow for Backend

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
README.md		README.md
english_french.ipynb		english_french.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Translation using RNNs

Approach

Dataset

Pipeline

Architecture

Results

Toolset

About

Releases

Packages

Languages

NateshReddy/Machine-Translation-using-RNNs

Folders and files

Latest commit

History

Repository files navigation

Machine Translation using RNNs

Approach

Dataset

Pipeline

Architecture

Results

Toolset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages