Seq2seq initially was there for machine translation by Google. Earlier there was a naïve way of translation in Python for Data Science. Each word was to convert to target language, but there was no regard for grammar and sentence correction. Seq2seq uses the method of deep learning for the translation process. It takes the word as well as the neighborhood into account for translation.
We use it for different applications like image captioning, text summarization, conversational models, etc.
Working of Seq2seq:
It takes input as a sequence and generates an output sequence of words. It takes the help of recurrent neural network (RNN) for this process. RNN suffers from the problem of vanishing gradient. LSTM is there in version, which is propose by Google. It develops context of words by taking two inputs at each point of time.
We have two components here, encoder and decoder so, it is also called Encoder-Decoder Network.
Encoder: Deep neural network layers are there and converts the input words for corresponding hide vectors. Each vector will represent current word and context of the word.
Decoder: It is similar to encoder. Input is as hide vector generate by encoder, its hide states and current word for producing the next hide vector and at last predicting the next word.
Apart from the two discussed many other optimizations have lead to other components of seq2seq:
- Attention: Input to decoder is a single vector that has to store all information about context. This will become a problem with larger sequences. The attention mechanism applied allows decoder to look at input sequence selectively.
- Beam Search: Highest probability word is select as output by decoder. But it will not always yield best results, due to problem of greedy algorithms. So, we will apply beam search that will suggest possible translation at each step. We will do this by making a tree of top k-result.
- Bucketing: Variable length sequences are possible in a seq2seq model because padding of 0’s done to both input and output. If the maximum length set by us is 100, and the sentence is 3 words long, it will cause a huge wastage of space. We will use the bucketing concept for this. We will also make buckets of different sizes.