NOTEBOOKS:
==================================
Word2Vec was great at learning word representations, but it lacked the ability to process or generate sequences (e.g., sentences).
What Seq2Seq Introduced:
==================================
Encoder: Processes the input sequence (e.g., a sentence) and converts it into a fixed-size vector (context vector).
Decoder: Takes the context vector and generates the output sequence (e.g., the translated sentence).
==================================
==================================
Introduced: 1980s
Strength: First model to handle sequences by using a “hidden state” to retain information.
Limitation:
Introduced: Paper published 1997
Strength: Improved version of RNN designed to solve the vanishing gradient problem.
Key Feature:
Limitation:
Introduced: 2017 (by Google researchers)
Strength: Revolutionized sequence processing with self-attention and parallelization.
Key Features:
Results: Can handle massive datasets and generate state-of-the-art results in translation, summarization, and text generation (e.g., GPT-3, BERT).
==================================
Transformers overcame these challenges:
==================================
Positional Encodings:
Example: ‘hot dog’ together have differnt meaning as ‘dog that is hot’
Attention Mechanism:
Attention allows the model to focus on specific parts of the input sentence when making predictions It works like a heatmap for how much focus the model gives to certain words. |
Example:
The animal didn’t cross the street because it was too tired.
The animal didn’t cross the street because it was too wide.
Self-attention allows the model to recognize that “it” refers to “the animal” or “the street”!
==================================
Transfer learning is a crucial technique in NLP, especially with the advent of large pre-trained models like BERT (Bidirectional Encoder Representations from Transformers), GPT, and others. These models are trained on massive datasets and can be fine-tuned on specific tasks such as binary classification (e.g., classifying reviews as positive or negative) by leveraging their learned knowledge.
Pre-training:
Fine-tuning: