GloVe and FastText: Building on Word2Vec’s Foundation#
word2vec inspires a lot of follow-up work. Here we will introduce two notable ones: GloVe and FastText.
GloVe: Looking at the Big Picture#
GloVe approaches word embeddings from a fundamentally different perspective than Word2Vec. While Word2Vec learns incrementally by scanning through text with small context windows, predicting words from their neighbors, GloVe takes a more global approach by analyzing the entire corpus’s word co-occurrence patterns at once.
Key Differences from Word2Vec#
While Word2Vec learns incrementally by predicting context words through a neural network architecture, GloVe takes a more direct approach through matrix factorization. It explicitly models relationships between all word pairs at once, allowing it to capture global patterns that Word2Vec’s local window approach might miss. GloVe’s mathematical foundation as a matrix factorization model, similar to LSA but with improved weighting, makes its training objective more interpretable and connects it naturally to classical statistical methods.
The key insight behind GloVe is that the ratio of co-occurrence probabilities carries meaningful information. Let’s look at a concrete example:
import gensim.downloader as api
import numpy as np
from tabulate import tabulate
# Load pre-trained GloVe vectors
glove = api.load('glove-wiki-gigaword-100')
# Demonstrate word relationships
word_pairs = [
('ice', 'water', 'steam'),
('king', 'queen', 'man'),
('computer', 'keyboard', 'screen')
]
def analyze_relationships(model, word_pairs):
results = []
for w1, w2, w3 in word_pairs:
# Calculate cosine similarities
sim12 = model.similarity(w1, w2)
sim23 = model.similarity(w2, w3)
sim13 = model.similarity(w1, w3)
results.append([f"{w1}-{w2}", sim12])
results.append([f"{w2}-{w3}", sim23])
results.append([f"{w1}-{w3}", sim13])
results.append(['---', '---'])
print(tabulate(results, headers=['Word Pair', 'Similarity'],
floatfmt=".3f"))
print("Analyzing word relationships in GloVe:")
analyze_relationships(glove, word_pairs)
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import gensim.downloader as api
2 import numpy as np
3 from tabulate import tabulate
File ~/miniforge3/envs/applsoftcomp/lib/python3.10/site-packages/gensim/__init__.py:11
7 __version__ = '4.3.2'
9 import logging
---> 11 from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils # noqa:F401
14 logger = logging.getLogger('gensim')
15 if not logger.handlers: # To ensure reload() doesn't add another one
File ~/miniforge3/envs/applsoftcomp/lib/python3.10/site-packages/gensim/corpora/__init__.py:6
1 """
2 This package contains implementations of various streaming corpus I/O format.
3 """
5 # bring corpus classes directly into package namespace, to save some typing
----> 6 from .indexedcorpus import IndexedCorpus # noqa:F401 must appear before the other classes
8 from .mmcorpus import MmCorpus # noqa:F401
9 from .bleicorpus import BleiCorpus # noqa:F401
File ~/miniforge3/envs/applsoftcomp/lib/python3.10/site-packages/gensim/corpora/indexedcorpus.py:14
10 import logging
12 import numpy
---> 14 from gensim import interfaces, utils
16 logger = logging.getLogger(__name__)
19 class IndexedCorpus(interfaces.CorpusABC):
File ~/miniforge3/envs/applsoftcomp/lib/python3.10/site-packages/gensim/interfaces.py:19
7 """Basic interfaces used across the whole Gensim package.
8
9 These interfaces are used for building corpora, model transformation and similarity queries.
(...)
14
15 """
17 import logging
---> 19 from gensim import utils, matutils
22 logger = logging.getLogger(__name__)
25 class CorpusABC(utils.SaveLoad):
File ~/miniforge3/envs/applsoftcomp/lib/python3.10/site-packages/gensim/matutils.py:20
18 import scipy.sparse
19 from scipy.stats import entropy
---> 20 from scipy.linalg import get_blas_funcs, triu
21 from scipy.linalg.lapack import get_lapack_funcs
22 from scipy.special import psi # gamma function utils
ImportError: cannot import name 'triu' from 'scipy.linalg' (/Users/skojaku-admin/miniforge3/envs/applsoftcomp/lib/python3.10/site-packages/scipy/linalg/__init__.py)
These similarities demonstrate how GloVe captures semantic relationships. Notice how related word pairs (like ‘ice-water’ and ‘water-steam’) have higher similarities than less related pairs (like ‘ice-steam’).
Let’s also look at how GloVe handles analogies:
# Demonstrate analogies
analogies = [
('king', 'man', 'queen', 'woman'), # gender relationship
('paris', 'france', 'london', 'england'), # capital-country
('walk', 'walked', 'run', 'ran') # verb tense
]
def test_analogy(model, word1, word2, word3, expected):
result = model.most_similar(
positive=[word3, word2],
negative=[word1],
topn=1
)[0]
print(f"{word1} : {word2} :: {word3} : {result[0]}")
print(f"Expected: {expected}, Confidence: {result[1]:.3f}\n")
print("Testing analogies in GloVe:")
for w1, w2, w3, w4 in analogies:
test_analogy(glove, w1, w2, w3, w4)
GloVe’s global co-occurrence statistics allow it to perform well on analogy tasks because it captures the overall structure of the language, not just local patterns. This is particularly evident in:
Semantic Relationships: GloVe can identify pairs of words that appear in similar contexts across the entire corpus
Syntactic Patterns: It captures grammatical relationships by learning from how words are used globally
Proportional Analogies: The famous “king - man + woman = queen” type relationships emerge naturally from the co-occurrence patterns
The success of GloVe in these tasks demonstrates the value of its matrix factorization approach and global statistics, complementing the incremental learning strategy of Word2Vec. Both approaches have their strengths, and understanding their differences helps in choosing the right tool for specific NLP tasks.
Note
The ability to capture global statistics makes GloVe particularly good at analogies, but it requires more memory during training than Word2Vec because it needs to store the entire co-occurrence matrix.
FastText: Understanding Parts of Words#
FastText took a different approach to improving word embeddings. Its key insight was that words themselves have internal structure that carries meaning. Consider how you understand a word you’ve never seen before, like “unhelpfulness.” Even if you’ve never encountered this exact word, you can understand it by recognizing its parts: “un-” (meaning not), “help” (the root word), and “-fulness” (meaning the quality of).
FastText implements this insight through several key mechanisms:
Subword Generation: Break words into character n-grams
Example: “where” → “<wh”, “whe”, “her”, “ere”, “re>”
The < and > marks show word boundaries
Vector Creation:
Each subword gets its own vector
A word’s final vector is the sum of its subword vectors
This allows handling of new words!
Let’s see FastText in action:
from gensim.models import FastText
# Train a simple FastText model
sentences = [
["the", "quick", "brown", "fox", "jumps"],
["jumping", "is", "an", "action"],
["quick", "movement", "requires", "energy"]
]
model = FastText(sentences, vector_size=100, window=3, min_count=1)
# FastText can handle words not in training
print("Similar words to 'jumped' (not in training):")
print(model.wv.most_similar('jumped'))
When to Use Each Approach#
The choice between these models often depends on your specific needs:
GloVe is Better For:#
Capturing broad thematic relationships
Working with a fixed vocabulary
Tasks involving analogies and word relationships
FastText Excels When:#
You expect new or misspelled words
Working with morphologically rich languages
Handling rare words is important
Modern Impact#
The innovations from GloVe and FastText haven’t been forgotten in modern NLP. Today’s large language models like BERT and GPT incorporate insights from both approaches:
They use subword tokenization (like FastText)
Their attention mechanisms capture both local and global patterns
They can generate context-dependent representations
The evolution from Word2Vec to GloVe to FastText shows how different perspectives on word representation have contributed to our understanding of language. Each model made unique contributions while building on previous insights, laying the groundwork for today’s more sophisticated approaches.
Note
Think of these models as different ways of looking at the same problem:
Word2Vec is like learning from conversations
GloVe is like analyzing entire books at once
FastText is like understanding word roots and affixes