This is a convex function of u_i. Meaning that the gradient is 0 at the minimum. Thus, by taking derivative with respect to u_i, we can find the optimal u_i by solving the following equation:
It is minimized when \lambda is the largest eigenvalue. So the best u is the eigenvector corresponding to the largest eigenvalue.
Why eigenvectors?
Intuition behind eigenvectors
import igraph as igimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsA = ig.Graph.Famous("Zachary").get_adjacency_sparse() # Load the karate club networkeigvals, eigvecs = np.linalg.eig(A.toarray()) # Eigenvalues and eigenvectorseigvals, eigvecs = np.real(eigvals), np.real(eigvecs)fig, axes = plt.subplots(1,4, figsize=(15,3))for i inrange(3): u = eigvecs[:, i].reshape((-1,1)) lam = eigvals[i] basisMatrix = u @ u.T sns.heatmap(basisMatrix, ax=axes[i+1], cmap="coolwarm", center=0) axes[i+1].set_title(f"Lambda={lam:.2f}")sns.heatmap(A.toarray(), ax=axes[0], cmap="coolwarm", center=0)axes[0].set_title("Adjacency Matrix")plt.show()
The d eigenvectors associated with the largest d eigenvalues give the optimal solution that minimizes the reconstruction error for the d dimensional case.
Fed up with the math? Let’s try it out! 🧑💻
import numpy as npimport igraph as igA = ig.Graph.Famous("Zachary").get_adjacency_sparse() # Load the karate club network
Task 1: Compute the eigenvectors and eigenvalues of A.
Task 2: Compute the sum of reconstructed matrices using the first d eigenvectors for d=1,2,3,4,5 and compare it with the original adjacency matrix A.
where u_i is the eigenvector corresponding to the i-th largest eigenvalue \lambda_i.
Let’s try it out more!
The adjacency matrix is not the only matrix that represents a network.
What do the eigenvectors of the following matrices look like? Compute the eigenvectors, visualize them (e.g., heatmap) and see what they are likely to represent.
Laplacian matrix L = D - A, where D is the degree matrix.
Normalized Laplacian matrix L_n = I - D^{-1/2} A D^{-1/2}.
Work with Zachary’s karate club network.
import numpy as npimport igraph as igA = ig.Graph.Famous("Zachary").get_adjacency_sparse() # Load the karate club network
Idea 2: Graph Cut ✂️
Graph cut Problem
Graph cut Problem ✂️
Disconnect a graph into two components by cutting the minimum number of edges
What is the solution to this optimization problem?
Neural Embedding Methods
Neural networks for embedding
How can I apply neural networks to embedding?
Run random walks
Treat the walks as sentences
Apply neural networks to predict temporal correlations between words
DeepWalk & node2vec: Use word2vec to learn node embeddings from random walks
Note:
This is one way. Another popular way is to use convolution inspired from image processing.
CBOW Model
Skipgram
node2vec 📝
Learn multi-step transition probabilities of random walks
High probability ~ close in the embedding space
Note:
Precisely speaking, this is not an accurate model description. node2vec is trained on a biased training algorithm. Consequently, two frequently co-visited nodes are not always embedded closely. See paper
node2vec random walks
Biased Random Walk:
\begin{align*}
P(x_{t+1}|x_t, x_{t-1})
\propto\begin{cases}
\frac{1}{p} & \text{Return to } x_{t-1} \\
1 & \text{Move to a neighbor $x_{t+1}$ directly connected to } x_{t-1} \\
\frac{1}{q} & \text{Move to a neighbor $x_{t+1}$ *not* directly connected to } x_{t-1}
\end{cases}
\end{align*}
Parameters:
p: Return parameter (lower = more backtracking)
q: Exploration parameter (lower = more exploration)
which control the walker to move away from the previous node or stay locally.
Example: Les Misérables Network 📚
Complementary visualizations of Les Misérables coappearance network generated by node2vec with label colors reflecting homophily (top) and structural equivalence (bottom).
All the embedding methods we’ve seen so far use Euclidean space (flat geometry).
But what if our network has a hierarchical structure—like a tree or organization chart?
Can flat space efficiently capture exponentially growing hierarchies?
Take 30 seconds to think about it…
The Challenge with Euclidean Space
Problem: Real networks have
Hierarchical structures
Scale-free properties
Strong clustering + small-world
Issue: In flat (Euclidean) space, volume grows polynomially with radius
But: Tree-like hierarchies grow exponentially
This is a fundamental mismatch!
What is Hyperbolic Space?
Hyperbolic geometry = curved space with negative curvature
Key Property:
Volume grows exponentially with radius—just like trees!
This naturally captures:
Scale-free degree distributions
Strong clustering
Small-world property
Self-similarity
Poincaré disk: hyperbolic space visualized in a circle
The Popularity-Similarity Framework
How do real networks actually grow? 🤔
Question:
2012: Researchers analyzed Internet, metabolic, and social networks
Do new nodes just connect to popular nodes (preferential attachment)?
Or is there something more going on? Think about a case (e.g., social network, transportation network, etc.) where it is not the case.
Answer: It’s Both! ⚖️
Discovery:
Networks grow by balancing two factors:
Popularity: Connect to well-established nodes
Similarity: Connect to similar nodes
This optimization naturally emerges in hyperbolic space!
Popularity → radial position (birth time)
Similarity → angular distance
New connections → hyperbolically closest nodes
How does the model actually work? 🔧
Question:
If networks grow by balancing popularity and similarity, what are the precise rules?
How do we generate a network that captures this?
Let’s break down the mathematical rules…
The Core Model: Popularity × Similarity
Network Generation Process:
At each time t = 1, 2, 3, \dots, add a new node t
Each node t gets coordinates:
Random angular position on a circle, \theta_t
Radial position based on birth time, r_t = \ln t (older = more popular)
New node t connects to the m nodes that minimize:
s \cdot \theta_{st} \quad \text{for } s < t
Key property: In this space, the rule “minimize s \cdot \theta_{st}” is mathematically equivalent to connecting to the mhyperbolically closest nodes.
The Connection Rule: Balancing Two Forces ⚖️
Key insight:
The number of nodes to connect to depends on both popularity and similarity
If a node is very popular, it should connect to many nodes
If a node is surrounded by many similar nodes, it should connect to many nodes
The radius of the node is often referred to as the implicit degree.
Example: European airports have high degree in part becaues there are many airports in Europe. Their implicit degree might be lower than what the degree suggests.
Preferential attachment is not a primitive mechanism
Probability \Pi(k) \propto k emerges naturally
It’s a consequence of geometric optimization!
Mathematical Models of Hyperbolic Space
Two Ways to Represent Hyperbolic Space
Question:
How can we mathematically represent this curved space in a computer?
We have two main models (they’re mathematically equivalent but computationally different):
Advantage: More efficient optimization (gradients in Euclidean ambient space)
Hyperboloid projects onto Poincaré disk
Hyperbolic Embeddings in Practice
What do hyperbolic embeddings look like?
Natural Hierarchy Emerges! 🌳
Center: Abstract terms (“Entity”, “Object”)
Moving outward: Increasingly specific
“Material” → “Wood” → “Hardwood”
Key insight: Hyperbolic space automatically organizes concepts hierarchically
No explicit supervision needed!
Why This Works:
The exponential volume growth in hyperbolic space naturally accommodates the exponentially growing number of specific concepts at lower hierarchy levels