By Urvi 🎉
Question: Can a network with clustering coefficient = 1 be small-world?
Expected Answer: Student’s Answer: No, that’s just a complete graph (shortest path length = 1, but trivial).
LLM’s Answer: LLM’s Answer: Yes, a network with a clustering coefficient of 1 can be a small-world network. The Watts-Strogatz model demonstrates this precisely. A clustering coefficient of 1 indicates that every node is part of a fully connected “clique” with its immediate neighbors …
By Sarah 🎉
Question: What can we conclude about the small world-ness of a network where it’s average clustering coefficient and global clustering coefficient are both equal to 1?
Expected Answer: We conclude it is not a small world network since having both coefficients equal to 1 signifies a fully connected network which therefore cannot also be small-world.
LLM’s Answer: The small-world property of a network, characterized by both high clustering and short average path lengths, is concluded when both the average clustering coefficient and the global clustering coefficient are equal to 1. This signifies that every node in the network is highly connected to its immediate neighbors, forming dense local communities (high clustering). Simultaneously, it indicates that, on average, any two nodes in the network are only a few steps away from each other (short average path length).
By Amahury 🎉
Question: True or False (justify briefly): The simple cycle graph C_N with N ≥ 5 is small-world (σ > 1) under the Humphries–Gurney definition when using C as transitivity and an Erdős–Rényi baseline with the same N and M.
Expected Answer: Student’s Answer: False. C_N has no triangles, so C = 0. With an ER baseline (same N, M) we have C_rand > 0 and L/L_rand is finite, hence σ = (0/C_rand)/(L/L_rand) = 0 < 1.
LLM’s Answer: The statement “The simple cycle graph C_N with N ≥ 5 is small-world (σ > 1) under the Humphries–Gurney definition when using C as transitivity and an Erdős–Rényi baseline with the same N and M” is true.
What we will learn in this module:
Keywords: community detection, assortativity, modularity, resolution limit, rugged landscape, random graph, label switching algorithm, Louvain algorithm, stochastic block model, the configuration model.
Image of cliques
Advantages:
Allows overlapping communities, based on strong local cohesion, and parameter \(k\) controls the number of communities.
Real-world groups are rarely perfect cliques. We relax the definition along three dimensions:
Combine dimensions to capture tightly-knit community structures.
Goal: Minimize the number of edges needed to cut to separate the graph into communities. \[ \text{argmin}_{V_1, V_2} \text{Cut}(V_1, V_2) = \sum_{i \in V_1} \sum_{j \in V_2} A_{ij}, \]
where \(V_1\) and \(V_2\) are the disjoint sets of nodes (i.e., \(V_1 \cap V_2 = \emptyset\) and \(V_1 \cup V_2 = V\)), and \(A_{ij}\) is the adjacency matrix.
This problem statement is incomplete 🫣. Find out what’s missing by playing with the following game. Graph Cut Problem 🎮
To avoid trivial cuts, we need to balance community sizes.
Ratio Cut: Penalizes small communities by normalizing by size. \[ \text{RatioCut}(\{V_c\}) = \sum_c \frac{\text{Cut}(V_c, V \setminus V_c)}{|V_c|} \]
Normalized Cut: Normalizes by community volume (sum of degrees, i.e., \(\text{vol}(V_c) = \sum_{i \in V_c} k_i\)). \[ \text{N-Cut}(\{V_c\}) = \sum_c \frac{\text{Cut}(V_c, V \setminus V_c)}{\text{vol}(V_c)} \]
We will learn how to solve these problems in Module 08!
Ratio Cut: \[ \text{RatioCut}(\{V_c\}) = \sum_c \frac{\text{Cut}(V_c, V \setminus V_c)}{|V_c|} \]
Normalized Cut. \[ \text{N-Cut}(\{V_c\}) = \sum_c \frac{\text{Cut}(V_c, V \setminus V_c)}{\text{vol}(V_c)} \]
Modularity is perhaps the most celebrated, yet most controversial, approach to community detection.
Modularity is:
Key idea: Modularity finds communities that are not just densely connected, but denser than random chance.
Imagine colored balls (nodes) and strings (edges).
\[ Q = \frac{1}{2m} \sum_{i,j} \left[ A_{ij} - P_{ij} \right] \delta(c_i, c_j) \]
Let’s play with it!
Modularity is powerful, but not perfect.
Resolution Limit: Fails to detect communities smaller than a certain scale, which depends on the size of the whole network. It might merge small, distinct communities.
Spurious Communities: Can “find” communities even in random networks where none exist. A high modularity score is not a guarantee of meaningful communities.
Degeneracy: Many different partitions can have similarly high modularity scores.
Instead of defining what a community is, SBM defines how a network is generated from communities.
\[ P(A_{ij} = 1 | c_i, c_j) = p_{c_i, c_j} \]
The SBM extends the notion of communities, i.e., a community is a group of nodes that connect to othe nodes in a similar way.
Allow for more broad definitions of communities.
Given a network, we can infer the most plausible community structure that generated it (if it was generated by SBM) by maximizing the likelihood function, i.e.,
\[ \begin{aligned} &\text{argmax}_{c_1, \ldots, c_n, \theta} \sum_{i<j} \ell_{ij}(c_i, c_j, \theta), \\ &\ell_{ij} = A_{ij} \log p_{c_i, c_j} + (1 - A_{ij}) \log (1 - p_{c_i, c_j}), \end{aligned} \]
where \(c_i\) is the community of node \(i\), \(\theta\) is the parameters of the SBM, and \(p_{c_i, c_j}\) is the probability of an edge between two nodes in blocks \(c_i\) and \(c_j\).
SBM is a generative model for networks with community structure.
It can generate networks with community structure.
SBM often produces homogeneous degree distributions when the number of communities is small, making it unsuitable for networks with heterogeneous degree distributions.
dcSBM addresses this limitation, and often yields more meaningful communities than the standard SBM.
Hierarchical SBM (hSBM): Models communities within communities, capturing nested structures. This SBM is free from the resolution limit problem!
There are several ways to fit an SBM:
It depends on your network and your question!