Community detection (pattern matching)#
Community detection is an abstract unsupervised problem. It is abstract because there is no clear-cut definition or ground truth to compare against. The concept of a community in a network is subjective and highly context-dependent.
A classical approach to community detection is based on pattern matching. Namely, we first explicitly define a community by a specific connectivity pattern of its members. Then, we search for these communities in the network.
Perhaps, the strictest definition of a community is a clique: a group of nodes all connected to each other. Examples include triangles (3-node cliques) and fully-connected squares (4-node cliques). However, cliques are often too rigid for real-world networks. In social networks, for instance, large groups of friends rarely have every member connected to every other, yet we want to accept such “in-perfect” social circles as communities. This leads to the idea of relaxed versions of cliques, called pseudo-cliques.
Pseudo-cliques are defined by relaxing at least one of the following three dimensions of strictness:
Degree: Not all nodes need to connect to every other node.
Density: The overall connection density can be lower.
\(\rho\)-dense subgraphs, with a minimum edge density of \(\rho\) [3].
Distance: Nodes can be further apart.
\(n\)-clique, where all nodes are within n steps of each other [4].
Combination of the above: