import networkx as nx
from scipy import sparse
G = nx.karate_club_graph()
A = sparse.csr_matrix(nx.adjacency_matrix(G))
print("A.indices:", A.indices[:5])
print("A.indptr:", A.indptr[:5])
print("A.data:", A.data[:5])Appendix
1 Compressed Sparse Row (CSR) format
CSR format is implemented in scipy. This consists of three arrays called indptr, indices, and data. For example,
We will walk you through what these arrays mean, how they are generated, and how we can leverage them for efficient computations.
How to generate CSR format from an adjacency matrix
Let’s walk you through how to store an example adjacency matrix in Compressed Sparse Row (CSR) format. Our example adjacency matrix is as follows.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | ||||||||||
| 1 | 1 | 1 | |||||||||
| 2 | 1 | 1 | 1 | ||||||||
| 3 | 1 | 1 | 1 | 1 | |||||||
| 4 | 1 | 1 | |||||||||
| 5 | 1 | ||||||||||
| 6 | 1 | 1 | 1 | ||||||||
| 7 | 1 | ||||||||||
| 8 | 1 | 1 | |||||||||
| 9 | 1 | 1 | |||||||||
| 10 | 1 | 1 | 1 | 1 | 1 |
We will first create adjacency list, which is a dictionary consisting of the row IDs and column IDs for the non-zero entries in the adjacency matrix.
\{\text{Row ID}: (\text{Column ID}, \text{Value})\}
Concretely, in Python,
adj_list = {
0:[(10,1)],
1:[(2,1), (10, 1)],
2:[(1,1), (3,1), (10, 1)],
3:[(2,1), (4,1), (5,1), (6,1)],
#...
}CSR format is a concatenation of the keys and values of the adjacency list, respectively. The CSR format has a concatenated array of the values, one for column IDs and one for the values, called indices and data, respectively.
import numpy as np
indices = np.array([vv[0] for k, v in adj_list.items() for vv in v])
indicesdata = np.array([vv[1] for k, v in adj_list.items() for vv in v])
dataAdditionally, the CSR format has another array called indptr, which stores the Row IDs of the non-zero entries in the adjacency matrix. This indptr array has a value such that indptr[i] is the first index of indices that corresponds to the i-th row of the adjacency matrix. This can be generated by
indptr = np.cumsum([0] + [len(adj_list[i]) for i in range(len(adj_list))])
indptrwhere we added 0 at the beginning of the array to represent the first non-zero entry in the first row. The first row ends at index len(adj_list[0])-1, and the second row starts at index len(adj_list[0]) and ends at index len(adj_list[0])+len(adj_list[1])-1, and so on.
Now we have three compressed vectors indptr, indices, and data, that together form the CSR format for the adjacency matrix.
How to use CSR format for efficient computations
The key advantage of the CSR representation is the memory efficiency. But you can leverage the CSR format for more efficient computations, if you know the semantics of indptr, indices, and data arrays.
For instance, one can compute the degree of a node by using
node = 1
degree = indptr[node+1] - indptr[node]
degreeLet us break down the above code. - indptr[node] is the first index of the indices array that corresponds to the node-th row of the adjacency matrix. - indptr[node+1] is the first index of the indices array that corresponds to the (node+1)-th row of the adjacency matrix. - Thus, indptr[node+1] - indptr[node] is the number of non-zero entries in the node-th row of the adjacency matrix, which is the degree of the node-th node.
Using indices, it is easy to identify the neighbors of a given node by using
neighbors = indices[indptr[node]:indptr[node+1]]
neighborswhere indices[indptr[node]:indptr[node+1]] is the corresponding column IDs of the non-zero entries in the node-th row of the adjacency matrix, which corresponds to the node IDs connected to the node-th node.
The edge weights to the neighbors can be obtained by using
edge_weights = data[indptr[node]:indptr[node+1]]
edge_weights