Description | Pros
and Cons | Applications | Details
| Usage Hints | References
| Acknowledgments
This clustering algorithm uses Brandes' algorithm to calculate the 'betweenness centrality' for vertices. Subsequently, the betweenness centrality of the edges within a network is calculated and the edge with the maximum betweenness centrality score is removed. This is repeated until a user given threshold value is higher than the betweenness centrality scores of all the edges or until the network is split into a certain number of subnetworks. The first definition
of betweenness centrality came from Anthonisse and Freeman: Given the betweenness centrality values for all nodes, the betweenness centrality scores of edges can be calculated. Edges with the maximum score are assumed to be important for the graph to stay interconnected. These high scoring edges are the 'weak ties' that interconnect clusters of nodes. Removing them leads to unconnected clusters of nodes. A breadth-first search algorithm for unweighted graphs (and Dijkstra's algorithm for weighted graphs) was implemented to calculate the number of shortest paths between any two nodes and to get the set of all predecessor nodes. Both pieces of information are required to calculate the betweenness centrality according to Brandes' formula. The breadth-first
search explores a
graph G=(V,E) across the breadth of the frontier between discovered
and undiscovered nodes. A node is examined more, the farther its distance
from the starting node The pseudo code for the relaxation procedure for a single edge is as follows:
define relax(Node u, Node v, int weight) The pseudo code for Dijkstra's Algorithm is as follows:
define dijkstra(Graph g, Node s)
define initializeSingleSource(Graph g, Node s)
Applicable to network data only. The algorithm
does not scale to more than 5000 nodes.
The algorithm has been successfully applied to identify clusters of co-author networks, interrelated genes, etc.
There are two versions of this clustering algorithm - for weighted and unweighted graphs. The former uses a breadth-first search algorithm and integer data structure to store 0 and 1 for the graph. The other one will work on both unweighted graphs and weighted graphs, but is inefficient for unweighted graphs as it uses Dijkstra's algorithm to find the shortest path and floating point data structures to store the graph. Although the results of both algorithms are identical there are considerable difference in running time for large scale networks.
There will be three files: BCclustering.java,
input.txt, output.txt. The two .txt files are input and
output sample files respectively and the .java file is the original
source file. 3. To compile it, issue the command javac BCclustering.java from
the same directory. It will make several class files upon successful compilation.
4. To run it, issue the command
java BCclustering input.txt output.txt. It will read input from the input
file and recreate the output file. 5. Enter zero to see the Centrality Scores of all the edges or enter
a threshold value when asked. 6. After each iteration, the current graph will be stored in the
file named out_tmp.txt just in case the user aborts the program or it crashes. When reinvoking this file it should be copied with some
other name and given as the input filename to the program. It can also
be used to see the current network status at any point of program execution. 7. Currently, the program is
tested by compiling with the JDK version-1.4.0.00. It stores the result
in the form of an adjacency matrix of the remaining part of the graph and also takes
the input in the same format. Visone can be used to visualize the input and output
networks and can be downloaded for free from www.visone.de.
Download visone, uncompress and run the program and then go to File->import->matrix
and select the file to visualize.
- Anthonisse, J.M.(1971) The rush in a directed graph. Technical Report BN9/71, Stichting Mahtematisch Centrum, Amsterdam.
- Freeman, L.C.(1977) A set
of measuring centrality based on betweenness.
*Sociometry 40*, 35-41. - Ulrik Brandes (2001) A
Faster Algorithm for Betweenness Centrality.
*Journal of Mathematical Sociology 25*(2):163-177.
The algorithm was implemented and documented by Vivek Agrawal (vivekagr@iitk.ac.in). |