Learning Modules > Network Analysis and Visualization
The structure and dynamics of networks are studied in many scientific disciplines. For example, computer scientists and engineers investigate the error and attack tolerance of computer networks and power grids. Biologists strive to understand how the complex network of interactions between genes and proteins results in normal physiological behavior or diseases in organisms. Neuroscientists study the brain as a network of neurons. Social scientists are interested in the structure and evolution of webs that characterize social interactions, such as friendship and acquaintances, or more formally defined networks such as co-authorship or paper-citation.
Network data sets might represent properties of entities (or nodes) but most importantly they describe the relations or links (or edges) between nodes. Social network researchers analyze diverse relations that might exist among different entities:
They are particularly interested in:
Some researchers are interested to do 'micro studies' that focus on the individual and his/her network. Others do 'macro studies' in an attempt to understand an entire network and its members.
Most commonly, networks are assumed to have nodes of one type and edges of one and the same type. Examples are web pages and their hyperlink connections, authors and their co-authorship relations, children and their friendship relations.
In this learning module, we will use the Network Analysis Tool to analyze the statistical and structural properties of diverse networks. Subsequently, Pajek - a program for analyzing large networks - will be applied to visualize the networks. Pajek is freely available at http://vlado.fmf.uni-lj.si/pub/networks/pajek/.
As mentioned above, network data is common and is studied in diverse scientific disciplines. For demonstration purposes, let's select a data set that most of you should be familiar with such as the ties among 15 office workers. This office workers data set was compiled by Robert A. Hanneman, University of California, Riverside. See his course web site for additional information on the data set. This data set is represented by a 15 (workers) by 15 matrix. Alternatively, one could have represented the data by a list of nodes (workers) and edges (strong ties between workers). Using a perl converter named p_nw_conv.pl (run with 'perl p_conv.pl <inputfile_name> <outputfile_name>') we derive the Pajek input format that is also supported by the Network Analysis Tool.
a) Network Visualization
b) Network Analysis
While the number of nodes and edges should be self explanatory, we explain small world and scale free networks here.
Small World Networks. The small-world phenomenon formalizes the anecdotal notion that "you are only ever six ‘degrees of separation' away from anybody else on the planet." Often cited examples of small-world networks are the network of movie actors, the US power grid and the nervous network of the worm C. elegans. In order to determine if a networks has small-world properties, the average path length L and clustering coefficient C are determined. The table below shows the values for the three mentioned networks and compares them to values obtained for random networks with the same number of vertices and average number of edges per vertex.
All three networks show the small-world phenomenon L = L_random but C >> Crandom.
Scale-Free Networks. Networks are called scale-free if they have an uneven distribution of connectedness where some nodes act as "very connected" hubs. Scale-free networks behave very differently from random networks of similar size and have been used to explain behaviors as diverse as those of the stock market, cancerous cells, or the dispersal of diseases. The two network types behave very differently. The connectedness of a randomly distributed network decays steadily as random nodes fail - the network slowly breaks into smaller, separate subgraphs that are unable to communicate. Scale-free networks show almost no degradation as highly connected nodes, which are statistically unlikely to fail under random conditions, maintain the connectivity in the network. While error tolerance of scale free networks is low, the attack=deletion of highly interconnected nodes quickly leads to the breakdown of connectivity. Examples of both networks published in (Albert et al, 2000) are given below.
In the left network, the five nodes with the most links (in red) are connected to only 27% of all nodes (green). In the scale-free network on the right, the five most connected nodes (red) are connected to 60% of all nodes (green).
To analyze a variety of network properties you can use the Network Analysis Tool. Simply register as a user, then leave a short description of the data set. For the office workers data set select 'undirected edges', 'space delimiter' and any number of properties you would like to have computed. All network properties are clickable and lead to the definitions or pseudo code of the applied algorithms (see figure below). You will receive the results via the email address you entered.
Next, please select a data set of your choice, visualize it via Pajek and analyze it with the Network Analysis Tool.
There are diverse resources on networks, their structure and their behavior. You may like to examine:
This documentation was compiled by Katy Börner. The Network Analysis Tool was implementd by Shashikant Penumarthy and Ketan Mane, Indiana University. Ketan also wrote the Pajek documentation and contributed the perl code.