Go to Google Home
Go
A data-code-compute resource for research and education in information visualization
InfoVis Home Learning Modules Software Databases Compute Resources References

Learning Modules > Network Analysis and Visualization

Description | Usage Hints | Learning Task | Discussion | References | Acknowledgments


line
Description

The structure and dynamics of networks are studied in many scientific disciplines. For example, computer scientists and engineers investigate the error and attack tolerance of computer networks and power grids. Biologists strive to understand how the complex network of interactions between genes and proteins results in normal physiological behavior or diseases in organisms. Neuroscientists study the brain as a network of neurons. Social scientists are interested in the structure and evolution of webs that characterize social interactions, such as friendship and acquaintances, or more formally defined networks such as co-authorship or paper-citation.

Network data sets might represent properties of entities (or nodes) but most importantly they describe the relations or links (or edges) between nodes. Social network researchers analyze diverse relations that might exist among different entities:

  • communication relations (e.g., who talks to whom)
  • instrumental relations (e.g., who asks whom for expert advice)
  • boundary penetration relations (e.g., who's on whose board of directors)
  • sentiment relations (e.g., friendship cliques in high school)
  • power relations (e.g., who follows whom)
  • kinship relations (e.g., who's related to whom)
  • transaction relations (e.g., who gives gifts to whom)

They are particularly interested in:

  • durability (how long ties last)
  • reciprocity (whether ties look identical from either end)
  • intensity (whether ties are 'weak' or 'strong')
  • density (how many potential ties in a network actually exist)
  • reachability (how many ties it takes to go from one 'end'of a network to the opposite 'end')
  • centrality (whether a network has a 'center' point or points)
  • quality (the reliability or certainty of ties)

Some researchers are interested to do 'micro studies' that focus on the individual and his/her network. Others do 'macro studies' in an attempt to understand an entire network and its members.

Most commonly, networks are assumed to have nodes of one type and edges of one and the same type. Examples are web pages and their hyperlink connections, authors and their co-authorship relations, children and their friendship relations.

In this learning module, we will use the Network Analysis Tool to analyze the statistical and structural properties of diverse networks. Subsequently, Pajek - a program for analyzing large networks - will be applied to visualize the networks. Pajek is freely available at http://vlado.fmf.uni-lj.si/pub/networks/pajek/.


line
Usage Hints

Information on the Network Analysis Tool is available here. Information on Pajek is online here.


line
Learning Task

As mentioned above, network data is common and is studied in diverse scientific disciplines. For demonstration purposes, let's select a data set that most of you should be familiar with such as the ties among 15 office workers. This office workers data set was compiled by Robert A. Hanneman, University of California, Riverside. See his course web site for additional information on the data set. This data set is represented by a 15 (workers) by 15 matrix. Alternatively, one could have represented the data by a list of nodes (workers) and edges (strong ties between workers). Using a perl converter named p_nw_conv.pl (run with 'perl p_conv.pl <inputfile_name> <outputfile_name>') we derive the Pajek input format that is also supported by the Network Analysis Tool.

a) Network Visualization
Visualize this network using the Pajek program. Just follow the instructions here. If you are interested to color code the nodes as shown in the figure below then select 'Partition> Create Null Partition', and 'Non-GUI interface Menu: Draw > Draw-Partition' in the non-graphical user interface menu. Next click Redraw in the graphical interface. Then press 'shift' and left click the node to change it to the desired color.


lm-nw 

b) Network Analysis
Next, let's determine diverse properties of the network such as the

  • Number of nodes and edges,
  • Small world properties such as average path length and clustering coefficient (definitions are given in the Network Analysis Tool),
  • Gamma value of the degree distribution as an indicator if the network is scale free.

While the number of nodes and edges should be self explanatory, we explain small world and scale free networks here.

Small World Networks. The small-world phenomenon formalizes the anecdotal notion that "you are only ever six ‘degrees of separation' away from anybody else on the planet." Often cited examples of small-world networks are the network of movie actors, the US power grid and the nervous network of the worm C. elegans. In order to determine if a networks has small-world properties, the average path length L and clustering coefficient C are determined. The table below shows the values for the three mentioned networks and compares them to values obtained for random networks with the same number of vertices and average number of edges per vertex.

  L_actual L_random C_actual C_random
Film actors 3.65 2.99 0.79 0.00027
Power grid 18.7 12.4 0.080 0.005
C. elegans 2.65 2.25 0.28 0.05

All three networks show the small-world phenomenon L = L_random but C >> Crandom.

Scale-Free Networks. Networks are called scale-free if they have an uneven distribution of connectedness where some nodes act as "very connected" hubs. Scale-free networks behave very differently from random networks of similar size and have been used to explain behaviors as diverse as those of the stock market, cancerous cells, or the dispersal of diseases. The two network types behave very differently. The connectedness of a randomly distributed network decays steadily as random nodes fail - the network slowly breaks into smaller, separate subgraphs that are unable to communicate. Scale-free networks show almost no degradation as highly connected nodes, which are statistically unlikely to fail under random conditions, maintain the connectivity in the network. While error tolerance of scale free networks is low, the attack=deletion of highly interconnected nodes quickly leads to the breakdown of connectivity. Examples of both networks published in (Albert et al, 2000) are given below.

nw-random
random network
nw-scalefree
scale free network

In the left network, the five nodes with the most links (in red) are connected to only 27% of all nodes (green). In the scale-free network on the right, the five most connected nodes (red) are connected to 60% of all nodes (green).

To analyze a variety of network properties you can use the Network Analysis Tool. Simply register as a user, then leave a short description of the data set. For the office workers data set select 'undirected edges', 'space delimiter' and any number of properties you would like to have computed. All network properties are clickable and lead to the definitions or pseudo code of the applied algorithms (see figure below). You will receive the results via the email address you entered.

lm-nw-tool

Next, please select a data set of your choice, visualize it via Pajek and analyze it with the Network Analysis Tool.


line
Discussion

There are diverse resources on networks, their structure and their behavior. You may like to examine:


line
References
  • Albert, H. Jeong, and A.-L. Barabási (2000) Error and attack tolerance in complex networks, Nature 406 , 378.
  • Albert, R. and A.-L. Barabasi, Statistical mechanics of complex networks. Rev. Mod. Phys., 2002. 74: p. 47-97.
  • Dorogovstev, S.N. and J.F.F. Mendes, Evolution of Random Networks. Adv. Phys., 2002. 51: p. 1079-1187.
  • Newman, M., The structure and function of complex networks. SIAM Review, 2003. 45: p. 167-256.
  • Strogatz, S.H., Exploring complex networks. Nature, 2001. 410: p. 268-276.

line
Acknowledgments

This documentation was compiled by Katy Börner. The Network Analysis Tool was implementd by Shashikant Penumarthy and Ketan Mane, Indiana University. Ketan also wrote the Pajek documentation and contributed the perl code.

line
Information Visualization Cyberinfrastructure @ SLIS, Indiana University
Last Modified May 10, 2004