CommDy: Dynamic Community Identification

Introduction

This is an implementation of the algorithms for identifying dynamic communities from our papers in 2007 and 2009. In general, the process consists of two parts:

Download

The following source code is released under Illinois Open Source License.

Available data sets:

After extracting the source code, run make in the src directory. I would suggest adding the directories src and script to the execution path (usually $PATH). To generate pictures, you also need GraphViz.

Next, we will start with how to create an input sequence of interactions


Input Format

All input, intermediate, and output files used here are in plain text. The input file format is gtm (Group-Time-Members). Each line in a gtm file is a sequence of numbers separated by space. They are group number, time step number, and individuals of this group. So the group number coincide with the line number in the file (which does have much use, of course). For example,
1 1 1 2 3
2 1 4 5 6
3 2 1 2 3
4 2 4 5 6
is an interaction sequence where we observe groups 1 and 2 at time 1 and groups 3 and 4 at time 2. The members of groups 1 and 3 are {1, 2, 3} and groups 2 and 4 are {4, 5, 6}. Comment lines, beginning with #, are ignored. Another example is
sw.gtm for the Southern Women data set.
# group time members ...
1 1 14 15 17 18
2 2 1 2 3 4 5 6 7 9
3 3 1 2 3
4 4 2 3 4 5 7 9 10 13 14 15
5 5 10 11 12 13 14 15
6 6 1 3 8 9 10 11 12 13 14 16 17 18
7 7 1 2 3 4 5 6
8 8 1 2 3 4 6 7 8 14
9 9 11 12 13 14 15
10 10 1 2 4
11 11 13 14 15
12 12 1 2 3 4 6 7 8 9 10 11 12 13 15 16
13 13 1 3 4 5
14 14 13 14 15
Next, we will color the groups by exhaustive search.

Output Format

The final output is a color2 file. Usually, each color2 file contains exactly one line. On each line are numbers separated by one or more blank characters. The numbers represent the color of groups and individuals. The group color comes first and it is listed in the same order as the corresponding GTM file that is used to generate the color2 file. Following the color of groups is the color of individual 1 at time 1, 2, ..., T, then the color of individual 2 at time 1, 2, ..., T, and so on. A newline character should follow the color of the last individual at the last time step. For example, the following is a coloring for the above sw4.gtm.
1 2 3 2 4 4 2 2 4 3 4 2 5 4  2 2 3 3 3 3 3 3 3 3 3 3 3 3  2 2 2 2 2 2 2 2 2 2 2 2 2 2  ...  1 1 1 1 1 1 1 1 1 1 1 1 1 1
The interpretation is,
1 2 3 2 4 4 2 2 4 3 4 2 5 4    # groups 1, 2, ..., 18
2 2 3 3 3 3 3 3 3 3 3 3 3 3    # individual 1 at time 1, 2, ..., 14
2 2 2 2 2 2 2 2 2 2 2 2 2 2    # individual 2 at time 1, 2, ..., 14
...
1 1 1 1 1 1 1 1 1 1 1 1 1 1    # individual 18 at time 1, 2, ..., 14

Exhaustive Search

The command bb5 is used to run the exhaustive search.
$ bb5 sw.gtm -cost 111 | tee sw-c111.log
14 group(s) 14 timestep(s)
18 observed individual(s) from 18 allocated IDs
max color 9 exact off
max time size 1
cost 1 1 1
initial min 2147483647
no time limit
tail lowerbounding enabled: window size 8
compute tail lowerbounds
g7 t7 -- g14 t14 lb 15
time lowerbound: 15 15 15 15 15 15 15 0 0 0 0 0 0 0 0
start main loop
init group color: [ 1 ]
[0:00:00:00] min 37 [ 1 2 3 2 4 4 2 2 4 2 4 5 2 4 ]
[0:00:00:00] min 36 [ 1 2 3 2 4 4 2 2 4 3 4 2 5 4 ]
[0:00:00:00] min 36 [ 1 2 3 2 4 4 2 2 4 5 4 2 3 4 ]
[0:00:00:00] min 36 [ 1 2 3 2 4 4 2 2 4 5 4 2 6 4 ]
Heap is empty. Done!
[0:00:00:05] min 36

The option -cost sets the switching cost, absence cost, and visiting cost. For example, -cost 123 sets the costs of switching=1, absence=2, and visiting=3.

The current version of exhaustive search, bb5 (Branch-and-bound version 5), uses heap to do search rearrangement and do tail lower-bounding by windows of size 8 by default. Blah... There are options which you can look up by running bb5 with no parameters.

I'd recommend saving the output as a file using the following naming convention because the script that follows will automatically figure out some parameters for you. First, remove .gtm from the input gtm filename then add -c111 (if you set all costs equal to 1), then append .log.

The following script will run the dynamic programming to figure out the optimal individual coloring.
$ convert_logfile.sh sw-c111.log
1 1 1
cost 36
next: color2dot sw.gtm sw-c111.color2 -Tpng

This generates two files: sw-c111.gcolor, sw-c111.color2. It finds the minimum cost in the log file, extract only group colorings with minimum cost, and save them in the gcolor file. The script also invoke ind_color2 with appropriate parameters. This finds optimal individual coloring (using the dynamic programming) of the first group coloring in the .gcolor file. The resulting coloring is then saved in the .color2 file. It also tells you which command to run next to generate a picture of the coloring. If you want, you can skip the rest of this section about speeding up.

Speeding up

One option -tblfile can be used to provide a file containing time lower-bounds, which can be pre-computed using interval_lowerbound. The long output of interval_lowerbound usually look like this.

$ interval_lowerbound sw.gtm
14 group(s) 14 timestep(s)
max color 9 exact off
max time size 1
cost 1 1 1
time limit 5s (1 sec)
compute interval lowerbounds
wsize 1 time 1:1 (group 1:1) lb 0
wsize 1 time 2:2 (group 2:2) lb 0
...
time lowerbound: 36 34 32 30 24 22 15 14 9 7 5 5 0 0
Then, put only the numbers on the last line in a new file. For example,
$ echo 36 34 32 30 24 22 15 14 9 7 5 5 0 0 > sw-c111.tlb
Then, run bb5 with -tlbfile <filename>.
$ bb5 sw.gtm -tlbfile sw-c111.tlb
14 group(s) 14 timestep(s)
18 observed individual(s) from 18 allocated IDs
max color 9 exact off
max time size 1
cost 1 1 1
initial min 2147483647
no time limit
tlbfile sw-c111.tlb: 36 34 32 30 24 22 15 14 9 7 5 5 0 0 0
start main loop
init group color: [ 1 ]
[0:00:00:00] min 37 [ 1 2 3 2 4 4 2 2 4 2 4 5 2 4 ]
[0:00:00:00] min 36 [ 1 2 3 2 4 4 2 2 4 3 4 2 5 4 ]
[0:00:00:00] min 36 [ 1 2 3 2 4 4 2 2 4 5 4 2 3 4 ]
[0:00:00:00] min 36 [ 1 2 3 2 4 4 2 2 4 5 4 2 6 4 ]
Heap is empty. Done!
[0:00:00:00] min 36
Well, it is fast because the data set is pretty small.

Visualization

Assuming you have
GraphViz installed, we can generate the picture using the following command.
$ color2dot sw.gtm sw-c111.color2 -Tpng
sw-c111-t01-t14.png ... cost 36 time 2 sec

3D Visualization

Khairi Reda has developed 3D visualization tools for dynamic communities. See more at Zebra Vis and SocioScape.


Path-Cover Approximation

This section requires
Python and GNU Linear Programing Kit (GLPK). We used to use the commercial solver CPLEX for the results in the paper, but I know not everyone can get his/her hands on CPLEX. So I am releasing a version that uses GLPK instead. You can check if you have them by typing python and glpsol (separately) at the shell prompt to see if it complains something like ``command not found" or not. If you have Ubuntu, you might just want to install it by:
sudo apt-get install python glpk-utils
The iterated path-cover algorithm is meant to color the groups. So you will need the dynamic programming color_ind2 from the above section to color the individuals afterward. The main script is ipc-glpk.py which basically read a GTM file, writes a linear program (LP) for path cover problem and internally calls glpsol to solve the LP. It contracts the edges of each path so that each path becomes a super-vertex. Then, the process repeats on the new graph with these super-vertices. This repeats until no more path is contracted. Our 2009 paper describes this in a bit more details. Here is what it looks like to run the script on the Southern Women data set.
$ ipc-glpk.py sw4.gtm
iteration 1: 54
iteration 2: 2
iteration 3: no edges
clean up
(What's next?)
gamsout2color.py sw4_ipc.out >sw4_ipc.gcolor
color_ind2 sw4.gtm sw4_ipc-c111.color2
color3dot.py sw4_ipc-c111.color2
The script also generates commands which you might want to do next. gamsout2color.py converts the output from a list of edges to a coloring of groups. color_ind2 computes (via dynamic programming) an optimal coloring of individuals given a group coloring. color3dot.py generates a picture of the coloring.
Author: Chayant Tantipathananandh
Last modified: Wed Aug 11 15:56:10 CDT 2010