CommDy: Dynamic Community Identification

Introduction

This is an implementation of the algorithms for identifying dynamic communities from our papers in 2007 and 2009. In general, the process consists of two parts:

Download

The following source code is released under Illinois Open Source License.

After extracting the source code, run make in the src directory. I would suggest adding the directories src and script to the execution path (usually $PATH). To generate pictures, you also need GraphViz.

Next, we will start with how to create an input sequence of interactions


Input Format

All input, intermediate, and output files used here are in plain text. The input file format is gtm (Group-Time-Members). Each line in a gtm file is a sequence of numbers separated by space. They are group number, time step number, and individuals of this group. So the group number coincide with the line number in the file (which does have much use, of course). For example,
1 1 1 2 3
2 1 4 5 6
3 2 1 2 3
4 2 4 5 6
is an interaction sequence where we observe groups 1 and 2 at time 1 and groups 3 and 4 at time 2. The members of groups 1 and 3 are {1, 2, 3} and groups 2 and 4 are {4, 5, 6}. Comment lines, beginning with #, are ignored. Another example is
sw.gtm for the Southern Women data set.
# group time members ...
1 1 14 15 17 18
2 2 1 2 3 4 5 6 7 9
3 3 1 2 3
4 4 2 3 4 5 7 9 10 13 14 15
5 5 10 11 12 13 14 15
6 6 1 3 8 9 10 11 12 13 14 16 17 18
7 7 1 2 3 4 5 6
8 8 1 2 3 4 6 7 8 14
9 9 11 12 13 14 15
10 10 1 2 4
11 11 13 14 15
12 12 1 2 3 4 6 7 8 9 10 11 12 13 15 16
13 13 1 3 4 5
14 14 13 14 15
Next, we will color the groups by exhaustive search.

Exhaustive Search

The command bb5 is used to run the exhaustive search.
$ bb5 sw.gtm -cost 111 | tee sw-c111.log
14 group(s) 14 timestep(s)
18 observed individual(s) from 18 allocated IDs
max color 9 exact off
max time size 1
cost 1 1 1
initial min 2147483647
no time limit
tail lowerbounding enabled: window size 8
compute tail lowerbounds
g7 t7 -- g14 t14 lb 15
time lowerbound: 15 15 15 15 15 15 15 0 0 0 0 0 0 0 0
start main loop
init group color: [ 1 ]
[0:00:00:00] min 37 [ 1 2 3 2 4 4 2 2 4 2 4 5 2 4 ]
[0:00:00:00] min 36 [ 1 2 3 2 4 4 2 2 4 3 4 2 5 4 ]
[0:00:00:00] min 36 [ 1 2 3 2 4 4 2 2 4 5 4 2 3 4 ]
[0:00:00:00] min 36 [ 1 2 3 2 4 4 2 2 4 5 4 2 6 4 ]
Heap is empty. Done!
[0:00:00:05] min 36

The option -cost sets the switching cost, absence cost, and visiting cost. For example, -cost 123 sets the costs of switching=1, absence=2, and visiting=3.

The current version of exhaustive search, bb5 (Branch-and-bound version 5), uses heap to do search rearrangement and do tail lower-bounding by windows of size 8 by default. Blah... There are options which you can look up by running bb5 with no parameters.

I'd recommend saving the output as a file using the following naming convention because the script that follows will automatically figure out some parameters for you. First, remove .gtm from the input gtm filename then add -c111 (if you set all costs equal to 1), then append .log.

The following script will run the dynamic programming to figure out the optimal individual coloring.
$ convert_logfile.sh sw-c111.log
1 1 1
cost 36
next: color2dot sw.gtm sw-c111.color2 -Tpng

This generates two files: sw-c111.gcolor, sw-c111.color2. It finds the minimum cost in the log file, extract only group colorings with minimum cost, and save them in the gcolor file. The script also invoke ind_color2 with appropriate parameters. This finds optimal individual coloring (using the dynamic programming) of the first group coloring in the .gcolor file. The resulting coloring is then saved in the .color2 file. It also tells you which command to run next to generate a picture of the coloring. If you want, you can skip the rest of this section about speeding up.

Speeding up

One option -tblfile can be used to provide a file containing time lower-bounds, which can be pre-computed using interval_lowerbound. The long output of interval_lowerbound usually look like this.

$ interval_lowerbound sw.gtm
14 group(s) 14 timestep(s)
max color 9 exact off
max time size 1
cost 1 1 1
time limit 5s (1 sec)
compute interval lowerbounds
wsize 1 time 1:1 (group 1:1) lb 0
wsize 1 time 2:2 (group 2:2) lb 0
...
time lowerbound: 36 34 32 30 24 22 15 14 9 7 5 5 0 0
Then, put only the numbers on the last line in a new file. For example,
$ echo 36 34 32 30 24 22 15 14 9 7 5 5 0 0 > sw-c111.tlb
Then, run bb5 with -tlbfile <filename>.
$ bb5 sw.gtm -tlbfile sw-c111.tlb
14 group(s) 14 timestep(s)
18 observed individual(s) from 18 allocated IDs
max color 9 exact off
max time size 1
cost 1 1 1
initial min 2147483647
no time limit
tlbfile sw-c111.tlb: 36 34 32 30 24 22 15 14 9 7 5 5 0 0 0
start main loop
init group color: [ 1 ]
[0:00:00:00] min 37 [ 1 2 3 2 4 4 2 2 4 2 4 5 2 4 ]
[0:00:00:00] min 36 [ 1 2 3 2 4 4 2 2 4 3 4 2 5 4 ]
[0:00:00:00] min 36 [ 1 2 3 2 4 4 2 2 4 5 4 2 3 4 ]
[0:00:00:00] min 36 [ 1 2 3 2 4 4 2 2 4 5 4 2 6 4 ]
Heap is empty. Done!
[0:00:00:00] min 36
Well, it is fast because the data set is pretty small.

Visualization

Assuming you have
GraphViz installed, we can generate the picture using the following command.
$ color2dot sw.gtm sw-c111.color2 -Tpng
sw-c111-t01-t14.png ... cost 36 time 2 sec

3D Visualization

Khairi Reda has developed 3D visualization tools for dynamic communities. See more at Zebra Vis and SocioScape.


Path-Cover approximation

(under construction)
Author:
Chayant Tantipathananandh
Last modified: Wed Sep 9 23:59:59 CDT 2009