Comments: Maps of the academic neighbourhood

Inasmuch as network structures show a lot more than flat bibliographies, it's interesting. But is there anything qualitatively different about the way the information is presented?

I'm afraid I'm not very up on network theory, but I would think that it likely provides methods for analysis not applicable to disconnected lists? Are there any bits of information we could add to these networks of references to make them more useful, like linktyping by number of references, or how it is referenced?

Should such a network be more chronological, in terms of inheritance, or more concept oriented, for a functionalist perspective(the interrelatedness of ideas). It seems like you have a horizon effect, in terms of how relevant a reference is, versus wanting to present a complete picture of the connections a particular group/author has.

Could such a network be analysed for 'bottlenecks', where a good deal of research is being referenced through an intermediary, rather than directly, when the original papers would be more useful? It seems like a dynamic kind of graph could serve as a research tool if served per request or per topic.

Another use for a graphical google/citeseer, I suppose.

Posted by Justin Corwin at April 19, 2004 08:41 AM

Good questions. A lot of network visualization is of course just neat eye-candy (like those wonderful Internet maps, http://www.nd.edu/~networks/visual/table.html). The strength of this kind of graphs is likely that they give a sense of the underlying structure at a glance , and by carefully selecting how different properties are represented patterns in them can be revealed. But adding too much properties to the graph makes it hard to read; one could for example add more publication information to each link in the first graph, but there would be too much color in it and the visual system would get saturated. Finding what properties to use is the hard part; as a rule of thumb I think they have to use different visual modalities to reduce interference. Still, there is room for improvement. The first graph could perhaps use shape to distinguish different kinds of publications - ellipses for papers, squares for conference reports and triangles for dissertations.

Most network analysis methods looks at numerical properties of the entire graph like its diameter or clustering, but some have local components that could be visualized (e.g. clustering - in the coauthorship graph it is interesting to note that some people write papers with groups of people that also write papers together, while others work in a far more isolated fashion).

I think time is important, and so far I have not seen any good graph visualizations that handle both time and other properties. I have some ideas for that, which I'm working on an algorithm for - more news later :-)

Bottlenecks are indeed a good use of graph analysis. There is a concept called "betweeness-centrality" that tells how many of the paths between all pairs of nodes pass a given node or edge. Nodes with high centrality are important bottlenecks, and in Chaomei Chen's visualizations of scientific fields (http://www.pages.drexel.edu/~cc345/) they are used to build a kind of spanning tree showing the main relationships. In my graphs there is a very clear bottleneck effect for Erik Fransén and Hans Linjenström, who are the bridge to the people around Mike Hasselmo and some people at the Karolinska Institute.

I dearly wished I had access to the citeseer database for this. Some of the results that have been discussed at the PNAS Colloquium on Mapping Knowledge domains (http://www.pnas.org/content/vol101/suppl_1/) that use citeseer or other science databases are fascinating, but mainly statistical rather than graphical.

Posted by Anders at April 19, 2004 12:50 PM

Very interesting. Is the CiteSeer data expensive, or just closed?

I am interested in graph theory. Will you be publishing your research in this area?

Posted by Justin Corwin at April 21, 2004 09:39 AM

Citeseer (http://citeseer.ist.psu.edu/cis) seems to be working erratically right now, perhaps due to a transfer from NEC to Penn State. But the really interesting information is in the database itself, which I guess one have to ask the owners for direct access to if one wanted to do graph theory.

And yes, I hope to publish whatever results I come up with. No guarantees on originality or scope :-)

Posted by Anders at April 22, 2004 10:51 PM