Distance measures for graph theory : comparisons and analyzes of different methods

Duyck, Maxime

Files

Duyck_25371100_2017.pdf

Open access
Adobe PDF
2.4 MB

Download

Details

Supervisors: Saerens, Marco
Faculty: Ecole polytechnique de Louvain
Degree label: Master [120] : ingénieur civil en mathématiques appliquées
Abstract: The analysis of networks or graphs is a highly researched field in the areas of applied mathematics and computing. Recent development in IT resources considerably increased calculating power and thus the ability to tackle problems that can be represented as enormous networks. One of the problems with the process of clustering is how to quantify the nodal connections in a network to construct communities. This can be solved by exploring the distance between vertices. Indeed, it is precisely the main subject of this thesis, and we attempt to address this issue using the best possible approach. There are several methods to define nodal distance, some of which are presented here. Originally, the scientific research focused mainly on two basic techniques, the shortest-path and the resistance distances. The latter is also called the commute-time distance. However, they suffer from some significant flaws. The former depends only on the most direct routes and thus fails to integrate the “degree of connectivity” between two nodes. This means that if the shortest-path distance between two pairs of vertices is constant, then two nodes connected by one path should be considered as similar as any other pair joined by several routes. Another limitation emerges in computing the interval from a given node, the shortest-path method usually provides many ties, or equidistant vertices. In short, this approach fails to take the entire graphical structure into account. On the other hand, the commute-time distances converge to yield a useless value, depending only on the degrees of the two vertices when the size of the network increases: the random walker becomes “lost in space” because the Markov chain mixes too quickly. Recent papers have introduced various distance measures that try to avoid these shortcoming. This thesis presents some of these methods before attempting to identify the most ideal techniques. The two main contributions can be therefore summarized as : - First, an exhaustive experimental comparison between different similarity or dissimilarity measures. - Next, a experimental comparison between the best methods introduced previously and a brand new method, the randomized optimal transport.