Graph embedding with application to semi-supervised classification, visualization, reconstruction and neighborhood preservation tasks : an experimental comparison

Files

Mbungu_67261600_2019.pdf

Open access
Adobe PDF
3.3 MB

Download

Details

Supervisors: Saerens, Marco
Faculty: Ecole polytechnique de Louvain
Degree label: Master [120] en sciences informatiques
Abstract: The Graph embedding is an effective yet efficient way to solve many real-world problems raised by Graph data analysis such as in social networks analysis (link prediction, community detection, node classification, ...). Different graph embedding algorithms have different insights of the node similarities (graph properties) and how to preserve them in the embedded space. Since a good graph embedding method should ensure that the learned embeddings can preserve the original network structure, the goal of this work was to evaluate the performance of various embedding methods belonging to two different families (kernel based* and deep learning based**) through semi-supervised node classification, graph visualization, graph reconstruction and neighborhood preservation, in order to determine which method performs best for each of these tasks. Our experiments were based on fourteen well-known datasets. The results revealed that the best method is dataset-dependent and task-dependent. Nevertheless, DeepWalk** (the state-of-the-art) obtained the best results for the classification task (around 83% classification accuracy for a labeling rate of 20%, i.e. only two nodes over 10 are labelled), but the difference was not statistically significant in comparison with the bag-of-paths methods* (Multidimensional scaling of the free energy distance matrix, Gaussian kernel of the free energy distance matrix, covariance of nodes co-presence on hitting paths and correlation of nodes co-occurrence on hitting path). All these techniques have a good capacity for discriminating nodes between the classes and have almost the same reconstruction ability but low (Mean average precision of 43% on average). Finally, t-Distributed Stochastic Neighbor Embedding* outperformed all the other methods with significant differences for the neighborhood preservation task, even if all these techniques (including tSNE) have shown low ability (Area under the curve of 23% on average), with respect to the distance measures used.

Options

Graph embedding with application to semi-supervised classification, visualization, reconstruction and neighborhood preservation tasks : an experimental comparison

Files

Mbungu_67261600_2019.pdf

Details