No Thumbnail Available

Comparison of some community detection methods for social network analysis

(2015)

Files

COLLETTE_Augustin_8946-13-00_2015_Annexe1.pdf
  • Open access
  • Adobe PDF
  • 235.19 KB

COLLETTE_Augustin_8946-13-00_2015_Annexe2.pdf
  • Open access
  • Adobe PDF
  • 157.19 KB

COLLETTE_Augustin_8946-13-00_2015_Annexe3.pdf
  • Open access
  • Adobe PDF
  • 157.21 KB

COLLETTE_Augustin_8946-13-00_2015_Annexe4.pdf
  • Open access
  • Adobe PDF
  • 157.18 KB

COLLETTE_Augustin_8946-13-00_2015_Annexe5.pdf
  • Open access
  • Adobe PDF
  • 157.49 KB

Details

Supervisors
Faculty
Degree label
Abstract
Social networks have been a major innovation in the business field for the last decade. This work focuses on the analysis of networks (a great part are social networks) and, more specifically, on trying to find communities in these networks. The purpose of the work is to compare several algorithms able to detect those communities (i.e. to cluster networks). We compare two hierarchical clustering algorithms, namely the kernel-based Ward's hierarchical clustering and the Louvain method. Their quality is measured based on several external quality indicators that compare the clustering partitions that they produce with a partitioning specified by a human or by the context. Then, statstical hypothesis tests were performed to check if there was a significant difference. We found that the Louvain method tends to generate better results but the difference is not significant. Further, we tested different types of kernel in the Ward's clustering and we found that the sigmoid corrected commute time kernel is significantly superior to the other kernels. Finally, we were interested in the problematic of the number of clusters that is not determined by the Ward's clustering algorithm and has to be set manually. We tested the L method, a powerful way to determine this number based on the Ward's clustering output, and compared it to the optimal number of clusters found by the Louvain method. To do so, we took the "natural" number of classes that we know as benchmark.