Graph-based semi-supervised classification algorithms in light of the recently proposed adaptive edge weighting and the question whether it can be extended to out-of-sample prediction

Files

Supervisors: Saerens, Marco
Faculty: Louvain School of Management
Degree label: Master [120] en Ingénieur de gestion
Abstract: This master thesis studies how graph-based semi-supervised classification algorithms can be extended to out-of-sample prediction. Two approaches are studied: graph freezing (possible thanks to adaptive edge weighting) and hybrid meta-algorithms (combinations of inductive supervised- and transductive semi-supervised algorithms). The classification performance of graph freezing is almost on par with a relaunch of the entire graph-based algorithm to include the new records. Among the three variants of hybrid meta-algorithms, two perform comparably to graph freezing. The performance can vary with data-set though. How to choose among these alternatives for out-of-sample extension remains an open question. A baseline comparison to supervised classification was done as well. Supervised algorithms must ignore the large amount of unlabeled training data and can, in our experiments, use only 1/6 of the available records, those which have labels. The obtained negative results call into question the usefulness of unlabeled data when used with graph-based algorithms. This is at least the case for data-sets with heterogeneous variables as they typically exist in business applications. It seems that graph-based algorithms only perform well on some specific data-sets which are difficult to identify other than by trial and error.