No Thumbnail Available

Graph-based semi-supervised classification algorithms in light of the recently proposed adaptive edge weighting and the question whether it can be extended to out-of-sample prediction

(2017)

Files

DavidErnst-36540600-2017.pdf
  • Open access
  • Adobe PDF
  • 2.09 MB

DavidErnst-36540600-2017-Annexe1.pdf
  • Open access
  • Adobe PDF
  • 331.27 KB

Details

Supervisors
Faculty
Degree label
Abstract
This master thesis studies how graph-based semi-supervised classification algorithms can be extended to out-of-sample prediction. Two approaches are studied: graph freezing (possible thanks to adaptive edge weighting) and hybrid meta-algorithms (combinations of inductive supervised- and transductive semi-supervised algorithms). The classification performance of graph freezing is almost on par with a relaunch of the entire graph-based algorithm to include the new records. Among the three variants of hybrid meta-algorithms, two perform comparably to graph freezing. The performance can vary with data-set though. How to choose among these alternatives for out-of-sample extension remains an open question. A baseline comparison to supervised classification was done as well. Supervised algorithms must ignore the large amount of unlabeled training data and can, in our experiments, use only 1/6 of the available records, those which have labels. The obtained negative results call into question the usefulness of unlabeled data when used with graph-based algorithms. This is at least the case for data-sets with heterogeneous variables as they typically exist in business applications. It seems that graph-based algorithms only perform well on some specific data-sets which are difficult to identify other than by trial and error.