Or Goaz, M.Sc. Thesis Seminar
For password to lecture, please contact: firstname.lastname@example.org
Advisor: Prof. R. Friedman and Dr. O. Rottenstreich
Clustering is a basic machine learning task. In this task, a stream of input items needs to be grouped into clusters, such that all items classified into the same cluster are closer to each other than to items classified to other clusters. Each cluster is centered around a centroid point, which may either be given as a parameter, or must be learned during the process in the case of unsupervised online learning.
This work studies the ability to perform clustering in programmable switches. The motivation for using programmable switches comes from the fact that classifying network traffic is a basic need for improved network security and management. Conducting such classification by the switches through which the traffic flows is potentially the most efficient approach. To that end, we develop Clustreams, a novel in-network clustering system designed to handle clustering in the data path. At the core of Clustreams is a novel clustering algorithm that relies heavily on TCAM (Ternary Content Addressable Memory) match-action capabilities. This algorithm is realized for the Nvidia Spectrum-3 switch, and is limited to classification when the centroid points are known a-priori. We also present an extension of Clustreams that supports unsupervised online learning.
The work includes accuracy measurements for the algorithms, as well as run-time performance measurements and analysis of the clustering algorithm on a Spectrum-3 switch. As shown in the measurements, Clustreams obtains very high accuracy with negligible run-time overheads.