Abstract |
Optimization is a branch of applied mathematics dealing with optimizing an objective function subject to a set of constraints over the decision variables. Data clustering is one of the major tasks in data analysis involving segmenting a given data set into several subsets based on a certain similarity/dissimilarity measurement. In this talk, we review some of recent exciting developments on the interaction between optimization and data clustering. We first show how the well-known K-means clustering problem, the most popular clustering model, can be casted equivalently as a so-called 0-1 semidefinite programming (0-1 SDP). Then we discuss how to design effective algorithm to find an exact or approximate solution to K-means clustering based on 0-1 SDP model and its polynomial solvable convex relaxation. Second, we consider the so-called ensemble clustering, a popular approach to pursue a better clustering model. We describe how our exploration in ensemble clustering has inspired us to study sparse solutions in non-convex quadratic optimization, one of the research frontiers in the field of optimization. If time allows, we shall discuss how to solve the binary matrix factorization (BMF) via new clustering models and techniques and discuss several research challenges in optimization and big data. |