Wednesday, April 5, 2017 (2:30 p.m. in Yost 306)
Title: Data mining: application of clustering method
Speaker: Yu Deng (Case Western Reserve University)
Advisor: Patti Williamson (Senior Instructor, Case Western Reserve University)
Abstract: The project mainly focuses on three different types of clustering method: K-means clustering, hierarchical clustering, and genetic programming approach. The entire project would talk about these three approaches individually and figure out how these approaches work in the different situations. After that, we will refer to reality and discuss how we use these approaches to group houses in Cleveland and San Francisco with different size and the asking price. We will find out how approaches perform and also find which one is the most suitable in this case.
We will get the following results: K-means clustering is very efficient and widely used, but is susceptible to initial centroids and outliers, and restricted to data in the cluster type. Hierarchical clustering is not required a particular value of k, and it is a great explanation of taxonomy. However, hierarchical clustering is not a great method of problems with a large database. Genetic programming approach is the most suitable for groups in the linear relationship.