Case Problem: Know Thy Customer
Know Thy Customer (KTC) is a financial consulting company that provides personalized financial advice to its clients. As a basis for developing this tailored advising KTC would like to segment its customers into several representative groups based on key characteristics.
Peyton Avery the director of KTCs fledging analytics division plans to establish the set of representative customer profiles based on 600 customer records in the file KnowThyCustomer. Each customer record contains data on age gender annual income marital status number of children whether the customer has a car loan and whether the customer has a home mortgage. KTCs market research staff has determined that these seven characteristics should form the basis of the customer clustering.
Peyton has invited a summer intern Danny Riles into her office so they can discuss how to proceed. As they review the data on the computer screen Peytons brow furrows as she realizes that sis task may not be trivial. The data contains both categorical variables (Female Married Car Mortgage) and interval variables (Age Income and Children).
Managerial Report
Playing the role of Peyton you must write a report documenting the construction of the representative customer profiles. Because Peyton would like to use this report as a training reference for interns such as Danny your report should experiment with several approaches and explain the strengths and weaknesses of each. In particular your report should include the following analyses:
1. Using k-means clustering on all seven variables experiment with different values of k. Recommend a value of k and describe these k clusters according to their average characteristics. Why might k-means clustering not be a good method to use for these seven variables?
2. Using hierarchical clustering all seven variables experiment with using complete linkage and group average linkage as the clustering method. Recommend a set of customer profiles (clusters). Describe these clusters according to their average characteristics. Why might hierarchical clustering not be a good method to use for these seven variables?
3. Apply a two-step clustering method:
a. Apply hierarchical clustering on the binary variables Female Married Car and Mortgage to recommend a set of clusters. Using Matching Coefficients as the similarity measure and group average linage as the clustering method.
b. Based on the clusters from part (a) split the original 600 observations into m separate data sets where m is the number of clusters recommended from part (a). For each of these m data set apply 2-means clustering using Age Income and Children as variables. This will generate a total of 2m clusters. Describe these 2m clusters according to their average characteristics.
What benefit does this two-step clustering approach have over the approaches in parts (1) and (2)? What weakness does it have?