VMWAREcase1.pdf

    IMB 621

    Kiran R, Doctoral Student, Indian Institute of Management Lucknow, Arunabha Mukhopadhyay, Associate Professor, Indian Institute of

    Management Lucknow, and U. Dinesh Kumar, Professor of DSIS, Indian Institute of Management Bangalore prepared this case for class

    discussion. This case is not intended to serve as an endorsement, source of primary data, or to show effective or inefficient handling of decision

    or business processes.

    Copyright © 2017 by the Indian Institute of Management Bangalore. No part of the publication may be reproduced or transmitted in any form or

    by any means – electronic, mechanical, photocopying, recording, or otherwise (including internet) – without the permission of Indian Institute of

    Management Bangalore.

    MACHINE LEARNING ALGORITHMS TO DRIVE CRM

    IN THE ONLINE E-COMMERCE SITE AT VMWARE

    KIRAN R, ARUNABHA MUKHOPADHYAY AND U DINESH KUMAR

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 2 of 16

    On February 25, 2016, in the VMWare (VMW) office in Silicon Valley, next to Stanford University, in

    the sprawling 100+ acre green campus in Palo Alto, California, winter had just ended and it was warm

    weather, as great as it possibly could be in February. In his office cabin in building Hilltop E, Michael

    Butler, the global head of the store business of VMW was in discussion with Parag Girish Chitalia, the

    global leader for advanced analytics and data sciences. Michael and Parag were discussing how to drive

    more revenues from Workstation business in the VMW store. The VMW store was the online portal of

    VMW (store.vmware.com), where end-customers could purchase certain products of VMW such as

    Fusion and Workstation online. The store was similar to any e-commerce site with a home page, category

    pages, and product detail pages, add to cart pages, checkout page and a confirmation of order page.

    Fusion helped end-customers and businesses run Windows on top of Mac machines, whereas Workstation

    helped customers run Mac on top of Windows machines. Since many customers would like to have both

    Windows and Mac operating systems on their computers, VMW store received many visitors to its

    website. Data on customer’s usage of VMW store is collected to understand consumer behavior. With

    rich behavioral data of the VMW website, Michael Butler was keen to see how the data sciences and

    analytics team could be leveraged to drive further Workstation sales as it was a key product in the

    competitive business environment.

    ABOUT VMWare

    VMware (VMW) has been a Palo Alto headquartered software company that reported USD 6.57 billion in

    2015, up 9% from 2014. VMW has been one of the most profitable software companies in history with

    GAAP net income of approximately USD 1 billion in 2015. Cash flows were healthy as well with free

    cash of USD 1.56 billion generated in 2015 (Exhibit 1). Founded in 1998 by Stanford Professors Diane

    Greene and Mendel Rosenblum, the company was headed by Pat Gelsinger in 2016 and had more than

    18,000 employees worldwide. VMW has been the industry leader in virtualization business with more

    than 80% market share. Virtualization is about using software to virtualize hardware – for example, the

    same central processing unit (CPU) can be shared by multiple users using the VMW software.

    Virtualization brings about great savings in costs to IT departments of companies and VMW has been the

    industry leader by a distance in this space with market share several times that of its nearest competitors.

    VMW garnered its revenues from three streams namely software defined data center (vSphere – for

    computing virtualization, NSX – for software defined networking & security, VSAN – for storage

    virtualization), end-user computing (Airwatch – for mobile computing, Horizon – enterprise desktop,

    Fusion, Workstation), and cloud (Private cloud vCloud Air).

    Michael Butler was in charge of the store (Exhibit 13) business powered by Fusion and Workstation

    products. Parag had joined VMW in 2014 to set up the advanced analytics and data sciences team called

    Analytics Community of Excellence under the Information Innovation Center/Enterprise Information

    Management organization. The team comprised data scientists and analysts hired from premier institutes

    in India such as the Indian Institute of Technology (IITs) and Indian Institute of Management (IIMs) and

    from around the world such as Georgia Tech and Stanford. Ravi Kondapalli was the lead data scientist in

    the data sciences innovations team powering Parag’s team. Ravi, a NIT Warangal grad with double

    Masters from Georgia Tech and IIM Bangalore had more than 15 years of experience in the industry.

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 3 of 16

    Driving Higher Workstation Revenues from the Store

    The primary objective of the meeting between Parag and Michael was to discuss how Parag’s newly

    formed data sciences group could assist in increasing store revenues with focus on key products starting

    with Workstation. Michael started the meeting by saying:

    Workstation forms the bulk of the purchases for our online store/e-commerce business for

    which we have both individual consumers and businesses as our customers. Growing

    revenues this year will be a challenge as there is no new version of Workstation planned.

    In a software business, renewals via upgrade to a latest software version form a major

    portion of the revenue and this year will be a challenge. I would like to understand how

    we can leverage data sciences and advanced analytics to target new workstation

    customers, up-sell to existing customers, cross-sell to customers that do not have

    Workstation.

    Parag shared some macro-level data on Workstation sales that Ravi, his lead data scientist in the data

    sciences innovations team, had compiled. Workstation revenues had doubled in the last 8 years

    (Exhibit 2) and formed a significant portion of the Overall Store Bookings (Exhibit 3). Different

    versions of Workstation had been launched over the years. Workstation 6 was launched in 2007 and the

    latest versions of the Workstation product were Workstation 12 and Workstation 12 Player. Significant

    portion of VMW Workstation customers upgraded to higher versions of Workstation. In Exhibit 4, each

    cell xij in the table denotes the number of customers that upgraded from Workstation version in the row i

    to the Workstation version in the column j. There was an opportunity in the sense that a large number of

    the customer base had not yet upgraded to the latest versions of Workstation. The store was visited by

    approximately 7 million visitors annually of whom approximately 2 million viewed some page related to

    Workstation products. However, only around 1.6 million visitors out of the 7 million were identifiable

    with an e-mail id (Exhibit 5). The visitor data contained rich clickstream/digital data that was housed in a

    Hadoop big data environment that the analytics team leveraged continually for their analysis. Apart from

    this visitor behavior, all previous purchases (if any) by the e-mail ids were stored in the Greenplum data

    warehouse. Greenplum is a massively parallel database and owned by Pivotal that has proven to be better

    than Teradata, Oracle, and other data warehouses. Online–offline integration for the de-anonymized

    visitors was possible with “e-mail id” as the common inter-linking key.

    Parag had driven the following key points to lay the ground for a discussion on analytics engagement.

     Workstation was going to be an important driver of the overall store revenues.

     There was untapped opportunity in the form of the old Workstation customers that had not yet

    upgraded to the latest version of Workstation, presenting an opportunity for up-sell.

     There were a large number of visitors to the online store that included those that had bought other

    store products presenting an opportunity for cross-sell.

     The data sciences and analytics team had access to rich sets of information about the customers

    and also the potential customers including their digital footprint (online) and their purchase

    history (offline).

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 4 of 16

    Being a sales leader, Michael liked the key points. He got straight to the point:

    We definitely have a great Customer Relationship Management (CRM) opportunity here

    in the form of up-sell, cross-sell and targeting. These present multiple challenges that I

    and my management team will go into in detail. For example: While I can drive

    incremental sales with coupons, I would want to give the coupons only to those customers

    that are most likely to buy and not indiscriminately to all.

    Can your team provide me with the list of the email ids most likely to purchase our latest

    products Workstation 12 or Workstation 12 Player in the next 3 months so that my team

    can target these email ids?

    Parag immediately proposed a propensity model as a quick win. A propensity model rank ordered e-mail

    ids or customers in their decreasing order of likelihood to purchase. His advanced analytics team powered

    by the data sciences innovations team had delivered great results in the past by the usage of these models.

    This propensity model could leverage the online and offline data for the e-mail ids and rank order them

    using machine learning techniques.

    Michael said:

    That’s awesome Parag! I only believe things that cause an increase in my sales! If you

    can create such a list, I will be happy to execute via one of the digital marketing channels

    (email with coupon, re-targeting on other websites, social targeting) or by targeting on

    our website.

    I will believe your list only when the cash machine rings up Workstation sales and when I

    can measure the upside scientifically.

    Michael was a technology geek and would only believe things once they were scientifically proven. Parag

    said he would get back to Michael with a propensity scored list within a couple of weeks. The presence of

    Ravi in the team gave Parag the confidence to suggest two weeks.

    PROPENSITY MODEL DEVELOPMENT

    Through e-mail, Parag briefed Ravi, the lead data scientist to be ready with what it would take to build a

    propensity model and also to brainstorm on what should be the overall data sciences plan that was to be

    presented to Michael. They had a detailed telephonic conversation the next day.

    Ravi set the baseline for the discussion:

    This is an example of a binary classification problem, where the visitor either buys or

    does not buy Workstation. The target variable will be if a visitor who visited the site buys

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 5 of 16

    Workstation in the next few months. The value of that target can be either 0 or 1, making

    it a classical binary classification problem.

    Ravi went through a deck which highlighted the following challenges.

     What should be the entity on which we should build a propensity model? As Exhibit 5 shows,

    only about 1.6 million out of about 7 million visitors had an e-mail id.

     We should decide on the sampling strategy, Should we use random sampling, time-based

    sampling or stratified random sampling?

     What data sciences and machine learning techniques should we try out in this instance?

     What cross-validation or training-validation technique should we use in order to have an estimate

    of how the model would perform in the real world?

    Ravi’s recommendations to Parag were as follows:

     Given a quick win, we should model on e-mail id level for the first cut. Longer term, we have to

    think of analytical approaches to target those without an e-mail id.

     There is only one right way to perform cross-validation. In this instance, we should do time-based

    cross-validation. In this method, we simulate the real world by aggregating data to a period and

    then predicting for the next period.

    o For example: Say we need to predict who will buy during April–June 2016. In this instance:

     For training, we could aggregate data up to September 2015 and predict the Workstation

    buyers during October–December 2015.

     For validation, we could aggregate data up to December 2015 and compare the

    predictions against actual Workstation buyers during January–March 2016.

     For scoring, we could aggregate the data up to March 2016.

     We could try any 2-class classifier such as Naïve Bayes, Logistic Regression, Decision Tree, or

    machine learning algorithms such as Random Forest, Gradient Boosting, etc. We could compare

    the lift curves of different models to see which one would work best.

     We could use the lift numbers on the validation set to obtain an estimate of the real world.

    Ravi further explained the time-based cross-validation using the following conceptual diagram.

    Ravi said he could build the model in a couple of weeks.

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 6 of 16

    DATA DESCRIPTION

    In order to build a detailed propensity model, Ravi collected data from 2008 to 2016. A stratified sample

    of 100,000 de-anonymized customers was used (provided in a separate spreadsheet). He aggregated data

    at an e-mail id level to come up with a set of features across online and offline (Exhibit 6), which could

    be used for model building. Sample training data is shown in Exhibit 7 with the variable names in

    Exhibit 8.

    DATA ANALYSIS

    To understand which features were important, Ravi’s team examined odds ratios of the target variable

    against each of the features. Odds ratio is explained in Exhibit 9. The key findings are shown in

    Exhibit 10. Odds ratio greater than 1 indicates that the feature is favorable towards purchase and odds

    ratio less than 1 indicates the opposite. A higher odds ratio would indicate a higher degree of favorability.

    OBJECTIVES

    The final objective was to leverage data sciences and analytics for targeting, up-sell and cross-sell to

    customers in the online store, thereby increasing customer value. The immediate need was a propensity to

    buy a model that could result in the set of top customers that Michael and team should target.

    At this point, Ravi had the following questions in mind.

     What feature selection techniques could he use?

     If he were to use the standard techniques – logistic regression or decision tree and any one

    advanced technique (random forest or neural network or support vector machine or gradient

    boosting…), how would the lift curves appear?

     Based on the lift curve, how should he communicate the potential opportunity from the model to

    Michael?

     Could there be incremental lift or other approaches that he could adopt – for example, clustering

    before classification?

    Having built several propensity models at VMW, Ravi knew that sales teams liked Whitebox models.

    Whitebox models are models whose workings can be explained to the sales teams. For example:

    Customer X is more likely to upgrade if the support for the older version is coming to an end OR if a

    compelling newer version is being launched. Sales leaders are not comfortable with just getting a list that

    works. They also want to know why the list worked. The question on Ravi’s mind was also how best to

    explain the characteristics of a Workstation buyer to the business.

    At the same point, Parag had a further list of questions to discuss with Ravi once the model was fully

    built.

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 7 of 16

     How should Parag and Ravi arrive at the number of e-mail ids that Michael should send?

    o Remember the e-mails were to be sent with a coupon. Sending too many could impact the

    margins.

    o Should this list be different for different marketing channels?

     How do we interpret the results for business decision making?

     While lift is an analytics or internal validation measure, what marketing intervention should he

    suggest to Michael so that there can be a scientific measurement of the return on investment to

    the store business from the exercise?

    o Can we conduct some form of Control–Test experiment to quantify the upside? If yes, how

    should the experiment be set up?

    Parag was also thinking about how he should set up an executive deck to summarize the results and

    measurement plan to Michael. At the same time, he was wondering about the overall value proposition

    that he could drive for the VMW store using analytics and data sciences.

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 8 of 16

    Exhibit 1

    VMW Financials

    Source: VMW Publicly available annual report: http://d1lge852tjjqow.cloudfront.net/CIK-0001124610/67b316e9-d82e-4848-ade6-

    e046775865be.pdf

    VMW Q4 2015 Earnings Call: http://s2.q4cdn.com/112802898/files/doc_financials/2015/q4/Q4-15_earnings_w_tables_final.pdf

    Exhibit 2

    Workstation Revenues over the Years1

    Source: Bookings Data (masked)

    1 All numbers are directional and for illustration purposes only. The data shared is masked and only illustrative of real data. These have been done

    to maintain confidentiality.

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 9 of 16

    Exhibit 3

    Workstation as a Proportion of Store Bookings* (masked)

    Source: Bookings Data (masked)

    Exhibit 4

    Cross-Sell Behavior of Workstation (masked)

    Source: Bookings Data (masked)

    Workstation 6 Workstation 7 Workstation 8 Workstation 9 Workstation 10 Workstation 11 Workstation 12 Workstation 12 Player

    Workstation 6 97593 10842 6604 5213 4420 2602 2179 109

    Workstation 7 97431 24005 19858 15939 9376 8050 293

    Workstation 8 67588 24326 21903 12319 10408 311

    Workstation 9 65935 23648 15326 11683 373

    Workstation 10 68998 18294 16665 508

    Workstation 11 45851 13535 485

    Workstation 12 41650 623

    Workstation 12 Player 5139

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 10 of 16

    Exhibit 5

    De-anonymized/Anonymized Store Visitor Funnel (masked)

    Source: Bookings Data (masked)

    17 MM (# of Visitors to VMW Store)

    5 MM (# of Visitors to Workstation in Store)

    ~4MM (store.vmware.com visitors with email id)

    ~1.7MM(Unique emails of Personal Desktop Buyers)

    ~500K unique

    emails of

    Workstation

    buyers

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 11 of 16

    Exhibit 6

    List of Feature Buckets

    Source: Bookings Data (masked)

    Exhibit 7

    Training Data

    Data with 1,00,000 rows can be downloaded from the following link:

    http://hrm.iimb.ernet.in/iimb/download/IMB_621.htm

    Source: Bookings Data (masked)

    Metrics for the

    Dimension

    Inputs

    Workstation, Fusion, vSphere, vCenter, vSOM, Horizon,

    vRealize

    Activation, Download, Registration, Page Views, Cart Add/Remove/View,

    Checkout, Purchase, Form Success, Form Abandon, Buy Now etc.

    Internal, Paid Search, Email, Social Network, Search Engines etc.

    Google, Bing, Yahoo, MSN, YOL etc.

    OS like Android, iOS, Linux, Mobile iOS, OS X, Windows OS,

    Windows Mobile and Browser like Apple, Blackberry, Google, Dolphin, Microsoft,

    AOL etc.

    Dig

    ital D

    ata

    Digital & Non-Digital Feature Engineering (Offline + Online)

    Store

    Products

    Event Wise

    Search

    Engine Wise

    OS/Browser

    Wise

    Referrer Type

    Marketing

    Channel

    DemandBase Data, IDM Data

    De-

    anonymizatio

    n Features

    Paid/Organic Vehicle Data

    Non D

    igital

    Data

    Revenue

    History

    Responses/Camp

    aign Features

    Marketing

    Channel ShareOther Products

    Bought

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 12 of 16

    Exhibit 8

    Sample Feature Names

    Variable Meaning

    Train_period_workstation_purchase_f

    lag

    Outcome variable (Whether the customer purchased

    workstation (coded as 1) or not (coded as 0))

    fswk_booking_pct Share of Fusion and Workstation bookings

    total_bookings_amount Total bookings from this customer

    personal_desktop_booking_pct Share of Personal Desktop Bookings

    tot_windows_visits Total no. of visits to vmware.com webpage from Windows OS

    days_since_first_personal_desktop_p

    urchase_date

    Length of Relationship with VMW w.r.t Personal Desktop

    products

    ftr_growth_personal_desktop_13_14 Growth in 'Personal Desktop' product bookings from 2013 to

    2014

    num_orders Total no. of lifetime orders this customer placed with VMW

    num_order_lines Total no. of lifetime order lines this customer placed with VMW

    ftr_growth_personal_desktop_14_15 Growth in 'Personal Desktop' product bookings from 2014 to

    2015

    idm_total_no_of_day_visits_to Total no. of visits to MyVMware Portal (required for customers

    to interact with VMWare support)

    ftr_growth_personal_desktop_12_13 Growth in 'Personal Desktop' product bookings from 2012 to

    2013

    tot_osx_visits Total no. of visits to vmware.com webpage from OSX OS

    tot_apple_browser_visits Total no. of visits to vmware.com webpage from Apple Safari

    Browser

    idm_no_of_day_visits_to_home_page Total no. of visits to MyVMware Portal Home page

    tot_microsoft_browser_visits Total no. of visits to vmware.com webpage from Microsoft

    Internet Explorer Browser

    tot_store_page_views Total no. of views to VMW Store Page

    idm_no_of_day_visits_to_download_

    page

    Total no. of visits to MyVMware Portal Download Page

    tot_page_views Total vmware.com page views

    tot_first_touch_direct_views Total no. of page views by marketing channel

    idm_no_of_day_visits_to_info_page Total no. of visits to MyVMware Portal Info Page

    idm_no_of_day_visits_to_license_pag

    e

    Total no. of visits to MyVMware Portal License Page

    tot_first_touch_natural_search_views Total no. of page views by marketing channel

    gu_num_of_employees Total no. of employees in the customer company as per DNB

    data

    tot_google_browser_visits Total no. of visits to vmware.com webpage from Google

    Chrome Browser

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 13 of 16

    idm_no_of_day_visits_to_eval_page Total no. of visits to MyVMware Portal Eval Page

    tot_visits Total vmware.com page visits

    purchase_events Total vmware.com purchase events

    tot_mozilla_browser_visits Total no. of visits to vmware.com webpage from Mozilla

    Firefox Browser

    tot_last_touch_direct_views Total no. of page views by marketing channel

    tot_first_touch_internal_views Total no. of page views by marketing channel

    tot_page_views_l90d Total vmware.com page views in last 90 days

    ftr_growth_vsom_14_15 Growth in 'vSOM' Bookings from 2014 to 2015

    tot_last_touch_natural_search_views Total no. of page views by marketing channel

    num_any_campaign_responses No. of responses from this customer for all VMW campaigns

    tot_last_touch_internal_views Total no. of page views by marketing channel

    tot_visits_l90d Total vmware.com visits in last 90 days

    ftr_growth_enterprise_desktop_13_14 Growth in 'Enterprise Desktop' product bookings from 2013 to

    2014

    Source: Data Analysis

    Exhibit 9

    Odds Ratio Explanation

    Target = 0

    Target = 1

    Feature = 0

    Feature = 1

    Odds for feature = 1 is defined as d/c

    Odds for feature = 0 is defined as b/a

    Odds ratio = (d/c)/(b/a) = da/bc

    Source: Data Analysis

    A b

    C d

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 14 of 16

    Exhibit 10

    Sample Averages of Features versus Target Variable

    Source: Data Analysis

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 15 of 16

    Exhibit 11

    Purchasers of Workstation as of End of Each Quarter from 2013 (data masked)

    Quarter No. of Workstation

    Buyers

    13Q1 2784

    13Q2 2300

    13Q3 3020

    13Q4 4198

    14Q1 2480

    14Q2 2530

    14Q3 1878

    14Q4 3808

    15Q1 2988

    15Q2 2582

    15Q3 3370

    15Q4 4164

    16Q1 2726

    16Q2 2264

    16Q3 2340

    16Q4 1194

    Source: Bookings Data (masked)

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

    Machine Learning Algorithms to Drive CRM in the Online E-Commerce Site at VMWare

    Page 16 of 16

    Exhibit 12

    Sample Purchase Paths on E-commerce

    Home Page → Product Detail Page → Cart → Checkout → Purchase

    Source: VMWare

    Exhibit 13

    About the VMW Store

    The store sells many products of which Fusion and Workstation are key to helping run Windows

    on Mac and Mac on Windows, respectively. It is an e-commerce site in the truest sense and is

    frequented for purchases both by consumers and businesses. The link to the store is provided

    here: http://store.vmware.com/store/vmware/en_US/home

    The store is a collection of pages. A sample purchase path for a user is indicated in Exhibit 12.

    This is by no means the only path and there could be several paths but is shown to indicate how

    the visitors purchase on the site.

    Source: VMWare

    For the exclusive use of M. Abouzahra, 2019.

    This document is authorized for use only by Mohamed Abouzahra in 2019.

                                                                                                                                      Order Now