Thursday, February 18, 2010

Segmentation and buliding the database

The last week and a half we have been talking about putting the marketing database together. We take internal information, external information and modelled data and create a database that really works for marketing applications. Internal information is transaction oriented, typically and includes information about what the customber has purchased and when. But transaction information only tells part of the story. We might not even have the customer's name if the product was shipped to a different person than a purchaser (b2b) or not have information about the household if just one person in the house buys our product (b2c). We need to enhance the database to get value and we talked about buying data from an outside data source to supplement our internal data. Typically, we send our raw data files to an outside source and they do a merge/purge and cleanse our data and append to our data records. We can then begin the process of modelling our data to identify our best customers. We talked about RFM (Recency, Frequency and Monetary Value) as a way of segmentation but that the RFM approach is only a starting point. Marriott uses a CAP model which includes capacity and propensity to purchase. The different types of segmentation that are discussed in the book are a good starting point for data anlysis but more sophisticated companies eventually develop their own proprietary segementation schemes.

Wednesday, February 3, 2010

Database Marketing Gets Serious!

The students successfully completed the first assignment with descriptive statistics, cross tabs and chi squared analyes. They learned that categorical data puts data in groups (think cross tabs) and that metric or continuous data is more suited to other types of statistical analysis such as t-tests, correlation and regression. In marketing, we like variation and the null hypothesis is usually what we don't want to have happen, no difference in counts for the chi square, no difference in means for a t-test and no linear relationship for correlation and regression. We are looking for a p value of under .05 for significance level. We had fun looking at live data and used the President's approval rating as a dependent variable in a simple regression. There was a regression relationship (F stat is <.05), 83% of the variance (Adjusted R sqared) in approval rating is explained by ONE significant variable (p < .05), unemployment rate. Interpreting the regression, the President's approval rating drops by about 6.6% for every one percentage point increase in the unemployment rate. The class is working now on creating a model to predict performance in our IM classes. They are looking to create a parsimonious model, that means as few variables with the most explanatory power, to explain an particular hypothesis.