Assignment Brief
The banks are having a bit of trouble with debt at the moment. They have lent lots of money to people who promised to pay it back, and then didn’t. In the future, they would like to avoid lending to the kind of person who won’t pay back the loan, and that is where you come in. We have got some data from a bank describing 2000 of its loan customers. The data also tells us whether or not each customer repaid the loan. The question is simple – Is there a difference between the people who repay the loans and those who don’t? Your assignment is to answer that question using data mining techniques and produce a system that would be able to tell the bank how likely it is that a new customer would pay back a loan.
You should use the Weka data mining package, which is installed in all the Schools laboratories. Weka is also available for download from: http://www.cs.waikato.ac.nz/~ml/weka/.
Supporting material on how to use Weka is available on Moodle.
You can download the data for the assignment from Moodle on Week 15.
Assignment Aims
The primary aims of this individual data mining assignment isto give you the opportunity to:
• Explain, apply and evaluate principles of data mining techniques and algorithms;
• Perform experiments on real world data and analyse the results.
• Demonstrate your ability to communicate by producing a technical report of your findings.
Deliverables
1) Report
The report should contain the following:
a) Introduction
Describe the task you were given, the data you received and the requirements of the
finished system. Define any terminology that you will use in the report (for example,
model, variable, task, etc.).
b) Data Summary
List the variables that you found in the file provided by the company (available on Moodle week 15). For each one,say whether it is nominal or numeric, continuous or discrete and whether or not it is ofuse in building the solution. Explain your decisions.
c) Data Preparation
Describe what you did with the data prior to the modelling process. Show histogramsof the data before and after any pre-processing that you carried out. If you correctedany mis-typed entries in the data, report what you changed.
d) Modelling
You must use two different techniques and build models with both: pick a suitabletree building algorithm and one other suitable algorithm of your choice. Justify your selection Describe the differentmethods you used and the results that you got. Give a brief technical description ofthe techniques and the way the models are represented. Include one diagram showingthe structure of each type of model that you build. Describe what parameters may bechanged and what effect this has.
If you varied the parameters of a model, show how this impacted on the results.Describe how you split the data for training and testing purposes. Be methodical andrecord each result. This stage is a little like scientific research – you are carrying outexperiments in your search for the best solution. Once you have a solution, show howyou verified its robustness.
For the two different techniques report on theircomparative ability to predict a defaulted loan, and also on how easy it would be forthe insurance company to understand the model and the reasons behind eachprediction it makes.
e) Results and Errors
Analyse and describe the level of accuracy the model achieves and the errors yourmodel makes. Show a confusion matrix for each model. Are there any areas of thedata where it performs worse than in others? Show a lift curve or an ROC curve forthe decision as to whether or not a loan will be repaid.
f) Conclusion
Summarise the results of your experiments and what you have learnt.
Submission of Deliverables
Each individual will submit one hard paper copy of their report (25%) to the Coursework
You do not need to submit the models that you built, just the report. There is not aword limit on the report – just write what you need to provide the requiredinformation clearly and concisely. You can assume that the client has a good technicalunderstanding of data mining and statistics, so do not shy away from technical termsin your report. Where you use them, however, explain what they mean in plainlanguage too.
Demonstration
You may be required to make a live demonstration of your work to the assessors of this
coursework, should it be deemed necessary.
ORDER THIS ESSAY HERE NOW AND GET A DISCOUNT !!!