Mathematics and Statistics

    Question 1 (24 marks)

    a. A consumer group asked a random sample of 20 drivers to keep a record of the number of discount petrol vouchers used by each driver over a six month period. The data showing the number of vouchers used follows.

    26 24 12 15 8 4 6 15 18 2 7 5 0 3 5 17 10 5 9 12

    For these data, calculate
    i. the mean number of vouchers used.
    ii. the standard deviation number of vouchers used. iii. the median number of vouchers used.
    iv. the range of the number of vouchers used.

    Use the statistics functions on your calculator for parts i. and ii. above. Do not use the formulae provided in the text.
    (4 marks)

    b. The opening daily share price for the Commonwealth Bank for the period 3/3/14 to 2/7/14 inclusive averaged $78.84 with a standard deviation of $2.45. The opening daily share price in Origin Energy for the same period averaged $14.67 with a standard deviation of $0.35.

    i. Calculate the variability of the daily share price for each company by calculating the coefficient of variation.
    ii. Based on the coefficient of variation calculated in part i., which is the riskier company to invest in? Explain.
    iii. Why is the coefficient of variation a better measure of risk than the standard deviation in this case?
    (6 marks)

    c. The boxplot which follows shows the daily closing prices for Commonwealth
    Bank shares for the perod 10/4/14 to 2/7/14 inclusive.

    Boxplot showing closing prices for Commonwealth
    Bank shares 10/4/14 to 2/7/14

    BoxPlot

    76 77 78 79 80 81 82 83
    price per share ($)

    Use the boxplot above to answer the following questions.

    i. Would the mean closing price over this period be higher or lower than the median? Justify your answer. Calculations are not necessary.
    ii. Estimate the lowest closing price to the nearest 50 cents. Is this an outlier?
    Justify your answer.
    iii. Complete this sentence. Fifty percent of the closing prices are between
    and . Express each amount to the nearest 50 cents.
    (7 marks)

    d. The weekly income for a sample of seventy randomly selected petrol stations in NSW was recorded in a given week. The data were summarised and presented in the table which follows.

    Weekly income ($’000s) Frequency
    > 20 up to and including 40 6
    > 40 up to and including 60 9
    > 60 up to and including 80 25
    > 80 up to and including 100 20
    > 100 up to and including 120 8
    > 120 up to and including 140 2

    Use the statistics functions on your calculator to determine the approximate mean and standard deviation weekly income for petrol stations in NSW.

    Do not use the formulae provided in the text.

    (3 marks)

    e. Identify whether the following proposed study is an example of descriptive statistics or inferential statistics and justify your choice.

    Estimating the true proportion of households in Canberra that have at least three dependents, from a random sample of 100 Canberra households.
    (2 marks)

    f. Data was collected on the incomes of all employees in a large company. What is the most likely shape of the distribution of these incomes? Justify your choice.
    (2 marks)

    Question 2 (62 marks)

    Download the data set ‘auction data.xls’ from the Assignment folder in the resources section of Interact. The data provided in auction data.xls show the Sydney auction results for the week ending 22 June 2014. The variables in this data set are: Beds, Type, Price and Result representing the number of bedrooms, the type of property (house or unit), the selling price (if sold) and the result of the auction respectively, as well as the auction date and the name of the selling agent.

    a. For each of the variables in the list below, identify the type of data recorded. State whether it is quantitative or qualitative and include the level of the data (nominal, ordinal, interval or ratio).

    i. Beds ii. Type iii. Price

    (6 marks)

    b. Most real data sets you encounter will contain errors. This one is no exception.
    Read the document ‘working with real data sets.pdf’ which explains how to identify errors in a data set and how to deal with them before going any further.

    List any four different types of errors you have found in this data set and explain why for each one why you have decided it was an error or possible error. For example do not list four entries which all had the price missing.

    You may want to leave completing this question till later. As you work with the data set you will encounter some of these errors so just make a note of them as you find them.

    Since we cannot contact the real estate agents to follow up on these errors, for the purpose of this assignment we will work with the data set as best we can.
    (4 marks)

    c. Sharon and Mark are property owners in the Sydney region who are planning to sell their two bedroom unit over the next few months. They are considering putting it up for auction so are interested in using these data to gain an insight into the current Sydney auction market.

    Using the complete data set, generate a three way pivot table report of ‘beds’ by
    ‘type’ by ‘result’. Use ‘type’ and ‘beds’ as row labels.

    Include the table as part of the submitted assignment.

    Use the data in the pivot table to answer the following vendors’ questions about the properties listed for auction in Sydney for the week ending 22 June 2014.

    i. How many properties were originally listed for auction for the week in question? How many of these were units?

    ii How many houses were withdrawn from sale? How many units were withdrawn from sale?

    iii How many 2 bedroom units were passed in? Express this as a percentage of all the units listed for auction that week excluding all units that were withdrawn from sale.

    iv. How many 2 bedroom units were sold at auction that week? How many 2 bedroom houses?

    v. Of all the properties listed for auction, how many 3 bedroom units, including those that were sold prior to and those that were sold after,were sold that week? Then, express this as a percentage of all the 3 bedroom

    units listed for auction that week excluding those that were withdrawn from sale.
    (13 marks)

    d. Separate the data into two data sets, one consisting of the units data only and one containing the houses data only.

    i. Use Excel to generate separate tables of descriptive statistics for houses and units for the variable Price’ and include them in your assignment submission. Round both means to the nearest thousand dollars and both standard deviations to the nearest ten dollars.

    ii. What was the price of the cheapest property sold that week? Look further afield to include information about what type of property it was, how many bedrooms it had, whether it sold at auction or before or after, and which real estate agent sold it.

    iii. The sample variance may have been expressed in an unusual way in one or both of these tables generated in part (i.). Explain this unusual notation and the numerical value it represents (in one or both tables).
    (8 marks)

    e. Using the data set for units only, use Excel to prepare a frequency distribution and histogram of the variable ‘Price’ for the unit data. Use $500 000 as the upper limit of the first class and a class width of $500 000.
    (5 marks)

    f. After preparing this histogram, discuss whether the choice of classes suggested above is appropriate. Refer to important aspects such as the number of classes, the width of the classes and whether all data are included in the classes chosen.
    (3 marks)

    g. Generate a boxplot for the variable ‘Price’for the units data only. Include the
    5-number summary generated by Excel and the boxplot with your assignment submission.
    (4 marks)

    h. Answer the following questions regarding the data for the units only and indicate the particular output in (d), (e) and (g) which provided the answer.

    i. How many units had a sale price listed?

    ii. How many outliers are there in the distribution of the selling prices of units ? What is the value of the largest outlier?

    iii. 25% of units sold for $x or less. What is x?

    iv. Comment on the shape of the distribution of the ‘Price’ variable for units only (skewed, symmetric, direction of skewness if relevant, unimodal, bimodal,etc.). Provide at least three items of supporting evidence from the output generated in parts (d), (e) and (g).
    (10 marks)

    i. If a media outlet were to quote the average selling price (of units listed for auction that week), would it be more appropriate to quote the mean or the median price? Why?
    (2 marks)

    j. Sharon and Mark would like to know the selling success rate of the all the properties that were listed. For all the properties that were listed for auction, generate a two way pivot table of ‘Type’ by ‘Result’.(Hint: include ‘Result’ in both the column and in the body of the table.) From this pivot table, generate a single horizontal 100% component bar chart with the variable ‘type’ plotted along the vertical axis and the different types of ‘result’ making up the components of each bar. Include both the pivot table and the bar chart with your assignment submission.

    Would it be correct for Sharon and Mark to conclude from this chart that more units than houses were sold prior to the auction? Explain.
    (7 marks)

                                                                                                                                      Order Now