Statistics: Analysis of Used Cars Database
A limited time offer! Get a custom sample essay written according to your requirements urgent 3h delivery guaranteedOrder Now
The purpose of this coursework is to investigate the comparative relationships between the depreciation of a car’s price, in relation to the factors that affect it. The factors that I wish to investigate are the age/mileage of a car, being the easiest to compare to depreciation. To do this, I shall use random sampling. I shall give a number of hypotheses, claiming whether each influential factor has an adequate effect on depreciation. I shall attempt to validate this using data given to me on Excel. I have done this in terms of percentage depreciation to make sure that I have relevant data to compare depreciation over each car in my sample. Here are the hypotheses and questions:
<< Hypothesis 1 >>
The older the car, the greater the percentage depreciation of the price – I believe this because as a car travels further, essential parts may perhaps wear down, and stop the car from working to its optimum standard. After a certain level of mileage, the car’s fuel costs may begin to increase, as its decreased efficiency uses up more fuel per mile.
These following data values are necessary to calculate the depreciation of a value of a car (as a rule), when there is more or less mileage:
* Sale price (no miles attached)
Mileage will affect the percentage depreciation of the original car’s price, so there should be no other variables included in the data needed to prove, or refute this hypothesis. When identifying a general trend for these data, I will discard these data that do not fit the trend: these will obscure my results.
<< Hypothesis 2 >>
The older the car, the greater the rate of percentage depreciation – I believe this because as cars become older, they begin to become more reliable. This is directly proportional to mileage, however, but cars become obsolete over the years: they become less favourable in comparison to other models with new technology installed.
The following data will help me decide whether this hypothesis is true, when there is more or less years “attached” to the car:
* Sale price ( no years attached)
* Age of the vehicle
When identifying a general trend for these data, I will discard these data that do not fit the trend: these will obscure my general trend and correlation when it comes to graph analysis. I.e. Some cars may appreciate in value in this particular group: some vintage cars will do this as they gain “collectors’ item” status after a number of years.
<< Hypothesis 3 >>
The more previous owners that had the car in their possession, the greater the percentage depreciation of the car – I believe this because each owner will add a depreciative value onto the car, e.g. more mileage. This will also increase the number of years attached to the car, and the efficiency of the car.
The following data will help me decide whether this hypothesis is true, when there are more or less owners attached to the car, to affect the percentage depreciation:
* Sale price (first hand)
* Number of owners
These data will help me to calculate whether the number of previous owners has a sizeable impact on the percentage depreciation of the car.
Now I have illustrated my three hypotheses, I shall attempt to prove them, or refute them using data collection, and finding a general trend, and then finding how well this trend correlates.
These Secondary data, gathered by unknown sources, will help me in my investigation to prove my hypotheses. It is impossible to tell when these were taken, however, so it may be very unreliable and yield false results. I shall now explain, and give different methods of sampling that I will use in my investigation.
Sampling: The process of converting data into digital data by taking a series of samples or readings at equal intervals. This has its advantages: if I have a lot of data and want to find a general trend, then I can use sampling to find a general pattern between all data by condensing it down to a smaller number of data, or “sample”, which is supposedly representative of the original large batch of data. This makes it easier to work with.
However, a disadvantage is that the sample may be obscured, and not fit the original pattern, so will give inaccurate results, not being representative of the original data.
There are many forms of sampling, though 3 basic ways will be shown here; I will then apply them later on in the coursework to my data to prove my hypotheses.
Random, Stratified and Systematic sampling are the three that I will define here:
Random: all data have equal chances of being chosen: there is no system in choosing them.
Stratified: Each datum is put into a group, and the proportional number of each group to the whole original data quota will be selected in terms of percentage. I will not be choosing this, as there are too many variables in the spreadsheet to format this kind of sampling successfully.
Systematic: Taking data in an ordered way; every third or fourth value is an example of this. However, if the data has already been ordered, this may not work, and yield inaccurate results.
Since there are not “groups” as such in my used cars database, I cannot use stratified sampling in my investigation, and the data provided to me may already be ordered in some way. Thus, I have chosen random sampling in my investigation. I have reordered the data into a random mix in Excel. I then deleted half of it, as each datum has equal probability of being filtered into the second, as opposed to the second half.