No matter how hard you try, you will get some junk data. Despite all of your best efforts you will still have those couple of punks that just love to answer whatever they want within your survey.
How to Spot Outliers within your Data
One of the easiest ways to sanitize your data without corrupting it is to ensure that you are using the proper metrics. Let's look at Mean versus Median. Mean is the arithmetic average of a set of data. Median is described as the numeric value separating the higher half of a sample, a population, or a probability distribution, from the lower half. Consider the following data set: 1,2,3,3,4,4,5,6,8,9,10,10,32. Can you find the Mean and the Median? The Median is 5. There are six numbers ahead of it and 6 numbers after it. The Mean however is roughly 7.5 which is rather skewed to the top half. Here using the average actually gives a weighted number towards the top half of the grid. If you exclude the one outlier entry of 32 then you actually get an average (Mean) of 5.4
Importance of Removing Outlier Data
Often times outlier data can do tremendous damage to the arithmetic average of a dat set. However it may have very little impact (or none at all) upon the Median. This is outlined in the example above. Depending on your survey or data set you may be able to simply remove the outlier data all together. However in some cases you may not want to physically remove the data. Instead you can alter your calculations (such as using the Median instead of the Mean) to ensure that you are getting a more fair representation of the responses. Which approach you take depends on the nature of your survey and the data collected. Make sure to take the approach that will surve your goals more efficiently.
Removing your Outliers – Conclusion
Beware of outlier data and don't be afraid top sanitize your results to ensure you are getting the most accurate read on your data.