Measures of Central Tendency and Variability
From WikiLectures
Central tendency
Definition: the tendency of quantitative data to cluster around some central value. The closeness with which the values surround the central value is commonly quantified using the standard deviation or variance.
Measures of central tendency
- Mean: The sum of all measurements divided by the number of observations.
- Median: The middle value that separates the higher half from the lower half. Mean and median can be compared with each other to determine if the population is of normal distribution or not.
- Mode: The most frequent value. Example of use: to determine the most common blood group.
- Geometric mean - the nth root of the product of the data values.
- Harmonic mean - the reciprocal of the arithmetic mean of the reciprocals of the data values.
- Weighted mean - an arithmetic mean that makes use of weighting to certain data elements.
- Truncated mean - the arithmetic mean of data values that do not include the whole set of values, such as ignoring values after a certain number or discarding a fixed proportion of the highest and lower values.
- Midrange - the arithmetic mean of the maximum and minimum values of a data set.
Variability (dispersion)
Definition: dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions. It is the variability or spread in a variable or a probability distribution.[1]
Measures of variability
- Variance: A measure of how far a set of numbers are spread out from each other. It describes how far the numbers lie from the mean (expected value). It is the square of standard deviation.
- Standard deviation (SD): it is only used for data that are “normally distributed”. SD indicates how much a set of values is spread around the average.[2] SD is determined by the variance (SD=the root of the variance).
- Interquartile range (IQR): the interquartile range (IQR), is also known as the 'midspread' or 'middle fifty', is a measure of statistical dispersion, being equal to the difference between the third and first quartiles[3]. IQR = Q3 − Q1. Unlike (total) range, the interquartile range is a more commonly used statistic, since it excludes the lower 25% and upper 25%, therefore reflecting more accurately valid values and excluding the outliers.
- Range: it is the length of the smallest interval which contains all the data and is calculated by subtracting the smallest observation (sample minimum) from the greatest (sample maximum) and provides an indication of statistical dispersion[4]. It bears the same units as the data used for calculating it. Because of its dependance on just two observations, it tends to be a poor and weak measure of dispersion, with the only exception being when the sample size is large.
Links
Related articles
External links
Sources
References
- ↑ Incomplete citation of web. . Statistical Dispersion [online]. [cit. 2011-11-28]. <http://en.wikipedia.org/wiki/Statistical_dispersion>.
- ↑ HARRIS, Michael – TAYLOR, Gordon. Medical and Health Science Statistics Made Easy [online] . 2nd edition. Jones and Bartlett Publishers, 2009. Available from <http://books.google.com/books?id=WqVbhD69WvMC&lpg=PA16&ots=3KRluMt2GJ&dq=%22indicates%20how%20much%20a%20set%20of%20values%20is%20spread%20around%20the%20average.%22&hl=cs&pg=PP1#v=onepage&q=%22indicates%20how%20much%20a%20set%20of%20values%20is%20spread%20around%20the%20average.%22&f=false Google Books>. ISBN 9780763772659.
- ↑ Incomplete citation of web. . Interquartile Range [online]. [cit. 2011-11-28]. <http://en.wikipedia.org/wiki/Interquartile_range>.
- ↑ Incomplete citation of web. . Range (statistics) [online]. [cit. 2011-11-28]. <http://en.wikipedia.org/wiki/Range_(statistics)>.
Bibliography
- BENCKO CHARLES UNIVERSITY, PRAGUE 2004, 270 P, V, et al. Hygiene and epidemiology. Selected Chapters. 2nd edition. Prague. 2008. ISBN 9788024607931.