Basics of Medical statistics: Difference between revisions
(→Data) |
|||
Line 1: | Line 1: | ||
== Statistics == | ==Statistics== | ||
Statistics is | Statistics is the discipline that concerns the collection, organization, analysis, interpretation and presentation of data | ||
The two main fields in statistics are: | |||
# '''Descriptive statistics-''' | |||
# '''Inferential statistics'''- | #'''Descriptive statistics-''' describes the data and summaries the findings | ||
== Medical Statistics == | #'''Inferential statistics'''- develops general conclusions from the data | ||
<br /> | |||
==Medical Statistics== | |||
The basic application of mathematical statistics in the medical field for the following reasons: Basic requirement for medical research, Updated medical knowledge, Data management and treatment. | The basic application of mathematical statistics in the medical field for the following reasons: Basic requirement for medical research, Updated medical knowledge, Data management and treatment. | ||
== Data == | ==Data== | ||
There are two main types of data, Both can be further divided into two subgroups: | There are two main types of data, Both can be further divided into two subgroups: | ||
Line 14: | Line 17: | ||
2. '''Quantitative Data:''' Values can be discrete or continuous. | 2. '''Quantitative Data:''' Values can be discrete or continuous. | ||
=== '''Qualitative Data''' === | ==='''Qualitative Data'''=== | ||
It is description of quality of something for example: level, appearance, taste, when observations fall into categorie | It is description of quality of something for example: level, appearance, taste, when observations fall into categorie | ||
'''Ordinal data:''' Ordinal data is a specific subcategory of qualitative data. It deals with categories that can be organized in some logical sequence known as “rank order”, e.g. level of education (Elementary school, Secondary school, University). | '''Ordinal data:''' Ordinal data is a specific subcategory of qualitative data. It deals with categories that can be organized in some logical sequence known as “rank order”, e.g. level of education (Elementary school, Secondary school, University). | ||
=== '''Quantitative Data''' === | ==='''Quantitative Data'''=== | ||
It is information that can be measured and presented as numbers. Examples include height, weight. | It is information that can be measured and presented as numbers. Examples include height, weight. | ||
<u>There are two sub-types of Quantitative Data:</u> | <u>There are two sub-types of Quantitative Data:</u> | ||
== Summarizing Data == | #'''Discrete: ''' it can only take certain values. Can only be divided into discrete values i.e. whole numbers. For example; The number of compliments your department receives per week or the weekly number of cardiac arrests represent types of quantitative-discrete data. | ||
#'''Continuous:''' They do not need to be whole numbers, instead they can be any value within a particular range. Everyday examples include SaO2, blood pressure and weight. | |||
==Summarizing Data== | |||
Summarizing data is important because it allows the information to be easily and quickly interpreted. It can be done graphically or in a tabular format- depending upon the type of presentation. | Summarizing data is important because it allows the information to be easily and quickly interpreted. It can be done graphically or in a tabular format- depending upon the type of presentation. | ||
=== Tubular Summary === | ===Tubular Summary=== | ||
These are commonly used to summaries nominal data but they can be applied to ordinal and quantitative varieties as well. The number within a particular category is called the frequency. Consequently, a frequency table lists the various numbers within different categories. | These are commonly used to summaries nominal data but they can be applied to ordinal and quantitative varieties as well. The number within a particular category is called the frequency. Consequently, a frequency table lists the various numbers within different categories. | ||
=== Graphical Summary === | ===Graphical Summary=== | ||
[[File:Mpl example histogram.svg|thumb|144x144px|histogram]] | [[File:Mpl example histogram.svg|thumb|144x144px|histogram]] | ||
There are several ways of graphical summarizing of information (for example: Line chart, Bar chart, Pie chart) the choice depends upon the type of data you are dealing with. | There are several ways of graphical summarizing of information (for example: Line chart, Bar chart, Pie chart) the choice depends upon the type of data you are dealing with. | ||
* '''histogram: ''' | |||
* '''Frequency distribution-''' is an organized graphical presentation of the number of individuals for each value on the scale of measurement. It allows the researcher to have a look at the entire data. It shows whether the observations are high or low and also their concentrations,i.e. if they are concentrated in one place or they spread out. Thus, frequency distribution presents a picture of how the individual observations are distributed in the measurement scale.[[File:The Normal Distribution.svg|thumb|148x148px|normal distribution]] | *'''histogram: '''agraphical representation of the data. The data in the histogram are shown as rectangles representing different categories,or bins, there is no overlap between them. Each rectangle represents the corresponding relative frequency when the horizontal axis (X axis) represents the categories of data (intervals) and the vertical axis (y axis) depicts the frequency. Height of the rectangle, expresses the frequency or density of cases, per one unit of the suspect. The bins (intervals) must be adjacent, and are often (but are not required to be) of equal size. The words used to describe the patterns in a histogram are: "symmetric", "skewed left" or "right", "unimodal", "bimodal" or “multimodal". | ||
* '''Normal distribution- '''bell-shaped frequency distribution curve'''. '''This curve, which is sometimes called a “Gaussian distribution”, is rightly regarded as the most important in the discipline of statistics. It has the characteristics of a single peak with an even distribution of values on either side. Mean, median and mode will be equal. The further a data point is from the mean, the less likely it is to occur. It is normal in the sense that it often provides an excellent model for the observed frequency distribution for many naturally occurring events. | *'''Frequency distribution-''' is an organized graphical presentation of the number of individuals for each value on the scale of measurement. It allows the researcher to have a look at the entire data. It shows whether the observations are high or low and also their concentrations,i.e. if they are concentrated in one place or they spread out. Thus, frequency distribution presents a picture of how the individual observations are distributed in the measurement scale.[[File:The Normal Distribution.svg|thumb|148x148px|normal distribution]] | ||
*'''Normal distribution- '''bell-shaped frequency distribution curve'''. '''This curve, which is sometimes called a “Gaussian distribution”, is rightly regarded as the most important in the discipline of statistics. It has the characteristics of a single peak with an even distribution of values on either side. Mean, median and mode will be equal. The further a data point is from the mean, the less likely it is to occur. It is normal in the sense that it often provides an excellent model for the observed frequency distribution for many naturally occurring events. | |||
[[File:Visualisation mode median mean.svg|thumb|240x240px|central tendency measures: mean mode and median ]] | [[File:Visualisation mode median mean.svg|thumb|240x240px|central tendency measures: mean mode and median ]] | ||
=== Central tendency === | ===Central tendency=== | ||
The measures of central tendency refer to a single value which determine a central or typical value in a set of data for a particular parameter. | The measures of central tendency refer to a single value which determine a central or typical value in a set of data for a particular parameter. | ||
Several types of aeasures of central tendencyverages are used according to what kind of data are represented by the numbers:
Mean (average) Median and Mode. | Several types of aeasures of central tendencyverages are used according to what kind of data are represented by the numbers:
Mean (average) Median and Mode. | ||
=== Measures Of Variability === | #'''Mean:''' Calculated as the sum of all measured values and then divided by the number of the measurements. The mean cannot be used for qualitative data. | ||
#'''Median '''- Divides an ordered sample into two equally sized parts (with the same probability 0.50).The numbers are arranged in either descending or ascending order and the middle number is taken. | |||
#'''Mode-''' The most frequent value in the sample. It represents the most popular option and the highest bar in histogram. | |||
===Measures Of Variability=== | |||
Summary measures are used to describe a dispersion of values within a distribution; the spread of values in data. It allow us to summaries the dispersion of data set with a single value. It eventually shows how much observations in a data set vary. | Summary measures are used to describe a dispersion of values within a distribution; the spread of values in data. It allow us to summaries the dispersion of data set with a single value. It eventually shows how much observations in a data set vary. | ||
The 3 main measures of variability: Range, interquartile range and Standard Deviation. | The 3 main measures of variability: Range, interquartile range and Standard Deviation. | ||
== | #'''Range :''' The numerical distance between the largest (maximum) and smallest values (minimum), it tells us about the variation in scores we have in our data, or it tells us the width of our data set. | ||
#I'''nterquartile range (IQR):''' is a measure of variability, based on dividing a data set into quartiles. Quartiles divide a rank-ordered data set into four equal parts (by 25 %). The values that divide each part are called the first, second, and third quartiles; and they are denoted by Q1, Q2, Q3 and Q4, respectively. | |||
#[[File:Standard deviation illustration.gif|thumb|Standard deviation ilustration]]'''Standard Deviation:''' provides a numerically meaningful measure of variance. The average distance each observation is from the mean. This value (when combined with other statistic methods) allow us to infer what percentage of our observations are a certain distance from the mean. | |||
== Links== | |||
P. Driscoll, F. Lecky, M. Crosby. An Introductory to Statistics.1999 | P. Driscoll, F. Lecky, M. Crosby. An Introductory to Statistics.1999 | ||
[[Category:Medical Informatics]] | [[Category:Medical Informatics]] |
Revision as of 11:21, 26 May 2020
Statistics
Statistics is the discipline that concerns the collection, organization, analysis, interpretation and presentation of data
The two main fields in statistics are:
- Descriptive statistics- describes the data and summaries the findings
- Inferential statistics- develops general conclusions from the data
Medical Statistics
The basic application of mathematical statistics in the medical field for the following reasons: Basic requirement for medical research, Updated medical knowledge, Data management and treatment.
Data
There are two main types of data, Both can be further divided into two subgroups:
1. Qualitative Data: Descriptive. For example andecdotal, interviews, subjective assessments. Can be categorised (ordinal or nominal)
2. Quantitative Data: Values can be discrete or continuous.
Qualitative Data
It is description of quality of something for example: level, appearance, taste, when observations fall into categorie
Ordinal data: Ordinal data is a specific subcategory of qualitative data. It deals with categories that can be organized in some logical sequence known as “rank order”, e.g. level of education (Elementary school, Secondary school, University).
Quantitative Data
It is information that can be measured and presented as numbers. Examples include height, weight.
There are two sub-types of Quantitative Data:
- Discrete: it can only take certain values. Can only be divided into discrete values i.e. whole numbers. For example; The number of compliments your department receives per week or the weekly number of cardiac arrests represent types of quantitative-discrete data.
- Continuous: They do not need to be whole numbers, instead they can be any value within a particular range. Everyday examples include SaO2, blood pressure and weight.
Summarizing Data
Summarizing data is important because it allows the information to be easily and quickly interpreted. It can be done graphically or in a tabular format- depending upon the type of presentation.
Tubular Summary
These are commonly used to summaries nominal data but they can be applied to ordinal and quantitative varieties as well. The number within a particular category is called the frequency. Consequently, a frequency table lists the various numbers within different categories.
Graphical Summary
There are several ways of graphical summarizing of information (for example: Line chart, Bar chart, Pie chart) the choice depends upon the type of data you are dealing with.
- histogram: agraphical representation of the data. The data in the histogram are shown as rectangles representing different categories,or bins, there is no overlap between them. Each rectangle represents the corresponding relative frequency when the horizontal axis (X axis) represents the categories of data (intervals) and the vertical axis (y axis) depicts the frequency. Height of the rectangle, expresses the frequency or density of cases, per one unit of the suspect. The bins (intervals) must be adjacent, and are often (but are not required to be) of equal size. The words used to describe the patterns in a histogram are: "symmetric", "skewed left" or "right", "unimodal", "bimodal" or “multimodal".
- Frequency distribution- is an organized graphical presentation of the number of individuals for each value on the scale of measurement. It allows the researcher to have a look at the entire data. It shows whether the observations are high or low and also their concentrations,i.e. if they are concentrated in one place or they spread out. Thus, frequency distribution presents a picture of how the individual observations are distributed in the measurement scale.
- Normal distribution- bell-shaped frequency distribution curve. This curve, which is sometimes called a “Gaussian distribution”, is rightly regarded as the most important in the discipline of statistics. It has the characteristics of a single peak with an even distribution of values on either side. Mean, median and mode will be equal. The further a data point is from the mean, the less likely it is to occur. It is normal in the sense that it often provides an excellent model for the observed frequency distribution for many naturally occurring events.
Central tendency
The measures of central tendency refer to a single value which determine a central or typical value in a set of data for a particular parameter.
Several types of aeasures of central tendencyverages are used according to what kind of data are represented by the numbers: Mean (average) Median and Mode.
- Mean: Calculated as the sum of all measured values and then divided by the number of the measurements. The mean cannot be used for qualitative data.
- Median - Divides an ordered sample into two equally sized parts (with the same probability 0.50).The numbers are arranged in either descending or ascending order and the middle number is taken.
- Mode- The most frequent value in the sample. It represents the most popular option and the highest bar in histogram.
Measures Of Variability
Summary measures are used to describe a dispersion of values within a distribution; the spread of values in data. It allow us to summaries the dispersion of data set with a single value. It eventually shows how much observations in a data set vary.
The 3 main measures of variability: Range, interquartile range and Standard Deviation.
- Range : The numerical distance between the largest (maximum) and smallest values (minimum), it tells us about the variation in scores we have in our data, or it tells us the width of our data set.
- Interquartile range (IQR): is a measure of variability, based on dividing a data set into quartiles. Quartiles divide a rank-ordered data set into four equal parts (by 25 %). The values that divide each part are called the first, second, and third quartiles; and they are denoted by Q1, Q2, Q3 and Q4, respectively.
- Standard Deviation: provides a numerically meaningful measure of variance. The average distance each observation is from the mean. This value (when combined with other statistic methods) allow us to infer what percentage of our observations are a certain distance from the mean.
Links
P. Driscoll, F. Lecky, M. Crosby. An Introductory to Statistics.1999