Fourfold and Contingency Tables
Feedback

From WikiLectures

Revision as of 23:07, 29 November 2011 by Fuser (talk | contribs) (minor eidt)

Template:Under Construction

Chi-squared Test

Contingency tables are used to determine whether 2 distinct variables are linked. To be able to quantify such linkage, we use the chi-squared (χ2) test.

The variables can be:

  • Qualitative
  • Discrete quantitative
  • Continuous quantitative, whose values have been grouped (i.e.: intervals).

When there are two such variables the data are arranged in a contingency table: Variable #1 → rows Variable #2 → columns

Individual members of the sample/population are assigned to the appropriate cell of the contingency table according to their values for the two variables. When the table has only two rows or two columns this is equivalent to the comparison of proportions. In this case it is called four-fold table.

Example

The medical hypothesis is that progressive polyarthritis (PAP) is associated with the HLA-DR4 antigen. Observed frequencies in the sample of 308 patients divided according to presence of PAP and HLA-DR4:

HLA-DR4 + HLA-DR4 - Total
PAP + 46 28 74
PAP - 50 184 234
Total 96 212 308

Statistical testing is based on reformulating the medical hypothesis in two statistical hypotheses, i.e. null hypothesis H0 and the alternative hypothesis H1.For our medical hypothesis the statistical hypotheses are as it follows:

  • H0: There is no association of PAP with HLA-DR4 (always state no association)
  • H1: PAP is associated with HLA-DR4. (the opposite to H0)

We intend to verify the null hypothesis on 5% significance level using data given in the table above (the observed frequencies). Next, we calculate the expected values for each cell. Generally, the expected frequency in the cell of the i-th row and j-th column can be calculated as the sum of the i-th row multiplied by the sum of the j-th column and divided by the total number of patients.

Table of expected values

HLA-DR4 + HLA-DR4 - Total
PAP + 23 51 74
PAP - 73 161 234
Total 96 212 308

Then, the observed and expected frequencies are compared. If the two variables are associated, the observed and expected frequencies should be close together, any discrepancy being due to random variation. The best way of looking at the differences between observed and expected frequencies is to calculate the chi-squared (χ2) statistic as follows:

Chi-squared.png where the summation includes all the cells in the table. For the above example the test statistics is χ2 = 43.61 In order to interpret this chi-squared statistic, we need to know the number of degrees of freedom(df) involved For a contingency table this is given in general by the formula df = ( number of rows - 1) x (number of columns - 1). In the above example there are 2 rows and 2 columns so we have df = (2-1)(2- 1) = 1