Correlation is a measure of the interrelatedness of two variables. If we observe that one variable always increases when a second variable increases, the two variables are said to be strongly positively correlated.
On the other hand, if one variable always decreases when a second variable increases, the two are said to be strongly negatively correlated. If we increase one variable and a second variable neither increases nor decreases, there is no correlation between the variables.
The cohort study is one of the methods scientists use to discern if there is a correlation between variables. A cohort is a defined group of people who are systematically observed over a particular period of time. Data is collected at specified intervals, and outcomes such as the presence or absence of a particular disease are also recorded. It is important the cohort be large, carefully measured, and not prone to attrition.
One of the largest cohort studies ever undertaken is the Nurses' Health Study. It began in 1976 with a group of female registered nurses aged 30 to 55, but the study has expanded to a second and now a third phase which have enrolled a total of over a quarter of a million participants.
Why nurses? As a group, they are used to responding to technical questionnaires, and they have demonstrated a professional motivation to continue participating in the study. Thanks to reports from their next-of-kin, their deaths are also followed up, including reviews of autopsy findings and other records.
More than one hundred refereed papers have resulted from the data collected. Among the titles are:
- Cigarette smoking and risk of stroke in middle-aged women
- Dietary fat intake and risk of coronary heart disease in women
- A prospective study of moderate alcohol drinking and risk of diabetes in women
- A prospective study of postmenopausal estrogen therapy and coronary heart disease
From these four papers, it is easy to see the some of the variables being compared and correlated in the Nurses' Health Study. Cigarette smoking and stroke; dietary fat intake and coronary heart disease; moderate alcohol drinking and diabetes; postmenopausal estrogen therapy and coronary heart disease. From reading the News section of the Nurses' Health Study, website, one might assume that these types of correlations have a cause-and-effect relationship.
This is not necessarily correct. Take another look at the fourth article in the bullet points, A prospective study of postmenopausal estrogen therapy and coronary heart disease, which was published in the New England Journal of Medicine 1985. This study and several like it identified a correlation between hormone replacement therapy and a decrease in the incidence of coronary heart disease in older women. Possible mechanisms were proposed, and it became a consensus opinion that, in the words of the paper's abstract, "postmenopausal use of estrogen reduces the risk of severe coronary heart disease."
This correlational wisdom lasted over a decade. Eventually scientists did a randomized controlled clinical trial of hormone replacement therapy in older women, the Heart Estrogen/Progestin Replacement Study or HERS. Published in 1998, the HERS study showed that women who already had heart disease would increase their risk of a heart attack if they received estrogen therapy. This was followed in 2002 by the Women's Health Initiative (WHI), another randomized controlled clinical trial, which concluded that hormone replacement therapy increased the risk of heart attack and stroke for postmenopausal women.
Since then, much speculation has ensued. It is possible that the women in the Nurse's Health Study who took estrogen were beneficiaries of the adherer effect. That is, because they initiated and adhered to what they thought was a preventive regimen of hormone replacement therapy, these nurses may have been more likely to engage in other preventive behaviors that do tend to produce longer and healthier lives.
Taking estrogen requires spending extra money for prescriptions and for medical followup. It is possible that the nurses who took estrogen belonged to higher socio-economic groups than those who did not. The correlation between estrogen use and better heart health may have been seen because both variables were positively related to income level.
A third explanation comes from a more careful analysis of the data from the Women's Health Initiative. It suggests that some of the discrepancies result from a time component in the effect of hormone replacement therapy on coronary heart disease in women. It appears that there is a small, nonsignificant decrease in coronary heart disease when women initiate hormone replacement therapy within ten years of the onset of menopause. If hormone replacement therapy is initiated more than ten years after menopause begins, the risk of coronary heart disease rises in proportion to the time elapsed. These effects were probably present in both the cohort studies and the randomized trials, but because the women were not originally stratified and compared according to the time that had elapsed after onset of menopause, the results of the studies were at odds.
The take-home lesson? In a correlation study there are always variables that aren't expected--in this case an adherence effect, a socio-economic effect, and an age stratification effect. Although the papers taken from a cohort study may be done carefully, and although the authors try to address every confounding variable they can think of, there is no way to be sure that a particular correlation equals causation. We can use a correlation study to create a likely hypothesis, but we must always test the hypothesis (preferably with many approaches in many carefully randomized controlled trials) before we can begin to accept its validity.