Muhammad Younus ( Department of Epidemiology, College of Human Medicine, Michigan State University, East Lansing, Michigan, USA, )
Azfar-e-Alam Siddiqi ( Department of Epidemiology, College of Human Medicine, Michigan State University, East Lansing, Michigan, USA, )
Bilquis Sana Khan ( Department of Community Health Sciences, The Aga Khan University, Karachi, Pakistan, )
Amy L. Steffey ( School of Public Health, University of Michigan Ann Arbor, Michigan, USA. )
March 2008, Volume 58, Issue 3
Special Communication
Background
In Pakistan and other developing countries where centralized, population-based comprehensive disease surveillance systems are not well developed, 2 researchers and healthcare providers have attempted to run small-scale, focused, disease-specific surveillance projects, some with good success. But the failure to follow one standard approach and the resulting variation, make this surveillance data even more vulnerable to bias.
Limited resources often restrict primary data collection for estimating magnitude of disease in populations. As a result, epidemiologists and clinicians, particularly in developing countries, are often forced to utilize institutional databases such as hospital records for estimating population disease burdens and testing research hypotheses. Like surveillance data, the institutional databases have their own limitations.
In this report, we will highlight the types of biases related to disease surveillance and institutional databases that when present, compromise the integrity of the data and affect the validity of research findings conducted using these data. It should be noted that these types of biases are not mutually exclusive, but somewhat interrelated.
Selection bias
A patient's visit to an outpatient or emergency department of a healthcare facility is a prerequisite for disease reporting to occur. A significant proportion of individuals, usually with subclinical to mild disease symptoms, are not diagnosed or do not even seek treatment. In countries like Pakistan, this is further compounded by over the counter distribution of prescription medicines, which promotes self-medication and provides the patient an alternative to seeking a qualified healthcare professional. For example, if a research project utilizing institution-based surveillance data or hospital records aims to estimate the disease burden related to foodborne illnesses such as salmonellosis or campylobacteriosis in a population, the results will underestimate the actual magnitude of the disease5, and may also be restricted to the more serious cases.
Additionally, health seeking behaviour varies among populations and is partly determined by the educational level and health knowledge of communities, which could significantly influence disease reporting rates among populations.
Referral bias
Physicians and healthcare facilities may attract patients with certain diseases, health-related conditions, or exposures from beyond their catchment areas, resulting in a higher proportion of patients with those diseases or conditions in a hospital database. Applying estimates generated from such data to the target population could result in the overestimation of a disease in the population.6 Similarly, if data from a laboratory or a tertiary care center is used to obtain the rate of positive results for certain infections, the results may be biased. In the developing world, especially in peri-urban areas, patients with positive test results are referred to major diagnostic centers for confirmatory diagnosis. For example, primary care physicians in remote locations typically refer patients with suspected multidrug-resistant tuberculosis (MDR-TB) to physicians in tertiary care hospitals for confirmation and further management. As a result, laboratory databases for these hospitals would show an increase in the positive predictive value of the test performed and also a higher proportion of MDR-TB among all TB patients. Estimates based on these hospitals records would overestimate the magnitude of the disease in the population.
Berkson's bias
The validity of epidemiologic research findings utilizing hospital data may be uncertain due to Berkson's bias. In 1964, Joseph Berkson explained that the relative frequency of disease in hospitalized individuals is inherently biased when compared to the population served by the hospital. This phenomenon is attributed to the way probabilities of hospitalization combine in patients with severe medical conditions or with more than one disease.6 Berkson's argument applies in particular to hospital-based case-control studies where the associations between risk factors and diseases are studied. If, for example, obese people who are also hypertensive have a higher probability of being hospitalized than people with hypertension alone, a spurious association between obesity and anti-hypertensive drugs can be found.
Misclassification of exposure or outcome
The International Classification of Diseases (ICD) is the standard diagnostic classification used to categorize diseases and other health-related events such as those contained in death certificates and hospital records. The ICD aids in the storage and retrieval of diagnostic information for clinical and epidemiological research purposes and allows for compilation and comparison of national morbidity and mortality statistics.7 The ICD is widely used as a disease classification system in the developed world and has helped standardize diagnoses. However, this data system is not foolproof; it does not eliminate misclassification8, which is compounded by differences in coding accuracy across institutions.9 Use of ICD data for research purposes in developing countries is hindered by variability of its use, in addition to the limitations mentioned above. Lack of widespread and consistent use of ICD in data abstraction may lead to misclassification of exposures or outcomes due to inconsistent case definitions. Therefore, limitations should be noted when making comparisons between healthcare settings or generalizing study results based on databases that do not use ICD codes. Adherence to ICD coding when abstracting data does however have the potential of reducing the variability in diagnoses and can improve the specificity of reporting and comparability between data sources.
Conclusions
There are no easy solutions to eliminating inherent biases related to surveillance or hospital-based data. The objective is to recognize the effects of reporting biases and minimize misinterpretation of results by readers. We recommend researchers discuss the limitations of databases and acknowledge possible sources of bias when reporting results based on surveillance data or medical record audits.
References
2. Akhtar S, White F. Animal disease surveillance: prospects for development in Pakistan. Rev Sci Tech 2003; 22:977-87.
3. Roush S, Birkhead G, Koo D, Cobb A, Fleming D. Mandatory reporting of diseases and conditions by health care professionals and laboratories. JAMA 1999; 282:164-70.
4. Chaffin M, Bard D. Impact of intervention surveillance bias on analyses of child welfare report outcomes. Child Maltreat 2006; 11:301-12.
5. Mead PS, Slutsker L, Dietz V, McCaig LF, Bresee J, Shapiro C, et al. Food-related illness and death in the United States. Emerg Infect Dis 1999; 5:607-25.
6. Schwartzbaum J, Ahlbom A, Feychting M. Berkson's bias reviewed. Eur J Epidemiol 2003; 18:1109-12.
7. World Health Organization (WHO). International Classification of Diseases (ICD). [online] [cited 2007 January 15]. Available from: RUL: http://www.who.int/classifications/icd/en/.
8. O'Malley KJ, Cook KF, Price MD, Wildes KR, Hurdle JF, Ashton CM. Measuring diagnoses: ICD code accuracy. Health Serv Res 2005; 40:1620-39.
9. Golomb MR, Garg BP, Saha C, Williams LS. Accuracy and yield of ICD-9 codes for identifying children with ischemic stroke. Neurology 2006; 67:2053-5.
Journal of the Pakistan Medical Association has agreed to receive and publish manuscripts in accordance with the principles of the following committees:




