Evaluating medical evidence

What is medical evidence?

Explaining the evidence

There are several levels of clinical evidence, including randomized controlled trials, observational studies and case reports. Published studies also vary in the quality of design and methodology.

When considering the choices of pain relief in childbirth the decision should be based not on fears, expectations or opinions of friends. The choice must be made with understanding of risks and benefits, supported by current medical evidence.

A lot of credit for giving labor epidural bad reputation should go to the media. It is understandable that TV and newspapers report medical disasters rather than normal cases. This, however, distorts the reality and leads to overestimation of problems associated with this or that medical procedure, including labor epidural. The same is true for anecdotal stories and personal experiences of friends and relatives: complications tend to be remembered longer than uneventful, routine events. Medical professionals are not immune to this phenomenon, and doctors and midwives are also affected by personal experiences with complications. That is why professional guidelines and standards of practice are developed and adjusted on regular basis.

The concept of evidence-based medicine has been the norm for the last several decades. Research is conducted in around the world, and the results are published and eventually summarized. However, medical outcomes are not straightforward, but complex and often difficult to measure. Moreover, the quality and therefore validity of the study depends on its design, and this is the area of research that is a separate art and science by itself. Moreover, even well conducted studies are often misinterpreted: the study conclusion may say one thing while the media and the public understands it as another. Not intentionally, but because medical issues are often complicated and also emotional.

The subject of scientific methodology is not very exciting, however it is important to understand how medical evidence is obtained and what questions to ask in order to assess the credibility of published trial or newspaper article.

As mentioned above, medical outcomes are seldom certain. Instead, they are presented in terms of probabilities. You may find, for example, that paracetamol (acetaminophen) does nothing for your headache, while someone else may swear by its efficacy. Both testimonials are, in fact, useless for evaluating this drug. The correct way to do this is to administer paracetamol to a large group of patients and record how many patients report their headache improve, as well as its side-effects, such as irritation of the stomach, for example. The numbers are compared against each other, and if the drug works in the majority of people while the incidence of side-effects is low it is reasonable to conclude that the risk-benefit profile of this medication is favorable.

The way how medical interventions are tested also determines the credibility of evidence, which is classified according to its significance.

The lowest level of evidence is anecdotal. It is evidence from personal experiences and published case reports. This kind of evidence has value when studying rare events. However it is not very useful for deriving reliable statistical data, such as frequency of complications. For example, the incidence of maternal deaths related to childbirth in developed countries is very low, in the region of 5 – 10 per every 10,000 patients. While it is useful to report and analyze this outcome, it is not correct to draw conclusions from one such case report. Most of us have jumped red light at least once in our driving career. Even though nothing probably happened, we all understand that it would be silly to generalize that doing it is safe!

In spite of unreliability of single case observations, patients tend to pay a lot of attention to single case reports, especially when it relates to serious adverse outcomes. Human mind is not designed to think in probabilities, and the fact that it is often irrational is what makes us humans. As all other animals, historically we are used to dealing with immediate dangers: don’t go to that forest, the tiger will eat you!

Such fallacies of human mind have been extensively studied and are especially demonstrable in areas that have significant emotional component. Economists have been interested in this phenomenon for a long time and have developed the new branch of science, behavioral finance. It primarily studies human reasoning related to financial decisions, however its many observations are relevant to all areas of everyday life. A classic example of irrationality is that when a person buys number of shares of a company on the stock market, to large extent he loses objectivity and his attitude to the company in question becomes selective. This person will largely ignore the bad news and will give more credibility to good news. Similarly, medical professionals are not immune to selective attitudes to past events, and their clinical decision making is often influenced by bad experiences.

To reduce subjectivity medical decision should be based on statistical data from properly conducted clinical research. Again though, the way how this data is obtained determines its credibility. There are two large groups of clinical studies: randomized controlled trials and observational studies.

Randomized, placebo-controlled, double-blind trial (RCT) is considered the gold standard of clinical research and the most reliable source of medical evidence. It is ideal way to compare single interventions and is often used to evaluate the efficacy of drugs. The way such trials are organized is reflected in the name. Randomized and placebo-controlled: patients are randomly assigned to either have the intervention or placebo. Double-blind: neither the patients nor the researchers who evaluate the results know who received the real drug or the placebo. The example below illustrates a typical RCT. Let’s say there is new drug on the market for prevention of sea sickness. In order to evaluate the claims made by the manufacturer we go to the sea cruise peer and offer the tourists to participate in the study. Those who are interested are randomly assigned to receive the pill in question or placebo, the pill that looks similar but does not contain any medication. This is often done by simple toss of the coin: heads – this person gets the drug, heads – placebo. Each particular individual does not know if he or she is getting the real drug. In research terminology the subjects are blinded. After the trip each tourist is questioned by a research worker about their experience: did they experience nausea or vomiting, how long did it last, how bad it was on the scale from one to five and so on, according to the planned design of the study. Researchers are also blinded; they do not know which subject received the actual drug. The data is then analyzed using special statistical methods that are appropriate to the specific study design which will show if there is a difference in studied parameters between the groups. If the incidence and severity of sea sickness in the treatment group (those who received the real drug) is meaningfully lower than in controls (group that received placebo) it is reasonable to conclude that the drug is effective for the prevention of this unpleasant condition.

Randomized controlled trial in the example above also allows estimate the effect of the drug. For instance, if it decreases the chance of sea sickness by 20%, 30% or 50%. Special statistical methods are also used to prove that the observed difference is not due to pure chance.

Unfortunately not every intervention is possible to study by randomized studies. Some interventions are not possible to randomize. For example, when studying the effect of cesarean section on breastfeeding it is neither ethical nor possible to assign a woman to have surgery by toss of a coin. Neither is it possible to blind the patient or the researcher to the intervention: can a woman not know that she had the operation? The same is true when studying effects of labor epidural: denying the woman superior pain relief in order to include her in the control group is not ethical. In such instances the data is obtained from observational studies.

A good example of observational study is how we could conduct research on the effects of labor epidural on back pain. To conduct the research we check hospital records of all women who had labor in the last year and divide them in two groups: those who had epidural in labor and who didn’t. Then we send questionnaires or telephone every patient and ask them if they have back pain after giving birth, where is the pain located, if they had back pain before labor and so on. As in RTCs, the parameters studied are planned before the study commences. Comparing the percentage of women complaining of new onset back pain after childbirth in each group we can come to the conclusion of epidural increases the risk of this complication. Instead of questions the patients may be examined for their back mobility parameters or sent for MRIs.

Observational studies are better than anecdotal evidence but have numerous limitations. Most importantly, they detect association between the two factors and not causation. For example, age is a known risk factor for developing a heart attack. It is also well known that older individuals have gray hair. An observational study therefore would detect fairly strong association between the presence of gray hair and the risk of heart attack. Would it be reasonable to conclude that gray hair is the cause of heart attacks and recommend dyeing the hair in order to prevent it? This example is obviously ridiculous. However, there are many instances when wrong conclusions are made based on associations.

Some studies have demonstrated that women who have labor epidural are more likely to require emergency cesarean section. Does it actually mean that epidural increases the risk of this operation? The answer may seem obvious yes, however the reality is more complicated. The question to ask in this situation is: on which basis were women selected for epidural? In many labor wards epidurals are requested by midwives, so that women with prolonged and difficult labors are more likely to get this way of pain relief. On the other hand, those with prolonged and difficult labors are at higher risk of cesareans section. Therefore, in this situation it is not labor epidural that leads to cesarean, but rather complicated labor often ends up with cesarean, while epidural is just a side-effect of the former.

Observational studies may be prospective – analysis performed in real time as the events unfold – or retrospective. They are also divided into case-control or cohort studies and so on. Every type of observational study has its benefits and drawbacks in certain clinical context. However all of them are subject to interpretation and may lead to incorrect conclusions.

Returning to the above example of studying the association of labor epidural and cesarean section, some studies use historical controls. Some hospitals have operated for years without the ability to offer labor epidurals for their patients. Let’s say, five years ago epidural service has become available. If the rate of cesarean section after the change increased it is logical to conclude that epidural increases its risk. On the other hand, with the introduction of epidural service many other aspects of obstetric practice in the hospital may change as well: closer monitoring of fetal heart rate for example. Closer monitoring may lead to more frequent detection of abnormalities and, as the consequence, more cesareans.

In an attempt to isolate various factors that may influence the outcome statisticians use complex mathematical analysis. While it is useful to some extent and may help the analysis of data, such analysis may lead to false findings and misinterpretation of data.

Large observational studies also carry the danger of detecting false, or spurious, associations, and the following example – though not related to labor epidurals – is a great illustration of this.

In the 1990-s statistical world has been rocked by a furious debate that started with the publication in the scientific journal Statistical Science by three statisticians from Israel. The article was based on so called Equidistant Letter Sequences, or ELS. A powerful computer was programmed to analyse the text of the Book of Genesis of the Torah, so that it read every second, third, fourth and so on letter; horizontally, vertically, diagonally, every possible way. The authors claimed that when they searched for names in close proximity to birth or death dates (as published in the Encyclopedia of Great Men in Israel) they found many matches, for example, the date of the assassination of Yitzhak Rabin was in close proximity to letters spelling out his name.

The heated debate among statistical scientists became public when in 1998 and 2002 American journalist published his books, The Bible Code and The Bible Code II: the Countdown, where by using the technique of ELS he comes up with many fascinating prophecies and predictions were uncovered. Among them the “proof” that Lee Harvey Oswald was destined to assassinate John F.Kennedy, prediction of the Gulf War and the collision of a comet with Jupiter. It was also implied that the message of the Bible was delivered to us by extraterrestrials and predicts disasters and an apocalypse between 1998 and 2006.

In reality, what has been discovered by thorough analysis of the Bible is that statistical noise may look meaningful. Just as the child hammering the piano occasionally produces a short melody, combining of letters into numerous combinations may produce something seemingly sensible. When criticized, Michael Drosnin replied: When my critics find a message about the assassination of a prime minister encrypted in Moby Dick, I’ll believe them. (Newsweek, Jun 9, 1997)

Brendan McKay, the researcher from Australian National University, obliged. ELS run on Moby Dick revealed “predictions” of assassinations of former Prime Minister of India Indira Gandhi, Lebanese President Rene Moawad, Soviet exile Leon Trotsky, John F.Kennedy and several other prominent figures.

The lesson is that it is easy to get carried away by massive amounts of data. Endless analysis of big databases may reveal interesting and unexpected connections. On the other hand, these findings may be due to pure chance. Even placebo effect can look convincing in large observational study. Definitive conclusions cannot be made on pure associations alone, and findings must be tested for their significance by using specialized mathematical methodologies that are the realm of professional statisticians. Unfortunately, in spite of known methodological dangers of large observational studies false findings find their way into the media on regular basis.

Even well conducted methodologically correct studies have limited value until they are replicated by other researchers. If this doesn’t happen the initial findings are likely to be false. Moreover, the findings of one study may be disproved by another. It is especially true when it comes to complex measured outcomes that are influenced by several factors. To determine what is really going on published studies are subjected to meta-analysis: they are pooled together and their results weighed and summarized. During meta-analysis the quality of trials is assessed, and studies that do not comply with strict requirements are excluded. In spite of all the efforts meta-analyses are not free from pitfalls, and in recent years new category of trials has appeared: meta-analysis of meta-analyses.

Obviously, analysis can go forever, and strictly speaking there two types of medical evidence: one that has been disproved and one that hasn’t yet been. Luckily, this view has more academic value, and in most areas of medicine some sort of consensus exists based on current evidence.

For example, the nineties were marked by the debate if labor epidural increases the risk of cesarean section. Studies came up with all sorts of results: confirming, disproving or inconclusive. Finally, after thorough analysis of all published trials the conclusion has been reached that there is no casual connection between the two.

That is why sensational articles in newspapers have little value for getting useful knowledge regarding medical procedures. Such articles often catch up on latest published studies with unexpected results and present data in emotionally appealing way. As the result the truth is distorted and the public misled. The subject of labor epidurals presented numerous examples of such distortions over the years, and epidurals have been blamed with increasing the risk of cesareans sections, unnecessary investigations and exposure of newborns to antibiotics, interfering with breastfeeding, causing chronic back pain and many other problems. However, only thorough review of clinical and academic literature makes it possible to come to objective conclusion if this is true or not.

Medical evidence is not static, and every published study contributes to the bank of existing medical knowledge and evidence. New associations are discovered, some are confirmed and some disproved. More understanding is gathered as the science accumulates more experience and research. All published data can be divided into the following:

Evidence of good value

  • meta-analysis of existing data
  • reviews of published trials
  • randomized controlled trials
  • large, well designed observational studies
  • expert consensus statements
  • guidelines by appropriate professional bodies.

Evidence of limited value

  • case reports
  • small observational studies
  • studies with new and unexpected findings
  • studies where authors have conflict of interests.

Evidence of doubtful value

  • articles in newspapers and magazines
  • TV programs
  • personal experiences and stories.

Evidence from every group can be useful. Journal article may be based on the interview with the reputable expert in medical field, and small observational studies may sometimes point towards new discoveries. Similarly, personal experiences should not be dismissed as completely useless. However, when considering medical procedure it is important to consider relative value of all sources and make informed choice based on best and most reliable information currently available.

1. http://www.skepdic.com/bibcode.html
2. http://cs.anu.edu.au/~bdm/dilugim/moby.html



Dr. Eugene Smetannikov is a practicing anesthesiologist with the interest in obstetric anesthesia. He is the author of the most comprehensive book on the subject, The Truth About Labor Epidural