The levels of evidence, randomized controlled trials and their role in evidence based medicine

The following research the levels of evidence, randomized controlled trials and evidence based medicine is from Patricia Burns, Research Associate, Section of Plastic Surgery, Department of Surgery, The University of Michigan Health System; Rod Rohrich, Professor of Surgery, Department of Plastic Surgery, University of Texas Southwestern Medical Center and Kevin Chung, Professor of Surgery, Section of Plastic Surgery, Department of Surgery, The University of Michigan Health System; and was published in 2012 by Journal of Plastic and Reconstructive Surgery.


As the name suggests, evidence based medicine (EBM), is about finding evidence and using that evidence to make clinical decisions.

A cornerstone of evidence based medicine is the hierarchical system of classifying evidence.

This hierarchy is known as the levels of evidence.

Physicians are encouraged to find the highest level of evidence to answer clinical questions.

Several papers published in Plastic Surgery journals concerning evidence based medicine topics have touched on this subject.

Specifically, previous papers have discussed the lack of higher level evidence in plastic reconstructive surgery and need to improve the evidence published in the journal.

Before that can be accomplished, it is important to understand the history behind the levels and how they should be interpreted.

This paper will focus on the origin of levels of evidence, their relevance to the evidence based medicine movement and the implications for the field of plastic surgery as well as the everyday practice of plastic surgery.

Randomized controlled trials in medicine.

History of Levels of Evidence

The levels of evidence were originally described in a report by the Canadian Task Force on the Periodic Health Examination in 1979.

The report’s purpose was to develop recommendations on the periodic health exam and base those recommendations on evidence in the medical literature.

The authors developed a system of rating evidence (Table 1) when determining the effectiveness of a particular intervention.

Table 1 Canadian task force on periodic health examination levels of evidence.

The evidence was taken into account when grading recommendations.

For example, a Grade A recommendation was given if there was good evidence to support a recommendation that a condition be included in the periodic health exam.

The levels of evidence were further described and expanded by Sackett in an article on levels of evidence for antithrombotic agents in 1989 (Table 2).

Table 2 levels of evidence from Sackett.

Both systems place randomized controlled trials (RCT) at the highest level and case series or expert opinions at the lowest level. The hierarchies rank studies according to the probability of bias.

Randomized controlled trials are given the highest level because they are designed to be unbiased and have less risk of systematic errors.

For example, by randomly allocating subjects to two or more treatment groups, these types of studies also randomize confounding factors that may bias results.

A case series or expert opinion is often biased by the author’s experience or opinions and there is no control of confounding factors.

Modification of levels

Since the introduction of levels of evidence, several other organizations and journals have adopted variation of the classification system.

Diverse specialties are often asking different questions and it was recognized that the type and level of evidence needed to be modified accordingly.

Research questions are divided into the categories: treatment, prognosis, diagnosis, and economic/decision analysis.

For example, Table 3 shows the levels of evidence developed by the American Society of Plastic Surgeons (ASPS) for prognosis and Table 4 shows the levels developed by the Centre for Evidence Based Medicine (CEBM) for treatment.

Table 3 levels of evidence for prognostic studies.

Table 4 levels of evidence for therapeutic studies.

The two tables highlight the types of studies that are appropriate for the question (prognosis versus treatment) and how quality of data is taken into account when assigning a level.

For example, randomized controlled trials are not appropriate when looking at the prognosis of a disease.

The question in this instance is: “What will happen if we do nothing at all”?

Because a prognosis question does not involve comparing treatments, the highest evidence would come from a cohort study or a systematic review of cohort studies.

The levels of evidence also take into account the quality of the data.

For example, in the chart from Centre for Evidence Based Medicine, poorly designed randomized controlled trials have the same level of evidence as a cohort study.

A grading system that provides strength of recommendations based on evidence has also changed over time.

Table 5 shows the Grade Practice Recommendations developed by ASPS.

Table 5 grade practice recommendations.

The grading system provides an important component in evidence-based medicine and assists in clinical decision making.

For example, a strong recommendation is given when there is level I evidence and consistent evidence from Level II, III and IV studies available.

The grading system does not degrade lower level evidence when deciding recommendations if the results are consistent.

Interpretation of levels

Many journals assign a level to the papers they publish and authors often assign a level when submitting an abstract to conference proceedings.

This allows the reader to know the level of evidence of the research but the designated level of evidence does always guarantee the quality of the research.

It is important that readers not assume that level 1 evidence is always the best choice or appropriate for the research question.

This concept will be very important for all of us to understand as we evolve into the field of evidence based medicine in Plastic Surgery.

By design, our designated surgical specialty will always have important articles that may have a lower level of evidence due to the level of innovation and technique articles which are needed to move our surgical specialty forward.

Although RCTs are the often assigned the highest level of evidence, not all randomized controlled trials are conducted properly and the results should be carefully scrutinized.

Sackett stressed the importance of estimating types of errors and the power of studies when interpreting results from randomized controlled trials.

For example, a poorly conducted randomized controlled trial may report a negative result due to low power when in fact a real difference exists between treatment groups.

Scales such as the Jadad scale have been developed to judge the quality of randomized controlled trials.

Although physicians may not have the time or inclination to use a scale to assess quality, there are some basic items that should be taken into account.

Items used for assessing RCTs include: randomization, blinding, a description of the randomization and blinding process, description of the number of subjects who withdrew or drop out of the study; the confidence intervals around study estimates; and a description of the power analysis.

For example, Bhandari et al., published a paper assessing the quality of surgical randomized controlled trials.

The authors evaluated the quality of randomized controlled trials reported in the Journal of Bone and Joint Surgery (JBJS) from 1988–2000.

Papers with a score of > 75% were deemed high quality and 60% of the papers had a score < 75%. The authors identified 72 randomized controlled trials during this time period and the mean score was 68%.

The main reason for the low-quality score was lack of appropriate randomization, blinding, and a description of patient exclusion criteria.

Another paper found the same quality score of papers in Journal of Bone and Joint Surgery with a level 1 rating compared to level 2.

Therefore, one should not assume that level 1 studies have higher quality than level 2.

A resource for surgeons when appraising levels of evidence are the users’ guides published in the Canadian Journal of Surgery and the Journal of Bone and Joint Surgery.

Similar papers that are not specific to surgery have been published in the Journal of the American Medical Association (JAMA).

Plastic surgery and evidence based medicine

The field of plastic surgery has been slow to adopt evidence-based medicine.

This was demonstrated in a paper examining the level of evidence of papers published in PRS.

The authors assigned levels of evidence to papers published in plastic reconstructive surgery over a 20-year period.

The majority of studies (93% in 1983) were level 4 or 5, which denotes case series and case reports.

Although the results are disappointing, there was some improvement over time.

By 2003 there were more level 1studies (1.5%) and fewer level 4 and 5 studies (87%).

A recent analysis looked at the number of level 1 studies in 5 different plastic surgery journals from 1978–2009.

The authors defined level 1 studies as randomized controlled trials and meta-analysis and restricted their search to these studies.

The number of level 1 studies increased from 1 in 1978 to 32 by 2009.

From these results, we see that the field of plastic surgery is improving the level of evidence but still has a way to go, especially in improving the quality of studies published.

For example, approximately a third of the studies involved double blinding, but the majority did not randomize subjects, describe the randomization process, or perform a power analysis.

Power analysis is another area of concern in plastic surgery.

A review of the plastic surgery literature found that the majority of published studies have inadequate power to detect moderate to large differences between treatment groups.

No matter what the level of evidence for a study, if it is under powered, the interpretation of results is questionable.

Although the goal is to improve the overall level of evidence in plastic surgery, this does not mean that all lower level evidence should be discarded.

Case series and case reports are important for hypothesis generation and can lead to more controlled studies.

Additionally, in the face of overwhelming evidence to support a treatment, such as the use of antibiotics for

wound infections, there is no need for randomized controlled trials.

Clinical examples using levels of evidence

In order to understand how the levels of evidence work and aid the reader in interpreting levels, we provide some examples from the plastic surgery literature.

The examples also show the peril of medical decisions based on results from case reports. An association was hypothesized between lymphoma and silicone breast implants based on case reports.

The level of evidence for case reports, depending on the scale used, is 4 or 5.

These case reports were used to generate the hypothesis that a possible association existed.

Because of these results, several large retrospective cohort studies from the United States, Canada, Denmark, Sweden and Finland were conducted.

The level of evidence for a retrospective cohort is 2.

All of these studies had many years of follow-up for a large number of patients.

Some of the studies found an elevated risk and others no risk for lymphoma.

None of the studies reached statistical significance.

Therefore, higher level evidence from cohort studies does not provide evidence of any risk of lymphoma.

Finally, a systematic review was performed that combined the evidence from the retrospective cohorts.

The results found an overall standardized incidence ratio of 0.89 (95% CI 0.67–1.18).

Because the confidence intervals include 1, the results indicate there is no increased incidence.

The level of evidence for the systematic review is 1.

Based on the best available evidence, there is no association between lymphoma and silicone implants.

This example shows how low level evidence studies were used to generate a hypothesis, which then led to higher level evidence that disproved the hypothesis.

This example also demonstrates that randomized controlled trials are not feasible for rare events such as cancer and the importance of observational studies for a specific study question.

A case-control study is a better option and provides higher evidence for testing the prognosis of the long-term effect of silicone breast implants.

Another example is the injection of epinephrine in fingers.

Based on case reports prior to 1950, physicians were advised that epinephrine injection can result in finger ischemia.

We see in this example in which level 4 or 5 evidence was accepted as fact and incorporated into medical textbooks and teaching.

However, not all physicians accepted this evidence and are performing injections of epinephrine into the fingers with no adverse effects on the hand.

Obviously, it was time for higher level evidence to resolve this issue.

An in-depth review of the literature from 1880 to 2000 by Denkler, identified 48 cases of digital infarction of which 21 were injected with epinephrine.

Further analysis found that the addition of procaine to the epinephrine injection was the cause of the ischemia.

The procaine used in these injections included toxic acidic batches that were recalled in 1948.

In addition, several cohort studies found no complications from the use of epinephrine in the fingers and hand.

The results from these cohort studies increased the level of evidence.

Based on the best available evidence from these studies, the hypothesis that epinephrine injection will harm fingers was rejected.

This example highlights the biases inherent in case reports.

It also shows the risk when spurious evidence is handed down and integrated into medical teaching.

Obtaining the best evidence

We have established the need for randomized controlled trials to improve evidence in plastic surgery but have also acknowledged the difficulties, particularly with randomization and blinding.

Although RCTs may not be appropriate for many surgical questions, well designed and conducted cohort or case-control studies could boost the level of evidence.

Many of the current studies tend to be descriptive and lack a control group.

The way forward seems clear.

Plastic surgery researchers need to consider utilizing a cohort or case-control design whenever a randomized controlled trial is not possible.

If designed properly, the level of evidence for observational studies can approach or surpass those from an RCT.

In some instances, observation studies and randomized controlled trials have found similar results.

If enough cohort or case-control studies become available, this increases the prospect of systematic reviews of these studies that will increase overall evidence levels in plastic surgery.


The levels of evidence are an important component of evidence based medicine.

Understanding the levels and why they are assigned to publications and abstracts helps the reader to prioritize information.

This is not to say that all level 4 evidence should be ignored and all level 1 evidence accepted as fact.

The levels of evidence provide a guide and the reader needs to be cautious when interpreting these results.


Supported in part by a Midcareer Investigator Award in Patient-Oriented Research (K24 AR053120) from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (to Dr. Kevin C. Chung).

Woman doing Kaenz exercises in tempered pool.

Kaenz invitation

If you finished reading this post, most likely you will love, as we do, everything related to aquatic therapy, hydrotherapy and pool exercises. For this reason, we invite you to increase your income, becoming a Kaenz Representative in your city.

Notify of
Inline Feedbacks
View all comments