Assessing evidence: strengths and weaknesses of grading tools

Purpose: To describe common problems that can occur with grading systems which may negate their usefulness in justifying evidence-based changes in policy and practice.

Description: In order to summarise the state of research on particular topics, the quality and robustness of scientific results is often assessed using one of a number of so-called grading instruments or systems. Grading instruments are extensively used in evidence-based medicine to develop clinical guidelines, as well as in other research on health care. Applying a grading system to a body of evidence provides an air of authority to those findings and can help determine whether those findings are translated into changes in policy or practice.

However, there are eight common problems with grading systems, which must temper enthusiasm for their use. Before relying on conclusions drawn from applying a grading system, the limitations of the particular instrument and its potential for misuse must be assessed.

Limitation 1. Grading systems may lack information on their validity and reliability
Many grading instruments have not been validated, meaning that there are no guarantees that they are effective, avoid bias, and generally work in the way they claim to work.

Limitation 2. Grading systems may have poor concurrent validity
Concurrent validity is said to exist when the results from different instruments designed to measure the same underlying construct are highly correlated. There is little evidence for concurrent validity across different grading scales.

Limitation 3. Grading systems may not account adequately (or at all) for external validity
Grading instruments are often designed to focus solely on scientific robustness and do not include any metrics for evaluating the external validity of findings, which includes the generalisability of research results to real-life populations, the feasibility of applying findings in the real world and the sustainability of an intervention over time.

Limitation 4. Grading systems are not necessarily inherently logical
In most grading systems there is no essential way in which different elements of quality combine to create higher quality.

Limitation 5. Grading systems are susceptible to subjectivity and low inter-rater reliability
Many grading instruments have guidelines that are open to interpretation, meaning that the assessor’s background, skills and views can affect how the guidelines are operationalised.

Limitation 6. Grading systems often have inadequate instructions and are overly complex
Many grading systems, including those that are complex, do not provide instructions or define terms, or instructions may be vague. This opens the possibility that raters arrive at conclusions that fit with their preconceptions.

Limitation 7. Some grading systems are biased toward randomised controlled trials
Some grading instruments automatically assign higher-level grades to evidence from randomised controlled trials and lower-level grades to evidence from other kinds of research, without taking into account any limitations in the randomised controlled trial or strengths in the other research designs.

Limitation 8. Grading systems may not adequately address different types of observational research
Some grading systems assume that all forms of observational research are equal in the strength of evidence they produce.

A final point is that grading instruments are not a uniform set of tools but vary widely (eg., one grading tool is described at https://i2s.anu.edu.au/resources/assessing-evidence-quality-recommendations, but there are many others). This means that the choice of the appropriate grading system for a particular context is essential.

Reference: Irving, M., Eramudugolla, R., Cherbuin, N., and Anstey, K. (2017). A critical review of grading systems: Implications for public health policy. Evaluation and the Health Professions, 40, 2: 244-262. (Online): https://doi.org/10.1177/0163278716645161

Related tools on this website:

Related topics on Wikipedia: N/A

Posted:December 2013
Last modified: April 2019