Applied Evidence

7 questions to ask when evaluating a noninferiority trial

Author and Disclosure Information

 

References

4. Is the sample size appropriate and justified?

The sample size in a noninferiority trial should provide high power to reject the null hypothesis that the difference (or relative risk) between groups is equal to or greater than the noninferiority margin under some clinically meaningful assumption about the true difference (or absolute risk reduction) between groups. A true difference of 0 (or a relative risk of 1) is typically assumed for sample size calculation. However, assuming that the new treatment is truly slightly better or slightly worse than the standard may be clinically appropriate in some cases. This would indicate a need for a smaller or larger sample size, respectively, than that required under the usual assumption of no difference.

When the justification for the sample size in a noninferiority trial is not provided or the number of participants is based on an inappropriate approach (eg, using superiority trial calculations for a noninferiority trial), questions about the quality of the trial arise. The primary concern is whether the noninferiority margin was actually selected before the trial began, as it should have been. And if the researchers used overly optimistic assumptions about the efficacy of the new treatment relative to the standard therapy, the failure to rule out the margin could be misleading. (As with superiority trials that fail to reject the null hypothesis, post hoc power calculations should be avoided.) After the study has ended, the resulting CIs should be used to evaluate whether the study was large enough to adequately assess the relative effectiveness of the treatments.

The RE-LY trial calculated the sample size that was expected to provide 84% power to rule out the prespecified hazard ratio of 1.46, assuming a true event rate of 1.6% per year (presumably for both groups), a recruitment period of 2 years, and at least one year of follow-up. The sample size was subsequently increased from 15,000 to 18,000 to maintain power in case of a low event rate.4,5

5. Is the noninferiority trial as similar as possible to the trial(s) comparing the standard treatment with placebo?

Characteristics of participants, setting, reference treatment, and outcomes used in a noninferiority trial should be as close as possible to those in the trial(s) comparing the treatment with placebo. This is known as the constancy assumption, and it is key to researchers’ ability to draw a conclusion about noninferiority.

The trials used to calculate the noninferiority margin and the RE-LY trial itself involved similar populations of patients with AF, and the outcome (stroke) was similar.

6. Is a per protocol analysis reported in the results?

In randomized controlled superiority trials, the participants should be analyzed in the groups to which they were originally allocated, regardless of whether they adhered to treatment during the entire follow-up period. Such intention-to-treat (ITT) analysis is important because it provides a more conservative estimate of treatment effect—taking into account that some people who are offered treatment will not accept it and others will discontinue treatment. An ITT analysis therefore tends to minimize treatment effects compared with a “per protocol” analysis, in which participants are analyzed according to the treatment they actually received and are often removed from the analysis if they discontinue or do not adhere to treatment.

Intention-to-treat analysis is important because it provides a more conservative estimate of treatment effect.In noninferiority trials, if patients in the intervention group cross over to the standard treatment group or those in the standard treatment group have poor adherence, an ITT analysis can increase the risk of wrongly claiming noninferiority.7 Therefore, a per protocol analysis should be included—and indeed may be preferable.

In RE-LY, ITT analyses were reported, and complete follow-up data were available for 99.9% of patients. However, the rates of treatment discontinuation at one year were about 15% for those on dabigatran and 10% for the warfarin group, and 21% and 17%, respectively, at 2 years.4,5 If the new treatment were truly less efficacious than the standard treatment, these moderate discontinuation rates could lead to more similar rates of stroke in the 2 groups than would be expected with higher continuation rates, biasing results towards the alternative of noninferiority. Although the original publication of trial results did not include a per protocol analysis, the RE-LY authors later reported that a per protocol analysis yielded similar results to the ITT analysis.

7. Are the overall design and execution of the trial high quality?

Because a poor quality noninferiority trial can appear to demonstrate noninferiority, looking at such studies critically is crucial. Appropriate randomization, concealed allocation, masking, and careful attention to participant flow must all be assessed.2,3

Recommended Reading

States look to continue Medicaid pay boost on their own
MDedge Family Medicine
Doctors seek to delay 2014 meaningful use deadlines
MDedge Family Medicine
Medical home pilot fails to reduce cost, utilization
MDedge Family Medicine
Meaningful use criteria bolster lung cancer screening program
MDedge Family Medicine
New guidelines issued for geriatric care in the ED
MDedge Family Medicine
The list of things FPs do just keeps getting shorter
MDedge Family Medicine
Wait time for family practice appointment longest in Boston
MDedge Family Medicine
Stage 2 of meaningful use: Expect tougher objectives, pre-payment audits
MDedge Family Medicine
CMS launches ICD-10 website for small physician practices
MDedge Family Medicine
Flexibility – but no passes – on meaningful use Stage 2
MDedge Family Medicine