The traditional clinical trial, designed to test whether a new treatment is better than a placebo or another active treatment, is known as a “superiority” trial—although rarely labeled as such. In contrast, the goal of a noninferiority trial is simply to demonstrate that a new treatment is not substantially less effective than the standard therapy.
Such trials are useful when a new therapy is thought to be safer, easier to administer, or less costly than the existing treatment, but not necessarily more effective. And, because it would be unethical to randomize patients with a serious condition for which there already is an effective treatment to placebo, a noninferiority trial is another means of determining if the new treatment is effective.
Noninferiority trials have unique design features and methodology and require a different analysis than traditional superiority trials. Yet many physicians know far less about them; many investigators appear to be less than proficient, as well. A review of 116 noninferiority trials and 46 equivalence trials found that only 20% fulfilled generally accepted quality criteria.1 To improve the quality of noninferiority trials, the CONSORT (Consolidated Standards of Reporting Trials) Group has published a checklist for trial design and reporting standards.2,3 Based on this checklist, we came up with 7 key questions to consider when evaluating a noninferiority trial. In the pages that follow, you’ll also find an at-a-glance guide (TABLE) and a methodology review using a hypothetical case (page E7).
1. Is a noninferiority trial appropriate?
The introduction to a noninferiority trial should provide the rationale for this design and the absence of a placebo control group. Look for a review of the evidence of the efficacy of the reference treatment that placebo-controlled trials have revealed, along with the effect size. The advantages of the new treatment over the standard treatment—eg, fewer adverse effects, easier administration, or lower cost—should be discussed, as well.
In the Randomized Evaluation of Long-term Anticoagulation Therapy (RE-LY)—a prominent noninferiority trial—investigators compared the standard anticoagulant (warfarin) for patients with atrial fibrillation (AF) at risk of stroke with a new agent, dabigatran.4 In the methods section of the abstract and the statistical analysis section of the main body, the authors clearly indicated that this was a noninferiority trial. They began by referring to the existing evidence of warfarin’s effectiveness, then detailed the qualities that make warfarin cumbersome to use, including the need for frequent laboratory monitoring. This was followed by evidence that many patients stop taking warfarin and that even for those who persist with treatment, adequate anticoagulation is difficult to maintain.
The authors went on to state that because dabigatran requires no long-term monitoring, it is easier to use. Therefore, if dabigatran could be shown to be no worse than warfarin in preventing strokes, it would be a reasonable alternative, leaving no doubt that this was an appropriate noninferiority trial.
2. Is the noninferiority margin based on clinical judgment and statistical reasoning?
The noninferiority margin should be based on clinical judgment as to how effective a new treatment must be in order to be declared not clinically inferior to the standard treatment. This can be based on several factors, including the severity of the outcome and the expected advantages of the new treatment. The margin should also take into account the size of the standard treatment’s effect vs placebo. In RELY, for example, the authors noted that the noninferiority margin was based on the desire to preserve at least 50% of the lower limit of the confidence interval (CI) of warfarin’s estimated effect; this was done using data from a previously published meta-analysis of 6 trials comparing warfarin with placebo for stroke prevention in patients with AF.4-6
3. Are the hypothesis and statistical analysis formulated correctly?
The clinical hypothesis in a noninferiority trial is that the new treatment is not worse than the standard treatment by a prespecified margin; therefore, the statistical null hypothesis to be tested is that the new treatment is worse than the reference treatment by more than that margin. Rejecting a true null hypothesis (for example, because the P value is <.05) is known as a type l error. In this setting, making a type I error would mean accepting a new treatment that is truly worse than the standard by at least the specified margin. Failure to reject a false null hypothesis is known as a type II error, which in this case would mean failing to identify a new treatment that is truly noninferior to the standard.7
In RE-LY, the authors stated that the upper limit of the one-sided 97.5% CI for the relative risk of a stroke with dabigatran vs warfarin had to fall below 1.46.4 (This is the same as testing the null hypothesis that the hazard ratio is ≥1.46.) Thus, the hypothesis was formulated correctly.