Elbow fractures constitute 7% of all adult fractures, and 30% of these fractures are distal humerus fractures.1,2 Of these, 96% involve disruption of the articular surface.3 Intra-articular distal humerus fracture patterns can be difficult to characterize on plain radiographs, and therefore computed tomography (CT) is often used. The surgeon’s understanding of the fracture pattern and the deforming forces affects choice of surgical approach. In particular, multiplanar fracture patterns, including coronal shear fractures of the capitellum or trochlea, are often difficult to recognize on plain radiographs. Identification of a multiplanar fracture pattern may require a change in approach or fixation. CT is useful for other intra-articular fractures, such as those of the proximal humerus,3-6 but involves increased radiation and cost.
We conducted a study to determine the effect of adding CT evaluation to plain radiographic evaluation on the classification of, and treatment plans for, intra-articular distal humerus fractures. We hypothesized that adding CT images to plain radiographs would change the classification and treatment of these fractures and would improve interobserver agreement on classification and treatment.
Materials and Methods
After obtaining University of Southern California Institutional Review Board approval, we retrospectively studied 30 consecutive cases of adult intra-articular distal humerus fractures treated by Dr. Itamura at a level I trauma center between 1995 and 2008. In each case, the injured elbow was imaged with plain radiography and CT. Multiple machines were used for CT, but all according to the radiology department’s standard protocol. The images were evaluated by 9 independent observers from the same institution: 3 orthopedic surgeons (1 fellowship-trained shoulder/elbow subspecialist, 1 fellowship-trained upper extremity subspecialist, 1 fellowship-trained orthopedic trauma surgeon), 3 shoulder/elbow fellows, and 3 senior residents pursuing upper extremity fellowships on graduation. No observer was involved in the care of any of the patients. All identifying details were removed from the patient information presented to the observers. For each set of images, the observer was asked to classify the fractures according to the Mehne and Matta classification system,7,8 which is the predominant system used at our institution.
Diagrams of this classification system were provided, but there was no formal observer training or calibration. Seven treatment options were presented: (1) open reduction and internal fixation (ORIF) using a posterior approach with olecranon osteotomy, (2) ORIF using a posterior approach, (3) ORIF using a lateral approach, (4) ORIF using a medial approach, (5) ORIF using an anterior/anterolateral approach, (6) total elbow arthroplasty, and (7) nonoperative management. The only clinical data provided were patient age and sex.
Images were evaluated in blinded fashion. Two rounds of evaluation were compared. In round 1, plain radiographs were evaluated; in round 2, the same radiographs plus corresponding 2-dimensional (2-D) CT images. A minimum of 1 month was required between viewing rounds.
Statistical Analysis
Statistical analysis was performed by the Statistical Consultation and Research Center at our institution. Cohen κ was calculated to estimate the reliability of the fracture classification and treatment plan made by different observers on the same occasion (interobserver reliability). Cramer V9 was calculated to estimate the reliability of the fracture classification and treatment plan made by the same observer on separate occasions (intraobserver reliability). It measures the association between the 2 ratings as a percentage of their total variation. The κ value and Cramer V value were also used to evaluate results based on the observers’ training levels. Both κ and Cramer V values are interpreted as follows: .00 to .20 indicates slight agreement; .21 to .40, fair agreement; .41-.60, moderate agreement; .61 to .80, substantial agreement; and ≥.81, almost perfect agreement. Zero represents no agreement, and 1.00 represents perfect agreement.
Results
Overall intraobserver reliability for classification was fair (.393). It was moderate for the treatment plan (.426) between viewing rounds. Residents had the highest Cramer V value at .60 (moderate) for classification reliability, and attending surgeons had the highest value at .52 (moderate) for treatment plan. All 3 groups (residents, fellows, attending surgeons) showed moderate intraobserver agreement for treatment plan (Table 1).
Interobserver reliability did not improve with the addition of CT in round 2. Reliability was fair at both viewing rounds for classification and for treatment. For classification, the overall κ value was .21 for the first round and .20 for the second round. For treatment plan, the overall κ value was .28 for the first round and .27 for the second round. Attending surgeons decreased in agreement with regard to treatment plan with the addition of CT (.46, moderate, to .32, fair). Fellows had only slight agreement for both rounds with regard to classification as well as treatment (Table 2).