1 Introduction
Vertebral bone quality (VBQ) is a critical biomechanical determinant of success in spinal fusion surgery. In patients undergoing posterior lumbar interbody fusion (PLIF) for degenerative lumbar disease, VBQ directly affects both the initial stability and the long-term fixation strength of implants such as pedicle screws.
[1] Reduced bone quality due to osteopenia or osteoporosis significantly compromises the bone–screw interface and is a well-established risk factor for postoperative internal fixation failure (IFF). Such complications—including screw loosening or breakage, cage subsidence or migration, and rod fracture—not only lead to recurrent pain and neurological risks but also frequently necessitate revision surgery, thereby worsening patient outcomes and quality of life.
[2,
3]Magnetic resonance imaging (MRI)-based VBQ scores were initially used to assess preoperative vertebral quality in spine surgery patients. By quantifying the signal intensity (SI) ratio between vertebral marrow (at L1–L4) and cerebrospinal fluid (CSF), the VBQ score indirectly reflects vertebral marrow fat content.
[4] Compared with dual-energy X-ray absorptiometry (DEXA), the traditional method for assessing bone mineral density (BMD), VBQ offers greater sensitivity in detecting marrow fat changes and avoids radiation exposure, making it a promising screening method for surgical candidates.
[5] Prior studies have shown that VBQ independently predicts cage subsidence after lumbar fusion and distal junctional kyphosis after cervical fusion, as well as the risk of reoperation.
[6–
9] Despite these findings, few studies have specifically examined the role of VBQ in predicting hardware failure after PLIF.
Previous studies showed VBQ cutoff value varied in predicting postoperative complications, with great ability in screening osteoporosis patients at the level of around 3.0.
[10] However, the threshold of VBQ for predicting IFF after lumbar fusion surgery has not reached consensus. Patients' characteristics, varied MRI devices and magnetic field, and the operator's strategy may cause deviation, leading to varied results.
[11,
12] As such, illustrating the factors affecting the VBQ threshold facilitates the generalization of the VBQ score and enables individualized interpretation of the cutoff value. Thus, this study aimed to evaluate the predictive efficacy of the VBQ score on IFF after PLIF, and explore the influencing factors of the threshold.
2 Methods
2.1 Study population
All cases treated between January 2017 and December 2022 were retrospectively reviewed. Of the 343 patients screened for degenerative lumbar disease, 256 met the inclusion criteria and were followed postoperatively. Inclusion criteria included (1) age ≥ 18 years, undergoing PLIF surgery for degenerative lumbar disease; (2) completed routine lumbar spine MRI (including T1 weighted sequence for VBQ calculation) within 1 year before surgery; and (3) postoperative follow-up ≥ 2 years (to ensure that it can be observed to meet the revision needs). Exclusion criteria included (1) previous lumbar spinal fusion surgery; (2) severe postoperative trauma leading to IFF; (3) patient with cement augmentation for stress endplate; and (4) poor quality or absent preoperative MRI image. The study was approved by the Ethics Committee of the First Medical Center of Chinese People's Liberation Army General Hospital and conducted by the Declaration of Helsinki. This study was based on routinely collected medical record data, which is secondary in nature and does not contain any personally identifiable information. Therefore, the requirement for informed consent was waived.
2.2 VBQ score
Two independent blinded observers performed the VBQ measurements. In cases of disagreement between the 2 initial readers (defined as an interobserver variability of > 10%), a third senior radiologist was consulted to adjudicate and provide a final measurement. Preoperative lumbar spine MRI T1-weighted sagittal image was analyzed. A circular region of interest was placed centrally within the L1–L4 vertebrae, avoiding endplates, cortical bone, and lesions. CSF SI was measured at the L3 level, and the VBQ = SIL1–L4/SICSF was calculated. If severe stenosis at L3 precluded region of interest placement, SICSF at the L2 or L4 level may also be considered.
2.3 Data collection
Demographic data were collected, including: age, gender, height, weight, BMI (kg/m2), smoking and alcohol history; comorbidities: hypertension, diabetes, coronary heart disease, hyperlipidemia (based on clinical history or LDL > 130 mg/dL); surgical parameters: date of surgery, number of fusion segments, whether IFF occurred after surgery; and whether steroids are used for a long time (≥3 months).
IFF requiring revision was used as the endpoint, defined as (1) postoperative screw loosening and breakage, fusion displacement, settlement, connection rod breakage, etc.; (2) confirmed by imaging (X-ray/CT) and requires surgical intervention; (3) the follow-up time is from the first fusion surgery to the revision surgery or the last follow-up (≥2 years).
2.4 Statistical analysis
SPSS 26.0 software was used for analysis. Continuous variables were expressed as mean ± standard deviation (x±s), and categorical variables were expressed as rate (%). The predictive power of the VBQ score was evaluated by receiver operating characteristic (ROC) curve, and the optimal cutoff value was determined by the Yoden index. Binary logistic regression was used to explore the influencing factors of the VBQ threshold, and Pearson analysis was used to test the correlation. The statistical significance level was set at p < 0.05.
3 Results
3.1 Patient characteristics
As shown in Figure 1, 343 patients were initially screened in electronic medical records. A total of 256 patients met the inclusion criteria. Table 1 illustrates the basic characteristics of patients. All patients received postoperative follow-up at least 2 years, with an average age of 68.73 ± 6.32 years and 74% female. There were 189 cases with IFF and 67 cases with non-IFF.
3.2 ROC curve identifying a predictive threshold
As shown in Table 2 and Figure 2, VBQ score was shown to predict postoperative IFF with an accuracy of 0.859 (95% confidence interval [CI]: 0.81–0.91) through ROC analysis. The optimal cutoff value was 3.3, the ability of VBQ score predicting IFF following PLIF was significantly better than other clinical variables. The VBQ score had good screening ability in identifying high-risk groups (Table 3).
3.3 Multivariate analysis
As shown in Table 4, Binary logistic regression was used to investigate the independent influencing factors of the optimal VBQ cutoff value. The results showed that long-term use of steroid drugs (≥ 3 months) had a nonnegligible effect on threshold (OR = 22.85, p < 0.05), but the CI of this factor was extremely wide (95% CI: 1.57–331.96), suggesting that the results were unstable, probably due to the low variability and little statistical significance. As such, the data were deleted. The other variables had no significant effect on the VBQ cutoff value (p > 0.05).
As shown in Table 5, Pearson correlation analysis showed that gender, chronic disease, and VBQ score were weakly positively correlated (R = 0.230 and 0.229, p < 0.001), and IFF was positively correlated with VBQ score (R = 0.549, p < 0.001). It was indicated that higher VBQ scores were associated with female sex, chronic comorbidities (including hypertension, diabetes, and coronary heart disease), and patients with IFF.
4 Discussion
The results indicated that the VBQ score exhibited a good predictive value for IFF following PLIF, with high sensitivity (77.6%) and specificity (79.9%), indicating its effectiveness in discriminating between patients with and without IFF. Especially, with NPV up to 91.0%, which indicated that a lower VBQ score (< 3.3) could be used for screening low-risk populations as a reliable exclusion criterion. However, its positive predictive value (PLR) is relatively low (57.8%), suggesting that a high VBQ score should be considered in combination with DEXA-T score, and cannot be thought of as a basis for decision-making alone. Multivariable analysis showed there is no significant association between clinic variables and VBQ cutoff value. Furthermore, female, chronic disease, and postoperative IFF significantly correlated with higher VBQ score, suggesting that VBQ score was significantly correlated with patients' basic health status and postoperative outcomes.
IFF is an important factor leading to spine internal fixation postoperative revision in osteoporosis patients.
[13] Decreased BMD contributes to a reduction in vertebral body failure load, significantly increasing the risk of hardware-related complications such as screw loosening and cage subsidence, which ultimately impairs the overall rehabilitation process.
[14,
15] For patients with degenerative lumbar spondylosis, assessing for osteoporosis before surgery aids in treatment planning and complication prevention, including determining the need for anti-osteoporotic drugs preoperatively and the use of bone cement augmentation intraoperatively.
[16] While DEXA is conventionally recommended as the standard preoperative assessment of BMD before instrumented fusion surgery,
[5] its screening rate among patients remains below 44%.
[17] Given the 2-dimensional nature of DEXA imaging, BMD measurements can be affected due to factors such as scoliosis, bone hyperplasia, vertebral fractures, and vascular calcification.
[18] Quantitative computed tomography (QCT), despite its better accuracy in assessing BMD, is also not widely adopted for routine clinical use; its application is limited by high cost and radiation exposure.
[19]Recently, increasing attention has been directed toward the use of MRI to evaluate VBQ. As a routine component of preoperative spinal assessment, MRI offers several advantages: it is noninvasive, avoids additional patient harm, involves no ionizing radiation, and does not increase examination costs. These attributes make MRI a promising tool for bone quality screening. The MRI-based VBQ scoring system, first proposed by Ehresman et al.
[4] in 2019, enables quantitative evaluation of trabecular fat infiltration by analyzing vertebral medullary SI on T1-weighted sequences. The score is calculated as the ratio of the median SI of the C3–C6 vertebral marrow to the CSF SI at the level of L3. Because CSF SI is uniform and stable across individuals, it serves as a reliable internal reference, thereby reducing device-related variability and improving the comparability of measurements.
[20]Osteoporosis is characterized by the replacement of hematopoietic marrow with adipocytes,
[20] which appear hyperintense on T1-weighted images.
[4] Consequently, higher VBQ scores are associated with reduced bone quality.
[21–
23] In the present study, the VBQ score demonstrated good predictive performance for IFF after PLIF, with sensitivity of 77.6% and specificity of 79.9%. Notably, the negative predictive value (NLR) reached 91.0%, suggesting that a low VBQ score is a reliable indicator for excluding patients at low risk of IFF. However, the PLR was more modest (57.8%), implying that high VBQ scores should be interpreted in conjunction with other clinical and imaging parameters rather than used in isolation. Previous studies have shown that the VBQ score correlates significantly with DEXA-T scores and acts as an independent risk factor for cage subsidence after spinal surgery.
[24] Although QCT is generally more accurate for predicting hardware-related complications, VBQ may serve as a useful adjunct to DEXA or QCT, particularly in settings where these modalities are unavailable.
[25] Numerous investigations have confirmed the consistency of VBQ results with both BMD and QCT validation.
[21,
26] While VBQ cannot fully replace traditional BMD assessments, its unique strengths make it a valuable supplementary tool for surgical planning and the prevention of osteoporosis-related complications.
In this study, no significant correlation was observed between the VBQ cutoff value and patients' baseline characteristics, general health status, or medication use (
p > 0.05). Nevertheless, the optimal threshold for predicting fixation failure remains uncertain. Some prior studies have suggested that a threshold around 3.0 provides good diagnostic accuracy for postoperative complications and osteoporosis,
[27,
28] a finding echoed here. However, the generalizability of the cutoff value remains controversial.
Several factors may account for the lack of a universal cutoff value. Surgical technique and approach, MRI device, field strength, operator's strategy, and outcome definitions may all influence VBQ thresholds. Although studies indicate good reproducibility of VBQ between different scanners and field strengths (1.5T vs. 3.0T),
[11] some reports suggest slightly higher scores at 1.5T, highlighting persistent uncertainty regarding magnetic field effects.
[29] Furthermore, there is limited research on the comparability of VBQ scores between different vertebral segments, and insufficient evidence to confirm their consistency across spinal levels. These technical and anatomical variations contribute to the difficulty in defining a universal threshold.
Beyond technical heterogeneity, surgical factors and varying definitions of fixation failure (e.g., limited to screw loosening versus including rod fracture or proximal junctional kyphosis) also affect complication rates and may alter the optimal VBQ threshold. These findings emphasize that VBQ cutoff values may be influenced by nonosseous factors. Thus, the development of predictive models integrating clinical, surgical, and imaging parameters is warranted to improve individualized risk assessment.
Overall, our results suggested that VBQ has good diagnostic utility for predicting IFF after PLIF. A threshold of 3.3 appeared to provide useful screening performance. However, the sensitivity and specificity of VBQ vary markedly depending on the cutoff used, raising concerns about diagnostic bias. In clinical settings lacking DEXA or QCT resources, VBQ offers a simple, MRI-based alternative for preoperative risk stratification. Nevertheless, whether VBQ can independently serve as a definitive diagnostic tool for predicting IFF requires further investigation.
This study has several limitations. First, although the ROC area reached 0.859, our cohort consisted of patients with lumbar degenerative disease, and those with severe osteoporosis were often excluded from surgery due to high perioperative risks. This selection bias may have led to overestimation of VBQ's applicability. Second, some associations between variables and VBQ score were not statistically significant, potentially due to the limited sample size and insufficient statistical power. Larger cohorts may reveal weak but clinically meaningful correlations. Finally, most existing evidence, including our data, derives from retrospective, single-center studies with relatively small samples, limiting the robustness and generalizability of conclusions. Although many reports support a threshold around 3.0 for predicting postoperative complications and osteoporosis, validation through large-scale, prospective, multicenter studies is still required to establish a stable and widely applicable cutoff value.
In conclusion, the VBQ score is an excellent screening tool for predicting IFF following PLIF, which can provide a key basis for preoperative intervention in high-risk patients, and factors influencing its cutoff value were not found according to the results.
© 2025 the Author(s). Published by Wolters Kluwer Health, Inc. on behalf of Higher Education Press.