1 Introduction
Clinical trials evaluating the efficacy of acupuncture therapy have used sham acupuncture as a control group. However, because the factors inducing the effects of acupuncture are diverse and complex, controversy remains as to whether a clinically inert placebo, one that lacks specific effects on the symptom or disease being investigated, is possible [
1,
2]. The types of sham acupuncture can be largely divided into the use of non-penetrating techniques, such as sham devices or pressing/poking the skin [
3,
4], and penetrating techniques, such as superficial needling [
5,
6]. In addition, sham acupuncture can be divided into types that do or do not use the acupuncture points indicated for the disorder being studied. When acupuncture points not indicated for the disorder are used, these may be other acupuncture points or non-acupuncture points [
5,
6]. The technical procedures for sham acupuncture, such as the technique and sites of stimulation, should be established and interpreted according to the research hypothesis. However, these varied procedures have often been misused and interpreted as a simple placebo control to evaluate the efficacy of acupuncture even though it is questionable if that is appropriate [
5–
9]. Furthermore, although evidence on the specificity of acupuncture points has been presented [
10–
12], sham acupuncture is sometimes performed on the same acupuncture points as the verum acupuncture points to evaluate the efficacy of acupuncture [
5,
6]. Although this is occasionally performed to examine the relative effectiveness of different stimulation techniques [
13], it has mostly been performed under the assumption that the choice of sham points is not important. Issues surrounding the physiologic effects of sham acupuncture have contributed to an underestimation of the treatment effectiveness of acupuncture [
14], potential bias against acupuncture, and a misstatement of the role of the placebo [
15]. This in turn has led to inconsistent conclusions about the effectiveness of acupuncture in clinical practice guidelines [
16–
18]. When the sham technique is applied at the same treatment points, the trial is no longer a placebo-controlled trial but is instead a trial comparing different treatment techniques in which randomization and blinding hold placebo and nonspecific effects equal. Such a trial does not test the efficacy of acupuncture [
5].
In our previous network meta-analyses (NMAs), we found that acupuncture clinical trials targeting chronic nonspecific low back pain [
8], cancer-related pain [
9], and knee osteoarthritis [
7] showed different results depending on whether the needling points in sham acupuncture were the same as those in verum acupuncture. For example, in NMA for chronic nonspecific low back pain, sham acupuncture needling at the same acupuncture points as those in verum acupuncture significantly improved pain intensity and function compared with needling at the non-indicated sham point. Based on these results, we hypothesized that similar results might be observed in sham acupuncture-controlled clinical trials for other conditions.
Migraine is a neurological disease with a high global prevalence and a large socioeconomic burden [
19,
20]. Several medications are commonly used for the acute relief and prevention of migraine; however, there are risks of overuse of acute medications and adverse effects such as gastrointestinal symptoms or cardiovascular events [
21,
22]. Acupuncture, a nonpharmacologic intervention, has been recommended due to its effectiveness and safety with a low risk of side effects [
21,
23]. Various clinical trials have been conducted to date to evaluate the effects of acupuncture on migraine; however, the conclusions are still considered controversial especially when acupuncture is compared with sham acupuncture [
23–
25]. Therefore, the purpose of this review was to investigate whether the outcome of acupuncture for migraine is affected by sham intervention at the same acupuncture points as those in verum acupuncture or at non-indicated sham points.
2 Eligibility criteria
The protocol of this systematic review was registered in PROSPERO (CRD42024496777).
(1) Population: Trials involving adults diagnosed with migraine without restrictions on age, sex, race, or nationality were included.
(2) Intervention and comparator: Verum acupuncture, sham acupuncture, and waiting list (no treatment) were included. Verum acupuncture included only manual acupuncture that penetrates the skin without additional stimulation such as electrical stimulation. Sham acupuncture was classified into two groups according to the needling points: sham acupuncture therapy at verum acupuncture points (SATV) or sham acupuncture therapy at sham points (SATS). Studies that could not be classified as SATV or SATS due to ambiguous information on the points needled in sham acupuncture were excluded. The waiting list (no treatment) group was included to form a connected loop in the network map, and only the use of rescue medication was allowed.
(3) Outcome measure: The primary outcome was headache pain intensity measured by visual analog scale (VAS), numeric rating scale (NRS), or other validated outcome measures. If multiple pain scales were used, priorities were determined through discussion between the authors. Secondary outcomes included response rate (responder: at least 50% reduction in migraine frequency) and frequency of migraine attacks.
As a unit of analysis, the earliest results after completion of all treatment sessions were used. If there was no presented value after treatment, the value assessed at the time closest to the end of treatment was used. Additionally, if data from eligible studies were not presented in a format suitable for meta-analysis and if the data could not be obtained even after contacting the corresponding authors, the studies were excluded.
(4) Study design: Randomized controlled clinical trials (RCTs) were eligible.
(5) Others: There were no restrictions imposed on publication year or language. In addition, not only articles published in journals but also gray literature such as conference proceedings were included. Studies where it was impossible to extract relevant information because full texts were not available were excluded.
3 Information sources and search strategy
We searched MEDLINE via PubMed, Embase via Elsevier, Cochrane Central Register of Controlled Trials (CENTRAL) via Cochrane Library, and Allied and Complementary Medicine Database (AMED) via Ovid to identify eligible studies on December 25, 2023. The reference lists of eligible studies and relevant review articles were hand-searched to identify additional studies. The search terms were determined through consensus by systematic review experts, and the detailed search terms and results for all databases are presented in Supplement section 1.
4 Study selection and data extraction
The bibliographic information of studies obtained through database searches and other sources was imported into Endnote 20 (Clarivate Analytics, Philadelphia, PA, USA). After removing duplicates, full texts were reviewed for studies identified as potentially eligible at the title and abstract review stage to confirm the final included studies.
The following data were extracted from the included studies using a standardized and pilot-tested data extraction form (Excel 2019, Microsoft, Redmond, WA, USA): basic study characteristics (publication year, country, and sample size), details of populations, interventions, and comparators, outcomes of interest, and results. If the relevant information was ambiguous or the results were not presented in a format suitable for meta-analysis, the corresponding author was contacted by email to request additional information. Two researchers (BL and CYK) conducted study selection and data extraction independently, and any disagreement was resolved by discussion between them and, if necessary, with the corresponding author.
5 Risk of bias assessment
The risk of bias for the included studies was assessed using the Cochrane risk of bias tool [
26]. The following items were evaluated as low, unclear, or high risk of bias in individual studies: random sequence generation, allocation concealment, blinding of participants, acupuncturists, and outcome assessors, completeness of outcome data, selective reporting, and other bias. Specifically, the “other bias” item was evaluated based on the statistical and clinical similarity of the baseline data between the groups. One researcher (BL) evaluated the risk of bias, and another researcher (CYK) reviewed the results independently. Any disagreement was resolved through discussion.
6 Data analysis and synthesis
NMA based on the frequentist framework was carried out for the outcomes of interest using network packages in Stata/MP 16.1 (StataCorp LLC, College Station, TX, USA). NMA was performed only if the statistical consistency assumption was satisfied, and it was tested through the node-splitting method (local approach) and design-by-treatment interaction model (global approach). Pairwise meta-analysis of direct evidence was performed using Review Manager 5.4.1 (Cochrane, London, UK) to confirm consistency in statistical significance with NMA results. As headache pain intensity and frequency of migraine attacks, which are continuous variables, were evaluated using different questionnaires and units in individual studies, they were pooled using the standardized mean differences (SMD) and 95% confidence intervals (CIs). The response rate, a dichotomous variable, was pooled using the risk ratio (RR) and 95% CI. If the 95% CI passed 1 and 0 for dichotomous and continuous variables, respectively, the effect size between the two groups was judged not statistically significant. A random-effects model was selected in both NMA and pairwise meta-analysis considering unavoidable clinical heterogeneity between the included studies. The number of participants and direct trials included in the NMA are shown in a network map, and the results of NMA and pairwise meta-analysis are shown in the interval plot and league table.
If sufficient studies (
n≥ 10) were included in the analysis, potential publication bias was assessed using a funnel plot and Egger’s test for asymmetry. In addition, to identify the best treatment, the surface under the cumulative ranking curve (SUCRA) statistic was examined. The certainty of evidence for effect estimates was assessed using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach [
27].
7 Study selection and characteristics
A total of 3542 studies were identified through database searching, and no studies were added from other sources. After using Endnote’s duplicate removal function, the titles and abstracts of the remaining 2406 studies were reviewed. The full texts of 71 potentially eligible studies were reviewed, and 53 studies were excluded for the following reasons: not RCTs (
n = 16), not about only manual acupuncture (
n = 6), not sham acupuncture or waiting list-controlled trials (
n = 11), not reporting outcomes of interest appropriately (
n = 17), only abstract available (
n = 1), and using duplicated data (
n = 2) (Supplement section 2). Finally, 18 studies involving 1936 participants were included in the analysis [
28–
45] (Fig.1).
The countries where the studies were conducted were China [
35,
36,
40,
41,
43,
45], Germany [
30,
31,
37,
39], Iran [
32–
34], Australia [
42], Canada [
44], Sweden [
38], Brazil [
28], and Italy [
29]. In addition, two studies targeted menstrual migraine patients [
38,
44], and only female patients were included in three studies [
29,
38,
44]. A total of four studies [
35–
37,
43] compared verum acupuncture, sham acupuncture, and waiting list, and the remaining studies compared verum acupuncture and sham acupuncture. The needling points of the sham acupuncture group were SATV in one study [
38] and SATS in the remaining studies. The primary outcome (headache pain intensity) was evaluated in 13 studies [
29,
31,
32,
34–
38,
40–
42,
44,
45], response rate was evaluated in eight studies [
28,
30,
31,
37,
38,
42–
44], and frequency of migraine attacks was evaluated in 12 studies [
30,
33–
40,
42,
44,
45] (Tab.1 and Supplement Section 3). In one study [
42], the 6-point Likert scale and VAS were used to evaluate headache pain intensity, and after discussion among the researchers, the 6-point Likert scale, which showed a smaller difference in baseline scores between groups, was selected as the target of analysis.
Four-node network maps were constructed (Fig.2). In the global approach for statistical consistency testing, the P values for headache pain intensity, response rate, and frequency of migraine attacks were 0.4348, 0.9749, and 0.0154, respectively. In addition, only headache pain intensity and response rate satisfied the statistical consistency test according to the local approach (Supplement Section 4). Therefore, NMA was performed only for headache pain intensity and response rate. The contribution matrix of direct comparisons with NMA estimates is presented in Supplement Section 5.
8 Results of risk of bias assessment
A total of 13 studies [
28–
32,
35,
37,
38,
40–
43,
45] appropriately generated random sequences using methods such as statistical software, and eight studies [
28,
31,
37,
40–
43,
45] concealed allocation appropriately using opaque sealed envelopes or central telephone or fax procedures. However, three studies [
36,
39,
44] and eight studies [
29,
30,
32,
35,
36,
38,
39,
44] were evaluated as having an unclear risk of bias on random sequence generation and allocation concealment, respectively, because they did not report information. In addition, two studies [
33,
34] in which patients who dropped out were withdrawn and replaced with new recruits were judged to have a high risk of bias in random sequence generation and allocation concealment. In the four studies with waiting list controls [
35–
37,
43], blinding of participants was not possible, and blinding of outcome assessors was also not possible because the outcome evaluation was based on patient-reported scales. However, the remaining studies comparing only acupuncture and sham acupuncture were judged to have appropriately blinded participants and outcome assessors. In addition, blinding of acupuncturists was not possible in any studies; however, it was judged that this would not affect the study results as acupuncturists were not involved in outcome assessment. A total of seven studies [
29,
30,
35,
36,
39,
40,
44] that only performed per-protocol analysis and one study [
39] that did not report pain intensity results were assessed as having a high risk of attrition and reporting bias, respectively. Furthermore, two studies [
28,
32] with significant differences in baseline clinical data between groups and three studies [
29,
30,
44] with insufficient relevant information were evaluated as having a high and unclear risk of other bias, respectively (Supplement Section 6).
9 Headache pain intensity
In NMA, compared with waiting list, verum acupuncture (SMD −1.43, 95% CI −1.98 to −0.88), SATS (SMD −1.00, 95% CI −1.55 to −0.45), and SATV (SMD −1.97, 95% CI −3.21 to −0.74) all significantly reduced headache pain intensity. In comparison with SATS, verum acupuncture significantly improved pain intensity (SMD 0.43, 95% CI 0.15 to 0.71); however, there was no significant difference between SATV and verum acupuncture (SMD −0.54, 95% CI −1.65 to 0.56). The effect estimate for the comparison between SATV and SATS was in favor of SATV; however, there was no statistically significant difference (SMD −0.97, 95% CI −2.12 to 0.17) (Fig.3). Pairwise meta-analysis and NMA were consistent in terms of statistical significance and direction of effect (Tab.2). The analysis results showed the asymmetry of the funnel plot, and the P value in Egger’s test was 0.032, suggesting publication bias (Supplement Section 7). In the SUCRA analysis, headache pain intensity was improved the most by SATV (92.8%), followed by verum acupuncture (72.2%), SATS (35%), and waiting list (0%) (Supplement Section 8). The certainty of evidence of the estimates was generally moderate to very low, and the reasons for downgrading were risk of bias, inconsistency, publication bias, and imprecision (Supplement Section 9).
9.1 Response rate
In NMA, high response rates were observed for verum acupuncture (RR 3.95, 95% CI 2.26 to 6.92), SATS (RR 2.98, 95% CI 1.70 to 5.24), and SATV (RR 6.84, 95% CI 1.15 to 40.71) compared with waiting list. In comparison with SATS, verum acupuncture significantly improved the response rate (RR 0.75, 95% CI 0.59 to 0.97); however, there was no significant difference between SATV and verum acupuncture (RR 1.73, 95% CI 0.32 to 9.41). The effect estimate for the comparison between SATV and SATS was in favor of SATV; however, there was no statistically significant difference between the two groups (RR 2.30, 95% CI 0.41 to 12.72) (Fig.3). Pairwise meta-analysis was consistent with NMA in terms of statistical consistency and direction of effect (Tab.3). As eight studies were included in the analysis, publication bias testing was not possible. According to the SUCRA analysis, SATV ranked first (84.8%), followed by verum acupuncture (74.9%), SATS (39.6%), and waiting list (0.6%) (Supplement Section 8). The certainty of evidence was high or moderate, and the reason for the downgrade was risk of bias or imprecision (Supplement Section 9).
10 Discussion
Due to the potential adverse effects of conventional medications [
21–
23], acupuncture therapy, a nonpharmacologic intervention, has been actively used to treat migraine, and its effects have been evaluated in clinical trials. Systematic reviews have been conducted to summarize the effects of acupuncture on migraine [
23–
25]. However, issues related to sham acupuncture have led to inconsistent conclusions about the effectiveness of acupuncture in clinical practice guidelines [
16–
18]. To the best of our knowledge, no study has been conducted specifically on the needling points of sham acupuncture for migraine.
An analysis of 18 acupuncture clinical trials involving 1936 migraine participants revealed that when SATS was used as the control, headache pain intensity and response rate were significantly improved in the acupuncture group. However, when SATV was used as the control, there was no significant difference between the verum and sham groups. These results are consistent with those of our previous studies in which NMA was performed on acupuncture clinical trials for patients with chronic nonspecific low back pain [
8], cancer pain [
9], and knee osteoarthritis [
7]. When comparing SATS and SATV, there was no significant difference in headache pain intensity and response rate between the two groups; however, the results were in favor of SATV. In our previous study involving 4379 patients with chronic nonspecific low back pain [
8], pain and function were significantly improved in SATV compared with SATS. In the current study, which included 1936 migraine patients, there was no statistically significant difference between SATV and SATS due to imprecision associated with a relatively small sample size for the comparison and wide confidence intervals, prompting us to downgrade the certainty of evidence for NMA estimates. However, the results showed trends similar to those observed in previous studies [
7–
9] and may show statistical significance when sham acupuncture-controlled trials are additionally conducted with increased statistical power. There were four studies [
35–
37,
43] in which blinding of participants and outcome assessors was not possible because they included waiting list controls and evaluated outcomes by patient self-assessment. However, because blinding between the acupuncture and sham acupuncture groups was maintained in these studies [
35–
37,
43], the effect of the risk of bias on the comparison between these groups was judged to be low. The network estimate for headache pain intensity showed potential publication bias, which also affected the certainty of evidence of the estimate.
Sham acupuncture has been conducted differently in various studies depending on the technique such as type of skin stimulation and selection of points needled, which can be divided into sham acupuncture using devices with no skin penetration and superficial needling at acupuncture points or non-indicated/non-acupuncture points [
5,
6]. Transparent reporting on sham acupuncture is important to understand the purpose and implications of the research. In particular, this may play a role in understanding the effects of acupuncture. The reporting quality of sham acupuncture in acupuncture clinical trials has generally been poor [
46]. Therefore, sham acupuncture reporting guidelines were recently developed as a Consolidated Standards for Reporting of Trials (CONSORT) extension to be used along with STandards for Reporting Interventions in Clinical Trials of Acupuncture (STRICTA) [
47,
48]. Based on the ACURATE guidelines [
49], the rationale for selecting the chosen sham acupuncture method must be clearly reported according to the research hypothesis and design. The use of verum acupuncture points for the needling points of sham acupuncture does not represent an appropriate placebo in acupuncture research. Regardless of sham acupuncture involving shallow needling only or the use of a non-penetrating device that presses and sometimes pokes the skin, when performed at the same acupoints as those in the investigated acupuncture needling method, it is a comparison of techniques and not an assessment of the efficacy of acupuncture. In addition, as questions continue to be raised about whether sham acupuncture is a physiologically inert placebo that lacks specific effects on the symptom or disease under investigation, it should not be misused as a placebo control until mechanistic studies on its activity are conducted. Although studies on the mechanism of sham needles have demonstrated the potential effects of tactile, interpersonal, psychological, and visual factors on the overall placebo effect, pre-clinical physiologic studies that conduct quantitative measurements of the effects of sham acupuncture have not been performed [
50]. Such studies can help understand not only the physiologic activity and thus appropriateness of sham acupuncture but also the effectiveness of acupuncture in previous acupuncture clinical trials [
51,
52]. Sham acupuncture techniques may have predictable physiologic effects [
51,
53], which have been found to be clinically separate from placebo effects [
54,
55]. Our findings, alongside those for knee osteoarthritis [
7], low back pain [
8], and cancer pain [
9], suggest significant problems with the use of sham acupuncture as a placebo-controlled intervention.
A limitation of this study is that only English language databases were searched. Although we tried our best to use a comprehensive search strategy, and other sources such as reference lists and clinical trial registries were also searched, some studies might not be listed. Additionally, our study examined the outcomes according to needling points for various sham acupuncture procedures and could not examine the effects of other sham acupuncture techniques, such as the use of sham acupuncture devices. Sham acupuncture sometimes incorporates a sham acupuncture device (base unit), and these studies require the use of the base unit in the verum acupuncture group to blind the participants. However, in clinical practice, base units are not used during acupuncture treatment, which hinders the practitioner’s ability to manipulate the needle [
56]. Previous NMA studies demonstrated that the outcome of verum acupuncture in acupuncture trials with a sham acupuncture device was inferior to that of verum acupuncture in acupuncture trials without a sham device for hot flashes and knee osteoarthritis [
57,
58]. Furthermore, in addition to acupuncture point specificity, acupuncture manipulation and the frequency and course of acupuncture also affect the therapeutic effects of acupuncture [
1]. However, our review included only one clinical trial performing SATV [
38]; thus, not only was it not possible to investigate the influence of variables other than the needling point of sham acupuncture, but the results, such as the power and precision, might also have been affected. Finally, the risk of bias in the included studies might have influenced the NMA results. We attempted to perform a sensitivity analysis to determine the robustness of NMA, targeting only studies evaluated as having a low risk of bias in all domains. However, one study that performed SATV was also evaluated as having a high risk of bias in some domains [
38]; thus, sensitivity analysis was not possible.
Nevertheless, while questions about the physiologic inertness of sham acupuncture continue to be raised, this study provides a meaningful analysis of the impact of the needling point of sham acupuncture on clinical results in acupuncture trials targeting migraine based on the rigorous methodology, and the results were consistent with our previous studies on chronic nonspecific low back pain [
8], cancer pain [
9], and knee osteoarthritis [
7].
11 Conclusions
In acupuncture clinical trials for migraine, the outcome of acupuncture compared with sham acupuncture was different depending on the needling points of sham acupuncture. There was no significant difference between SATV and verum acupuncture. Sham acupuncture methods should be established according to the research hypothesis, and SATV should not be misused as a placebo control to evaluate the efficacy of acupuncture.