Research Review by Dr. Shawn Thistle©


Dec. 2006

Study Title:

A systematic review of the diagnostic accuracy of provocative tests of the neck for diagnosing cervical radiculopathy


Rubinstein SM et al.

Publication Information:

European Spine Journal 2007; 16(3): 307-319.


Cervical radiculopathy can be a substantial cause of pain, morbidity, and disability. It is a common condition that affects both men and women, mainly around middle age. Despite its prevalence, the gold standard for diagnosis of this condition is unclear.

Traditionally, clinical history and examination findings are confirmed with advanced imaging or electrodiagnostic testing. As with many other clinical conditions, all of these diagnostic methods have inherent limitations. In addition, the expense and lack of availability of these tests limits their application, emphasizing the necessity for simple, clinical tests to identify this condition.

The purpose of this systematic review was to evaluate diagnostic accuracy for clinical tests commonly used in the evaluation of cervical radiculopathy. For this review, cervical radiculopathy refers to signs and symptoms related to dysfunction of a spinal nerve of the neck. These can include pain, myotomal weakness, and sensory or reflex neurological deficit.

The optimal reference standard defined for and utilized in this review when selecting studies included both: a) electrodiagnostic evidence or acute denervation in cervical paraspinal muscules and/or a specific myotome, and b) demonstrated abnormalities on advanced imaging (myelography, CT, MRI) that correlated with the site and corresponding signs and symptoms of the patient.

This study began with a hand search of relevant orthopedic texts to identify tests commonly used to evaluate cervical radiculopathy. A comprehensive literature search was then conducted, including all relevant databases, to identify studies which met the following criteria:
  • any provocative test of the neck for diagnosing cervical radiculopathy was identified
  • the diagnostic test was compared to any reference standard (such as EMG, plain film x-ray or advanced imaging)
  • sensitivity and specificity were reported and a 2x2 contingency table could be (re)constructed
  • the publication was a complete report
Case series, case reports, animal studies, surgical and cadaveric studies were all excluded because diagnostic accuracy cannot be determined from these types of studies. Each potential study was reviewed by two separate reviewers, and reviewed for methodological quality with QUADAS, a previously tested set of 12 criteria. Any disagreement regarding study inclusion was resolved by a third reviewer.

Pertinent Results:

  • 6 studies met the inclusion criteria (all of which were found on MEDLINE) – 3 were published in the 1980s, and the other three after 2000
  • no single study used the optimal reference standard described above – 2 used EMG, 3 used advanced imaging, and 1 used operative findings
  • multiple studies evaluated the following tests: upper limb tension test (ULTT), shoulder abduction test, traction/neck distraction, Spurling’s test, and only one study evaluated Valsalva’s maneuver
  • no studies were found which examined the axial compression test or the shoulder depression test
  • the most striking finding was the variability among results for the various studies – this was most pronounced for the shoulder abduction test, which had reported sensitivities ranging from 0.17-0.78
  • Spurling’s test was shown to have low to moderate sensitivity and high specificity, as did individual studies for traction/neck distraction and Valsalva’s maneuver
  • the ULTT demonstrated high sensitivity and low specificity while the shoulder abduction test demonstrated low to moderate sensitivity and moderate to high specificity
  • in general, no test demonstrated high sensitivity and specificity, and the methodological quality of the studies (except for one) was “meager”

Conclusions & Practical Application:

This review was limited by three major shortcomings:
  1. only six studies were identified that met inclusion criteria (and only one of those included patients in a primary care setting)
  2. no study used the optimal reference standard (even though this optimal standard was defined for this review by these authors – it seems reasonable)
  3. the studies included were not standardized in terms of test performance (this seemed most prevalent for Spurling’s test, which was performed in slightly different ways in each study)
Despite these drawbacks, I feel this study underlines a critical point. During our education, we were exposed to many clinical tests from various orthopedic textbooks without (in many cases) ever critically examining the literature to support their accuracy. This is one example where these familiar tests don’t seem to hold up to critical review.

That being said, determining accuracy for tests like these is difficult because there is no universally accepted gold standard for diagnosis, and what these tests are actually testing (in terms of tissue stress etc.) has not been elucidated. The problem is further clouded by the difficulty in distinguishing cervical radiculopathy (spinal nerve involvement) from brachial plexopathy or peripheral nerve entrapment.

So what is the take-home message from this systematic review? First, I feel that it emphasizes our immediate need to clarify our clinical testing abilities for this, and other clinical conditions. We have inherently trusted orthopedic textbooks for too long.

Second (and on a more positive note), the authors propose the following as a practical application of the existing data: “When consistent with the history and other physical findings, a positive Spurling’s test, as well as positive findings for traction/neck distraction [i.e. symptom reduction], and the Valsalva’s maneuver might be suggestive of a cervical radiculopathy (i.e. given their high specificity), while a negative ULTT might be used to rule it out (i.e. given its high sensitivity).”

I think this recommendation is reasonable, but it must be understood that the combination of these tests mentioned by the authors is an extrapolation based on the limited existing data.