How difficult it is to get the truth about questionable drugs
The reanalysis of Study 329 illustrates the necessity of making primary trial data and protocols available to increase the rigour of the evidence base.
Access to primary data from trials has important implications for both clinical practice and research, including that published conclusions about efficacy and safety should not be read as authoritative.
Jon Jueridini and colleagues have reanalysed SmithKline Beecham’ infamous Study 329 (published by Keller and colleagues in 2001), the primary objective of which was to compare the efficacy and safety of paroxetine and imipramine with placebo in the treatment of adolescents with unipolar major depression.
The reanalysis under the restoring invisible and abandoned trials (RIAT) initiative was done to see whether access to and reanalysis of a full dataset from a randomised controlled trial would have clinically relevant implications for evidence based medicine.
Their analysis finds that neither paroxetine nor high dose imipramine showed efficacy for major depression in adolescents, and there was an increase in harms with both drugs.
Healthcare providers are encouraged to be frank about the limitations of screening—the harms of screening are certain, but the benefits in overall mortality are not
Why cancer screening has never been shown to “save lives”—and what we can do about it, BMJ 2016;352:h6080, 06 January 2016.
The claim that cancer screening saves lives is based on fewer deaths due to the target cancer. Vinay Prasad and colleagues argue that reductions in overall mortality should be the benchmark and call for higher standards of evidence for cancer screening. Press Play> to listen to the recording.
Study 329 reanalysis illustrates the necessity of making primary trial data and protocols available to increase the rigour of the evidence base
2015 (2nd) Study Abstract
The RIAT re-analysis marks a new chapter in the story of Study 329, showing the remarkable power of open data. But it also shows how much our current systems are failing patients and the public. It should not have taken 14 years to get to this point. It shows that we need regulation, and perhaps legislation, to ensure that the results of all clinical trials are made publicly available and that individual patient data are available for legitimate independent third party scrutiny.
Objectives To reanalyse SmithKline Beecham’s Study 329 (published by Keller and colleagues in 2001), the primary objective of which was to compare the efficacy and safety of paroxetine and imipramine with placebo in the treatment of adolescents with unipolar major depression. The reanalysis under the restoring invisible and abandoned trials (RIAT) initiative was done to see whether access to and reanalysis of a full dataset from a randomised controlled trial would have clinically relevant implications for evidence based medicine.
Setting 12 North American academic psychiatry centres, from 20 April 1994 to 15 February 1998.
Participants 275 adolescents with major depression of at least eight weeks in duration. Exclusion criteria included a range of comorbid psychiatric and medical disorders and suicidality.
Interventions Participants were randomised to eight weeks double blind treatment with paroxetine (20-40 mg), imipramine (200-300 mg), or placebo.
Main outcome measures The prespecified primary efficacy variables were change from baseline to the end of the eight week acute treatment phase in total Hamilton depression scale (HAM-D) score and the proportion of responders (HAM-D score ≤8 or ≥50% reduction in baseline HAM-D) at acute endpoint. Prespecified secondary outcomes were changes from baseline to endpoint in depression items in K-SADS-L, clinical global impression, autonomous functioning checklist, self-perception profile, and sickness impact scale; predictors of response; and number of patients who relapse during the maintenance phase. Adverse experiences were to be compared primarily by using descriptive statistics. No coding dictionary was prespecified.
Results The efficacy of paroxetine and imipramine was not statistically or clinically significantly different from placebo for any prespecified primary or secondary efficacy outcome. HAM-D scores decreased by 10.7 (least squares mean) (95% confidence interval 9.1 to 12.3), 9.0 (7.4 to 10.5), and 9.1 (7.5 to 10.7) points, respectively, for the paroxetine, imipramine and placebo groups (P=0.20). There were clinically significant increases in harms, including suicidal ideation and behaviour and other serious adverse events in the paroxetine group and cardiovascular problems in the imipramine group.
Conclusions Neither paroxetine nor high dose imipramine showed efficacy for major depression in adolescents, and there was an increase in harms with both drugs. Access to primary data from trials has important implications for both clinical practice and research, including that published conclusions about efficacy and safety should not be read as authoritative. The reanalysis of Study 329 illustrates the necessity of making primary trial data and protocols available to increase the rigour of the evidence base.
How we expect researchers to make all their data available
The movement to make data from clinical trials widely accessible has achieved enormous success, and it is now time for medical journals to play their part. From 1 July The BMJ will extend its requirements for data sharing to apply to all submitted clinical trials, not just those that test drugs or devices.
The BMJ’s Elizabeth Loder explains what this means for authors, and how we expect researchers to make their data available.
Do I really need this test or procedure? What are the risks? Are there simpler safer options? What happens if I do nothing?
This post content is published by The BMJ, aiming to lead the debate on health, and to engage doctors, researchers and health professionals to improve outcomes for patients.
The idea that some medical procedures are unnecessary and can do more harm than good is as old as medicine itself. In Mesopotamia 38 centuries ago, Hammurabi proclaimed a law threatening overzealous surgeons with the loss of a hand or an eye. In 1915, at the height of a surgical vogue for prophylactic appendicectomy, Ernest Codman offended his Boston colleagues with a cartoon mocking their indifference to outcomes and asking, “I wonder if clinical truth is incompatible with medical science? Could my clinical professors make a living without humbug?” Looking at the rates of tonsillectomy in London boroughs in the 1930s, John Alison Glover discovered that they were entirely governed by the policy of school doctors and bore no relation to need or outcomes. John (Jack) Wennberg established the science of outcomes research when in 1973 he described patterns of gross variation in the use of medical and surgical procedures in the United States, which lacked any clinical rationale but was closely related to supply.
Diagnosis drives treatment, and in recent years the term overdiagnosis has been used to describe various situations where diagnoses lead to unnecessary treatment, wasting resources while increasing patient anxiety. Overdiagnosis can be said to occur when “individuals are diagnosed with conditions that will never cause symptoms or death” often as a “consequence of the enthusiasm of early diagnosis.” Overtreatment includes treatment of these overdiagnosed conditions. It also encompasses treatment that has minimal evidence of benefit or is excessive (in complexity, duration, or cost) relative to alternative accepted standards. A recent report by the Academy of Medical Royal Colleges argued that doctors have an ethical responsibility to reduce this wasted use of clinical resource because, in a healthcare system with finite resources, one doctor’s waste is another patient’s delay.
Choosing Wisely in the NHS
Even before the inception of the NHS, the British tradition has generally been one of late adoption and cautious use of new medicines, procedures, and technologies. Nevertheless, the UK shows similar patterns of variation in use of medical and surgical interventions to those in the US, though less extreme in absolute terms. The National Institute for Health and Care Excellence (NICE) was set up in 1999 in part to address these unwarranted variations in clinical practice and has identified over 800 clinical interventions for potential disinvestment. However, engaging clinicians with stopping familiar or ingrained practices requires a different approach to that for introducing new treatments.
An initiative recently developed in the US and Canada called Choosing Wisely aims to change doctors’ practice to align with best practice by getting them to stop using various interventions that are not supported by evidence, free from harm, and truly necessary, including those that duplicate tests or procedures already received. Choosing Wisely asks medical organisations (such as medical royal colleges in the UK) to identify tests or procedures commonly used in their specialty, the necessity of which should be questioned and discussed. These are compiled into lists, and the “top five” interventions for each specialty should not be used routinely or at all. So far, more than 60 US specialist societies have joined in the Choosing Wisely initiative. It has also been adopted by other countries, including Australia, Germany, Italy, Japan, Netherlands, and Switzerland—a clear sign that wasteful medical practices are a problem for all health systems.
The Academy of Medical Royal Colleges, which represents all medical royal colleges in the UK, is launching a Choosing Wisely programme in collaboration with other clinical, patient, and healthcare organisations. Participating organisations will work together to develop top five lists of tests or interventions with questionable value. The academy, royal colleges, and partners, including The BMJ, will then promote dissemination of this information and Choosing Wisely conversations between clinicians and patients. These new conversations will rebalance discussions about the risks and benefits of tests and interventions, such that doctors and patients will be supported to acknowledge that a minor potential benefit may not outweigh potential harm, the minimal evidence base, and substantial financial expense and therefore that, sometimes, doing nothing might be the favourable option.
Tackling the underlying causes of overtreatment
A culture of “more is better,” where the onus is on doctors to “do something” at each consultation has bred unbalanced decision making. This has resulted in patients sometimes being offered treatments that have only minor benefit and minimal evidence despite the potential for substantial harm and expense. This culture threatens the sustainability of high quality healthcare and stems from defensive medicine, patient pressures, biased reporting in medical journals, commercial conflicts of interest, and a lack of understanding of health statistics and risk.
The system has no incentive to restrict doctors’ activity; the NHS in England has a system of payment by results, which in reality is often a payment by activity and encourages providers to do more both in primary and secondary care. General practice is increasingly pressured to focus less on open dialogue with patients about treatment options and more on fulfilling the demands of the Quality and Outcomes Framework (QOF, a pay for performance instrument) and adhering to local commissioning decisions.
The quality measures in both primary and secondary care are based on guidelines produced by NICE, but doctors should not consider these as tramlines because decisions need to be made with reference to individual patient circumstances, the wishes of the patient, clinical expertise, and available resources. Some people would choose to take a hypothetical pill with no side effects daily, even for a few weeks’ gain in life expectancy, whereas others would prefer not to, even if they were told it would add 10 years to their lifespan. It is instructive to note that a large and comprehensive longitudinal study recently concluded that higher reported achievement incentivised under QOF has not reduced premature death in the population.
We suggest that guideline committees should increasingly turn their efforts towards the production of tools that help clinicians to understand and share decisions on the basis of best evidence. Rather than prespecifying the outcome of such dialogue, and trying to get medicine “just right,” they should try to ensure that decisions are based on the best match between what is known about the benefits and harms of each intervention and the goals and preferences of each patient.
More informed decision making can also alleviate, perhaps disproportionate, fears for those patients who may not want treatment. A recent study revealed that when patients were told the lack of prognostic benefit for angioplasty, only 46% elected to go ahead with the procedure versus 69% who were not explicitly given this information. Responding to similar concerns about getting patients’ consent for elective coronary angioplasty in the UK, NHS England’s cardiology lead, Huon Gray, stated, “It is important that doctors are clear with their patients about this.”
It is easy to misunderstand health statistics, and doctors can find themselves needing to manage unrealistic expectations of patients who may find it difficult to obtain reliable information. Communicating relative risks as opposed to absolute risk or numbers needed to treat can often unintentionally mislead. As Gerd Gigerenzer, director of Harding Centre for Risk Literacy in Berlin, summarised in 2009, “It is an ethical imperative that every doctor and patient understand the difference between absolute and relative risks, to protect patients against unnecessary anxiety and manipulation.”
Doctors’ health illiteracy is well documented. Misunderstanding of statistics often leads to a belief that screening is more beneficial than it actually is and, in some cases, to no acknowledgment of its potential harms. In a study of 150 gynaecologists, one third did not understand the meaning of a 25% risk reduction from mammography. Many believed that if all women were screened 25% of women (or 250 fewer out of every 1000) would die of breast cancer, when actually the best evidence based estimate is actually one less death per 2000 women (from Cochrane’s analysis of randomised studies including 500 000 women).
Both medical and surgical overtreatment can place patients at high risk of adverse events. Shared decision making can help to reduce this overtreatment and may be particularly beneficial to disadvantaged groups, significantly improving health outcomes and reducing health inequalities.
One of the major concerns about the development of top five lists in the US is the potential for individual societies to choose low hanging fruit. For example, the American Academy of Orthopaedic Surgeons included the use of an over the counter supplement but no major procedures, despite evidence of wide variation in elective knee replacement and arthroscopy rates among Medicare beneficiaries. Currently, there is also no evidence that lists reduce use of low value medical practices. One crucial and relevant marker of success would be universal awareness of the Choosing Wisely programme among doctors and patients. However, despite much publicity in the medical literature, a random telephone survey of 600 US doctors recently conducted by the American Board of Internal Medicine found that only 21% had heard of Choosing Wisely. The level of public awareness of the campaign, which is a fundamental component to its progress, has not been assessed.
Reducing wasteful and harmful healthcare will require commitment from both doctors and patients, in addition to objective evidence of effectiveness. The NHS already has good systems for evidence appraisal and health technology assessment, but better and simpler tools are needed to facilitate informed discussion in clinical settings. Without such robust and easily shared decision aids, systematically updated without bias, patients may be swayed by potential exaggerated claims in the media when new drugs or procedures are introduced. Lastly, shared decision making does not guarantee lower resource use; greater involvement of patients in deciding their care will require a new set of consultation skills as well as a better range of decision aids.
Call to action and next steps
To ensure the development of a Choosing Wisely culture in clinical practice, the academy suggests:
Doctors should provide patients with resources that increase their understanding about potential harms of interventions and help them accept that doing nothing can often be the best approach
Patients should be encouraged to ask questions such as, “Do I really need this test or procedure? What are the risks? Are there simpler safer options? What happens if I do nothing?”
Medical schools should ensure that students develop a good understanding of risk alongside critical evaluation of the literature and transparent communication. Students should be taught about overuse of tests and interventions. Organisations responsible for postgraduate and continuing medical education should ensure that practising doctors receive the same education
Commissioners should consider a different payment incentive for doctors and hospitals
Support from the media and medical publications will be vital because the public education campaign is crucial to the programme’s success. The academy will ensure that the programme is thoughtfully implemented and rigorously evaluated by demonstrating a reduction in wasteful practices within a fixed time scale. It will begin by asking specialty organisations to compile top five lists. All lists will be accompanied by an implementation plan and will be evaluated and monitored to assess their effect on reducing low value healthcare.
The academy has set up a steering group to provide policy advice and direction for the project. The group comprises individual experts, patient groups, college representatives and key stakeholders. It is time for action to translate the evidence into clinical practice and truly wind back the harms of too much medicine.
Sources and more information
Choosing Wisely in the UK: the Academy of Medical Royal Colleges’ initiative to reduce the harms of too much medicine, BMJ 2015;350:h2308, 12 May 2015.
1 in 69 women will develop ovarian cancer in her lifetime
Objectives To develop a risk prediction model to preoperatively discriminate between benign, borderline, stage I invasive, stage II-IV invasive, and secondary metastatic ovarian tumours.
Design Observational diagnostic study using prospectively collected clinical and ultrasound data.
Setting 24 ultrasound centres in 10 countries.
Participants Women with an ovarian (including para-ovarian and tubal) mass and who underwent a standardised ultrasound examination before surgery. The model was developed on 3506 patients recruited between 1999 and 2007, temporally validated on 2403 patients recruited between 2009 and 2012, and then updated on all 5909 patients.
Main outcome measures Histological classification and surgical staging of the mass.
Results The Assessment of Different NEoplasias in the adneXa (ADNEX) model contains three clinical and six ultrasound predictors: age, serum CA-125 level, type of centre (oncology centres v other hospitals), maximum diameter of lesion, proportion of solid tissue, more than 10 cyst locules, number of papillary projections, acoustic shadows, and ascites. The area under the receiver operating characteristic curve (AUC) for the classic discrimination between benign and malignant tumours was 0.94 (0.93 to 0.95) on temporal validation. The AUC was 0.85 for benign versus borderline, 0.92 for benign versus stage I cancer, 0.99 for benign versus stage II-IV cancer, and 0.95 for benign versus secondary metastatic. AUCs between malignant subtypes varied between 0.71 and 0.95, with an AUC of 0.75 for borderline versus stage I cancer and 0.82 for stage II-IV versus secondary metastatic. Calibration curves showed that the estimated risks were accurate.
Conclusions The ADNEX model discriminates well between benign and malignant tumours and offers fair to excellent discrimination between four types of ovarian malignancy. The use of ADNEX has the potential to improve triage and management decisions and so reduce morbidity and mortality associated with adnexal pathology.
Sources and more information:
Evaluating the risk of ovarian cancer before surgery using the ADNEX model to differentiate between benign, borderline, early and advanced stage invasive, and secondary metastatic tumours: prospective multicentre diagnostic study, The BMJ 2014;349:g5920, 15 October 2014.
New test to distinguish between ovarian tumours, NHS Choices, October 16 2014.
New test ‘helps identify best ovarian cancer treatment’, BBC News Health, October 16 2014.
Association of skirt size and postmenopausal breast cancer risk in older women: a cohort study within the UK Collaborative Trial of Ovarian Cancer Screening
Objectives Several studies suggest that overall and central-obesity are associated with increased breast cancer (BC) risk in postmenopausal-women. However, there are no studies investigating changes of central obesity and BC. We report on the association of BC risk with self-reported skirt size (SS; waist-circumference proxy) changes between 20s and postmenopausal-age.
Design Prospective cohort-study.
Setting UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS) involving the nine trial centres in England.
Participants Postmenopausal-women aged >50 with no known history of BC prior to or on the day of completion of the study-entry questionnaire.
Interventions At recruitment and at study entry, women were asked to complete a questionnaire. Women were followed-up via ‘flagging’ at the NHS Information Centre in England and the Hospital Episode Statistics.
Main outcome-measure Time to initial BC diagnosis.
Results Between 2 January 2005 and 1 July 2010, 92 834 UKCTOCS participants (median age 64.0) completed the study-entry questionnaire. During median follow-up of 3.19 years (25th–75th centile: 2.46–3.78), 1090 women developed BC. Model adjusted analysis for potential confounders showed body mass index (BMI) at recruitment to UKCTOCS (HR for a 5 unit change=1.076, 95% CI 1.012 to 1.136), current SS at study entry (HR=1.051; 95% CI 1.014 to 1.089) and change in SS per 10 years (CSS) (HR=1.330; 95% CI 1.121 to 1.579) were associated with increased BC risk but not SS at 25 (HR=1.006; 95% CI 0.958 to 1.056). CSS was the most predictive singe adiposity measure and further analysis including both CSS and BMI in the model revealed CSS remained significant (HR=1.266; 95% CI 1.041 to 1.538) but not BMI (HR=1.037; 95% CI 0.970 to 1.109).
Conclusions CSS is associated with BC risk independent of BMI. A unit increase in UK SS (eg, 12–14) every 10-years between 25 and postmenopausal-age is associated with postmenopausal BC risk by 33%. Validation of these results could provide women with a simple and easy to understand message.
Strengths and limitations of this study
To the best of our knowledge, this is the first study investigating the association between central obesity using skirt size (SS) as a proxy and breast cancer risk. Between 25 and postmenopausal age, an increase in SS by one unit every decade increased the risk of postmenopausal breast cancer by 33% while decrease in SS was associated with lowering of risk.
Our prospective cohort-study includes 94 000 women with comprehensive follow-up through data linkage to multiple national databases.
There is a possibility of underestimation of self-reported SS. However, if current SS at study entry is uniformly underestimated then there is merely rescaling of CSS so that the strength of the association is unaffected. Furthermore, recall bias of the SS at 25 maybe a limitation but unless this inability in reporting is systematically related to future breast cancer, measurement error can only result in underestimating the strength of the true association between CSS and breast cancer risk.
Given that obesity is now emerging as a global epidemic, from a public health prospective these findings are significant as they provide women with a simple and easy to understand message.
Sources and More Information:
Association of skirt size and postmenopausal breast cancer risk in older women: a cohort study within the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS), BMJ Open 2014;4:e005400 doi:10.1136/bmjopen-2014-005400, 24.09.2014.
Skirt size increase ups breast cancer risk, NHS Choices, Cancer, September 25 2014.
Accuracy of urinary human papillomavirus testing for presence of cervical HPV: systematic review and meta-analysis
A simple urine test which can detect the human papilloma virus (HPV) could offer women a much less invasive alternative to the cervical cancer screening or ‘smear’ test, experts have said.
New research published in The BMJ has revealed that the tests are accurate and efficient, and the doctors behind the study said that offering the test could help reverse a fall in the number of young women being screened for possible cancer.
Objective To determine the accuracy of testing for human papillomavirus (HPV) DNA in urine in detecting cervical HPV in sexually active women.
Design Systematic review and meta-analysis.
Data sources Searches of electronic databases from inception until December 2013, checks of reference lists, manual searches of recent issues of relevant journals, and contact with experts.
Eligibility criteria Test accuracy studies in sexually active women that compared detection of urine HPV DNA with detection of cervical HPV DNA.
Data extraction and synthesis Data relating to patient characteristics, study context, risk of bias, and test accuracy. 2×2 tables were constructed and synthesised by bivariate mixed effects meta-analysis.
Results 16 articles reporting on 14 studies (1443 women) were eligible for meta-analysis. Most used commercial polymerase chain reaction methods on first void urine samples. Urine detection of any HPV had a pooled sensitivity of 87% (95% confidence interval 78% to 92%) and specificity of 94% (95% confidence interval 82% to 98%). Urine detection of high risk HPV had a pooled sensitivity of 77% (68% to 84%) and specificity of 88% (58% to 97%). Urine detection of HPV 16 and 18 had a pooled sensitivity of 73% (56% to 86%) and specificity of 98% (91% to 100%). Metaregression revealed an increase in sensitivity when urine samples were collected as first void compared with random or midstream (P=0.004).
Limitations The major limitations of this review are the lack of a strictly uniform method for the detection of HPV in urine and the variation in accuracy between individual studies.
Conclusions Testing urine for HPV seems to have good accuracy for the detection of cervical HPV, and testing first void urine samples is more accurate than random or midstream sampling. When cervical HPV detection is considered difficult in particular subgroups, urine testing should be regarded as an acceptable alternative.
Sources and More Information:
Accuracy of urinary human papillomavirus testing for presence of cervical HPV: systematic review and meta-analysis, BMJ 2014;349:g5264, 16 September 2014.
New urine test could replace invasive smear tests, TheIndependent Health News 9736609, 17 September 2014.
HPV urine test could screen for cervical cancer, NHS Choices, cancer, 17 September 2014.
Long-term use of pills for anxiety and sleep problems may be linked to Alzheimer’s, case-control study / research suggests
Objectives To investigate the relation between the risk of Alzheimer’s disease and exposure to benzodiazepines started at least five years before, considering both the dose-response relation and prodromes (anxiety, depression, insomnia) possibly linked with treatment.
Design Case-control study.
Setting The Quebec health insurance program database (RAMQ).
Participants 1796 people with a first diagnosis of Alzheimer’s disease and followed up for at least six years before were matched with 7184 controls on sex, age group, and duration of follow-up. Both groups were randomly sampled from older people (age >66) living in the community in 2000-09.
Main outcome measure The association between Alzheimer’s disease and benzodiazepine use started at least five years before diagnosis was assessed by using multivariable conditional logistic regression. Ever exposure to benzodiazepines was first considered and then categorised according to the cumulative dose expressed as prescribed daily doses (1-90, 91-180, >180) and the drug elimination half life.
Results Benzodiazepine ever use was associated with an increased risk of Alzheimer’s disease (adjusted odds ratio 1.51, 95% confidence interval 1.36 to 1.69; further adjustment on anxiety, depression, and insomnia did not markedly alter this result: 1.43, 1.28 to 1.60). No association was found for a cumulative dose 180 prescribed daily doses) and with the drug half life (1.43 (1.27 to 1.61) for short acting drugs and 1.70 (1.46 to 1.98) for long acting ones).
Conclusion Benzodiazepine use is associated with an increased risk of Alzheimer’s disease. The stronger association observed for long term exposures reinforces the suspicion of a possible direct association, even if benzodiazepine use might also be an early marker of a condition associated with an increased risk of dementia. Unwarranted long term use of these drugs should be considered as a public health concern.
Sources and More Information:
Benzodiazepine use and risk of Alzheimer’s disease: case-control study, BMJ 2014;349:g5205, 09 September 2014.
Anxiety and sleeping pills ‘linked to dementia’, BBC News Health, health-29127726, 10 September 2014.
Sleeping pills taken by millions linked to Alzheimer’s, The Daily Telegraph Health, healthnews/11083674, 10 Sep 2014.
Prescription sleeping pills taken by more than one million Britons ‘can raise chance of developing Alzheimer’s by 50%’, Daily Mail, health/article-2750042, 10 September 2014.
Prescription sleeping pills linked to Alzheimer’s risk, NHS Choices, Neurology, September 10, 2014.