Thursday, 10 September 2015

Getting Outcomes Out Of The Closet!



There are few things more frustrating than completing a systematic review and finding that you have no useful evidence (trials) to be able to draw any conclusions from. But here's one of them; finding that there is some evidence but then finding that you can't actually do anything with it.
So, you've carried out your rigorous search of the literature, you've eliminated what is not relevant,  you have actually found trials that compare the same intervention, you've carried out data extraction, assessed their quality and the risk of bias.  Phew! The next step, to get the most out of these primary studies, is to synthesise their results which will let you look at the overall picture they present across different groups, compare their results or look further into the subgroups and contrast their findings.
BUT...
comparing apples and orangesYou find yourself in a position where you can't combine or compare them... each trial has used different outcomes or different outcome measures.

What do I mean?
Well, take for example, a very simple randomised control trial where you think one filling, let's call it X, is better than my newly developed filling, we'll call that one Y.  Both are traditional restorative materials, designed to replace tooth structure following selective caries removal (we'll talk about that another time!).  We work together to design our trial with strict inclusion criteria so that everything is standardised and the 2 filling materials will be allocated to each patient who is eligible and their tooth using a random allocation system. So far, so good. We have similar conditions and neither material will be disadvantaged because we have taken care to eliminate any remaining bias (as far as possible) by random allocation.

We run our trial*.  At the end we assess the performance of the different fillings but what should we use to assess it?  We could:
  • count how many fillings needed to be repaired
  • count how many needed to be replaced
  • use a scale to assess the aesthetics over time if it was an anterior tooth (or even a posterior one)
  • count how many of the teeth experienced sensitivity after the restoration was placed
  • use a scale to assess sensitivity post-treatment 
  • use a scale to assess what the participants thought of the appearance
  • use the Ryge criteria to determine using a scoring system for different criteria such as marginal ditching, staining etc
and so on...

You can see the problem here. If our study decides to measure the Ryge criteria, we can't compare the findings for these two materials' performance with a study that has looked at how many needed to be replaced.
Even if we have used a scale to assess the aesthetics, we might have chosen one or made one up but find that all of the other studies that use a scale to assess aesthetics for materials, have chosen a different one to us, so our study cannot be compared.

Trials of interventions for dental caries trials, there are a vast number of outcomes and outcome measures used. In a quick review of the area, we found 18 different outcomes reported for 50 trials of caries management techniques for primary teeth  (a link to the protocol presented at COMET IV can be found here)

So, how do we decide which outcomes to measure or which outcome measures to use. Well, at the moment, it seems that there is no real guidance and researchers will pragmatically choose to use what they consider the most important outcome to be and the outcome measure they are most familiar with.  Is this the best method? Perhaps it would be better to look across the literature at what has been used by other teams and make sure our study will be comparable.  But even that isn't easy - there are so many that sometimes none will dominate or if one does, it may not be an outcome measure that has been reliably tested and so has unknown validity, reproducibility, repeatability when used.

 In my next blog I'm going to talk about a different approach to addressing this problem...

*Strictly speaking we should, of course, have decided our primary outcome before we ran the trial because our power calculation is based on expected success/failure rates for us to work out the number of fillings we need to compare to see a difference but just give me a bit of leeway here.