Present and Future Measurement in Rehabilitation
What do professionals and others working in clinical and administrative capacities need to know about measurement to be able to enhance practice?
I think talking about this topic is a good idea. I’m becoming more and more aware that knowing how to use and implement measures is a problem in our field. I make assumptions about what it is that people know about measurement that I’m learning are really not well founded.
Like what sorts of assumptions?
I assume that, if you give people the information, with well-designed and well-developed instruments, they’ll know what to do with it. And they don’t. We’ve developed a lot of instruments over the past 10 years and people are using them, but they’re losing interest because they don’t know what to do with the information that comes from them. This is a failure on our part. What I find is that most clinicians understand that if you give a patient something to fill out, and it generates some kind of score or report, you can use it with that patient to see how they are doing over time. That’s it. That’s the level of understanding they have because that’s how they are trained. They are not trained to use the data about what is happening with their patients for prognosis, for example, or for quality improvement or reimbursement. Those three applications of measurement are much less frequently employed.
Do you see the opposite? Do you see people using measures for things for which they are not designed?
All the time. For example, when you design instruments to help clinicians track outcomes, both at the individual level for prognosis, for quality improvement, or for reimbursement, they also want to be able to use those measures for care planning. And they’re not always designed for that.
I think I understand what you are saying, but what is an example of that?
Let’s take FIM. People like to use FIM to help them plan care. Well, it wasn’t designed for that. It is not specific enough for care planning. Similarly, our AM-PAC instrument is exquisitely designed for certain applications, but not for care planning. So you put it in the hands of clinicians who are untrained, and one of the common things that I hear back is that is not really very helpful to our clinicians in care planning. It just doesn’t give them the level of detail they want. On the other hand, if you ask a clinician, “Mrs. Jones comes in who has had a knee replacement and is coming in for outpatient therapy,” and you ask them, how much improvement will Mrs. Jones experience by the end of her therapy and how many visits will she require, both of which are classic prognostic questions, they look at you like you are from outer space, and they say something like, “In my experience,…” They rarely think to use aggregate data from outcome and utilization measures to empirically answer that question.
So how do we go from where we are to where we need to be in terms of generating prognoses?
Well, I think we have to train people: I believe they aren’t getting the training in their professional development programs. So then you have to do that in continuing professional education.
I think rehabilitation has been fairly progressive when it comes to outcome measurement. So using traditional ordinal measures is becoming increasingly recognized in the rehabilitation fields as measurement malpractice.
What would it look like? What sort of principles do you have to teach to make that use apparent?
For 12 or 14 years now, we have been doing a twice annual conference with CARF. It’s called Transforming Outcome Data into Management Information, and we get about 100 rehabilitation professionals a year. It’s a 2-day conference of didactic presentations as well as a lot of lab work where people bring data and we teach them how to convert data into information that they can use in clinical practice. I think it’s an example of what is needed by the practicing clinician. Taking data and making it into something that’s useful for the management of their practices. Most professionals collect data because it’s required by CARF or some accrediting body. They put it into a three ring notebook, and when the accreditors come, they pull it out and show them that we collect all this nice outcome data. The accreditors go away, the binder goes back on the shelf, and no one ever pays attention to it until reaccreditation comes along again. Atul Gwande argues that what we haven’t done in medicine, and I would extend that to rehabilitation, is that we haven’t trained professionals in systems skills like how to collect and appreciate data, know how to use it, and know how to implement it at scale within our fields. We don’t do any of that - it’s all anecdotal. If you go see most clinicians and ask them specifically about your prognosis, it’s always, “From my experience…” They generalize from their individual practice instead of from aggregate data that are more representative. I’ve always assumed that people would know what to do with information if you put it in their hands. I’ve learned that’s a big mistake.
Tell me about the concept of minimally important clinical difference (MCID)?
Well, there are different definitions of what that is, and there are different ways in which people determine it. It is not a very exact science. If you look at the process for determining the MCID, it is really woefully inadequate. Most of the techniques for determining MCID are based on external anchors, which you would use for gauging how much change should occur before you believe it is meaningful. The problem is that those anchors are not well validated. They are usually global indicators of improvement, and if we really had confidence in global indicators of improvement, either from the clinician or the patient, why would you need the standardized instrument? I do find distribution-based methods very helpful, in contrast, because we all know there is a lot of measurement error in any of the measures that we employ. So, the question you should be asking, if you are using a measure clinically with an individual patient, is how much change needs to occur before I can really believe that real change has occurred. That is the part of MCID that I find quite useful.
So with the AM-PAC, for example, in most of the applications, if you don’t get an improvement of 4 points on an individual patient, it’s not believable as real change. I think that’s useful to know. And then if you get an improvement three times the MDC (minimally detectable change), that seems like a lot of change. Can I tell you precisely that that is the MCID? Probably not. But, if you read the papers out there, you’ll see that researchers frequently report that the MCID for a particular measure is smaller than the MDC. How can that be the case? Yet, I’ve seen that reported in multiple papers. So the MCID generated through these anchor-based approaches is smaller than the threshold you need to achieve for confidence that it is more than measurement error. So I put a lot more stock in the MDC as long as you’ve exceeded the MDC, you know that the change is a real change, that is, not attributable to measurement error.1, 2
Are clinician rated scales preferable over patient self-report?
We have patient report scales for many things that are not used in practice because they aren’t trusted because there might be reporting bias. Reporting error is a component of the measurement error. Yet, people still won’t believe it. However, clinically, if you ask them how they find out about this information, it’s almost always, “Well, I talk to the patient.” Look at the FIM, which is basically a clinician-reported scale. People will say it is supposed to be based on performance or clinical judgment. But people hold workshops to train rehabilitation professionals how to score the FIM to maximize reimbursement. So they do workshops on how to introduce measurement error into scales. Clinicians are reluctant to be held accountable by what the patient feels has been their improvement, particularly if their payment is going to be determined by that. The feeling is that you can’t trust the patient — that many times, they are going to be inaccurate.
But if you figure out what the MDC is, you’ve taken into account the measurement error, you know, the unreliability of the assessment. Studies have shown that it is not systematic error. It is random error. We just published a paper in Stroke3 that showed when you compare professional report and family-member report with patient report, they are different, but the difference is random — it is not systematically in one direction. It is quite random. Therefore, if you know what the reliability coefficient is, you can adjust for it.
I’ve never argued that there is no error. My argument is that you take that into account. I’ve even done studies where I’ve compared the amount of error in patient-reported measures and performance-based measures. I’ve shown the amount of error is about the same. If you think about it, it makes perfect sense. It is difficult to train people to administer these performance-based measures consistently. There is going to be error, particularly if you are doing multiple sites and multiple clinicians who are doing the testing. We published a paper in 2008 on a clinical trial of hip fracture patients.4 We did a head-to-head comparison and we showed, if you look at the sensitivity based on distribution-based measures, or anchor-based measures, the performance-based measures do about the same as the patient-based measures. Yet, performance-based measures are considered objective, and patient-based measures are considered subjective.
If getting healthcare providers to understand and use measures properly is such a barrier, how about having appropriate measures for them to use. Where are we with that?
It takes time, and review committees are very slow and very conservative. I was on a research planning group call recently where one of the concerns about using our AM-PAC measure was that reviewers wouldn’t accept it because it was too new. It was 2004 when the first AMPAC article came out, so it’s almost 10 years old. We now have predictive validity along with convergent and construct validity. So there is plenty of data that suggests it is a very psychometrically adequate measure. But the concern was nonetheless raised about how the reviewers would view a ‘new instrument’. They were also fearful that reviewers would feel computer adaptive test (CAT) measure is too innovative. It’s discouraging. I don’t know why we are so conservative when it comes to measurement. We’ll continue to use measures that were developed in the 1970’s, even though we know better options are available, in part, because people have become accustomed to using them. It takes on the aura of validity because it has been used a lot. I see it all the time — it is very discouraging.
But it is getting a lot better. Recently, my friend, Gunnar Grimby, who is an editor of the Journal of Rehabilitation Medicine, recently wrote an editorial with Alan Tennant, which was entitled: “Time to end measurement malpractice.”5 They basically make the argument that we should stop publishing work that uses ordinal measures that rank order what is being measured because the state of measurement has evolved sufficiently that using ordinal measures constitutes ‘measurement malpractice.’ Ten years ago you couldn’t have written that. That’s in the field of rehabilitation, and I think rehabilitation has been fairly progressive when it comes to outcome measurement. So using traditional ordinal measures is becoming increasingly recognized in the rehabilitation fields as measurement malpractice. I think it is a big step forward. I think we are at the point where we are really moving toward quantitative measures as the norm. I think that measurement science has improved a lot. There are better measures, and I think people are getting better at making selections.
- Haley SM, Fragala-Pinkham MA. Interpreting change scores of tests and measures used in physical therapy. Physical Therapy 2006;86(5):735-43.
- Jette AM, Tao W, Norweg A, Haley S. Interpreting rehabilitation outcome measurements. Journal of Rehabilitation Medicine 2007;39(8):585-90.
- Jette AM, Ni P, Rasch EK, et al. Evaluation of Patient and Proxy Responses on the Activity Measure for Postacute Care. Stroke 2012;43(3):824-9.
- Latham NK, Mehta V, Nguyen AM, et al. Performance-Based or Self-Report Measures of Physical Function: Which Should Be Used in Clinical Trials of Hip Fracture Patients? Archives of Physical Medicine and Rehabilitation 2008;89(11):2146-55.
- Grimby G, Tennant A, Tesio L. The use of raw scores from ordinal scales: Time to end malpractice? . Journal of Rehabilitation Medicine 2012;44:97-8.
From Brain Injury Professional, the official publication of the North American Brain Injury Society, Vol. 9, Issue 2. Copyright 2012. Reprinted with permission of NABIS and HDI Publishers. For more information or to subscribe, visit: www.nabis.org.
Brain Injury Professional is the largest professional circulation publication on the subject of brain injury and is the official publication of the North American Brain Injury Society. Brain Injury Professional is published jointly by NABIS and HDI Publishers. Members of NABIS receive a subscription to BIP as a benefit of NABIS membership. Click here to learn more about membership in NABIS.