A Critical Look at Evaluation’s Value Proposition: Has Evaluation Become Just a “Check the Box” Exercise?

Image
In this episode of GEI's podcast Powered by Evidence, GEI Program Manager Dugan Fraser and Estelle Raimondo, Head of Evaluation Methods at the World Bank Group's Independent Evaluation Group, explore approaches for improving the value and reviving the promise of evaluation.

Join host, Dugan Fraser, as he chats with Estelle Raimondo, Head of Evaluation Methods at the World Bank’s Independent Evaluation Group about the current state of evaluation. In her thought-provoking keynote address to the 14th European Evaluation Society conference in Copenhagen in June 2022, Estelle Raimondo and her colleague, Peter Dahler-Larsen, proposed that the institutionalization of evaluation has resulted in it becoming a performative practice. Raimondo proposes several approaches for improving the value - and reviving the promise - of evaluation.

This episode is also available also on spotify.

 

TRANSCRIPT:


Dugan Fraser

[00:00:34]

Hello everybody. I'm Dugan Fraser. I'm the Program Manager for the Global Evaluation Initiative. Welcome to our podcast, Powered by Evidence.

I'm very, very happy to welcome Estelle Raimondo to the conversation today. She's the Methods Advisor at the Independent Evaluation Group, where she advises teams on methodology. She does research on technological designs and innovations, and she also leads evaluations, with more than 10 years of experience in development evaluation. She's a faculty member of IPDET, the International Program for Development Evaluation Training, and she serves on the board of the European Evaluation Society. Her research has been published in several internationally peer-reviewed journals and manuscripts, and she has a PhD in evaluation research from George Washington University.

Welcome, Estelle. It's very nice to have you with us.

Estelle Raimondo

[00:01:26]

Hi, Dugan. Thanks for having me.

Dugan Fraser

[00:01:29]

Won't you start by telling us a little bit about the work that you do at the IEG?

Estelle Raimondo

[00:01:35]

Sure. So, for the longest time, I was wearing two hats. I was, you know, working with teams across all of our evaluations to help them set up their evaluation design, think creatively about their methodology, and also, you know, assure a little bit of the quality of the work that we do. And also, I was leading evaluations, mostly thematic evaluations, you know, those that are in the human capital sector, and country program evaluations.

More recently, I'm just wearing one hat. It's a big hat. It's the methods advisor. And so, I'm no longer leading evaluation studies at the moment.

Dugan Fraser

[00:02:21]

So, you're the person to talk to about evaluation methods. And obviously, this is a big topic and one that's really interesting. And I'd love to know what you think the big trends at the moment are in evaluation methods and what you think drives them.

Estelle Raimondo

[00:02:39]

So, I think we are at a very exciting time because, in evaluations, I think we are really going for eclecticism. Over the past 10 years, what I've seen is a lot of creativity around the evaluation designs and quite a few trends that are ongoing and somehow coming together in a nice way.

So, the first one, I would say, is that we have broadened the range of impact evaluation methodologies. We used to have a fairly small toolbox, which was randomized control trial, which has a lot of strengths, but also some limitations. And over the past 10 years, we've seen a boom in trying to reclaim causal evaluation. So, we've seen a lot of analysis through process tracing, QCA, more theory-based evaluation methodologies that lend themselves to evaluating things that were hard to measure. That's one big trend, I would say.

The second is trying to harness the power of big data. And we've seen a lot of exciting experiments in data science and evaluation. Some of them we are contributing at IEG. So, on the team. On the text side, trying to make the most of all the textual data that we have. On the geospatial side, really using images as data. That's really exciting as well.

And then the third trend is to really bring the participatory side of evaluation much more prominently in everything we do. And so, really embedding techniques, but also approaches that are more from the social, sociology, ethnography, and more community-based approaches. And in some way, the best evaluations are those that really have a meaningful design based on the questions and not methods-driven.

So, yeah, that's my panoramic view of the field. I'm sure there are many other trends that I haven't thought through.

Dugan Fraser

[00:04:47]

I always feel like the methods issue is a really interesting way to understand the world. I think it's a really interesting way to understand the world around us, and what do you think the state of methods says about where we are at as a species and as a series of civilizations?

What do you think that the eclectic and creative diversity says about how the state of the human animal is at?

Estelle Raimondo

[00:05:18]

That's a very good question. I would say that we have finally come to terms with the fact that we can't ignore complexity, that we can't oversimplify and consider everything to be linear and tightly fitting in our really light, logical frameworks. And also becoming much more humble about what we can understand about processes of change, which is at the core of evaluation.

So, I think the eclecticism is about really trying to make sense of that complexity around us and not being too close to any school of thoughts or any, any tools that could be helpful in that process. So I've seen a bit less of a divide around, you know, quantitative and qualitative or systems approach and more linear thinking, but really more willingness to experiment with whatever can help us make sense of the situation, of that complexity.

Dugan Fraser

[00:06:26]

So desperate times call for desperate measures.

Estelle Raimondo

[00:06:29]

I think so. And trying to finally, you know, get rid of the walls that we're...

Dugan Fraser

[00:06:36]

...that divide us.

Estelle Raimondo

[00:06:37]

Yes, because we're all in it together.

Dugan Fraser

[00:06:41]

And it's a very complicated reality that we're trying to make sense of, especially since COVID, but even before.

When was EES? Was it in June this year? Evaluation, European Evaluation Society? Yeah, I think it was in June in Copenhagen. And it was, for me, I think maybe the first big event that I attended since COVID. And so it had a real intensity and there was a kind of wildness about that sense of being in big spaces with a whole lot of strangers that gave the whole conversation a real urgency. And I really enjoyed the conference. But one of the highlights for me was in fact your closing ceremony, where you spoke about something that I found really interesting.

You and Peter Dahler-Larsen talked about this idea of a skeptical turn in evaluations and particularly in relation to evaluation systems. Would you mind telling us a little bit about what that was all about and how you kind of came to assume that position?

And maybe to start with, if I could ask you to explain, explain what you understand an evaluation system to be and then to talk about the skeptical turn and what that means.

Estelle Raimondo

[00:08:01]

Sure. Thanks, Dugan. Yeah, I agree that it was a really great moment, this evaluation conference, because at last we were coming back together and the energy was really nice. Also the sense of urgency that evaluation needs to be meaningful and has a purpose and a mission in complicated times.

And so our keynote was really a little bit of a call for taking that moment quite seriously, including revisiting some of the ideologies that have underpinned the practice of evaluation and evaluation systems in the past few decades.

So let me start maybe by my own, my own definition of evaluation, which is it has kind of three components. The first is I really think that it's about applying meaningfully social science investigative approaches to answer questions that are about determining the value and worth of particular programs or activities or policies with a view of informing decision-making processes, whether it's a decision-making process, whether they are embedded in organizations or whether they are more collective and driven by communities and societies. So these are three important pieces of evaluation.

Evaluation systems for us with Peter is when there is this institutionalization of the practice of evaluation. When you start having rules, codification, policies, governance systems around evaluation, you start to have a very systematic evaluation with the goal of rendering the conduct of evaluation more systematic. So, these are evaluation systems. It's this set of rules, routines, norms that underlie the practice of evaluations, usually embedded in organizations.

And the skeptical turn, this idea is something that Peter has been working on for a long time, but me as well, maybe less long, still my dissertation had some of these ideas already embedded, is that we need to apply as skeptical a look at the practice of evaluation as we do when we evaluate programs. And we can't assume that evaluation is always the right solution for any kind of problem.

The problem with this routinization of evaluations, this codification, transformation of evaluation into standard operating procedures, is that we no longer even ask, 'Is evaluation the right thing to do to address that particular problem?' Do we have a good sense that evaluability is high, and we end up having evaluations supply that is a little bit disconnected from the problems.

So let me stop here as kind of a first introduction to the skeptical turn.

Dugan Fraser

[00:11:20]

Yeah look, I mean, I find it so interesting because, as you know, the Global Evaluation Initiative is established to promote, in fact, the very thing that you're describing one needs to be, that you're explaining one needs to be skeptical about.

So, it's interesting for me because you're coming at it from the opposite direction in that I'm saying we need more of this thing and you're saying, yeah, but not too much, which I think is totally valid and extremely interesting. And I think that the mindfulness and the care that needs to characterize evaluation on which we are both agreed.

And I think that one of the big issues that you raise is around the institutionalization of evaluation. And one of the things I'm constantly reflecting on is how the way evaluation is structured in an institution, i.e., often as a freestanding independent entity, does encourage a disconnect from the decision-making around the intervention that's being evaluated.

Right. But before we get into that, I want to just ask you how your plenary presentation was received. What kind of feedback did you get? Did people disagree furiously and weep in the aisles or how did it go down?

Estelle Raimondo

[00:12:56]

It's a little bit hard to know because those who probably were, you know, not really, not super well, I mean, didn't receive it particularly well, didn't come and tell us. So we have a biased view. I would imagine that, you know, it didn't sit well with everyone. There are people who perhaps don’t live in the same empirical reality of a lot of codification of evaluation for whom it might not have resonated as well. But we got enough feedback that the message landed quite well and prominently.

There’s a lot of people, mostly evaluators, who have felt straitjacketed by this movement of codification and routinization. A host of consultants, independent consultants, who are commissioned to conduct evaluation, that have, you know, the five DEC criteria, and that's the rule. And even if it doesn’t fit the situation, that’s what they have to do. Or evaluators who are evaluating again the same policy because there is this mandate that every five years we need to evaluate that particular program. Even if the recommendations from the previous one haven't been incorporated. Or evaluators who really see vividly that their work is not being used.

So, I think our message landed with many realities. The main question that then needs to be asked is, how do we evaluate? The main question that then needs to be asked is, we are in agreement on the diagnostic, how do we move in the right direction, given the past dependence of our systems, systems that have been established for decades, are really, really hard to change. Just because if we stop doing something, then we won't have the continuity of the data.

We have that at the World Bank really clearly. Even if there is awareness that the system doesn't work as well as we would. Changing it seems like a huge risk because many stakeholders are wedded to one system, even if it doesn't serve their needs very well. So, it's the then what that is more important in some way.

Dugan Fraser

[00:15:24]

So, I'm forced to then ask the question, so how would one do it differently? I mean, let's talk about transitioning from a not great system to a better one, in a moment.

But before we talk about the transition, what might an evaluation system look like that wasn't basically a ritualized performative act that is, as you say, sort of codified and routinized into meaninglessness? What would an evaluation system be that was, in fact, really adding value?

Estelle Raimondo

[00:16:03]

The question of use is, of course, central. And the risk in moving into the institutionalization is that this use piece is lost.

 

You know, decades ago in the 60s and 70s, when evaluation was mostly an ad hoc activity, except in some organizations, the question of use was central. There was no mandate to conduct evaluation. And unless you really had a clear use at the same time, there were also limitations, very, very valid ones of having just ad hoc evaluation being commissioned here and there and no real systems underlying them.

But as we move towards more institutionalization, the question of use became completely secondary. So, for instance, you can have evaluations that are justified just because it's the first evaluation on the topic. It's not because there is a particular problem that needs to be addressed or there is a decision point that needs to be informed.

An evaluation system that is more meaningful and fit for purpose is one where the question of use is really at the center. And we wouldn't just codify the supply of evaluation for the sake of it being systematic. That's one aspect.

The second is an evaluation system that is really open to feedback, and feedback not only by evaluators, because that's another trait, right, who evaluates, who guards the garden of the city, who evaluates the evaluators, it's often other evaluators.

And so, you know, in our keynotes, we were mentioning that having an opportunity space for others to give feedback is very important. So, a system that is willing to be scrutinized.

And the third point for me is a system that is open to change. We, in evaluation, we keep actually evaluating programs and advocating for adaptive management and advocating for more flexibility and evidence-based decisions. Our evaluation systems, especially when they have lived in organization for a while and are so routinized, are so slow to change. And there are very few opportunities to really rethink some of the tenets. And so that would be a third really important feature of a well-functioning evaluation system. One that can move and including quite drastically.

Dugan Fraser

[00:18:48]

Wow. That's a description that not many evaluation systems I know of can meet. In summary, you're saying that they need to be centered around use. You're saying they need to be open to feedback from others, not just evaluators. And that the system itself needs to be willing to change and, you know, I think that's really valid in institutions where the evaluation system is baked in and it and it's present that, you know, in many of the places where we work in the GEI. Our job is to help invent evaluation systems and to construct them from scratch. And I think this idea that they'd be nimble and agile and flexible and adaptive is quite challenging in many of the contexts we work in. So that's really interesting.

It's also said, sorry, did you want to jump in?

Estelle Raimondo

[00:19:56]

I wanted to jump in because it's also the opportunity space to build those systems in the right way and learn from the systems that have been institutionalized, and really not necessarily follow suit on some of the aspects.

I mean, I'm not saying that they are completely dysfunctional. There are things that work really well. But having learned from that experience, there are features that you can think of. And so this idea of routine feedback is something that does not necessarily need to be onerous. But it's the idea that every so many years we are open to understanding the consequences of the systems, even beyond what they were setting out to do, these constitutive effects that we don't necessarily know when we design.

So, you know, the issue of gaming ratings, the issue of evaluations that are checked off and being open to that possibility and revisiting the system after a few years is something that can be thought through at the outset and also trying to maybe be more piloting at the beginning than we tend to do in building evaluation systems where very quickly we want everything to be systematic.

Being willing to have a phase of piloting and understanding the needs might be something doable. I don't know.

Dugan Fraser

[00:21:34]

I want to come back to something you mentioned, which is this idea of putting use at the center of evaluation systems. And often when people talk to me about an evaluation, I find myself saying to them, what decision is it that you want the findings from this evaluation to be useful for? And when you ask people that question, they often can't tell you because they aren't thinking about a specific decision. And so, you know, I think it's really important that they think about a specific decision that they want evidence to support.

And I was also thinking about an earlier point you made about evaluability assessments and people don't really do those much anymore. They were a thing a few years back. Do you think there's space to be much more intentional about use? I mean, you'll be familiar with Michael Quinn Patton's distinctions of the various kinds of use. You know, it's not always just the focus. It's not always just the findings that are useful. It's often the process of doing an evaluation is very helpful. And there's a number of other different types of use.

But do you think in designing evaluations and in institutionalizing evaluation systems, there's a role for a much more focused conversation about use? When we do these evaluations, what are we going to use them for? Is that something you think would be helpful?

Estelle Raimondo

[00:23:04]

Definitely. I think there are several opportunity points in even in systems that have been routinized that we don't exploit as well as we could because we are very supply-driven. Oftentimes, one particular aspect is. Oftentimes, it's clear that the use only comes out after because it's several years after an evaluation has been done, perhaps. But there are still, there are still possibilities of thinking through the

maturity level of a program, for instance. If a program is fairly nascent, still in trial and error mode, this is a space where evaluation can be particularly useful.

Perhaps there are other spaces in which the evaluation might not be the right approach and a program might need something else. An audit at some point. So the idea that evaluation can be a high priority under certain circumstances and the low priority at others is quite refreshing because we are in this oftentimes in systems where evaluation is the goal to mostly for legitimacy purposes and not necessarily for use. So rethinking the value added of a particular evaluation, for the types of programs or where they are in their life cycle could also be something done more, more routine.

Last point, you mentioned about the assessment. You said, yes, that's not something that we do anymore, which is really interesting. And it's true. And we call for a renaissance of evaluability assessment. The real ones, not designing projects so that they are valuable. But thinking about whether evaluation is the right thing at that moment, whether we have the right data to even have a meaningful evaluation, whether there is an opening for use or window of opportunity to change anything.

These are two big questions of evaluability that if we embedded them a little bit more often, we would probably increase the value added of evaluations.

Dugan Fraser

[00:25:22]

When you think about alternatives to evaluation, you mentioned auditing or audits. And you know, we evaluators hate the thought that you could ever do an audit instead of an evaluation. That's a heretical suggestion on your part. I hope you're going to apologize. What else might you do instead of an evaluation?

Estelle Raimondo

[00:25:46]

Yeah. Sometimes audits are useful. I would stand by that. But absolutely. So, for instance, sometimes you really need actual research if you are at the, you know, again, a very experimental stage. Or if you have a very clear question that can be answered by a much more research approach, that's possible.

You could also have more participatory processes that independent evaluations cannot really carry out. You can have a range of other options that don't require a full-on evaluation. It could be a first stage or an embedded action research type of exercise. It really depends on, again, the maturity of the program and what is expected of the exercise. And sometimes evaluations are not the right thing to do at that moment. We just have to accept that, I think that others are equally, you know, have an equally valid proposition.

On the other hand, sometimes evaluations are very needed and they are kept at arm's length. So it's not that, you know, again, it's this fit for purpose-ness idea that could be mainstreamed a little bit more.

Dugan Fraser

[00:27:23]

Such an interesting conversation. And I really appreciate the willingness on your part to say some of the difficult things that we as evaluators need to hear.

One of the things you didn't mention, which has come up a lot recently, is the whole field of monitoring and sort of performance and progress reporting as interventions get implemented. And the people involved in that. And the people involved in the work are given an opportunity to feedback.

As we sort of wrap up this conversation, would you have particular advice that you'd want to share with people like me who are working in developing countries promoting the idea of evaluation systems and helping them put evaluation systems in place?

Estelle Raimondo

[00:28:15]

Yes. Thanks for mentioning monitoring. Of course, it's also a set of activities that, you know, are important. Again, done well. Making sure that we don't monitor 35 indicators if we know that we're only going to use five. Or three. Or one. Yes, exactly. So I don't know if it's words of wisdom really because again, we have different experiences and you probably have a lot of really good ideas.

So let me just think about a few nuggets. I think making clear that evaluation can be a very meaningful activity, not just to avoid risks, not just to tick a box, not just for legitimacy purposes, but really for functional use, right?  I think that should be a starting point. But also recognizing that getting there will require some trial and error would be something important to share because often we want ready-made solutions that, you know, we look at our neighbor, how they've done it, and just apply a recipe and all the norms and standards that, for instance, you know, the ECG or UNEG or OECD. There is valid elements to that, but always adapting to context is really important.

The second, and we've discussed this, is trying to avoid past dependence at all costs, meaning really embedding this idea of feedback and possibility of change. Even not nimble and agile at the first go, but really being thoughtful about this idea, that of feedback processes into the evaluation system.

And the third thing is to try to find ways to avoid the conundrum of independent evaluation systems that become isolated. Independence shouldn't be isolation. And I don't have a recipe for that. But recognizing that, emphasizing the accountability of that, and the rules of distance will come at a cost if overdone.

So, yeah, these are three ideas.

Dugan Fraser

[00:30:57]

I think that this question of how to avoid independence, equaling isolating is so important. And I think it's something that we'll think about hard, along with everything else that we've discussed today.

It's been a really fascinating conversation. Thank you so much, and I look forward to staying in touch with you on these issues as we build these systems and work together in the IEG.

Estelle Raimondo

[00:31:26]

Thank you so much, Dugan, for enjoying the keynote and asking me to discuss a little bit further. It's been extremely helpful for me too. Take care.

Dugan Fraser

[00:31:38]

Bye-bye.

Estelle Raimondo

[00:31:39]

Bye.