What the AI Conversation in M&E Is Missing – And a Three-Lens Framework to Help

Apr 09, 2026 | Blogs |

Any decision about using artificial intelligence in monitoring and evaluation involves evidence needs, workflow, and capability considerations. In this blog post, Douglas Glandon explains why we need to consider all these dimensions together, not separately, to make meaningful decisions on AI use.

Written by Douglas Glandon

Douglas Glandon is Acting Program Manager of the Global Evaluation Initiative.

The monitoring and evaluation (M&E) profession has responded to the rise of artificial intelligence (AI) with impressive energy. Organizations have issued policies, convened discussions, launched training, and developed practitioner guidance. Much of this work is valuable, but for practitioners and managers, and those responsible for strengthening M&E systems, the landscape can still feel fragmented.

The challenge is not a shortage of good ideas about how to use AI in M&E. It is the way those ideas are too often kept apart. Questions about evidence quality, workflow implications, capability requirements, and ethical considerations are typically treated as separate conversations. In practice, any meaningful decision about how to use AI in M&E involves all of these dimensions at once—and they are best addressed together.

At the Global Evaluation Initiative (GEI), we have been thinking carefully about this gap and how to address it. The result is a working paper, Navigating AI & Digitalization in M&E: A Three-Lens Framework, which we are sharing as part of Glocal Evaluation Week 2026 and circulating for public comment. This post explains the problem the paper addresses, what the framework offers, and why we think it matters.

The problem the framework addresses

Three features of the current AI discourse in M&E limit its cumulative usefulness, despite the quality of individual contributions.

First, AI is often treated as a single category. Yet computer vision applied to satellite imagery, predictive machine-learning models for program targeting, and large language models (LLMs) used for qualitative coding differ fundamentally – in the data they require, the skills they demand, the ways they can fail, and the evaluative questions they can address. Guidance framed around "the strengths and limitations of AI" without such distinctions is rarely actionable and often travels poorly beyond the most visible use cases.

Second, much of the available content adopts a tool-first orientation – starting with an AI technique before looking at applications. This may serve those who already know they want to use a particular technology, but it is less helpful for the more common situations evaluators face: a specific evidence need, a constrained workflow, or an institutional question about readiness. Good evaluation practice starts from the question, not the method, yet the pace of AI adoption makes the tool-first pull particularly strong.

Third, existing contributions tend to address individual dimensions in isolation – applications, competencies, ethics, governance – each valuable but partial. In practice, these dimensions converge. A decision about using natural language processing (NLP) for qualitative analysis, for example, simultaneously raises questions about evidence credibility, workflow integration, and team capability. Without a way to consider these jointly, important trade-offs remain invisible, and practitioners are left to improvise the connections on their own.

What the framework offers

The three-lens framework responds to this fragmentation by organizing AI guidance around the relationships that practitioners and institutions actually have with technology.

The evidence needs lens starts from what we need to know. It asks whether AI-enabled approaches can generate evidence that is more valid, timely, or granular than conventional methods – or whether they make it possible to address questions that would otherwise be infeasible.

The workflow lens focuses on what we need to do. It looks at where AI can address real constraints in evaluative work, distinguishing between automating routine tasks, augmenting human judgment, and enabling processes that were not previously possible.

The capability lens asks what people and institutions need to know, do, and have in place to work responsibly in an AI-influenced environment, including skills, governance arrangements, and organizational safeguards.

Each lens is mapped onto an existing GEI framework: the policy/program cycle, the task framework for evaluative activities, and the Evaluation Competency Framework. This allows AI considerations to be integrated into existing processes such as capacity assessments, system diagnostics, and training design, rather than requiring a parallel agenda.

Two design choices are intentional. First, the framework does not include a lens organized around AI technologies. A technology taxonomy would reintroduce the tool-first orientation the framework is designed to move beyond. Second, ethics and responsible practice are treated as cross-cutting concerns that run through all three lenses – shaping how evidence is judged, how workflows are designed, and how capabilities are developed.

The framework is designed to support judgment, not replace expertise. It works best when applied by teams that combine technical and evaluative knowledge, particularly at the decision stage – not only during implementation.

Why integration matters

The framework's central proposition is that any decision about AI in M&E simultaneously involves evidence, workflow, and capability considerations, whether or not this is made explicit. The payoff comes from considering the lenses together.

At the practitioner level, consider the use of LLMs for drafting evaluation reports. Through the workflow lens, this looks straightforward: it saves drafting time, and the tools are widely accessible. But the evidence lens raises concerns about fabricated or decontextualized content, while the capability lens highlights a critical risk: the ability to judge when AI-generated text accurately reflects the evidence, and when it subtly distorts it. An integrated view does not rule out LLM-assisted drafting, but it places clear conditions on its use, including robust human review protocols and transparent disclosure.

At the institutional level, consider satellite-based nighttime light imagery for evaluating rural electrification programs. The evidence and workflow lenses point to significant potential: frequent, granular data at scale, without repeated field visits. The capability lens, however, surfaces the skills and systems required to interpret such data responsibly, including geospatial expertise and ground‑truthing against local information. Without these, apparent precision can be misleading. The integrated view shifts the question from “should we use this technology?” to “what capabilities must be in place for the evidence to be credible?”

Across these examples, the framework helps distinguish between low‑risk improvements ready for adoption, uses that require prior capability investment, and applications that appear attractive from one perspective but pose risks visible only when all three lenses are applied.

Why this matters now

As AI makes it easier to generate plausible analysis at an unprecedented scale, the value of evaluation rests less on producing outputs and more on the integrity of the processes behind them. Transparent methods, documented judgment, and accountable use of evidence become even more important.

The capacities that have always defined good evaluation – critical thinking, methodological discipline, ethical commitment, contextual sensitivity – are precisely those needed to engage with AI responsibly. The challenge is not to become technologists, but to remain evaluators who can judge when AI strengthens their work and when it does not. The three‑lens framework is designed to support that judgment by helping practitioners and institutions ask better questions.

An invitation

This paper is a working draft, circulated for public comment. We share it as part of the call for submissions for Glocal Evaluation Week 2026, whose theme this year – Evaluation, Evidence, and Trust in the Age of AI – reflects the same concerns.

We invite the M&E community to put the framework to work: apply it to a current decision, adapt it to your context, and test where it adds value and where it falls short. We are particularly interested in learning where the three lenses did not surface the considerations that mattered most.