AI did not replace us (yet)—but we need an upgrade

Patrizia Cocca, GEI Lead for Communications and Knowledge Management | GEI | May 14, 2026 | Blogs |

Computer scientist using artificial intelligence computing simulating human brain thought processes. Indian employee working on tablet, developing AI machine learning algorithms

As AI tools become more widely used in communications and evaluation work, questions of judgment, trust, and credibility have come into sharper focus. In this blog post, Patrizia Cocca reflects on what this shift has revealed and what it means for evaluation capacity development ahead of Glocal Evaluation Week 2026.

As preparations build for Glocal Evaluation Week 2026, which will explore how artificial intelligence (AI) is reshaping evaluation practice and evidence use, our team keeps returning to the same question: what remains distinctly human in this moment, and what do we need to upgrade to stay credible and useful?

For those of us working in communications, that question has felt very close to home. When we first began using AI in our day-to-day work, the reaction was not pure excitement. It was also discomfort and, if we are honest, some fear. Tasks that had long defined our professional identity—drafting articles, shaping key messages, even sketching website structures—were suddenly being completed in minutes. Often fluently. Sometimes with unsettling confidence.

What surprised us was not simply how capable these tools were, but what their speed and fluency exposed about where human value really lies. As content production became easier, judgment, context, and trust emerged as the work that mattered most. In that sense, AI is forcing us to upgrade: to sharpen the value we bring, to be more deliberate about how we use new tools, and to rethink how we make credible evidence visible and trusted in an information environment saturated with instant, AI‑generated answers.

Why this matters now for GEI and Glocal

In its second phase, the Global Evaluation Initiative (GEI) treats AI and digitalization as major shifts in how evidence is generated, analyzed, and used. The emphasis is not on chasing tools for their own sake. It is on building the capacity, norms, and safeguards that allow institutions to navigate this shift responsibly. GEI's approach to this is set out in its working paper Navigating AI and Digitalization in M&E: A Three-Lens Framework.

That framing sits at the heart of Glocal Evaluation Week 2026. Credible evaluation evidence now competes for attention with instant, fluent, and often unreliable AI-generated content. When a policy maker can type a question into a chatbot and receive a confident answer in seconds, what does that mean for how we package, distribute, and position evaluation findings? How do we make credible evidence accessible, and accessible evidence credible?

These are not abstract questions for us. They shape how we think about evaluation capacity development and evidence use, including how GEI designs its communication products and supports partners to engage policy makers in complex environments.

What AI changed in our work—and what it did not

When we started experimenting with AI tools in GEI's communications, the most visible change was speed. Drafting a first version of a blog post, a social media thread, or a short explainer became faster. The blank page was less of a barrier.

The more consequential change came after that first draft. Because AI could help us get started, time reappeared elsewhere, not by changing what mattered in the work, but by making it easier to focus on it. We could reinvest it in the parts of the work that AI could not do for us: clarifying what really matters for our audiences, checking whether a message might be misinterpreted in a specific political context, aligning language with GEI's values. AI did not teach us that these skills exist; rather, it sharpened our understanding of where and how they matter most.

Much of what experienced communicators do is rarely written down: reading a room and sensing resistance before anyone speaks; recognizing when trust is forming, and when it is fragile. These are often grouped under “soft skills,” but in our work they are core professional competencies. In an evaluation context, effective communication requires cultural intelligence—understanding that a message that resonates in Brussels may fall flat in Dhaka or Dakar, and knowing why—as well as political awareness of formal structures and informal power dynamics, and ethical judgment about how to present evidence that may be uncomfortable, how much context to provide, and when to invite conversations others might prefer to postpone.

These capabilities are not unique to communications; they are fundamental to a wide range of professional roles. Senior evaluators, policy advisers, and managers draw on them every day. When a tool can generate fluent text on demand, the question shifts from "can we produce something?" to "should we say this, in this way, to these people, at this moment?" That is judgment. AI does not supply it.

What AI can and cannot do

AI systems already perform many tasks that look like judgment. They classify, rank, score, and detect patterns. They can analyze sentiment in large volumes of feedback and summarize meeting transcripts. These capabilities are advancing quickly, and we use them.

But they are not a substitute for the kind of contextual, relationship-aware decision-making that human professionals bring to their work. AI can suggest which messages performed well in past campaigns. It does not, on its own, have access to the relational knowledge that shapes trust—why a particular community distrusts a certain institution, or how a history of broken commitments might shape the reception of a new evaluation finding.

Recent public examples have shown what happens when organizations deploy AI-generated communications without sufficient human oversight: chatbots have provided incorrect guidance or presented themselves as authorities they were not, while automated responses have misrepresented policies. The result is not just error; it is erosion of trust in the institution behind the message. In evaluation and evidence use, where credibility is built slowly and lost quickly, that is a risk institutions cannot afford to ignore.

What this means for evaluation systems

These reflections point to something broader about evaluation capacity development. AI does not introduce a new lesson here, but it brings a familiar one into sharper focus: tools alone do not strengthen evaluation systems. People, judgment, and trust do.

In India, for example, GEI supports the Centre for Learning on Evaluation and Results for South Asia (CLEAR-SA), a GEI implementing partner, in building the next generation of evaluators. Recent GEI work with CLEAR-SA illustrates how, alongside methods and technical skills, the emphasis increasingly includes how to frame questions responsibly, interpret findings with care, and engage policy makers in ways that reflect institutional and political realities.

CLEAR-SA is one among many GEI partners integrating communication, judgment, and ethical awareness into evaluation capacity development—recognizing that technical competence alone is not enough for evidence to be credible or used. Communication is therefore woven through this work, from how evaluators explain their methods to how they surface uncertainty and adapt messages for different audiences.

Similar themes are emerging across the GEI network. Partners working on AI-enabled synthesis tools are not only experimenting with algorithms, they are also asking how to communicate the strengths and limits of those tools to commissioners and users of evaluations. Teams developing guidance on AI in evaluation are not only cataloguing risks, they are also thinking about how to discuss those risks in ways that support responsible innovation. The upgrade, in other words, is not primarily about learning to use new tools. It is about strengthening the capabilities and norms that allow evidence to be used credibly in real decision-making environments.

What Glocal Evaluation Week 2026 is trying to do

Glocal Evaluation Week is GEI's annual, community-driven space for shared learning. It brings together thousands of practitioners, policy makers, researchers, and commissioners of evaluations through self-organized events around the world.

In 2026, the focus on AI and evaluation—under the theme Evaluation, Evidence, and Trust in the Age of AI (1–5 June 2026)—provides an opportunity to surface what many practitioners are already navigating in their daily work, and to learn together through shared reflection and exchange across different contexts.

For GEI's communications work, this means asking questions such as:

Where can AI genuinely improve how evaluation findings are communicated—for example, by helping tailor messages to different audiences?
Where does AI introduce new risks, such as reinforcing biases in the evidence highlighted, or creating a false sense of certainty around contested issues?
And what capabilities do institutions need—not only technical skills, but also ethical frameworks and governance arrangements—to use AI in ways that build, rather than erode, trust?

Glocal provides a platform for these conversations to happen in public, across contexts, and with a diversity of voices. GEI's role is to convene, connect, and support, while recognizing that the most valuable insights come from the community itself.

Working with AI, without outsourcing judgment

If you work in communications, evaluation, or evidence use and are wondering what comes next, our experience points to a few starting points.

First, learn to work with AI tools deliberately, and be transparent about how you use them. Different tools serve different purposes and using them responsibly requires clarity about where they add value—and where human judgment must remain central. Include clear disclosure of when and how AI is used in your work. .
Second, invest in the skills that help translate complexity into clarity. As AI makes it easier to generate plausible answers at speed, the value of professional judgment increasingly lies in explaining why particular evidence matters, what its limits are, and how it should (and should not) inform real decisions.
Third, build trust that cannot be automated. As AI enables the rapid production and dissemination of convincing information, institutions face greater risks of misrepresentation, overconfidence, and erosion of credibility. Trust depends on relationships—between evaluators and commissioners, institutions and communities, and evidence producers and users—that AI can support but not replace.
Finally, engage early . AI is already reshaping how information is produced and consumed. Choosing not to engage—or to defer decisions about its use—is itself a choice with consequences for whose evidence is seen, whose voices are heard, and whose judgments shape decisions.

That is why we hope people will participate in Glocal Evaluation Week 2026. At a time when credible evidence is competing with fast, fluent, and often unverified information, spaces like Glocal provide an opportunity for practitioners to reflect together, test ideas, and learn across contexts. Those conversations matter for how evaluation evidence is understood, trusted, and used.

Please join us! Comment on this blog or reach out to us on social media: LinkedIn and X/Twitter. What has been your experience with AI in evaluation and communications? Share your stories with us. You can sign up for our newsletter here. If you would like to contribute your knowledge to this blog, we would be happy to work with you—please contact us at gei@worldbank.org.

The author thanks Douglas Glandon for comments on earlier drafts and Graham Holliday for editing. AI tools were used to support initial drafting and copy-editing; Ideas and editorial judgments are the author’s own.