Educating Evaluation Professionals: Addressing the Nexus of Data Science and Evaluation
Jos Vaessen is an adviser at the Independent Evaluation Group, World Bank Group and the Global Evaluation Initiative.
Editing Support provided by Maria Fyodorova, Communications Consultant for GEI.
The Global Evaluation Initiative (GEI) encompasses a global network of institutions and individual experts. As we aim to take training in evaluation further into the 21st century with the purpose of effectively meeting the evolving demand for evaluation-related knowledge and skills, we discuss a number of core issues that we consider to be of importance in meeting this challenge. In the first part of this blog series, we discussed two topics that are fundamental to supporting this demand: the definition of evaluation and its boundaries and core knowledge funds and competencies for evaluators.
In this blogpost, I discuss how advances in data science are changing the practice of evaluation and the implications for evaluation training and professional development.
A Silent Revolution
About a decade ago, the United Nation’s Secretary-General's High-Level Panel of eminent persons on the Post-2015 Development Agenda called for a “data revolution": “Better data and statistics will help governments track progress and make sure their decisions are evidence based; they can also strengthen accountability. This is not just about governments. International agencies, CSOs and the private sector should be involved. A true data revolution would draw on existing and new sources of data to fully integrate statistics into decision making, promote open access to, and use of, data and ensure increased support for statistical systems.” [UN 2013 p. 24]
In fact, since then a data revolution has been taking place, albeit a silent one, in terms of unprecedented growth in data production and availability, advances in computational capacity to manipulate and analyze data, as well as new advances in data science. “Data Science” is an umbrella term for an emerging field at the crossroads of computer science, engineering, mathematics and statistics. It encompasses innovations in data management as well as analytics, including the use of data visualization, and more advanced forms of analytics such as machine learning and deep learning algorithms. The increasing availability of new (big) data for evaluative analysis (e.g., remote sensing data, social media data, call data records) as well as the ability to more effectively analyze existing data (e.g., text from thousands of project-related documents as big data), has generated new opportunities - but also challenges - for the practice of evaluation [Bamberger and York 2020].
Potential for Data Science Applications in Evaluation
The use of data science techniques can potentially increase the efficiency, quality and breadth of evaluative analysis. For example, with the help of supervised and unsupervised machine learning models, evaluators can potentially identify and classify information much more efficiently than before from large numbers of documents. Moreover, with the use of an adequate taxonomy and a sufficiently large training data set, data can be coded more accurately than if done by a person. Data science applications can also help evaluators answer new questions or answer existing questions in new ways. For example, open access satellite imagery data can be used to measure how particular spatial phenomena  change over time.
To give you a sense of what applying data science techniques looks like in practice, let me share some recent examples from the World Bank’s Independent Evaluation Group (IEG). Increasingly, IEG has been using text analytics and machine learning to identify the nature and magnitude of the World Bank Group’s interventions (globally) in a particular thematic area of work. This type of approach is particularly effective in the case of large and complex evaluands defined around particular themes.  Text analytics and machine learning can also be used in evaluative synthesis in a number of ways, for example by facilitating the identification and classification of relevant evaluative content from large repositories of documents.  In a number of instances, to complement existing analyses around the relevance of World Bank interventions, IEG has used geospatial data layering to gauge whether interventions are geographically targeting areas that have the highest need (e.g., based on poverty levels). Another area of application is the use of machine learning and deep learning algorithms to classify imagery data into outcome categories. As a result, longitudinal data sets on outcome variables can be generated, which, in combination with spatial counterfactual designs and statistical modelling, can be used in effectiveness analyses.  These are just a few examples of a range of applications that can be used to complement (or in a few cases even transform) existing approaches to address particular questions or analytical challenges in evaluation. 
Challenges for Evaluation
While the data revolution is unlikely to fundamentally transform the field of evaluation in the next few years, in many ways it will change the way evaluations are conducted and how data are used. Not only in the field of evaluation, but also in other knowledge and oversight functions change will happen. Given the ‘competition for space’ between evaluation and other knowledge and oversight functions such as performance audit or policy research, evaluation cannot stay on the sidelines of the data revolution. The way forward lies in a meaningful and thoughtful integration of new data and new data analytics in evaluative practice. To tap into the potential benefits of using data science applications in evaluation, the following challenges need to be carefully considered.
A first set of challenges is around bias. For example, when working with textual data one has to be aware of the potential and the shortcomings of the underlying documents. What evaluative evidence can be found in the document? What type of evidence is typically missing or misrepresented? Similarly, when using remote sensing data, it is important to understand to what extent these data can serve as proxies for the phenomenon that one is interested in. For example, to understand the association between the quality of dwellings (which in part can be derived from remote sensing imagery data) and the incidence of poverty, some type of “ground-truthing”  is important. When analyzing social media content using natural language processing, (among other things) one must reflect on whose content is represented on the platform and to what extent this group reflects the reference group of interest. For instance, in some areas with a high level of internet penetration, users of a particular social media platform may better represent the general population than in areas with lower levels of internet penetration. A related challenge concerns falling into the trap of pursuing a data-driven evaluation approach. Notwithstanding the wealth of (big) data available for evaluative analysis, good evaluation practice first and foremost requires a questions-driven evaluation design, where the evaluation questions determine where and how data science applications come in alongside other approaches (which may be more essential than the former for responding to a particular question).
A second set of challenges concerns the institutional context in which the work is conducted. Experience at IEG and other places shows that effective use of data science applications requires close collaboration between evaluator and data scientist. This necessitates a minimum level of data science literacy on the side of the evaluator, as well as a good understanding of the institutional and evaluation context on the side of the data scientist.
Finally, there are also ethical challenges to consider. When scraping the internet for relevant content, and more specifically, when analyzing social media content (e.g., using topic modelling or sentiment analysis) it is important to understand that the voices of those who have access and are “vocal” on the internet is actually what is being captured. This leaves out many potentially relevant groups of citizens, who are not as vocal or who do not use social media. Apart from constituting a source of bias from an analytical point of view, this can also be problematic from an ethical standpoint. Another issue concerns the use of personal data (e.g., medical records, phone records, video surveillance data) without proper consent or anonymization of the data. In short, there are important data privacy issues and broader data governance issues to be considered. 
Implications For Evaluation Training and Professional Development
There are several implications of the data revolution for evaluation training and professional development. First of all, university graduates who are currently entering the field of evaluation may already have some basic training in the use of data science applications, including programming skills. As multivariate statistical analysis became an integral part of graduate education programs years ago, data science is gradually becoming an essential part of current university curricula. Evaluation-specific graduate programs, in particular, may wish to consider the integration of introductory and more advanced courses on data science applications into their curricula. Introductory and applied data science courses should also become available as part of the professional training offer accessed by more experienced evaluators.
As data science finds its way into evaluative practice, we need more tested use cases that can also serve educational purposes. We also need to optimize opportunities for on-the-job learning. The use of data science applications in evaluations requires a combination of technical and substantive expertise. This does not only require the right type of collaborative approach but also an institutional context that is conducive to experimentation. One of the elements that could be helpful here is to think about collaborations between academic institutions and organizational evaluation functions around the topic of methodological innovation and particularly the use of data science applications in evaluation.
It is becoming clear that the data revolution is creating new opportunities for evaluative analysis. Evaluators and other evaluation stakeholders should be prepared to take advantage of this potential while being cognizant of the limitations. One of the foundations of evaluative inquiry is the triangulation and synthesis of evidence from different sources to underpin a value judgment. New data and new data analytics will become a more important ingredient in this mix. To respond to the increasing demand for data science capacity in evaluation, further changes in the curricula of training activities as well as appropriate professional development activities are needed. In the end, in order for evaluators to successfully embrace data science, a bit of critical self-reflection may be helpful. Something we evaluators should be good at.
Bamberger and York 2020. https://ieg.worldbankgroup.org/event/datascience-and-evaluation.
 An example is land use classification categories constructed from satellite imagery data as proxies of outcomes of interventions.
 An excellent example can be found here: https://ieg.worldbankgroup.org/evaluations/world-bank-support-reducing-child-undernutrition.
 Additional data collection “on the ground” to help validate and interpret the remote sensing data.
 For more discussion see here: https://merltech.org/advancing-data-governance-in-africa.
Please join us! Comment on this blog or reach out to us on social media: LinkedIn and Twitter. What has been your experience with the impact of the data revolution on your evaluation work? Have you seen an increased focus on data use in classes and professional training programs for evaluators? Share your stories with us.
You can sign up for our newsletter here.
If you would like to contribute your knowledge to this blog, we would be happy to work with you - please contact us at firstname.lastname@example.org