How do expertise-oriented an consumer-oriented evaluation approaches differ?

Chapter 5 addresses first approaches, expertise and consumer-oriented. Please read the questions below from pages 150 – 151.

The initial posting should be substantive and include terminology and show a grasp of the material as well as your own ideas around the issues.

1) How do expertise-oriented an consumer-oriented evaluation approaches differ? How are they alike? What are your thoughts about the value of these approaches?

2) How should one determine the criteria for evaluating a product? Should the focus be solely on outcomes? What should be the balance among the quality of inputs (staff, facilities, budget), process (the conduct of the program), and outputs or outcomes?

The e-book has been uploaded. If you have questions about the answers, please feel free to read the book.

Please write in times new roman font 12 double spaced.

Please have the answer for no more than one page for each question.

Program Evaluation

Alternative Approaches and Practical Guidelines

FOURTH EDITION

Jody L. Fitzpatrick University of Colorado Denver

James R. Sanders Western Michigan University

Blaine R. Worthen Utah State University

Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto

Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo

Vice President and Editor in Chief: Jeffery W. Johnston Senior Acquisitions Editor: Meredith D. Fossel Editorial Assistant: Nancy Holstein Vice President, Director of Marketing: Margaret Waples Senior Marketing Manager: Christopher D. Barry Senior Managing Editor: Pamela D. Bennett Senior Project Manager: Linda Hillis Bayma Senior Operations Supervisor: Matthew Ottenweller Senior Art Director: Diane Lorenzo Cover Designer: Jeff Vanik Cover Image: istock Full-Service Project Management: Ashley Schneider, S4Carlisle Publishing Services Composition: S4Carlisle Publishing Services Printer/Binder: Courier/Westford Cover Printer: Lehigh-Phoenix Color/Hagerstown Text Font: Meridien

Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on appropriate page within text.

Every effort has been made to provide accurate and current Internet information in this book. However, the Internet and information posted on it are constantly changing, so it is inevitable that some of the Internet addresses listed in this textbook will change.

Copyright © 2011, 2004, 1997 Pearson Education, Inc., Upper Saddle River, New Jersey 07458. All rights reserved. Manufactured in the United States of America. This publication is protected by Copyright, and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission(s) to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, 501 Boylston Street, Suite 900, Boston, MA 02116, fax: (617) 671-2290, email: permissionsus@pearson.com.

Library of Congress Cataloging-in-Publication Data

Fitzpatrick, Jody L. Program evaluation: alternative approaches and practical guidelines / Jody L. Fitzpatrick, James R.

Sanders, Blaine R. Worthen. p. cm.

ISBN 978-0-205-57935-8 1. Educational evaluation—United States. 2. Evaluation research (Social action programs)—

United States. 3. Evaluation—Study and teaching—United States. I. Sanders, James R. II. Worthen, Blaine R. III. Worthen, Blaine R. Program evaluation. IV. Title.

LB2822.75.W67 2011 379.1’54—dc22

2010025390 10 9 8 7 6 5 4 3 2

ISBN 10: 0-205-57935-3 ISBN 13: 978-0-205-57935-8

mailto:permissionsus@pearson.com

Jody Fitzpatrick has been a faculty member in public administration at the Uni- versity of Colorado Denver since 1985. She teaches courses in research methods and evaluation, conducts evaluations in many schools and human service settings, and writes extensively about the successful practice of evaluation. She has served on the Board of the American Evaluation Association and on the editorial boards of the American Journal of Evaluation and New Directions for Evaluation. She has also served as Chair of the Teaching of Evaluation Topical Interest Group at the American Evaluation Association and has won a university-wide teaching award at her university. In one of her recent publications, Evaluation in Action: Interviews with Expert Evaluators, she uses interviews with expert evaluators on one evaluation to talk about the decisions that evaluators face as they plan and conduct evaluations and the factors that influence their choices. She is currently evaluating the chang- ing roles of counselors in middle schools and high schools and a program to help immigrant middle-school girls to achieve and stay in school. Her international work includes research on evaluation in Spain and Europe and, recently, she has spoken on evaluation issues to policymakers and evaluators in France, Spain, Denmark, Mexico, and Chile.

James Sanders is Professor Emeritus of Educational Studies and the Evaluation Center at Western Michigan University where he has taught, published, consulted, and conducted evaluations since 1975. A graduate of Bucknell University and the University of Colorado, he has served on the Board and as President of the American Evaluation Association (AEA) and has served as Chair of the Steering Committee that created the Evaluation Network, a predecessor to AEA. His publications in- clude books on school, student, and program evaluation. He has worked exten- sively with schools, foundations, and government and nonprofit agencies to develop their evaluation practices. As Chair of the Joint Committee on Standards for Educational Evaluation, he led the development of the second edition of The Program Evaluation Standards. He was also involved in developing the concepts of applied performance testing for student assessments, cluster evaluation for program evaluations by foundations and government agencies, and mainstreaming evaluation for organizational development. His international work in evaluation has been concentrated in Canada, Europe, and Latin America. He received distinguished ser- vice awards from Western Michigan University, where he helped to establish a PhD program in evaluation, and from the Michigan Association for Evaluation.

About the Authors

iii

Blaine Worthen is Psychology Professor Emeritus at Utah State University, where he founded and directed the Evaluation Methodology PhD program and the West- ern Institute for Research and Evaluation, conducting more than 350 evaluations for local and national clients in the United States and Canada. He received his PhD from The Ohio State University. He is a former editor of Evaluation Practice and founding editor of the American Journal of Evaluation. He served on the American Evaluation Association Board of Directors and received AEA’s Myrdal Award for Outstanding Evaluation Practitioner and AERA’s Best Evaluation Study Award. He has taught university evaluation courses (1969–1999), managed federally man- dated evaluations in 17 states (1973–1978), advised numerous government and private agencies, and given more than 150 keynote addresses and evaluation workshops in the United States, England, Australia, Israel, Greece, Ecuador, and other countries. He has written extensively in evaluation, measurement, and as- sessment and is the author of 135 articles and six books. His Phi Delta Kappan arti- cle, “Critical Issues That Will Determine the Future of Alternative Assessment,” was distributed to 500 distinguished invitees at the White House’s Goals 2000 Conference. He is recognized as a national and international leader in the field.

iv About the Authors

The twenty-first century is an exciting time for evaluation. The field is growing. People—schools, organizations, policymakers, the public at large—are interested in learning more about how programs work: how they succeed and how they fail. Given the tumult experienced in the first decade of this century, many peo- ple are interested in accountability from corporations, government, schools, and nonprofit organizations. The fourth edition of our best-selling textbook is designed to help readers consider how evaluation can achieve these purposes. As in previ- ous editions, our book is one of the few to introduce readers to both the different approaches to evaluation and practical methods for conducting it.

New to This Edition

The fourth edition includes many changes:

• A new chapter on the role of politics in evaluation and ethical considerations. • A new and reorganized Part Two that presents and discusses the most current

approaches and theories of evaluation. • An increased focus on mixed methods in design, data collection, and analysis. • Links to interviews with evaluators who conducted an evaluation that illus-

trates the concepts reviewed in that chapter, as they discuss the choices and challenges they faced.

• A discussion of how today’s focus on performance measurement, outcomes, impacts, and standards have influenced evaluation.

• New sections on organizational learning, evaluation capacity building, mainstreaming evaluation, and cultural competence––trends in evaluation and organizations.

Evaluation, today, is changing in a variety of ways. Policymakers, managers, citizens, and consumers want better tracking of activities and outcomes. More importantly, many want a better understanding of social problems and the programs and policies being undertaken to reduce these problems. Evaluation in many forms, including performance measurement and outcome or impact assessments, is ex- panding around the globe. People who work in organizations are also interested in evaluation as a way to enhance organizational learning. They want to know how well they’re doing, how to tackle the tough problems their organizations address, and how to improve their performance and better serve their clients and their

Preface

community. Many different methods are being developed and used: mixed meth- ods for design and data collection, increased involvement of new and different stakeholders in the evaluation process, expanded consideration of the potential uses and impacts of evaluation, and more effective and diverse ways to communicate findings. As evaluation expands around the world, the experiences of adapting eval- uation to different settings and different cultures are enriching the field.

In this new edition, we hope to convey to you the dynamism and creativity involved in conducting evaluation. Each of us has many years of experience in conducting evaluations in a variety of settings, including schools, public welfare agencies, mental health organizations, environmental programs, nonprofit organ- izations, and corporations. We also have years of experience teaching students how to use evaluation in their own organizations or communities. Our goal is, and always has been, to present information that readers can use either to conduct or to be a participant in evaluations that make a difference to their workplace, their clients, and their community. Let us tell you a bit more about how we hope to do that in this new edition.

Organization of This Text

The book is organized in four parts. Part One introduces the reader to key concepts in evaluation; its history and current trends; and ethical, political, and interper- sonal factors that permeate and transcend all phases of evaluation. Evaluation dif- fers from research in that it is occurring in the real world with the goal of being used by non-researchers to improve decisions, governance, and society. As a result, evaluators develop relationships with their users and stakeholders and work in a political environment in which evaluation results compete with other demands on decision makers. Evaluators must know how to work in such envi- ronments to get their results used. In addition, ethical challenges often present themselves. We find the ways in which evaluation differs from research to be both challenging and interesting. It is why we chose evaluation as our life’s work. In Part One, we introduce you to these differences and to the ways evaluators work in this public, political context.

In Part Two, we present several different approaches, often called models or theories, to evaluation. (Determining whether objectives or outcomes have been achieved isn’t the only way to approach evaluation!) Approaches influence how evaluators determine what to study and how they involve others in what they study. We have expanded our discussions of theory-based, decision-oriented, and participatory approaches. In doing so, we describe new ways in which evaluators use logic models and program theories to understand the workings of a program. Participatory and transformative approaches to empowering stakeholders and creating different ways of learning are described and contrasted. Evaluators must know methodology, but they also must know about different approaches to eval- uation to consciously and intelligently choose the approach or mix of approaches that is most appropriate for the program, clients, and stakeholders and context of their evaluation.

vi Preface

In Parts Three and Four, the core of the book, we describe how to plan and carry out an evaluation study. Part Three is concerned with the planning stage: learning about the program, conversing with stakeholders to learn purposes and consider future uses of the study, and identifying and finalizing evaluation questions to guide the study. Part Three teaches the reader how to develop an eval- uation plan and a management plan, including timelines and budgets for conduct- ing the study. In Part Four, we discuss the methodological choices and decisions evaluators make: selecting and developing designs; sampling, data collection, and analysis strategies; interpreting results; and communicating results to others. The chapters in each of these sections are sequential, representing the order in which decisions are made or actions are taken in the evaluation study. We make use of extensive graphics, lists, and examples to illustrate practice to the reader.

This Revision

Each chapter has been revised by considering the most current books, articles, and reports. Many new references and contemporary examples have been added. Thus, readers are introduced to current controversies about randomized control groups and appropriate designs for outcome evaluations, current discussions of political influences on evaluation policies and practices, research on participative approaches, discussions of cultural competency and capacity building in organiza- tions, and new models of evaluation use and views on interpreting and dissemi- nating results.

We are unabashedly eclectic in our approach to evaluation. We use many different approaches and methods––whatever is appropriate for the setting––and encourage you to do the same. We don’t advocate one approach, but instruct you in many. You will learn about different approaches or theories in Part Two and different methods of collecting data in Parts Three and Four.

To facilitate learning, we have continued with much the same pedagogical structure that we have used in past editions. Each chapter presents information on current and foundational issues in a practical, accessible manner. Tables and figures are used frequently to summarize or illustrate key points. Each chapter begins with Orienting Questions to introduce the reader to some of the issues that will be covered in the chapter and concludes with a list of the Major Concepts and Theories reviewed in the chapter, Discussion Questions, Application Exercises, and a list of Suggested Readings on the topics discussed.

Rather than using the case study method from previous editions, we thought it was time to introduce readers to some real evaluations. Fortunately, while Blaine Worthen was editor of American Journal of Evaluation, Jody Fitzpatrick wrote a column in which she interviewed evaluators about a single evaluation they had conducted. These interviews are now widely used in teaching about evaluation. We have incorporated them into this new edition by recommending the ones that illustrate the themes introduced in each chapter. Readers and instructors can choose either to purchase the book, Evaluation in Action (Fitzpatrick, Christie, & Mark, 2009), as a case companion to this text or to access many of the interviews

Preface vii

through their original publication in the American Journal of Evaluation. At the end of each chapter, we describe one to three relevant interviews, citing the chapter in the book and the original source in the journal.

We hope this book will inspire you to think in a new way about issues—in a questioning, exploring, evaluative way—and about programs, policy, and organi- zational change. For those readers who are already evaluators, this book will pro- vide you with new perspectives and tools for your practice. For those who are new to evaluation, this book will make you a more informed consumer of or participant in evaluation studies or, perhaps, guide you to undertake your own evaluation.

Acknowledgments

We would like to thank our colleagues in evaluation for continuing to make this such an exciting and dynamic field! Our work in each revision of our text has reminded us of the progress being made in evaluation and the wonderful insights of our colleagues about evaluation theory and practice. We would also like to thank Sophia Le, our research assistant, who has worked tirelessly, creatively, and diligently to bring this manuscript to fruition. We all are grateful to our families for the interest and pride they have shown in our work and the patience and love they have demonstrated as we have taken the time to devote to it.

viii Preface

Contents

PART ONE • Introduction to Evaluation 1

1 Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 3

Informal versus Formal Evaluation 5

A Brief Definition of Evaluation and Other Key Terms 6

Differences in Evaluation and Research 9

The Purposes of Evaluation 13

Roles and Activities of Professional Evaluators 16

Uses and Objects of Evaluation 18

Some Basic Types of Evaluation 20

Evaluation’s Importance—and Its Limitations 32

2 Origins and Current Trends in Modern Program Evaluation 38

The History and Influence of Evaluation in Society 38

1990–The Present: History and Current Trends 49

3 Political, Interpersonal, and Ethical Issues in Evaluation 64

Evaluation and Its Political Context 65

Maintaining Ethical Standards: Considerations, Issues, and Responsibilities for Evaluators 78

PART TWO • Alternative Approaches to Program Evaluation 109

4 Alternative Views of Evaluation 111

Diverse Conceptions of Program Evaluation 113

Origins of Alternative Views of Evaluation 114

Classifications of Evaluation Theories or Approaches 120

5 First Approaches: Expertise and Consumer-Oriented Approaches 126

The Expertise-Oriented Approach 127

The Consumer-Oriented Evaluation Approach 143

6 Program-Oriented Evaluation Approaches 153

The Objectives-Oriented Evaluation Approach 154

Logic Models and Theory-Based Evaluation Approaches 159

How Program-Oriented Evaluation Approaches Have Been Used 164

Strengths and Limitations of Program-Oriented Evaluation Approaches 166

Goal-Free Evaluation 168

7 Decision-Oriented Evaluation Approaches 172

Developers of Decision-Oriented Evaluation Approaches and Their Contributions 173

The Decision-Oriented Approaches 173

How the Decision-Oriented Evaluation Approaches Have Been Used 184

Strengths and Limitations of Decision-Oriented Evaluation Approaches 184

x Contents

8 Participant-Oriented Evaluation Approaches 189

Evolution of Participatory Approaches 190

Developers of Participant-Oriented Evaluation Approaches and Their Contributions 191

Participatory Evaluation Today: Two Streams and Many Approaches 199

Some Specific Contemporary Approaches 205

How Participant-Oriented Evaluation Approaches Have Been Used 220

Strengths and Limitations of Participant-Oriented Evaluation Approaches 223

9 Other Current Considerations: Cultural Competence and Capacity Building 231

The Role of Culture and Context in Evaluation Practice and Developing Cultural Competence 232

Evaluation’s Roles in Organizations: Evaluation Capacity Building and Mainstreaming Evaluation 235

10 A Comparative Analysis of Approaches 243

A Summary and Comparative Analysis of Evaluation Approaches 243

Cautions About the Alternative Evaluation Approaches 244

Contributions of the Alternative Evaluation Approaches 248

Comparative Analysis of Characteristics of Alternative Evaluation Approaches 249

Eclectic Uses of the Alternative Evaluation Approaches 251

PART THREE • Practical Guidelines for Planning Evaluations 257

11 Clarifying the Evaluation Request and Responsibilities 259

Understanding the Reasons for Initiating the Evaluation 260

Conditions Under Which Evaluation Studies Are Inappropriate 265

Contents xi

Determining When an Evaluation Is Appropriate: Evaluability Assessment 268

Using an Internal or External Evaluator 271

Hiring an Evaluator 277

How Different Evaluation Approaches Clarify the Evaluation Request and Responsibilities 281

12 Setting Boundaries and Analyzing the Evaluation Context 286

Identifying Stakeholders and Intended Audiences for an Evaluation 287

Describing What Is to Be Evaluated: Setting the Boundaries 290

Analyzing the Resources and Capabilities That Can Be Committed to the Evaluation 304

Analyzing the Political Context for the Evaluation 307

Variations Caused by the Evaluation Approach Used 309

Determining Whether to Proceed with the Evaluation 310

13 Identifying and Selecting the Evaluation Questions and Criteria 314

Identifying Useful Sources for Evaluation Questions: The Divergent Phase 315

Selecting the Questions, Criteria, and Issues to Be Addressed: The Convergent Phase 328

Specifying the Evaluation Criteria and Standards 332

Remaining Flexible during the Evaluation: Allowing New Questions, Criteria, and Standards to Emerge 336

14 Planning How to Conduct the Evaluation 340

Developing the Evaluation Plan 342

Specifying How the Evaluation Will Be Conducted: The Management Plan 358

xii Contents

Establishing Evaluation Agreements and Contracts 367

Planning and Conducting the Metaevaluation 368

PART FOUR • Practical Guidelines for Conducting and Using Evaluations 379

15 Collecting Evaluative Information: Design, Sampling, and Cost Choices 381

Using Mixed Methods 383

Designs for Collecting Descriptive and Causal Information 387

Sampling 407

Cost Analysis 411

16 Collecting Evaluative Information: Data Sources and Methods, Analysis, and Interpretation 418

Common Sources and Methods for Collecting Information 419

Planning and Organizing the Collection of Information 443

Analysis of Data and Interpretation of Findings 444

17 Reporting Evaluation Results: Maximizing Use and Understanding 453

Purposes of Evaluation Reporting and Reports 454

Different Ways of Reporting 455

Important Factors in Planning Evaluation Reporting 456

Key Components of a Written Report 469

Suggestions for Effective Oral Reporting 476

A Checklist for Good Evaluation Reports 479

How Evaluation Information Is Used 479

Contents xiii

18 The Future of Evaluation 490

The Future of Evaluation 490

Predictions Concerning the Profession of Evaluation 491

Predictions Concerning the Practice of Evaluation 493

A Vision for Evaluation 496

Conclusion 497

Appendix A The Program Evaluation Standards and Guiding Principles for Evaluators 499

References 505

Author Index 526

Subject Index 530

xiv Contents

Introduction to Evaluation

Part

This initial section of our text provides the background necessary for the begin- ning student to understand the chapters that follow. In it, we attempt to accom- plish three things: to explore the concept of evaluation and its various meanings, to review the history of program evaluation and its development as a discipline, and to introduce the reader to some of the factors that influence the practice of evaluation. We also acquaint the reader with some of the current controversies and trends in the field.

In Chapter 1, we discuss the basic purposes of evaluation and the varying roles evaluators play. We define evaluation specifically, and we introduce the reader to several different concepts and distinctions that are important to evalua- tion. In Chapter 2, we summarize the origins of today’s evaluation tenets and prac- tices and the historical evolution of evaluation as a growing force in improving our society’s public, nonprofit, and corporate programs. In Chapter 3, we discuss the political, ethical, and interpersonal factors that underlie any evaluation and em- phasize its distinction from research.

Our intent in Part One is to provide the reader with information essential to understanding not only the content of the sections that follow but also the wealth of material that exists in the literature on program evaluation. Although the con- tent in the remainder of this book is intended to apply primarily to the evaluation of programs, most of it also applies to the evaluation of policies, products, and processes used in those areas and, indeed, to any object of an evaluation. In Part Two we will introduce you to different approaches to evaluation to enlarge your understanding of the diversity of choices that evaluators and stakeholders make in undertaking evaluation.

This page intentionally left blank

Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions

Orienting Questions

1. What is evaluation? Why is it important?

2. What is the difference between formal and informal evaluation?

3. What are some purposes of evaluation? What roles can the evaluator play?

4. What are the major differences between formative and summative evaluations?

5. What questions might an evaluator address in a needs assessment, a process evaluation, and an outcome evaluation?

6. What are the advantages and disadvantages of an internal evaluator? An external evaluator?

The challenges confronting our society in the twenty-first century are enormous. Few of them are really new. In the United States and many other countries, the public and nonprofit sectors are grappling with complex issues: educating children for the new century; reducing functional illiteracy; strengthening families; train- ing people to enter or return to the workforce; training employees who currently work in an organization; combating disease and mental illness; fighting discrimi- nation; and reducing crime, drug abuse, and child and spouse abuse. More recently, pursuing and balancing environmental and economic goals and working to ensure peace and economic growth in developing countries have become prominent concerns. As this book is written, the United States and many countries around

4 Part I • Introduction to Evaluation

the world are facing challenging economic problems that touch every aspect of so- ciety. The policies and programs created to address these problems will require evaluation to determine which solutions to pursue and which programs and poli- cies are working and which are not. Each new decade seems to add to the list of challenges, as society and the problems it confronts become increasingly complex.

As society’s concern over these pervasive and perplexing problems has intensified, so have its efforts to resolve them. Collectively, local, regional, national, and international agencies have initiated many programs aimed at eliminating these problems or their underlying causes. In some cases, specific programs judged to have been ineffective have been “mothballed” or sunk outright, often to be replaced by a new program designed to attack the problem in a different—and, hopefully, more effective—manner.

In more recent years, scarce resources and budget deficits have posed still more challenges as administrators and program managers have had to struggle to keep their most promising programs afloat. Increasingly, policymakers and man- agers have been faced with tough choices, being forced to cancel some programs or program components to provide sufficient funds to start new programs, to con- tinue others, or simply to keep within current budgetary limits.

To make such choices intelligently, policy makers need good information about the relative effectiveness of programs. Which programs are working well? Which are failing? What are the programs’ relative costs and benefits? Similarly, each program manager needs to know how well different parts of programs are working. What can be done to improve those parts of the program that are not working as well as they should? Have all aspects of the program been thought through carefully at the planning stage, or is more planning needed? What is the theory or logic model for the program’s effectiveness? What adaptations would make the program more effective?

Answering such questions is the major task of program evaluation. The ma- jor task of this book is to introduce you to evaluation and the vital role it plays in virtually every sector of modern society. However, before we can hope to convince you that good evaluation is an essential part of good programs, we must help you understand at least the basic concepts in each of the following areas:

• How we—and others—define evaluation • How formal and informal evaluation differ • The basic purposes—and various uses—of formal evaluation • The distinction between basic types of evaluation • The distinction between internal and external evaluators • Evaluation’s importance and its limitations

Covering all of those areas thoroughly could fill a whole book, not just one chapter of an introductory text. In this chapter, we provide only brief coverage of each of these topics to orient you to concepts and distinctions necessary to under- stand the content of later chapters.

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 5

Informal versus Formal Evaluation

Evaluation is not a new concept. In fact, people have been evaluating, or examin- ing and judging things, since the beginning of human history. Neanderthals prac- ticed it when determining which types of saplings made the best spears, as did Persian patriarchs in selecting the most suitable suitors for their daughters, and English yeomen who abandoned their own crossbows in favor of the Welsh longbow. They had observed that the longbow could send an arrow through the stoutest armor and was capable of launching three arrows while the crossbow sent only one. Al- though no formal evaluation reports on bow comparisons have been unearthed in English archives, it is clear that the English evaluated the longbow’s value for their purposes, deciding that its use would strengthen them in their struggles with the French. So the English armies relinquished their crossbows, perfected and improved on the Welsh longbow, and proved invincible during most of the Hundred Years’ War.

By contrast, French archers experimented briefly with the longbow, then went back to the crossbow—and continued to lose battles. Such are the perils of poor evaluation! Unfortunately, the faulty judgment that led the French to persist in us- ing an inferior weapon represents an informal evaluation pattern that has been re- peated too often throughout history.

As human beings, we evaluate every day. Practitioners, managers, and policymakers make judgments about students, clients, personnel, programs, and policies. These judgments lead to choices and decisions. They are a natural part of life. A school principal observes a teacher working in the classroom and forms some judgments about that teacher’s effectiveness. A program officer of a founda- tion visits a substance abuse program and forms a judgment about the program’s quality and effectiveness. A policymaker hears a speech about a new method for de- livering health care to uninsured children and draws some conclusions about whether that method would work in his state. Such judgments are made every day in our work. These judgments, however, are based on informal, or unsystematic, evaluations.

Informal evaluations can result in faulty or wise judgments. But, they are characterized by an absence of breadth and depth because they lack systematic procedures and formally collected evidence. As humans, we are limited in making judgments both by the lack of opportunity to observe many different settings, clients, or students and by our own past experience, which both informs and bi- ases our judgments. Informal evaluation does not occur in a vacuum. Experience, instinct, generalization, and reasoning can all influence the outcome of informal evaluations, and any or all of these may be the basis for sound, or faulty, judg- ments. Did we see the teacher on a good day or a bad one? How did our past ex- perience with similar students, course content, and methods influence our judgment? When we conduct informal evaluations, we are less cognizant of these limitations. However, when formal evaluations are not possible, informal evalua- tion carried out by knowledgeable, experienced, and fair people can be very use- ful indeed. It would be unrealistic to think any individual, group, or organization could formally evaluate everything it does. Often informal evaluation is the only

6 Part I • Introduction to Evaluation

practical approach. (In choosing an entrée from a dinner menu, only the most compulsive individual would conduct exit interviews with restaurant patrons to gather data to guide that choice.)

Informal and formal evaluation, however, form a continuum. Schwandt (2001a) acknowledges the importance and value of everyday judgments and argues that evaluation is not simply about methods and rules. He sees the evaluator as helping practitioners to “cultivate critical intelligence.” Evaluation, he notes, forms a middle ground “between overreliance on and over-application of method, general principles, and rules to making sense of ordinary life on one hand, and advocating trust in personal inspiration and sheer intuition on the other” (p. 86). Mark, Henry, and Julnes (2000) echo this concept when they describe evaluation as a form of assisted sense-making. Evaluation, they observe, “has been developed to assist and extend natural human abilities to observe, understand, and make judgments about policies, programs, and other objects in evaluation” (p. 179).

Evaluation, then, is a basic form of human behavior. Sometimes it is thorough, structured, and formal. More often it is impressionistic and private. Our focus is on the more formal, structured, and public evaluation. We want to inform readers of various approaches and methods for developing criteria and collecting information about alternatives. For those readers who aspire to become professional evaluators, we will be introducing you to the approaches and methods used in these formal studies. For all readers, practitioners and evaluators, we hope to cultivate that critical intelligence, to make you cognizant of the factors influencing your more informal judgments and decisions.

A Brief Definition of Evaluation and Other Key Terms

In the previous section, the perceptive reader will have noticed that the term “evaluation” has been used rather broadly without definition beyond what was implicit in context. But the rest of this chapter could be rather confusing if we did not stop briefly to define the term more precisely. Intuitively, it may not seem dif- ficult to define evaluation. For example, one typical dictionary definition of eval- uation is “to determine or fix the value of: to examine and judge.” Seems quite straightforward, doesn’t it? Yet among professional evaluators, there is no uni- formly agreed-upon definition of precisely what the term “evaluation” means. In fact, in considering the role of language in evaluation, Michael Scriven, one of the founders of evaluation, for an essay on the use of language in evaluation recently noted there are nearly 60 different terms for evaluation that apply to one context or another. These include adjudge, appraise, analyze, assess, critique, examine, grade, inspect, judge, rate, rank, review, score, study, test, and so on (cited in Patton, 2000, p. 7). While all these terms may appear confusing, Scriven notes that the variety of uses of the term evaluation “reflects not only the immense im- portance of the process of evaluation in practical life, but the explosion of a new area of study” (cited in Patton, 2000, p. 7). This chapter will introduce the reader

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 7

to the array of variations in application, but, at this point, we will focus on one definition that encompasses many others.

Early in the development of the field, Scriven (1967) defined evaluation as judging the worth or merit of something. Many recent definitions encompass this original definition of the term (Mark, Henry, & Julnes, 2000; Schwandt, 2008; Scriven, 1991a; Stake, 2000a; Stufflebeam, 2001b). We concur that evaluation is de- termining the worth or merit of an evaluation object (whatever is evaluated). More broadly, we define evaluation as the identification, clarification, and application of defensible criteria to determine an evaluation object’s value (worth or merit) in rela- tion to those criteria. Note that this definition requires identifying and clarifying de- fensible criteria. Often, in practice, our judgments of evaluation objects differ because we have failed to identify and clarify the means that we, as individuals, use to judge an object. One educator may value a reading curriculum because of the love it instills for reading; another may disparage the program because it does not move the child along as rapidly as other curricula in helping the student to recognize and interpret letters, words, or meaning. These educators differ in the value they assign to the cur- ricula because their criteria differ. One important role of an evaluator is to help stake- holders articulate their criteria and to stimulate dialogue about them. Our definition, then, emphasizes using those criteria to judge the merit or worth of the product.

Evaluation uses inquiry and judgment methods, including: (1) determining the criteria and standards for judging quality and deciding whether those stan- dards should be relative or absolute, (2) collecting relevant information, and (3) applying the standards to determine value, quality, utility, effectiveness, or sig- nificance. It leads to recommendations intended to optimize the evaluation object in relation to its intended purpose(s) or to help stakeholders determine whether the evaluation object is worthy of adoption, continuation, or expansion.

Programs, Policies, and Products

In the United States, we often use the term “program evaluation.” In Europe and some other countries, however, evaluators often use the term “policy evaluation.” This book is concerned with the evaluation of programs, policies, and products. We are not, however, concerned with evaluating personnel or the performance of indi- vidual people or employees. That is a different area, one more concerned with man- agement and personnel.1 (See Joint Committee. [1988]) But, at this point, it would be useful to briefly discuss what we mean by programs, policies, and products. “Program” is a term that can be defined in many ways. In its simplest sense, a pro- gram is a “standing arrangement that provides for a . . . service” (Cronbach et al., 1980, p. 14). The Joint Committee on Standards for Educational Evaluation (1994) defined program simply as “activities that are provided on a continuing basis” (p. 3). In their

1The Joint Committee on Standards for Educational Evaluation has developed some standards for personnel evaluation that may be of interest to readers involved in evaluating the performance of teach- ers or other employees working in educational settings. These can be found at http://www.eval.org/ evaluationdocuments/perseval.html.

http://www.eval.org/evaluationdocuments/perseval.html

8 Part I • Introduction to Evaluation

new edition of the Standards (2010) the Joint Committee noted that a program is much more than a set of activities. They write:

Defined completely, a program is

• A set of planned systematic activities • Using managed resources • To achieve specified goals • Related to specific needs • Of specific, identified, participating human individuals or groups • In specific contexts • Resulting in documentable outputs, outcomes and impacts • Following assumed (explicit or implicit) systems of beliefs (diagnostic, causal, in-

tervention, and implementation theories about how the program works) • With specific, investigable costs and benefits. (Joint Committee, 2010, in press)

Note that their newer definition emphasizes programs achieving goals related to particular needs and the fact that programs are based on certain theories or as- sumptions. We will talk more about this later when we discuss program theory. We will simply summarize by saying that a program is an ongoing, planned intervention that seeks to achieve some particular outcome(s), in response to some perceived ed- ucational, social, or commercial problem. It typically includes a complex of people, organization, management, and resources to deliver the intervention or services.

In contrast, the word “policy” generally refers to a broader act of a public orga- nization or a branch of government. Organizations have policies—policies about re- cruiting and hiring employees, policies about compensation, policies concerning interactions with media and the clients or customers served by the organization. But, government bodies—legislatures, departments, executives, and others—also pass or develop policies. It might be a law or a regulation. Evaluators often conduct studies to judge the effectiveness of those policies just as they conduct studies to evaluate pro- grams. Sometimes, the line between a program and a policy is quite blurred. Like a program, a policy is designed to achieve some outcome or change, but, unlike a pro- gram, a policy does not provide a service or activity. Instead, it provides guidelines, regulations, or the like to achieve a change. Those who study public policy define policy even more broadly: “public policy is the sum of government activities, whether acting directly or through agents, as it has an influence on the life of citizens” (Peters, 1999, p. 4). Policy analysts study the effectiveness of public policies just as evaluators study the effectiveness of government programs. Sometimes, their work overlaps. What one person calls a policy, another might call a program. In practice, in the United States, policy analysts tend to be trained in political science and economics, and evaluators tend to be trained in psychology, sociology, education, and public administration. As the field of evaluation expands and clients want more information on government programs, evaluators study the effectiveness of programs and policies.

Finally, a “product” is a more concrete entity than either a policy or a pro- gram. It may be a textbook such as the one you are reading. It may be a piece of software. Scriven defines a product very broadly to refer to the output of some- thing. Thus, a product could be a student or a person who received training, the

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 9

work of a student, or a curricula which is “the product of a research and development effort” (1991a, p. 280).

Stakeholders

Another term used frequently in evaluation is “stakeholders.” Stakeholders are various individuals and groups who have a direct interest in and may be affected by the program being evaluated or the evaluation’s results. In the Encyclopedia of Evaluation, Greene (2005) identifies four types of stakeholders:

(a) People who have authority over the program including funders, policy makers, advisory boards;

(b) People who have direct responsibility for the program including program devel- opers, administrators, managers, and staff delivering the program;

(d) People who are damaged or disadvantaged by the program (those who lose fund- ing or are not served because of the program). (pp. 397–398)

Scriven (2007) has grouped stakeholders into groups based on how they are impacted by the program, and he includes more groups, often political groups, than does Greene. Thus, “upstream impactees” refer to taxpayers, political supporters, funders, and those who make policies that affect the program. “Midstream impactees,” also called primary stakeholders by Alkin (1991), are program managers and staff. “Down- stream impactees” are those who receive the services or products of the program.

All of these groups hold a stake in the future direction of that program even though they are sometimes unaware of their stake. Evaluators typically involve at least some stakeholders in the planning and conduct of the evaluation. Their par- ticipation can help the evaluator to better understand the program and the infor- mation needs of those who will use it.

Differences in Evaluation and Research

It is important to distinguish between evaluation and research, because these dif- ferences help us to understand the distinctive nature of evaluation. While some methods of evaluation emerged from social science research traditions, there are important distinctions between evaluation and research. One of those distinctions is purpose. Research and evaluation seek different ends. The primary purpose of research is to add to knowledge in a field, to contribute to the growth of theory. A good research study is intended to advance knowledge. While the results of an evaluation study may contribute to knowledge development (Mark, Henry, & Julnes, 2000), that is a secondary concern in evaluation. Evaluation’s primary pur- pose is to provide useful information to those who hold a stake in whatever is be- ing evaluated (stakeholders), often helping them to make a judgment or decision.

10 Part I • Introduction to Evaluation

Research seeks conclusions; evaluation leads to judgments. Valuing is the sine qua non of evaluation. A touchstone for discriminating between an evaluator and a researcher is to ask whether the inquiry being conducted would be regarded as a failure if it produced no data on the value of the thing being studied. A researcher answering strictly as a researcher will probably say no.

These differing purposes have implications for the approaches one takes. Research is the quest for laws and the development of theory—statements of re- lationships among two or more variables. Thus, the purpose of research is typically to explore and establish causal relationships. Evaluation, instead, seeks to exam- ine and describe a particular thing and, ultimately, to consider its value. Some- times, describing that thing involves examining causal relationships; often, it does not. Whether the evaluation focuses on a causal issue depends on the information needs of the stakeholders.

This highlights another difference in evaluation and research—who sets the agenda. In research, the hypotheses to be investigated are chosen by the researcher based on the researcher’s assessment of the appropriate next steps in developing theory in the discipline or field of knowledge. In evaluation, the questions to be answered are not those of the evaluator, but rather come from many sources, including those of significant stakeholders. An evaluator might suggest questions, but would never determine the focus of the study without consultation with stakeholders. Such actions, in fact, would be unethical in evaluation. Unlike re- search, good evaluation always involves the inclusion of stakeholders—often a wide variety of stakeholders—in the planning and conduct of the evaluation for many reasons: to ensure that the evaluation addresses the needs of stakeholders, to improve the validity of results, and to enhance use.

Another difference between evaluation and research concerns generalizabil- ity of results. Given evaluation’s purpose of making judgments about a particular thing, good evaluation is quite specific to the context in which the evaluation object rests. Stakeholders are making judgments about a particular evaluation object, a program or a policy, and are not as concerned with generalizing to other settings as researchers would be. In fact, the evaluator should be concerned with the par- ticulars of that setting, with noting them and attending to the factors that are rel- evant to program success or failure in that setting. (Note that the setting or context may be a large, national program with many sites, or a small program in one school.) In contrast, because the purpose of research is to add to general knowledge, the methods are often designed to maximize generalizability to many different settings.

As suggested previously, another difference between research and evaluation concerns the intended use of their results. Later in the book, we will discuss the many different types of use that may occur in evaluation, but, ultimately, evalua- tion is intended to have some relatively immediate impact. That impact may be on immediate decisions, on decisions in the not-too-distant future, or on perspectives that one or more stakeholder groups or stakeholders have about the object of the evaluation or evaluation itself. Whatever the impact, the evaluation is designed to be used. Good research may or may not be used right away. In fact, research that adds in important ways to some theory may not be immediately noticed, and

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 11

connections to a theory may not be made until some years after the research is conducted.2 Nevertheless, the research stands alone as good research if it meets the standards for research in that discipline or field. If one’s findings are to add to knowl- edge in a field, ideally, the results should transcend the particulars of time and setting.

Thus, research and evaluation differ in the standards used to judge their adequacy (Mathison, 2007). Two important criteria for judging the adequacy of research are internal validity, the study’s success at establishing causality, and external validity, the study’s generalizability to other settings and other times. These crite- ria, however, are not sufficient, or appropriate, for judging the quality of an eval- uation. As noted previously, generalizability, or external validity, is less important for an evaluation because the focus is on the specific characteristics of the program or policy being evaluated. Instead, evaluations are typically judged by their accuracy (the extent to which the information obtained is an accurate reflection—a one-to- one correspondence—with reality), utility (the extent to which the results serve the practical information needs of intended users), feasibility (the extent to which the evaluation is realistic, prudent, diplomatic, and frugal), and propriety (the extent to which the evaluation is done legally and ethically, protecting the rights of those involved). These standards and a new standard concerning evaluation accountabil- ity were developed by the Joint Committee on Standards for Evaluation to help both users of evaluation and evaluators themselves to understand what evalua- tions should do (Joint Committee, 2010). (See Chapter 3 for more on the Standards.)

Researchers and evaluators also differ in the knowledge and skills required to perform their work. Researchers are trained in depth in a single discipline—their field of inquiry. This approach is appropriate because a researcher’s work, in almost all cases, will remain within a single discipline or field. The methods he or she uses will remain relatively constant, as compared with the methods that evaluators use, because a researcher’s focus remains on similar problems that lend themselves to certain methods of study. Evaluators, by contrast, are evaluating many different types of programs or policies and are responding to the needs of clients and stakehold- ers with many different information needs. Therefore, evaluators’ methodological training must be broad and their focus may transcend several disciplines. Their edu- cation must help them to become sensitive to the wide range of phenomena to which they must attend if they are to properly assess the worth of a program or policy. Evaluators must be broadly familiar with a wide variety of methods and techniques so they can choose those most appropriate for the particular program and the needs of its stakeholders. In addition, evaluation has developed some of its own specific methods, such as using logic models to understand program theory and metaevalua- tion. Mathison writes that “evaluation as a practice shamelessly borrows from all disciplines and ways of thinking to get at both facts and values” (2007, p. 20). Her statement illustrates both the methodological breadth required of an evaluator and

2A notable example concerns Darwin’s work on evolution. Elements of his book, The Origin of the Species, were rejected by scientists some years ago and are only recently being reconsidered as new research sug- gests that some of these elements were correct. Thus, research conducted more than 100 years ago emerges as useful because new techniques and discoveries prompt scientists to reconsider the findings.

12 Part I • Introduction to Evaluation

the fact that evaluators’ methods must serve the purpose of valuing or establishing merit and worth, as well as establishing facts.

Finally, evaluators differ from researchers in that they must establish personal working relationships with clients. As a result, studies of the competencies required of evaluators often cite the need for training in interpersonal and communication skills (Fitzpatrick, 1994; King, Stevahn, Ghere, & Minnema, 2001; Stufflebeam & Wingate, 2005).

In summary, research and evaluation differ in their purposes and, as a result, in the roles of the evaluator and researcher in their work, their preparation, and the criteria used to judge the work. (See Table 1.1 for a summary of these differ- ences.) These distinctions lead to many differences in the manner in which research and evaluation are conducted.

Of course, evaluation and research sometimes overlap. An evaluation study may add to our knowledge of laws or theories in a discipline. Research can inform our judgments and decisions regarding a program or policy. Yet, fundamental distinctions remain. Our earlier discussion highlights these differences to help those who are new to evaluation to see the ways in which evaluators behave differently than researchers. Evaluations may add to knowledge in a field, contribute to theory development, establish causal relationships, and provide explanations for the relationship between phenomena, but that is not its primary purpose. Its primary purpose is to assist stake- holders in making value judgments and decisions about whatever is being evaluated.

Action Research

A different type of research altogether is action research. Action research, origi- nally conceptualized by Kurt Lewin (1946) and more recently developed by Emily Calhoun (1994, 2002), is research conducted collaboratively by professionals to

TABLE 1.1 Differences in Research and Evaluation

Factor Research Evaluation

Purpose Add to knowledge in a field, develop laws and theories

Make judgments, provide information for decision making

Who sets the agenda or focus?

Researchers Stakeholders and evaluator jointly

Generalizability of results

Important to add to theory Less important, focus is on particulars of program or policy and context

Intended use of results

Not important An important standard

Criteria to judge adequacy

Internal and external validity Accuracy, utility, feasibility, propriety, evaluation accountability

Preparation of those who work in area

Depth in subject matter, fewer methodological tools and approaches

Interdisciplinary, many methodological tools, interpersonal skills

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 13

improve their practice. Such professionals might be social workers, teachers, or accountants who are using research methods and means of thinking to develop their practice. As Elliott (2005) notes, action research always has a developmental aim. Calhoun, who writes of action research in the context of education, gives exam- ples of teachers working together to conceptualize their focus; to collect, analyze, and interpret data on the issue; and to make decisions about how to improve their practice as teachers and/or a program or curriculum they are implementing. The data collection processes may overlap with program evaluation activities, but there are key differences: Action research is conducted by professionals about their own work with a goal of improving their practice. Action research is also considered to be a strategy to change the culture of organizations to one in which professionals work collaboratively to learn, examine, and research their own practices. Thus, action research produces information akin to that in formative evaluations— information to be used for program improvement. The research is conducted by those delivering the program and, in addition to improving the element under study, has major goals concerning professional development and organizational change.

The Purposes of Evaluation

Consistent with our earlier definition of evaluation, we believe that the primary purpose of evaluation is to render judgments about the value of whatever is being evaluated. This view parallels that of Scriven (1967), who was one of the earliest to outline the purpose of formal evaluation. In his seminal paper, “The Methodol- ogy of Evaluation,” he argued that evaluation has a single goal or purpose: to determine the worth or merit of whatever is evaluated. In more recent writings, Scriven has continued his emphasis on the primary purpose of evaluation being to judge the merit or worth of an object (Scriven, 1996).

Yet, as evaluation has grown and evolved, other purposes have emerged. A discussion of these purposes sheds light on the practice of evaluation in today’s world. For the reader new to evaluation, these purposes illustrate the many facets of evaluation and its uses. Although we agree with Scriven’s historical emphasis on the purpose of evaluation, to judge the merit or worth of a program, policy, process, or product, we see these other purposes of evaluation at play as well.

Some years ago, Talmage (1982) argued that an important purpose of eval- uation was “to assist decision makers responsible for making policy” (p. 594). And, in fact, providing information that will improve the quality of decisions made by policymakers continues to be a major purpose of program evaluation. Indeed, the rationale given for collecting much evaluation data today—by schools, by state and local governments, by the federal government, and by nonprofit organizations— is to help policymakers in these organizations make decisions about whether to continue programs, to initiate new programs, or, in other major ways, to change the funding or structure of a program. In addition to decisions made by policymakers, evaluation is intended to inform the decisions of many others, including program managers (principals, department heads), program staff (teachers, counselors,

14 Part I • Introduction to Evaluation

health care providers, and others delivering the services offered by a program), and program consumers (clients, parents, citizens). A group of teachers may use evaluations of student performance to make decisions on program curricula or materials. Parents make decisions concerning where to send their children to school based on information on school performance. Students choose institutions of higher education based on evaluative information. The evaluative information or data provided may or may not be the most useful for making a particular deci- sion, but, nevertheless, evaluation clearly serves this purpose.

For many years, evaluation has been used for program improvement. As we will discuss later in this chapter, Michael Scriven long ago identified program im- provement as one of the roles of evaluation, though he saw that role being achieved through the initial purpose of judging merit and worth. Today, many see organizational and program improvement as a major, direct purpose of evaluation (Mark, Henry, & Julnes, 2000; Patton, 2008a; Preskill & Torres, 1998).

Program managers or those who deliver a program can make changes to im- prove the program based on the evaluation results. In fact, this is one of the most frequent uses of evaluation. There are many such examples: teachers using the re- sults of student assessments to revise their curricula or pedagogical methods, health care providers using evaluations of patients’ use of medication to revise their means of communicating with patients about dosage and use, and trainers us- ing feedback from trainees to change training to improve its application on the job. These are all ways that evaluation serves the purpose of program improvement.

Today, many evaluators see evaluation being used for program and organi- zational improvement in new ways. As we will describe in later chapters, Michael Patton often works today in what he calls “developmental evaluation,” working to assist organizations that do not have specific, measurable goals, but, instead, need evaluation to help them with ongoing progress, adaptation, and learning (Patton, 1994, 2005b). Hallie Preskill (Preskill, 2008; Preskill & Torres, 2000) and others (King, 2002; Baker & Bruner, 2006) have written about the role of evaluation in improving overall organizational performance by instilling new ways of thinking. In itself, the process of participating in an evaluation can begin to influence the ways that those who work in the organization approach problems. For example, an evaluation that involves employees in developing a logic model for the program to be evaluated or in examining data to draw some conclusions about program progress may prompt those employees to use such procedures or these ways of approaching a problem in the future and, thus, lead to organizational improvement.

The purpose of program or organizational improvement, of course, overlaps with others. When an evaluation is designed for program improvement, the eval- uator must consider the decisions that those managing and delivering the program will make in using the study’s results for program improvement. So the purpose of the evaluation is to provide both decision making and program improvement. We will not split hairs to distinguish between the two purposes, but will simply acknowledge that evaluation can serve both purposes. Our goal is to expand your view of the various purposes for evaluation and to help you consider the purpose in your own situation or organization.

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 15

Some recent discussions of the purposes of evaluation move beyond these more immediate purposes to evaluation’s ultimate impact on society. Some evalu- ators point out that one important purpose of evaluation is helping give voice to groups who are not often heard in policy making or planning programs. Thus, House and Howe (1999) argue that the goal of evaluation is to foster deliberative democracy. They encourage the evaluator to work to help less powerful stakehold- ers gain a voice and to stimulate dialogue among stakeholders in a democratic fash- ion. Others highlight the role of the evaluator in helping bring about greater social justice and equality. Greene, for example, notes that values inevitably influence the practice of evaluation and, therefore, evaluators can never remain neutral. Instead, they should recognize the diversity of values that emerge and arise in an evaluation and work to achieve desirable values of social justice and equity (Greene, 2006).

Carol Weiss (1998b) and Gary Henry (2000) have argued that the purpose of evaluation is to bring about social betterment. Mark, Henry, and Julnes (2000) de- fine achieving social betterment as “the alleviation of social problems, meeting of hu- man needs” (p. 190). And, in fact, evaluation’s purpose of social betterment is at least partly reflected in the Guiding Principles, or ethical code, adopted by the American Evaluation Association. One of those principles concerns the evaluator’s responsibil- ities for the general and public welfare. Specifically, Principle E5 states the following:

Evaluators have obligations that encompass the public interest and good. Because the public interest and good are rarely the same as the interests of any particular group (including those of the client or funder) evaluators will usually have to go beyond analysis of particular stakeholder interests and consider the welfare of society as a whole. (American Evaluation Association, 2004)

This principle has been the subject of more discussion among evaluators than other principles, and deservedly so. Nevertheless, it illustrates one important pur- pose of evaluation. Evaluations are concerned with programs and policies that are intended to improve society. Their results provide information on the choices that policymakers, program managers, and others make in regard to these programs. As a result, evaluators must be concerned with their purposes in achieving the so- cial betterment of society. Writing in 1997 about the coming twenty-first century, Chelimsky and Shadish emphasized the global perspective of evaluation in achiev- ing social betterment, extending evaluation’s context in the new century to world- wide challenges. These include new technologies, demographic imbalances across nations, environmental protection, sustainable development, terrorism, human rights, and other issues that extend beyond one program or even one country (Chelimsky & Shadish, 1997).

Finally, many evaluators continue to acknowledge the purpose of evaluation in extending knowledge (Donaldson, 2007; Mark, Henry, & Julnes, 2000). Although adding to knowledge is the primary purpose of research, evaluation studies can add to our knowledge of social science theories and laws. They provide an opportunity to test theories in real-world settings or to test existing theories or laws with new groups by examining whether those theories hold true in new

16 Part I • Introduction to Evaluation

settings with different groups. Programs or policies are often, though certainly not always, based on some theory or social science principles.3 Evaluations provide the opportunity to test those theories. Evaluations collect many kinds of information that can add to our knowledge: information describing client groups or problems, information on causes or consequences of problems, tests of theories concerning impact. For example, Debra Rog conducted an evaluation of a large intervention program to help homeless families in the early 1990s (Rog, 1994; Rog, Holupka, McCombs-Thornton, Brito, & Hambrick, 1997). At the time, not much was known about homeless families and some of the initial assumptions in planning were in- correct. Rog adapted her evaluation design to learn more about the circumstances of homeless families. Her results helped to better plan the program, but also added to our knowledge about homeless families, their health needs, and their circum- stances. In our discussion of the differences between research and evaluation, we emphasized that the primary purpose of research is to add to knowledge in a field and that this is not the primary purpose of evaluation. We continue to maintain that distinction. However, the results of some evaluations can add to our knowl- edge of social science theories and laws. This is not a primary purpose, but simply one purpose that an evaluation may serve.

In closing, we see that evaluation serves many different purposes. Its primary purpose is to determine merit or worth, but it serves many other valuable pur- poses as well. These include assisting in decision making; improving programs, or- ganizations, and society as a whole; enhancing democracy by giving voice to those with less power; and adding to our base of knowledge.

Roles and Activities of Professional Evaluators

Evaluators as practitioners play numerous roles and conduct multiple activities in performing evaluation. Just as discussions on the purposes of evaluation help us to better understand what we mean by determining merit and worth, a brief dis- cussion of the roles and activities pursued by evaluators will acquaint the reader with the full scope of activities that professionals in the field pursue.

A major role of the evaluator that many in the field emphasize and discuss is that of encouraging the use of evaluation results (Patton, 2008a; Shadish, 1994). While the means for encouraging use and the anticipated type of use may differ, considering use of results is a major role of the evaluator. In Chapter 17, we will discuss the different types of use that have been identified for evaluation and var- ious means for increasing that use. Henry (2000), however, has cautioned that fo- cusing primarily on use can lead to evaluations focused solely on program and organizational improvement and, ultimately, avoiding final decisions about merit and worth. His concern is appropriate; however, if the audience for the evaluation

3The term “evidence-based practice” emerges from the view that programs should be designed around social science research findings when basic research, applied research, or evaluation studies have found that a given program practice or action leads to the desired, intended outcomes.

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 17

is one that is making decisions about the program’s merit and worth, this problem may be avoided. (See discussion of formative and summative evaluation in this chapter.) Use is certainly central to evaluation, as demonstrated by the prominent role it plays in the professional standards and codes of evaluation. (See Chapter 3.)

Others’ discussions of the role of the evaluator illuminate the ways in which evaluators might interact with stakeholders and other users. Rallis and Rossman (2000) see the role of the evaluator as that of a critical friend. They view the pri- mary purpose of evaluation as learning and argue that, for learning to occur, the evaluator has to be a trusted person, “someone the emperor knows and can listen to. She is more friend than judge, although she is not afraid to offer judgments” (p. 83). Schwandt (2001a) describes the evaluator in the role of a teacher, helping practitioners develop critical judgment. Patton (2008a) envisions evaluators in many different roles including facilitator, collaborator, teacher, management con- sultant, organizational development (OD) specialist, and social-change agent. These roles reflect his approach to working with organizations to bring about develop- mental change. Preskill and Torres (1998) stress the role of the evaluator in bring- ing about organizational learning and instilling a learning environment. Mertens (1999), Chelimsky (1998), and Greene (1997) emphasize the important role of in- cluding stakeholders, who often have been ignored by evaluation. House and Howe (1999) argue that a critical role of the evaluator is stimulating dialogue among various groups. The evaluator does not merely report information, or pro- vide it to a limited or designated key stakeholder who may be most likely to use the information, but instead stimulates dialogue, often bringing in disenfranchised groups to encourage democratic decision making.

Evaluators also have a role in program planning. Bickman (2002), Chen (1990), and Donaldson (2007) emphasize the important role that evaluators play in helping articulate program theories or logic models. Wholey (1996) argues that a critical role for evaluators in performance measurement is helping policymakers and managers select the performance dimensions to be measured as well as the tools to use in measuring those dimensions.

Certainly, too, evaluators can play the role of the scientific expert. As Lipsey (2000) notes, practitioners want and often need evaluators with the “expertise to track things down, systematically observe and measure them, and compare, ana- lyze, and interpret with a good faith attempt at objectivity” (p. 222). Evaluation emerged from social science research. While we will describe the growth and emergence of new approaches and paradigms, and the role of evaluators in edu- cating users to our purposes, stakeholders typically contract with evaluators to provide technical or “scientific” expertise and/or an outside “objective” opinion. Evaluators can occasionally play an important role in making program stakehold- ers aware of research on other similar programs. Sometimes, the people manag- ing or operating programs or the people making legislative or policy decisions on programs are so busy fulfilling their primary responsibilities that they are not aware of other programs or agencies that are doing similar things and the research conducted on these activities. Evaluators, who typically explore existing research on similar programs to identify potential designs and measures, can play the role

18 Part I • Introduction to Evaluation

of scientific expert in making stakeholders aware of research. (See, for example, Fitzpatrick and Bledsoe [2007] for a discussion of Bledsoe’s role in informing stakeholders of existing research on other programs.)

Thus, the evaluator takes on many roles. In noting the tension between advocacy and neutrality, Weiss (1998b) writes that the role(s) evaluators play will depend heavily on the context of the evaluation. The evaluator may serve as a teacher or critical friend in an evaluation designed to improve the early stages of a new reading program. The evaluator may act as a facilitator or collaborator with a community group appointed to explore solutions to problems of unemployment in the region. In conducting an evaluation on the employability of new immigrant groups in a state, the evaluator may act to stimulate dialogue among immigrants, policymakers, and nonimmigrant groups competing for employment. Finally, the evaluator may serve as an outside expert in designing and conducting a study for Congress on the effectiveness of annual testing in improving student learning.

In carrying out these roles, evaluators undertake many activities. These include negotiating with stakeholder groups to define the purpose of evaluation, developing contracts, hiring and overseeing staff, managing budgets, identifying disenfranchised or underrepresented groups, working with advisory panels, collecting and analyzing and interpreting qualitative and quantitative information, commu- nicating frequently with various stakeholders to seek input into the evaluation and to report results, writing reports, considering effective ways to disseminate information, meeting with the press and other representatives to report on progress and results, and recruiting others to evaluate the evaluation (metaevalu- ation). These, and many other activities, constitute the work of evaluators. Today, in many organizations, that work might be conducted by people who are formally trained and educated as evaluators, attend professional conferences and read widely in the field, and identify their professional role as an evaluator, or by staff who have many other responsibilities—some managerial, some working directly with students or clients—but with some evaluation tasks thrown into the mix. Each of these will assume some of the roles described previously and will conduct many of the tasks listed.

Uses and Objects of Evaluation

At this point, it might be useful to describe some of the ways in which evaluation can be used. An exhaustive list would be prohibitive, filling the rest of this book and more. Here we provide only a few representative examples of uses made of evaluation in selected sectors of society.

Examples of Evaluation Use in Education 1. To empower teachers to have more say in how school budgets are allocated 2. To judge the quality of school curricula in specific content areas 3. To accredit schools that meet or exceed minimum accreditation standards

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 19

4. To determine the value of a middle school’s block scheduling 5. To satisfy an external funding agency’s demands for reports on effectiveness

of school programs it supports 6. To assist parents and students in selecting schools in a district with school

choice 7. To help teachers improve their reading program to encourage more volun-

tary reading

Examples of Evaluation Use in Other Public and Nonprofit Sectors 1. To decide whether to expand an urban transit program and where it should

be expanded 2. To establish the value of a job training program 3. To decide whether to modify a low-cost housing project’s rental policies 4. To improve a recruitment program for blood donors 5. To determine the impact of a prison’s early-release program on recidivism 6. To gauge community reaction to proposed fire-burning restrictions to im-

prove air quality 7. To determine the effect of an outreach program on the immunization of in-

fants and children

Examples of Evaluation Use in Business and Industry 1. To improve a commercial product 2. To judge the effectiveness of a corporate training program on teamwork 3. To determine the effect of a new flextime policy on productivity, recruitment,

and retention 4. To identify the contributions of specific programs to corporate profits 5. To determine the public’s perception of a corporation’s environmental image 6. To recommend ways to improve retention among younger employees 7. To study the quality of performance appraisal feedback

One additional comment about the use of evaluation in business and indus- try may be warranted. Evaluators unfamiliar with the private sector are sometimes unaware that personnel evaluation is not the only use made of evaluation in business and industry settings. Perhaps that is because the term “evaluation” has been absent from the descriptors for many corporate activities and programs that, when examined, are decidedly evaluative. Activities labeled as quality assurance, quality control, research and development, Total Quality Management (TQM), or Continuous Quality Improvement (CQI) turn out, on closer inspection, to possess many characteristics of program evaluation.

Uses of Evaluation Are Generally Applicable

As should be obvious by now, evaluation methods are clearly portable from one arena to another. The use of evaluation may remain constant, but the entity it is ap- plied to—that is, the object of the evaluation—may vary widely. Thus, evaluation

20 Part I • Introduction to Evaluation

may be used to improve a commercial product, a community training program, or a school district’s student assessment system. It could be used to build organizational capacity in the Xerox Corporation, the E. F. Lilly Foundation, the Minnesota Department of Education, or the Utah Division of Family Services. Evaluation can be used to empower parents in the San Juan County Migrant Education Program, workers in the U.S. Postal Service, employees of Barclays Bank of England, or residents in east Los Angeles. Evaluation can be used to provide information for decisions about programs in vocational education centers, community mental health clinics, university medical schools, or county cooperative extension offices. Such examples could be multiplied ad infinitum, but these should suffice to make our point.

In some instances, so many evaluations are conducted of the same type of object that it prompts suggestions for techniques found to be particularly helpful in evalu- ating something of that particular type. An example would be Kirkpatrick’s (1977; 1983; 2006) model for evaluating training efforts. In several areas, concern about how to evaluate broad categories of objects effectively has led to the development of various subareas within the field of evaluation, such as product evaluation, personnel evaluation, program evaluation, policy evaluation, and performance evaluation.

Some Basic Types of Evaluation

Formative and Summative Evaluation

Scriven (1967) first distinguished between the formative and summative roles of evaluation. Since then, the terms have become almost universally accepted in the field. In practice, distinctions between these two types of evaluation may blur somewhat, but the terms serve an important function in highlighting the types of decisions or choices that evaluation can serve. The terms, in fact, contrast two different types of actions that stakeholders might take as a result of evaluation.

An evaluation is considered to be formative if the primary purpose is to pro- vide information for program improvement. Often, such evaluations provide infor- mation to judge the merit or worth of one part of a program. Three examples follow:

1. Planning personnel in the central office of Perrymount School District have been asked by the school board to plan a new, and later, school day for the local high schools. This is based on research showing that adolescents’ biological clocks cause them to be more groggy in the early morning hours and on parental con- cerns about teenagers being released from school as early as 2:30 P.M. A forma- tive evaluation will collect information (surveys, interviews, focus groups) from parents, teachers and school staff, and students regarding their views on the cur- rent school schedule calendar and ways to change and improve it. The planning staff will visit other schools using different schedules to observe these schedules and to interview school staff on their perceived effects. The planning staff will then give the information to the Late Schedule Advisory Group, which will make final recommendations for changing the existing schedule.

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 21

2. Staff with supervisory responsibilities at the Akron County Human Resources Department have been trained in a new method for conducting performance appraisals. One of the purposes of the training is to improve the performance appraisal interview so that employees receiving the appraisal feel motivated to improve their performance. The trainers would like to know if the information they are providing on conducting interviews is being used by those supervisors who com- plete the program. They plan to use the results to revise this portion of the training program. A formative evaluation might include observing supervisors conducting actual, or mock, interviews, as well as interviewing or conducting focus groups with both supervisors who have been trained and employees who have been re- ceiving feedback. Feedback for the formative evaluation might also be collected from participants in the training through a reaction survey delivered either at the conclusion of the training or a few weeks after the training ends, when trainees have had a chance to practice the interview.

3. A mentoring program has been developed and implemented to help new teachers in the classroom. New teachers are assigned a mentor, a senior teacher who will provide them with individualized assistance on issues ranging from dis- cipline to time management. The focus of the program is on helping mentors learn more about the problems new teachers are encountering and helping them find solutions. Because the program is so individualized, the assistant principal responsible for overseeing the program is concerned with learning whether it is being implemented as planned. Are mentors developing a trusting relationship with the new teachers and learning about the problems they encounter? What are the typical problems encountered? The array of problems? For what types of prob- lems are mentors less likely to be able to provide effective assistance? Interviews, logs or diaries, and observations of meetings between new teachers and their men- tors will be used to collect data to address these issues. The assistant principal will use the results to consider how to better train and lead the mentors.

In contrast to formative evaluations, which focus on program improvement, summative evaluations are concerned with providing information to serve decisions or assist in making judgments about program adoption, continuation, or expansion. They assist with judgments about a program’s overall worth or merit in relation to important criteria. Scriven (1991a) has defined summative evaluation as “evaluation done for, or by, any observers or decision makers (by contrast with developers) who need valuative conclusions for any other reasons besides development” (p. 20). Robert Stake has memorably described the distinction between the two in this way: “When the cook tastes the soup, that’s formative evaluation; when the guest tastes it, that’s summative evaluation” (cited by Scriven, 1991a, p. 19). In the following examples we extend the earlier formative evaluations into summative evaluations.

1. After the new schedule is developed and implemented, a summative evalu- ation might be conducted to determine whether the schedule should be contin- ued and expanded to other high schools in the district. The school board might be

22 Part I • Introduction to Evaluation

the primary audience for this information because it is typically in a position to make the judgments concerning continuation and expansion or termination, but others—central office administrators, principals, parents, students, and the public at large—might be interested stakeholders as well. The study might collect infor- mation on attendance, grades, and participation in after-school activities. Other unintended side effects might be examined, such as the impact of the schedule on delinquency, opportunities for students to work after school, and other afternoon activities.

2. To determine whether the performance appraisal program should be contin- ued, the director of the Human Resource Department and his staff might ask for an evaluation of the impact of the new performance appraisal on job satisfaction and performance. Surveys of employees and existing records on performance might serve as key methods of data collection.

3. Now that the mentoring program for new teachers has been tinkered with for a couple of years using the results of the formative evaluation, the principal wants to know whether the program should be continued. The summative eval- uation will focus on turnover, satisfaction, and performance of new teachers.

Note that the audiences for formative and summative evaluation are very different. In formative evaluation, the audience is generally the people delivering the program or those close to it. In our examples, they were those responsible for developing the new schedule, delivering the training program, or managing the mentoring program. Because formative evaluations are designed to improve pro- grams, it is critical that the primary audience be people who are in a position to make changes in the program and its day-to-day operations. Summative evalua- tion audiences include potential consumers (students, teachers, employees, man- agers, or officials in agencies that could adopt the program), funding sources, and supervisors and other officials, as well as program personnel. The audiences for summative evaluations are often policymakers or administrators, but can, in fact, be any audience with the ability to make a “go–no go” decision. Teachers make such decisions with curricula. Consumers (clients, parents, and students) make decisions about whether to participate in a program based on summative infor- mation or their judgments about the overall merit or worth of a program.

A Balance between Formative and Summative. It should be apparent that both formative and summative evaluation are essential because decisions are needed during the developmental stages of a program to improve and strengthen it, and again, when it has stabilized, to judge its final worth or determine its future. Unfortunately, some organizations focus too much of their work on summative evaluations. This trend is noted in the emphases of many funders today on impact or outcome assessment from the beginning of a program or policy. An undue emphasis on summative evaluation can be unfortunate because the development process, without formative evaluation, is incomplete and inefficient. Consider the foolishness of developing a new aircraft design and submitting it to a summative

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 23

test flight without first testing it in the formative wind tunnel. Program test flights can be expensive, too, especially when we haven’t a clue about the probability of success.

Formative data collected during the early stages of a program can help identify problems in the program model or theory or in the early delivery of the program that can then be modified or corrected. People delivering the program may need more training or resources to effectively implement the model. The model may have to be adapted because the students or clients being served are not exactly as program developers anticipated. Perhaps they have different learning strategies or less knowledge, skills, or motivation than anticipated; therefore, the training program or class curriculum should be expanded or changed. In other cases, students or clients who participate in a program may have more, or different, skills or problems than program planners anticipated. The program, then, must be adapted to address those.4 So, a formative evalua- tion can be very useful at the beginning of a program to help it succeed in achieving its intended outcomes.

Conversely, some organizations may avoid summative evaluations. Evaluat- ing for improvement is critical, but, ultimately, many products and programs should be judged for their overall merit and worth. Henry (2000) has noted that evaluation’s emphasis on encouraging use of results can lead us to serving incre- mental, often formative, decisions and may steer us away from the primary pur- pose of evaluation—determining merit and worth.

Although formative evaluations more often occur in the early stages of a program’s development and summative evaluations more often occur in its later stages, it would be an error to think they are limited to those time frames. Well- established programs can benefit from formative evaluations. Some new pro- grams are so problematic that summative decisions are made to discontinue. However, the relative emphasis on formative and summative evaluation changes throughout the life of a program, as suggested in Figure 1.1, although this generalized concept obviously may not precisely fit the evolution of any particu- lar program.

An effort to distinguish between formative and summative evaluation on several dimensions appears in Table 1.2. As with most conceptual distinctions, formative and summative evaluation are often not as easy to distinguish in the real world as they seem in these pages. Scriven (1991a) has acknowledged that the two are often profoundly intertwined. For example, if a program continues beyond a summative evaluation study, the results of that study may be used for both sum- mative and, later, formative evaluation purposes. In practice, the line between formative and summative is often rather fuzzy.

4See the interview with Stewart Donaldson about his evaluation of a work-training program (Fitzpatrick & Donaldson, 2002) in which he discusses his evaluation of a program that had been suc- cessful in Michigan, but was not adapted to the circumstances of California sites, which differed in the reasons why people were struggling with returning to the workforce. The program was designed an- ticipating that clients would have problems that these clients did not have.

24 Part I • Introduction to Evaluation

FIGURE 1.1 Relationship between Formative and Summative Evaluation

Formative Evaluation

Summative Evaluation

Program Life

Re la

tiv e

Em ph

as is

TABLE 1.2 Differences between Formative and Summative Evaluation

Formative Evaluation Summative Evaluation

Use To improve the program To make decisions about the program’s future or adoption

Audience Program managers and staff Administrators, policymakers, and/or potential consumers or funding agencies

By Whom Often internal evaluators supported by external evaluators

Often external evaluators, supported by internal evaluators

Major Characteristics Provides feedback so program personnel can improve it

Provides information to enable decision makers to decide whether to continue it, or consumers to adopt it

Design Constraints What information is needed? When?

What standards or criteria will be used to make decisions?

Purpose of Data Collection

Diagnostic Judgmental

Frequency of Data Collection

Frequent Infrequent

Sample Size Often small Usually large

Questions Asked What is working? What needs to be improved? How can it be improved?

What results occur? With whom? Under what conditions? With what training? At what cost?

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 25

Beyond Formative and Summative. Our discussion of the purposes of evaluation reflects the changes and expansions that have occurred in the practice of evalua- tion over the decades. Michael Patton (1996) has described three purposes of eval- uation that do not fall within the formative or summative dimension. These include the following:

1. The contribution of evaluation to conceptual thinking, rather than immediate or instrumental decisions or judgments, about an object. As evaluation practice has expanded and research has been conducted on how evaluation is used, eval- uators have found that evaluation results are often not used immediately, but, rather, are used gradually—conceptually—to change stakeholders’ thinking about the clients or students they serve, about the logic models or theories for programs, or about the ways desired outcomes can be achieved.

2. Evaluation for broad, long-term organizational learning and continuous im- provement. Patton’s developmental evaluation falls within this category. Results from such evaluations are not used for direct program improvement (formative purposes), but to help organizations consider future directions, changes, and adap- tations that should be made because of new research findings or changes in the context of the program and its environment. (See Preskill [2008]; Preskill and Torres [2000].)

3. Evaluations in which the process of the evaluation may have more import than the use of the results. As we will discuss in Chapter 17, research on the use of evaluation has found that participation in the evaluation process itself, not just the results of the evaluation, can have important impacts. Such participation can change the way people plan programs in the future by providing them with skills in developing logic models for programs or by empowering them to participate in program planning and development in different ways. As we discussed, one pur- pose of evaluation is to improve democracy. Some evaluations empower the pub- lic or disenfranchised stakeholder groups to participate further in decision making by providing them with information or giving them a voice through the evalua- tion to make their needs or circumstances known to policymakers.

The distinction between formative and summative evaluations remains a pri- mary one when considering the types of decisions the evaluation will serve. How- ever, it is important to remember the other purposes of evaluation and, in so doing, to recognize and consider these purposes when planning an evaluation so that each evaluation may reach its full potential.

Needs Assessment, Process, and Outcome Evaluations

The distinctions between formative and summative evaluation are concerned pri- marily with the kinds of decisions or judgments to be made with the evaluation results. The distinction between the relative emphasis on formative or summative evaluation is an important one to make at the beginning of a study because it

26 Part I • Introduction to Evaluation

informs the evaluator about the context, intention, and potential use of the study and has implications for the most appropriate audiences for the study. However, the terms do not dictate the nature of the questions the study will address. Chen (1996) has proposed a typology to permit consideration of process and outcome along with the formative and summative dimension. We will discuss that typology here, adding needs assessment to the mix.

Some evaluators use the terms “needs assessment,” “process,” and “out- come” to refer to the types of questions the evaluation study will address or the fo- cus of the evaluation. These terms also help make the reader aware of the full array of issues that evaluators examine. Needs assessment questions are concerned with (a) establishing whether a problem or need exists and describing that problem, and (b) making recommendations for ways to reduce the problem; that is, the poten- tial effectiveness of various interventions. Process, or monitoring studies, typically describe how the program is delivered. Such studies may focus on whether the program is being delivered according to some delineated plan or model or may be more open-ended, simply describing the nature of delivery and the successes and problems encountered. Process studies can examine a variety of different issues, including characteristics of the clients or students served, qualifications of the de- liverers of the program, characteristics of the delivery environment (equipment, printed materials, physical plant, and other elements of the context of delivery), or the actual nature of the activities themselves. Outcome or impact studies are concerned with describing, exploring, or determining changes that occur in pro- gram recipients, secondary audiences (families of recipients, coworkers, etc.), or communities as a result of a program. These outcomes can range from immediate impacts or outputs (for example, achieving immediate learning objectives in a les- son or course) to longer-term objectives, final goals, and unintended outcomes.

Note that these terms do not have implications for how the information will be used. The terms formative and summative help us distinguish between the ways in which the results of the evaluation may be used for immediate decision making. Needs assessment, process, and outcome evaluations refer to the nature of the issues or questions that will be examined. In the past, people have occasionally misused the term formative to be synonymous with process evaluation, and summative to be synonymous with outcome evaluation. However, Scriven (1996) himself notes that “formative evaluations are not a species of process evaluation. Conversely, sum- mative evaluation may be largely or entirely process evaluation” (p. 152).

Table 1.3 illustrates the application of these evaluation terms building on a typology proposed by Chen (1996); we add needs assessment to Chen’s typology. As Table 1.3 illustrates, an evaluation can be characterized by the action the eval- uation will serve (formative or summative) as well as by the nature of the issues it will address.

To illustrate, a needs assessment study can be summative (Should we adopt this new program or not?) or formative (How should we modify this program to deliver it in our school or agency?). A process study often serves formative purposes, providing information to program providers or managers about how to change activities to improve the quality of the program delivery to make it more likely that

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 27

TABLE 1.3 A Typology of Evaluation Studies

Judgment

What to Revise/Change Formative

What to Begin, Continue, Expand Summative

Needs Assessment How should we adapt the model we are considering?

Should we begin a program? Is there sufficient need?

Process Is more training of staff needed to deliver the program appropriately?

Are sufficient numbers of the target audience participating in the program to merit continuation?

Outcome How can we revise our curricula to better achieve desired outcomes?

Is this program achieving its goals to a sufficient degree that its funding should be continued?

objectives will be achieved, but a process study may also serve summative purposes. A process study may reveal that the program is too complex or expensive to deliver or that program recipients (students, trainees, clients) do not enroll as expected. In such cases, a process study that began as a formative evaluation for program improvement may lead to a summative decision to discontinue the program. Accountability studies often make use of process data to make summative decisions.

An outcome study can, and often does, serve formative or summative purposes. Formative purposes may be best served by examining more immediate outcomes be- cause program deliverers have greater control over the actions leading to these out- comes. For example, teachers and trainers often make use of immediate measures of student learning to make changes in their curriculum or methods. They may decide to spend more time on certain areas or to expand on the types of exercises or prob- lems students practice to better achieve certain learning goals, or they may spend less time on areas in which students have already achieved competency. Policymakers making summative decisions, however, are often more concerned with the pro- gram’s success at achieving other, more global outcomes, such as graduation rates or employment placement, because their responsibility is with these outcomes. Their decisions regarding funding concern whether programs achieve these ultimate out- comes. The fact that a study examines program outcomes, or effects, however, tells us nothing about whether the study serves formative or summative purposes.

Internal and External Evaluations

The adjectives “internal” and “external” distinguish between evaluations conducted by program employees and those conducted by outsiders. An experimental year- round education program in the San Francisco public schools might be evaluated by a member of the school district staff (internal) or by a site-visit team appointed by the California State Board of Education (external). A large health care organization with facilities in six communities might have a member of each facility’s staff evaluate the

F oc

u s

of Q

u es

ti on

28 Part I • Introduction to Evaluation

TABLE 1.4 Advantages of Internal and External Evaluators

Internal External

More familiar with organization & program history

Can bring greater credibility, perceived objectivity

Knows decision-making style of organization

Typically brings more breadth and depth of technical expertise for a particular evaluation

Is present to remind others of results now and in future

Has knowledge of how other similar organizations and programs work

Can communicate technical results more frequently and clearly

effectiveness of their outreach program in improving immunization rates for infants and children (internal), or the organization may hire a consulting firm or university research group to look at all six programs (external).

Seems pretty simple, right? Often it is, but how internal is the evaluation of the year-round school program if it is conducted by an evaluation unit at the cen- tral office, which is quite removed from the charter school implementing the pro- gram? Is that an internal or external evaluation? Actually, the correct answer is both, for such an evaluation is clearly external from the perspective of those in the charter school, yet might be considered an internal evaluation from the perspec- tive of the state board of education or parents in the district.

There are obvious advantages and disadvantages connected with both internal and external evaluation roles. Table 1.4 summarizes some of these. Internal evalu- ators are likely to know more about the program, its history, its staff, its clients, and its struggles than any outsider. They also know more about the organization and its culture and styles of decision making. They are familiar with the kinds of informa- tion and arguments that are persuasive, and know who is likely to take action and who is likely to be persuasive to others. These very advantages, however, are also disadvantages. They may be so close to the program that they cannot see it clearly. (Note, though, that each evaluator, internal and external, will bring his or her own history and biases to the evaluation, but the internal evaluators’ closeness may pre- vent them from seeing solutions or changes that those newer to the situation might see more readily.) While successful internal evaluators may overcome the hurdle of perspective, it can be much more difficult for them to overcome the barrier of posi- tion. If internal evaluators are not provided with sufficient decision-making power, autonomy, and protection, their evaluation will be hindered.

The strengths of external evaluators lie in their distance from the program and, if the right evaluators are hired, their expertise. External evaluators are perceived as more credible by the public and, often, by policymakers. In fact, external evaluators typically do have greater administrative and financial independence. Nevertheless, the objectivity of the external evaluator can be overestimated. (Note the role of the ex- ternal Arthur Andersen firm in the 2002 Enron bankruptcy and scandal. The lure of obtaining or keeping a large contract can prompt external parties to bend the rules to

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 29

keep the contract.) However, for programs with high visibility or cost or those sur- rounded by much controversy, an external evaluator can provide a desirable degree of autonomy from the program. External evaluators, if the search and hiring process are conducted appropriately, can also bring the specialized skills needed for a particu- lar project. In all but very large organizations, internal evaluators must be jacks-of-all- trades to permit them to address the ongoing evaluation needs of the organization. When seeking an external evaluator, however, an organization can pinpoint and seek the types of skills and expertise needed for that time-limited project.

Organizing Internal Evaluation for Maximum Effect. In recent years, evaluations conducted by people employed by the organization have grown exponentially as funders’ demands for accountability have increased. This growth is at least partly due to professional evaluators’ emphasis on building internal organizational capac- ity to conduct evaluation. (Capacity building and mainstreaming evaluation were the conference themes for the American Evaluation Association in 2000 and 2001, respectively, with the 2001 conference focusing on one of our co-authors’ themes, mainstreaming evaluation. See Leviton, [2001] and Sanders, [2002] for their pub- lished Presidential Addresses on the subjects.) We will discuss capacity building fur- ther in Chapter 9, but in this section we will discuss ways in which to structure internal evaluation to improve evaluation and the performance of the organization.

First, a comment on internal evaluators. For many years, large school districts had, and many continue to have, internal evaluation units. The economic con- straints on education have reduced the number of districts with strong internal evaluation units, but such units remain in many districts. (See, for example, Christie’s interview with Eric Barela, an internal evaluator with the Los Angeles Unified School District, Christie and Barela [2008]). In many nonprofit organiza- tions, internal evaluation capacity has increased in recent years. This growth has been spurred by United Way of America (UWA), a major funding source for many nonprofit, human service organizations, which encouraged these organizations to implement its evaluation strategy for measuring outcomes (Hendricks, Plantz, & Pritchard, 2008). Today, approximately 19,000 local agencies funded by United Way conduct internal evaluations, supplemented with training by United Way, to measure agency outcomes. Similarly, Cooperative Extensions and other organiza- tions are active in conducting internal evaluations (Lambur, 2008). State and local governments have been thrust into a more active evaluation role through federal performance-based management systems. All these efforts have prompted public and nonprofit organizations to train existing staff to, at minimum, report data on program outcomes and, often, to conduct evaluations to document those outcomes.

Given the growth in internal evaluation, it is appropriate to consider how internal evaluations can be conducted for the maximum effect. Evaluators have been writing about ways to enhance internal evaluation for some years (Chelimsky, 1994; Love, 1983, 1991; Scriven, 1975; Sonnichsen, 1987, 1999; Stufflebeam, 2002a). Probably the two most important conditions identified for successful internal evaluations are (a) active support for evaluation from top administrators within the organization and (b) clearly defined roles for internal

30 Part I • Introduction to Evaluation

evaluators. The strength of internal evaluators is their ongoing contribution to decision making within the organization. Without the active support of leaders within the organization, internal evaluators cannot fulfill that role.

Where evaluators should be located in a large organization is an area of some disagreement. Internal evaluators must be situated where they can understand organizational problems, initiate or plan evaluations to address those problems, and be in a position to frequently communicate results to the stakeholders who can use them. Some argue that internal evaluators should, therefore, be placed centrally within the organization where they can work closely with top decision makers. In this way, the internal evaluators can serve in an advisory function to top managers and are able to communicate information from a variety of evaluation studies as needed. Many, if not most, internal evaluation units are centrally located in the organization and, hence, have the potential to serve in that capacity. With proximity to top managers, the director of an internal evaluation unit can continue to demonstrate the value of evaluation to the organization.

Others (Lambur, 2008), however, have argued that internal evaluators should be dispersed among program units where they can provide useful, forma- tive evaluation for program improvement directly to people who are delivering the organization’s programs. In such positions, internal evaluators can build a more trusting relationship with program deliverers and increase the chances that the results of their evaluations will be used. Lambur, in interviews with internal evaluators in cooperative extension offices, found disadvantages to being “closely aligned with administration” (2008, p. 49). Staff who are delivering programs, such as teachers, social workers, trainers, and others, see evaluation in the central office as being more concerned with accountability and responding to federal gov- ernment demands and less concerned with improving programs. Lambur found evaluators who worked in program units were able to become closer to the pro- grams, and, as a result, they believed, knew how to conduct more useful evalua- tions. They recognized the potential for being less objective, but worked to make their evaluations more rigorous. In such positions, internal evaluators can serve in Rallis and Rossman’s role of critical friend (2000).

Patton (2008b) has also interviewed internal evaluators and has found that they face many challenges. They can be excluded from major decisions and asked to spend time on public relations functions rather than true evaluation. In addition, they do, in fact, spend much time gathering data for accountability requirements from external funding sources; this takes away time from developing relationships with administrators and people who deliver the program. Internal evaluators are often, but not always, full-time evaluators. Like many professionals in organiza- tions, they can have other responsibilities that conflict with their evaluation role.

Patton (2008b) and Lambur (2008) argue that internal evaluators face com- peting demands in evaluating for accountability and for program improvement. Both argue that the emphasis for internal evaluators should be on program improvement. Lambur writes,

“Through my personal experience [as an internal evaluator], I learned it was far more effective to promote evaluation as a tool for improving programs than helping the

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 31

organization meet demands for accountability. If program staff view themselves as primary stakeholders for evaluation results, they are more apt to become engaged in the process of conducting high-quality evaluations. Results of such evaluations can be used first for program improvement, and then for accountability purposes.”

Those writing about the organization of internal evaluation acknowledge the difficulties an internal evaluator faces, but provide many useful suggestions. The solution, however, for an individual organization can depend on its mission and purpose. In some organizations, placing evaluation in a central location with top administrators can provide the distance from programs needed for credibility in important summative evaluations and can supply evaluators with avenues for affecting organizational learning and culture through educating key administrators about the role of evaluation. In other organizations, it can be important to place evaluators in program units where they can focus on the improvement of indi- vidual programs. In either case, internal evaluators require organizational support from top managers, mid-level managers, and supervisors. Internal evaluators can help create a true learning organization, where evaluation is looked to for valuable information to make decisions. To do so, though, requires careful planning and continuous communication and support from others in clarifying and supporting the role of evaluation in the organization.

Possible Role Combinations. Given the growth in internal evaluation capacity, considering how to combine internal and external evaluation is important. One way is to consider the purposes of evaluation. The dimensions of formative and summative evaluation can be combined with the dimensions of internal and external evaluation to form the two-by-two matrix shown in Figure 1.2. The most common roles in evaluation might be indicated by cells 1 and 4 in the matrix. For- mative evaluations are often conducted by internal evaluators, and there are clear merits in such an approach. Their knowledge of the program, its history, staff, and clients is of great value, and credibility is not nearly the problem it would be in a summative evaluation. Program personnel are often the primary audience, and the evaluator’s ongoing relationship with them can enhance the use of results in a good learning organization. Summative evaluations are probably best conducted

Internal

Formative

Summative

1 Internal

Formative

3 Internal

Summative

External

2 External

Formative

4 External

Summative FIGURE 1.2 Combination of Evaluation Roles

32 Part I • Introduction to Evaluation

by external evaluators. It is difficult, for example, to know how much credibility to attach to a Ford Motor Company evaluation that concludes that a particular Ford automobile is far better than its competitors in the same price range. The credibility accorded to an internal summative program evaluation (cell 3) in a school or nonprofit organization may be no better.

In some cases, though, funds are not available for external evaluators, or competent external evaluators cannot be identified. In many cases, summative evaluations are conducted internally and, in such cases, role combinations are possible to improve the credibility of the results. Patton (2008a) suggests using ex- ternal evaluators to review and comment on the quality of internal evaluations. In other cases, external evaluators can design critical elements of the evaluation, helping define the evaluation questions and developing evaluation designs and measures, perhaps working jointly with an internal evaluation team. Internal evaluators can then work to implement the evaluation and to develop effective means for communicating results to different stakeholder groups. Such role com- binations can save critical fiscal resources, improve internal capacity, and enhance the credibility of the results. (See, for example, Fitzpatrick’s interview with Debra Rog concerning her role as an external evaluator in a project for homeless fami- lies spanning several cities. She discusses the role of staff within each organization in helping conduct and plan the evaluation with her guidance [Fitzpatrick and Rog, 1999]). In any case, when a summative evaluation is conducted internally, man- agers within the organization need to attend to the position of the evaluators in the organization relative to the program being evaluated. They must work to ensure maximum independence and must not place evaluators in the untenable position of evaluating programs developed by their boss or colleagues.

Sonnichsen (1999) writes of the high impact that internal evaluation can have if the organization has established conditions that permit the internal evalu- ator to operate effectively. The factors that he cites as being associated with eval- uation offices that have a strong impact on the organization include operating as an independent entity, reporting to a top official, giving high rank to the head of the office, having the authority to self-initiate evaluations, making recommenda- tions and monitoring their implementation, and disseminating results widely throughout the organization. He envisions the promise of internal evaluation, writing, “The practice of internal evaluation can serve as the basis for organiza- tional learning, detecting and solving problems, acting as a self-correcting mecha- nism by stimulating debate and reflection among organizational actors, and seeking alternative solutions to persistent problems” (Sonnichsen, 1999, p. 78).

Evaluation’s Importance—and Its Limitations

Given its many uses, it may seem almost axiomatic to assert that evaluation is not only valuable but essential in any effective system or society. Citizens look to eval- uation for accountability. Policymakers and decision makers call on it and use it

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 33

to make important decisions. Program staff can use evaluation to plan and improve programs to better meet clients’ and societal needs and to make decisions about how to stay within their budget. Consumers, such as parents, students, and voluntary clients, can make choices about schools for themselves or their children or the hospital, clinic, or agency they will contact for services. Evaluators can per- form many roles for those delivering programs. These include helping them develop good programs, helping them deliver the programs to changing clients for changing contexts, and helping them find interventions that are most successful in achieving their goals. Evaluators can help organizations as a whole through stimulating a learning culture, thereby helping those in the organization to ques- tion and consider their goals and their methods, their clients and their needs, and showing them how to use evaluative inquiry methods to meet their needs. As some evaluators note, evaluation plays an important continuing role in democ- racy. It informs citizens and, thus, empowers them to influence their schools, their government, and their nonprofit organizations. It can influence the power of stakeholders who have been absent from important decisions by giving them voice and power through evaluation. Scriven (1991b) said it well:

The process of disciplined evaluation permeates all areas of thought and practice. . . . It is found in scholarly book reviews, in engineering’s quality control procedures, in the Socratic dialogues, in serious social and moral criticism, in mathematics, and in the opinions handed down by appellate courts. . . . It is the process whose duty is the systematic and objective determination of merit, worth, or value. Without such a process, there is no way to distinguish the worthwhile from the worthless. (p. 4)

Scriven also argues the importance of evaluation in pragmatic terms (“bad products and services cost lives and health, destroy the quality of life, and waste the resources of those who cannot afford waste”), ethical terms (“evaluation is a key tool in the service of justice”), social and business terms (“evaluation directs effort where it is most needed, and endorses the ‘new and better way’ when it is better than the traditional way—and the traditional way where it’s better than the new high-tech way”), intellectual terms (“it refines the tools of thought”), and personal terms (“it provides the only basis for justifiable self-esteem”) (p. 43). Perhaps for these reasons, evaluation has increasingly been used as an instrument to pursue goals of organi- zations and agencies at local, regional, national, and international levels.

But, evaluation’s importance is not limited to the methods used, the stake- holder supplied with information, or the judgment of merit or worth that is made. Evaluation gives us a process to improve our ways of thinking and, therefore, our ways of developing, implementing, and changing programs and policies. Schwandt has argued that evaluators need to cultivate in themselves and others an intelligent belief in evaluation. He writes that “possessing (and acting on) an intelligent belief in evaluation is a special obligation of evaluators—those who claim to be well pre- pared in the science and art of making distinctions of worth” (2008, p. 139). He reminds us that evaluation is not simply the methods, or tools, that we use, but a way of thinking. Citing some problematic trends we see in society today, the political

34 Part I • Introduction to Evaluation

manipulation of science and the tendency to see or argue for all-or-nothing solutions that must be used in all settings in the same way, Schwandt calls for evaluators to help citizens and stakeholders use better means of reasoning. This better means of reasoning would draw on the kinds of thinking good evaluators should do. The characteristics of such reasoning include a tolerance for ambiguity, a recognition of multiple perspectives and a desire to learn from those different perspectives, a desire to experiment or to become what Don Campbell called an “experimenting society.” Describing this society and evaluation’s role in it, Schwandt writes:

This is a society in which we ask serious and important questions about what kind of society we should have and what directions we should take. This is a social environment indelibly marked by uncertainty, ambiguity, and interpretability. Evaluation in such an environment is a kind of social conscience; it involves serious questioning of social direction; and it is a risky undertaking in which we endeavor to find out not simply whether what we are doing is a good thing but also what we do not know about what we are doing. So we experiment—we see what we can learn from different ways of knowing. In evaluation, we try to work from the top down (so to speak) using what policy makers say they are trying to do as a guide, as well as from the bottom up, doing evaluation that is heavily participant oriented or user involved. All this unfolds in an atmosphere of questioning, of multiple visions of what it is good to do, of multiple interpretations of whether we as a society are doing the right thing. (2008, p. 143)

As others in evaluation have done, Schwandt is reminding us of what eval- uation should be. As evaluators, we learn how to use research methods from many disciplines to provide information and reach judgments about programs and policies, but our methods and theories underlie an approach to reasoning. This approach is its greatest promise.

Limitations of Evaluation. In addition to its potential for impact, evaluation has many limitations. Although the purpose of this book is to help the reader learn how to conduct good evaluations, we would be remiss if we did not discuss these limitations. The methods of evaluation are not perfect ones. No single study, even those using multiple methods, can provide a wholly accurate picture of the truth because truth is composed of multiple perspectives. Formal evaluation is more suc- cessful than informal evaluation, in part, because it is more cautious and more sys- tematic. Formal evaluation is guided by explicit questions and criteria. It considers multiple perspectives. Its methods allow one to follow the chain of reasoning, the evaluative argument, and to more carefully consider the accuracy, or the validity, of the results. But evaluations are constrained by realities, including some charac- teristics of the program and its context, the competencies of the evaluation staff, the budget, the timeframe, and the limits of what measures can tell us.

A more important limitation to evaluation than the methodological and fis- cal ones, however, are the political ones. We live in a democracy. That means that elected, and appointed, officials must attend to many issues. Results of evaluations

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 35

are not their sole source of information by any means, nor should they be. Citi- zens’ input and expectations obviously play a role in decisions. Many stakeholder groups, experts, lawmakers, policymakers, and, yes, lobbyists, have information and experience that are important to consider. So, in the best of situations, evalu- ation is simply one piece of information, albeit an important piece, we hope, in the marble cake of sources used by decision makers in a democracy.

Finally, both evaluators and their clients may have been limited by a ten- dency to view evaluation as a series of discrete studies rather than a continuing system representing an approach to reasoning and personal and organizational growth. It can be difficult to question what you do and the activities that you be- lieve in, but evaluative inquiry must prompt us to do that, both in evaluating our evaluations (metaevaluation) and in evaluating programs. A few poorly planned, badly executed, or inappropriately ignored evaluations should not surprise us; such failings occur in every field of human endeavor. This book is intended to help evaluators, and the policymakers, managers, and all the other stakeholders who participate in and use evaluations, to improve their evaluative means of reason- ing and to improve the practice of evaluation.

Major Concepts and Theories

1. Evaluation is the identification, clarification, and application of defensible criteria to determine an evaluation object’s value, its merit or worth, in regard to those criteria. The specification and use of explicit criteria distinguish formal evaluation from the informal evaluations most of us make daily.

2. Evaluation differs from research in its purpose, the role of the evaluator and the researcher in determining the focus of the study, the criteria used to judge its quality, its involvement of stakeholders, and the competencies required of those who practice it.

3. The basic purpose of evaluation is to render judgments about the value of the object under evaluation. Other purposes include providing information for program and organizational improvement and to make decisions, working to better society and to improve and sustain democratic values, encouraging meaningful dialogue among many diverse stakeholders, as well as adding to our knowledge concerning the application of social science theory, and providing oversight and compliance for programs.

4. Evaluators play many roles including facilitator, planner, advocate, scientific expert, critical friend, collaborator, and aid to decision makers and other stakeholder groups.

5. Evaluations can serve formative or summative decisions as well as other purposes. Formative evaluations are designed for program improvement. The audience is, most typically, stakeholders close to the program. Summative evaluations serve decisions about program adoption, continuation, or expansion. Audiences for these evaluations must have the ability to make such “go-no go” decisions.

6. Evaluations can address needs assessment, process, or outcome questions. Any of these types of questions can serve formative or summative purposes.

7. Evaluators may be internal or external to the organization. Internal evaluators know the organizational environment and can facilitate communication and use of re- sults. External evaluators can provide more credibility in high-profile evaluations and bring a fresh perspective and different skills to the evaluation.

8. Evaluation goes beyond particular methods and tools to include a way of thinking. Evaluators have a role in educating stakeholders and the public about the concept of evaluation as a way of thinking and reasoning. This way of thinking includes acknowl- edging, valuing, using, and exploring different perspectives and ways of knowing, and creating and encouraging an experimenting society—one that actively questions, con- siders, and creates policies, programs, interventions, and ideas.

36 Part I • Introduction to Evaluation

Discussion Questions

1. Consider a program in your organization. If it were to be evaluated, what might be the purpose of the evaluation at this point in time? Consider the stage of the pro- gram and the information needs of different stakeholder groups. What might be the role of evaluators in conducting the evaluation?

2. What kind of evaluation do you think is most useful—formative or summative? What kind of evaluation would be most useful to you in your work? To your school board or elected officials?

3. Which do you prefer, an external or internal evaluator? Why?

4. Describe a situation in which an internal evaluator would be more appropriate than an external evaluator. What is the rationale for your choice? Now describe a situation in which an external evaluator would be more appropriate.

Application Exercises

1. List the types of evaluation studies that have been conducted in an institution or agency of your acquaintance, noting in each instance whether the evaluator was internal or external to that institution. Determine whether each study was form- ative or summative and whether it was focused on needs assessment, process, or outcome questions. Did the evaluation address the appropriate questions? If not, what other types of questions or purposes might it have addressed?

2. Think back to any formal evaluation study you have seen conducted (or if you have never seen one conducted, find a written evaluation report of one). Identify three things that make it different from informal evaluations. Then list ten informal eval- uations you have performed so far today. (Oh, yes you have!)

3. Discuss the potential and limitations of program evaluation. Identify some things evaluation can and cannot do for programs in your field.

4. Within your own organization (if you are a university student, you might choose your university), identify several evaluation objects that you believe would be ap- propriate for study. For each, identify (a) the stakeholder groups and purposes the evaluation study would serve, and (b) the types of questions the evaluation might address.

Chapter 1 • Evaluation’s Basic Purpose, Uses, and Conceptual Distinctions 37

Case Studies

In this edition, we begin a new practice to ac- quaint readers with real evaluations in order to give them a better understanding of the prac- tice of evaluation. At the end of many chapters, we will recommend one or more interviews that Jody Fitzpatrick, one of our authors, or Christina Christie conducted with a well- known evaluator concerning one evaluation he or she completed. Each article begins with a brief summary of the evaluation. Fitzpatrick or Christie then interviews the evaluator about the choices he or she made in determining the purposes of the evaluation, involving stake- holders, selecting designs and data collection methods, collecting the data, reporting the re- sults, and facilitating use. Interested readers may refer to the book that collects and analyzes these interviews:

Fitzpatrick, J. L., Christie, C. A., & Mark, M. M. (2008). Evaluation in action: Interviews with expert evalua- tors. Thousand Oaks, CA: Sage.

Or, the reader may read individual interviews published in the American Journal of Evaluation.

For this chapter we recommend two inter- views to orient the reader to two quite different

types of evaluation in Evaluation in Action Chapters 1 (James Riccio) and 7 (Gary Henry).

In Chapter 1, James Riccio describes the choices he made in an evaluation designed to judge the merit and worth of a welfare reform program for the state of California as welfare re- form initiatives first began. His major stake- holder is the California legislature, and the study illustrates a traditional, mixed-methods evaluation with significant instrumental use. The journal source is as follows: Fitzpatrick, J. L. & Riccio, J. (1997). A dialogue about an award- winning evaluation of GAIN: A welfare-to-work program. Evaluation Practice, 18, 241–252.

In Chapter 7, Gary Henry describes the de- velopment of a school “report card” for schools in Georgia during the early stages of the performance monitoring emphasis for K–12 education. The evaluation provides descriptive information to help parents, citizens, and policymakers in Geor- gia learn more about the performance of individ- ual schools. The journal source is as follows: Fitzpatrick, J. L., & Henry, G. (2000). The Georgia Council for School Performance and its perfor- mance monitoring system: A dialogue with Gary Henry. American Journal of Evaluation, 21, 105–117.

Suggested Readings

Greene, J. C. (2006). Evaluation, democracy, and social change. In I. F. Shaw, J. C. Greene, & M. M. Mark (Eds.), The Sage handbook of evaluation. London: Sage Publications.

Mark, M. M., Henry, G. T., & Julnes, G. (2000). Toward an integrative framework for evaluation practice. American Journal of Evaluation, 20, 177–198.

Patton, M. Q. (1996). A world larger than formative and summative. Evaluation Practice, 17(2), 131–144.

Rallis, S. F., & Rossman, G. B. (2000). Dialogue for learning: Evaluator as critical friend. In R. K. Hopson (Ed.), How and why language matters in

evaluation. New Directions for Evaluation, No. 86, 81–92. San Francisco: Jossey-Bass.

Schwandt, T. A. (2008). Educating for intelligent be- lief in evaluation. American Journal of Evalua- tion, 29(2), 139–150.

Sonnichsen, R. C. (1999). High impact internal evalu- ation. Thousand Oaks, CA: Sage.

Stake, R. E. (2000). A modest commitment to the promotion of democracy. In K. E. Ryan & L. DeStefano (Eds.), Evaluation as a democratic process: Promoting inclusion, dialogue, and delib- eration. New Directions for Evaluation, No. 85, 97–106. San Francisco: Jossey-Bass.

Origins and Current Trends in Modern Program Evaluation

Orienting Questions

1. How did the early stages of evaluation influence practice today?

2. What major political events occurred in the late 1950s and early 1960s that greatly accelerated the growth of evaluation thought?

3. What significant events precipitated the emergence of modern program evaluation?

4. How did evaluation evolve as a profession in the 1970s and 1980s?

5. How has evaluation changed in the last two decades? What factors have influenced these changes?

Formal evaluation of educational, social, and private-sector programs is still maturing as a field, with its most rapid development occurring during the past four decades. Compared with professions such as law, education, and accounting or disciplines like sociology, political science, and psychology, evaluation is still quite new. In this chapter, we will review the history of evaluation and its progress toward becoming a full-fledged profession and transdiscipline. This history and the concluding discussion of the current state of evaluation will make the reader better aware of all the directions that evaluation can take.

The History and Influence of Evaluation in Society

Early Forms of Formal Evaluation

Some evaluator-humorists have mused that formal evaluation was probably at work in determining which evasion skills taught in Sabertooth Avoidance 101 had the

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 39

greatest survival value. Scriven (1991c) apparently was not speaking tongue-in-cheek when suggesting that formal evaluation of crafts may reach back to the evaluation of early stone-chippers’ products, and he was obviously serious in asserting that it can be traced back to samurai sword evaluation.

In the public sector, formal evaluation was evident as early as 2000 B.C., when Chinese officials conducted civil service examinations to measure the profi- ciency of applicants for government positions. And in education, Socrates used verbally mediated evaluations as part of the learning process. But centuries passed before formal evaluations began to compete with religious and political beliefs as the driving force behind social and educational decisions.

Some commentators see the ascendancy of natural science in the seven- teenth century as a necessary precursor to the premium that later came to be placed on direct observation. Occasional tabulations of mortality, health, and pop- ulations grew into a fledgling tradition of empirical social research that grew un- til “In 1797, Encyclopedia Britannica could speak of statistics—‘state-istics,’ as it were—as a ‘word lately introduced to express a view or survey of any kingdom, county, or parish’” (Cronbach et al., 1980, p. 24).

But quantitative surveys were not the only precursor to modern social research in the 1700s. Rossi and Freeman (1985) give an example of an early British sea captain who divided his crew into a “treatment group” that was forced to consume limes, and a “control group” that consumed the sailors’ normal diet. Not only did the experiment show that “consuming limes could avert scurvy,” but “British seamen eventually were forced to consume citrus fruits—this is the derivation of the label ‘limeys,’ which is still sometimes applied to the English” (pp. 20–21).

Program Evaluation: 1800–1940

During the 1800s, dissatisfaction with educational and social programs in Great Britain generated reform movements in which government-appointed royal com- missions heard testimony and used other less formal methods to evaluate the respective institutions. This led to still-existing systems of external inspectorates for schools in England and much of Europe. Today, however, those systems use many of the modern concepts of evaluation; for example, recognition of the role of val- ues and criteria in making judgments and the importance of context. Inspectorates visit schools to make judgments concerning quality and to provide feedback for improvement. Judgments may be made about the quality of the school as a whole or the quality of teachers, subjects, or themes. (See Standaert. [2000])

In the United States, educational evaluation during the 1800s took a slightly different bent, being influenced by Horace Mann’s comprehensive annual, empirical reports on Massachusetts’s education in the 1840s and the Boston School Committee’s 1845 and 1846 use of printed tests in several subjects—the first instance of wide-scale assessment of student achievement serving as the basis for school comparisons. These two developments in Massachusetts were the first attempts at objectively measuring student achievement to assess the quality of a large school system. They set a precedent

40 Part I • Introduction to Evaluation

seen today in the standards-based education movement’s use of test scores from stu- dents as the primary means for judging the effectiveness of schools.

Later, during the late 1800s, liberal reformer Joseph Rice conducted one of the first comparative studies in education designed to provide information on the qual- ity of instructional methods. His goal was to document his claims that school time was used inefficiently. To do so, he compared a large number of schools that varied in the amount of time spent on spelling drills and then examined the students’ spelling ability. He found negligible differences in students’ spelling performance among schools where students spent as much as 100 minutes a week on spelling in- struction in one school and as little as 10 minutes per week in another. He used these data to flog educators into seeing the need to scrutinize their practices empirically.

The late 1800s also saw the beginning of efforts to accredit U.S. universities and secondary schools, although that movement did not really become a potent force for evaluating educational institutions until several strong regional accrediting associa- tions were established in the 1930s. The early 1900s saw another example of accred- itation (broadly defined) in Flexner’s (1910) evaluation—backed by the American Medical Association and the Carnegie Foundation—of the 155 medical schools then operating in the United States and Canada. Although based only on one-day site visits to each school by himself and one colleague, Flexner argued that inferior training was immediately obvious: “A stroll through the laboratories disclosed the presence or absence of apparatus, museum specimens, library and students; and a whiff told the inside story regarding the manner in which anatomy was cultivated” (Flexner, 1960, p. 79). Flexner was not deterred by lawsuits or death threats from what the medical schools viewed as his “pitiless exposure” of their medical training practices. He deliv- ered his evaluation findings in scathing terms. For example, he called Chicago’s fifteen medical schools “the plague spot of the country in respect to medical education” (p. 84). Soon “schools collapsed to the right and left, usually without a murmur” (p. 87). No one was ever left to wonder whether Flexner’s reports were evaluative.

Other areas of public interest were also subjected to evaluation in the early 1900s; Cronbach and his colleagues (1980) cite surveys of slum conditions, management and efficiency studies in the schools, and investigations of local government corruption as examples. Rossi, Freeman, and Lipsey (1998) note that evaluation first emerged in the field of public health, which was concerned with infectious diseases in urban areas, and in education, where the focus was on literacy and occupational training.

Also in the early 1900s, the educational testing movement began to gain momentum as measurement technology made rapid advances under E. L. Thorndike and his students. By 1918, objective testing was flourishing, pervading the military and private industry as well as all levels of education. The 1920s saw the rapid emergence of norm-referenced tests developed for use in measuring individual performance levels. By the mid-1930s, more than half of the United States had some form of statewide testing, and standardized, norm-referenced testing, including achievement tests and personality and interest profiles, became a huge commercial enterprise.

During this period, educators regarded measurement and evaluation as nearly synonymous, with the latter usually thought of as summarizing student test

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 41

performance and assigning grades. Although the broader concept of evaluation, as we know it today, was still embryonic, useful measurement tools for the evaluator were proliferating rapidly, even though very few meaningful, formally published evaluations of school programs or curricula would appear for another 20 years. One notable exception was the ambitious, landmark Eight Year Study (Smith & Tyler, 1942) that set a new standard for educational evaluation with its sophisticated methodology and its linkage of outcome measures to desired learning outcomes. Tyler’s work, in this and subsequent studies (e.g., Tyler, 1950), also planted the seeds of standards-based testing as a viable alternative to norm-referenced testing. (We will return in Chapter 6 to the profound impact that Tyler and those who followed in his tradition have had on program evaluation, especially in education.)

Meanwhile, foundations for evaluation were being laid in fields beyond education, including human services and the private sector. In the early decades of the 1900s, Fredrick Taylor’s scientific management movement influenced many. His focus was on systemization and efficiency—discovering the most efficient way to perform a task and then training all staff to perform it that way. The emergence of “efficiency experts” in industry soon permeated the business community and, as Cronbach et al. (1980) noted, “business executives sitting on the governing boards of social services pressed for greater efficiency in those services” (p. 27). Some cities and social agencies began to develop internal research units, and social scientists began to trickle into government service, where they started conducting applied social research in specific areas of public health, housing needs, and work productivity. However, these ancestral, social research “precursors to evaluation” were small, isolated activities that exerted little overall impact on the daily lives of the citizenry or the decisions of the government agencies that served them.

Then came the Great Depression and the sudden proliferation of government services and agencies as President Roosevelt’s New Deal programs were implemented to salvage the U.S. economy. This was the first major growth in the federal govern- ment in the 1900s, and its impact was profound. Federal agencies were established to oversee new national programs in welfare, public works, labor management, urban development, health, education, and numerous other human service areas, and increasing numbers of social scientists went to work in these agencies. Applied social research opportunities abounded, and soon social science academics began to join with their agency-based colleagues to study a wide variety of variables relating to these programs. While some scientists called for explicit evaluation of these new social programs (e.g., Stephan, 1935), most pursued applied research at the intersec- tion of their agency’s needs and their personal interests. Thus, sociologists pursued questions that were of interest to the discipline of sociology and the agency, but the questions of interest often emerged from sociology. The same trend occurred with economists, political scientists, and other academics who came to conduct research on federal programs. Their projects were considered to be “field research” and pro- vided opportunities to address important questions within their discipline in the field. (See the interview with Michael Patton in the “Suggested Readings” at the end of this chapter for an example. In this interview, he discusses how his dissertation was ini- tially planned as field research in sociology but led Patton into the field of evaluation.)

42 Part I • Introduction to Evaluation

Program Evaluation: 1940–1964

Applied social research expanded during World War II as researchers investigated government programs intended to help military personnel in areas such as reducing their vulnerability to propaganda, increasing morale, and improving the training and job placement of soldiers. In the following decade, studies were directed at new pro- grams in job training, housing, family planning, and community development. As in the past, such studies often focused on particular facets of the program in which the researchers happened to be most interested. As these programs increased in scope and scale, however, social scientists began to focus their studies more directly on entire programs rather than on the parts of them they found personally intriguing.

With this broader focus came more frequent references to their work as “evaluation research” (social research methods applied to improve a particular program).1 If we are liberal in stretching the definition of evaluation to cover most types of data collection in health and human service programs, we can safely say evaluation flourished in those areas in the 1950s and early 1960s. Rossi et al. (1998) state that it was commonplace during that period to see social scientists “engaged in evaluations of delinquency-prevention programs, felon-rehabilitation projects, psy- chotherapeutic and psychopharmacological treatments, public housing programs, and community organization activities” (p. 23). Such work also spread to other coun- tries and continents. Many countries in Central America and Africa were the sites of evaluations examining health and nutrition, family planning, and rural community development. Most such studies drew on existing social research methods and did not extend the conceptual or methodological boundaries of evaluation beyond those already established for behavioral and social research. Such efforts would come later.

Developments in educational program evaluation between 1940 and 1965 were unfolding in a somewhat different pattern. The 1940s generally saw a period of consolidation of earlier evaluation developments. School personnel devoted their energies to improving standardized testing, quasi-experimental design, accreditation, and school surveys. The 1950s and early 1960s also saw consider- able efforts to enhance the Tylerian approach by teaching educators how to state objectives in explicit, measurable terms and by providing taxonomies of possible educational objectives in the cognitive domain (Bloom, Engelhart, Furst, Hill, & Krathwohl, 1956) and the affective domain (Krathwohl, Bloom, & Masia, 1964).

In 1957, the Soviets’ successful launch of Sputnik I sent tremors through the U.S. establishment that were quickly amplified into calls for more effective teaching of math and science to American students. The reaction was immediate. Passage of the National Defense Education Act (NDEA) of 1958 poured millions of dollars into massive, new curriculum development projects, especially in mathematics and science. Only a few projects were funded, but their size and perceived importance led policymakers to fund evaluations of most of them.

The resulting studies revealed the conceptual and methodological impover- ishment of evaluation in that era. Inadequate designs and irrelevant reports were

1We do not use this term in the remainder of the book because we think it blurs the useful distinction between research and evaluation that we outlined in the previous chapter.

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 43

only some of the problems. Most of the studies depended on imported behavioral and social science research concepts and techniques that were fine for research but not very suitable for evaluation of school programs.

Theoretical work related directly to evaluation (as opposed to research) did not exist, and it quickly became apparent that the best theoretical and method- ological thinking from social and behavioral research failed to provide guidance on how to carry out many aspects of evaluation. Therefore, educational scientists and practitioners were left to glean what they could from applied social, behavioral, and educational research. Their gleanings were so meager that Cronbach (1963) penned a seminal article criticizing past evaluations and calling for new directions. Although his recommendations had little immediate impact, they did catch the attention of other education scholars, helping to spark a greatly expanded concep- tion of evaluation that would emerge in the next decade.

The Emergence of Modern Program Evaluation: 1964–1972

Although the developments discussed so far were not sufficient in themselves to create a strong and enduring evaluation movement, each helped create a context that would give birth to such a movement. Conditions were right for accelerated conceptual and methodological development in evaluation, and the catalyst was found in the War on Poverty and the Great Society, the legislative centerpieces of the administration of U.S. President Lyndon Johnson. The underlying social agenda of his administration was an effort to equalize and enhance opportunities for all citizens in virtually every sector of society. Millions of dollars were poured into programs in education, health, housing, criminal justice, unemployment, urban renewal, and many other areas.

Unlike the private sector, where accountants, management consultants, and R & D departments had long existed to provide feedback on corporate programs’ productivity and profitability, these huge, new social investments had no similar mechanism in place to examine their progress. There were government employees with some relevant competence—social scientists and technical specialists in the var- ious federal departments, particularly in the General Accounting Office (GAO)2—but they were too few and not sufficiently well organized to deal even marginally with determining the effectiveness of these vast government innovations. To complicate matters, many inquiry methodologies and management techniques that worked on smaller programs proved inadequate or unwieldy with programs of the size and scope of these sweeping social reforms.

For a time it appeared that another concept developed and practiced success- fully in business and industry might be successfully adapted for evaluating these federal programs, the Planning, Programming, and Budgeting System (PPBS). PPBS was part of the systems approach used in the Ford Motor Company—and

2This was the original name of the GAO. In 2004, its name was changed to the Government Accountability Office.

44 Part I • Introduction to Evaluation

later brought to the U.S. Department of Defense (DOD) by Robert McNamara when he became Kennedy’s secretary of defense. The PPBS was a variant of the systems approaches that were being used by many large aerospace, communica- tions, and automotive industries. It was aimed at improving system efficiency, ef- fectiveness, and budget allocation decisions by defining organizational objectives and linking them to system outputs and budgets. Many thought the PPBS would be ideally suited for the federal agencies charged with administering the War on Poverty programs, but few of the bureaucrats heading those agencies were eager to embrace it., However, PPBS was a precursor to the evaluation systems the federal government has mandated in recent years with the Government Performance Results Act (GPRA) and the Program Assessment Rating Tool (PART).

PPBS, with its focus on monitoring, outputs, and outcomes, did not succeed. Instead, the beginning of modern evaluation in the United States, Canada, and Germany was inspired by a desire to improve programs through learning from ex- perimentation on social interventions. Ray Rist, in his research with the Working Group on Policy and Program Evaluation, which was created by the International Institute on Administrative Sciences (IIAS) to study differences in evaluation across countries, placed the United States, Canada, Germany, and Sweden among what they called “first wave” countries (Rist, 1999). These were countries that began modern evaluation in the 1960s and 1970s with the goal of improving social programs and interventions. Evaluations were often part of program planning, and evaluators were located close to the programs they were evaluating. As we will discuss later in the chapter, evaluation in the early part of the twenty-first century is more akin to the earlier PPBS systems than to its first-wave origins.

The stage for serious evaluation in the United States was set by several factors. Administrators and managers in the federal government were new to managing such large programs and felt they needed help to make them work. Managers and policy- makers in government and social scientists were interested in learning more about what was working. They wanted to use the energy and funds appropriated for eval- uation to begin to learn how to solve social problems. Congress was concerned with holding state and local recipients of program grants accountable for expending funds as prescribed. The first efforts to add an evaluative element to any of these programs were small, consisting of congressionally-mandated evaluations of a federal juvenile delinquency program in 1962 (Weiss, 1987) and a federal manpower development and training program enacted that same year (Wholey, 1986). It matters little which was first, however, because neither had any lasting impact on the development of evaluation. Three more years would pass before Robert F. Kennedy would trigger the event that would send a shock wave through the U.S. education system, awakening both policymakers and practitioners to the importance of systematic evaluation.

The Elementary and Secondary Education Act. The one event that is most responsible for the emergence of contemporary program evaluation is the passage of the Elementary and Secondary Education Act (ESEA) of 1965. This bill pro- posed a huge increase in federal funding for education, with tens of thousands of federal grants to local schools, state and regional agencies, and universities. The

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 45

largest single component of the bill was Title I (later renamed Chapter 1), destined to be the most costly federal education program in U.S. history. Wholey and White (1973) called Title I the “grand-daddy of them all” among the array of legislation that influenced evaluation at the time.

When Congress began its deliberations on the proposed ESEA, concerns began to be expressed, especially on the Senate floor, that no convincing evidence existed that any federal funding for education had ever resulted in any real edu- cational improvements. Indeed, there were some in Congress who believed fed- eral funds allocated to education prior to ESEA had sunk like stones into the morass of educational programs with scarcely an observable ripple to mark their passage. Robert F. Kennedy was the most persuasive voice insisting that the ESEA require each grant recipient to file an evaluation report showing what had resulted from the expenditure of the federal funds. This congressional evaluation mandate was ultimately approved for Title I (compensatory education) and Title III (inno- vative educational projects). The requirements, while dated today, “reflected the state-of-the-art in program evaluation at that time” (Stufflebeam, Madaus, & Kellaghan, 2000, p. 13). These requirements, which reflected an astonishing amount of micromanagement at the congressional level but also the serious con- gressional concerns regarding accountability, included using standardized tests to demonstrate student learning and linking outcomes to learning objectives.

Growth of Evaluation in Other Areas. Similar trends can be observed in other areas as the Great Society developed programs in job training, urban development, housing, and other anti-poverty programs. Federal government spending on anti- poverty and other social programs increased by 600% after inflation from 1950 to 1979 (Bell, 1983). As in education, people wanted to know more about how these programs were working. Managers and policymakers wanted to know how to im- prove the programs and which strategies worked best to achieve their ambitious goals. Congress wanted information on the types of programs to continue funding. Increasingly, evaluations were mandated. In 1969, federal spending on grants and contracts for evaluation was $17 million. By 1972, it had expanded to $100 million (Shadish, Cook, & Leviton, 1991). The federal government expanded greatly to oversee the new social programs but, just as in education, the managers, political scientists, economists, and sociologists working with them were new to managing and evaluating such programs. Clearly, new evaluation approaches, methods, and strategies were needed, as well as professionals with a somewhat different train- ing and orientation to apply them. (See interviews with Lois-Ellin Datta and Carol Weiss cited in the “Suggested Readings” at the end of this chapter to learn more about their early involvement in evaluation studies with the federal government at that time. They convey the excitement, the expectations, and the rapid learning curve required to begin this new endeavor of studying government programs to improve the programs themselves.)

Theoretical and methodological work related directly to evaluation did not exist. Evaluators were left to draw what they could from theories in cognate disciplines and to glean what they could from better-developed methodologies, such as experimental

46 Part I • Introduction to Evaluation

design, psychometrics, survey research, and ethnography. In response to the need for more specific writing on evaluation, important books and articles emerged. Suchman (1967) published a text reviewing different evaluation methods and Campbell (1969b) argued for more social experimentation to examine program effectiveness. Campbell and Stanley’s book (1966) on experimental and quasi-experimental designs was quite influential. Scriven (1967), Stake (1967), and Stufflebeam (1968) began to write articles about evaluation practice and theories. At the Urban Institute, Wholey and White (1973) recognized the political aspects of evaluation being conducted within or- ganizations. Carol Weiss’s influential text (1972) was published and books of eval- uation readings emerged (Caro, 1971; Worthen & Sanders, 1973). Articles about evaluation began to appear with increasing frequency in professional journals. To- gether, these publications resulted in a number of new evaluation models to respond to the needs of specific types of evaluation (e.g., ESEA Title III evaluations or evalua- tions of mental health programs).

Some milestone evaluation studies that have received significant attention occurred at this time. These included not only the evaluations of Title I, but eval- uations of Head Start and the television series Sesame Street. The evaluations of Sesame Street demonstrated some of the first uses of formative evaluation, as por- tions of the program were examined to provide feedback to program developers for improvement. The evaluations of Great Society programs and other programs in the late 1960s and early 1970s were inspired by the sense of social experimen- tation and the large goals of the Great Society programs. Donald Campbell, the in- fluential research methodologist who trained quite a few leaders in evaluation, wrote of the “experimenting society” in his article “Reforms as Experiments” urg- ing managers to use data collection and “experiments” to learn how to develop good programs (Campbell, 1969b). He argued that managers should advocate not for their program, but for a solution to the problem their program was designed to address. By advocating for solutions and the testing of them, managers could make policymakers, citizens, and other stakeholders more patient with the difficult process of developing programs to effectively reduce tough social problems such as crime, unemployment, and illiteracy. In an interview describing his post- graduate fellowship learning experiences with Don Campbell and Tom Cook, William Shadish discusses the excitement that fueled the beginning of modern evaluation at that time, noting, “There was this incredible enthusiasm and energy for social problem solving. [We wanted to know] How does social change occur and how does evaluation contribute to that?” (Shadish & Miller, 2003, p. 266).

Graduate Programs in Evaluation Emerge. The need for specialists to conduct useful evaluations was sudden and acute, and the market responded. Congress provided funding for universities to launch new graduate training programs in educational research and evaluation, including fellowship stipends for graduate study in those specializations. Several universities began graduate programs aimed at training educational or social science evaluators. In related fields, schools of public administration grew from political science to train administrators to man- age and oversee government programs, and policy analysis emerged as a growing

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 47

new area. Graduate education in the social sciences ballooned. The number of people completing doctoral degrees in economics, education, political science, psychology, and sociology grew from 2,845 to 9,463, an increase of 333%, from 1960 to 1970 (Shadish et al., 1991). Many of these graduates pursued careers eval- uating programs in the public and nonprofit sectors. The stage for modern pro- gram evaluation was set by the three factors we have described: a burgeoning economy in the United States after World War II, dramatic growth in the role of the federal government in education and other policy areas during the 1960s, and, finally, an increase in the number of social science graduates with interests in evaluation and policy analysis (Shadish et al., 1991).

Evaluation Becomes a Profession: 1973–1989

This period can be characterized as one of increasing development of a distinct field of evaluation through the growth in approaches, programs to train students to become evaluators, and professional associations. At the same time, the sites of evaluation began to diversify dramatically, with the federal government playing a less dominant role.

Several prominent writers in the field proposed new and differing models. Evaluation moved beyond simply measuring whether objectives were attained, as evaluators began to consider information needs of managers and unintended out- comes. Values and standards were emphasized, and the importance of making judgments about merit and worth became apparent. These new and controversial ideas spawned dialogue and debate that fed a developing evaluation vocabulary and literature. Scriven (1972), working to move evaluators beyond the rote applica- tion of objectives-based evaluation, proposed goal-free evaluation, urging evaluators to examine the processes and context of the program to find unintended outcomes. Stufflebeam (1971), responding to the need for evaluations that were more in- formative to decision makers, developed the CIPP model. Stake (1975b) proposed responsive evaluation, moving evaluators away from the dominance of the ex- perimental, social science paradigms. Guba and Lincoln (1981), building on Stake’s qualitative work, proposed naturalistic evaluation, leading to much debate over the relative merits of qualitative and quantitative methods. Collectively, these new conceptualizations of evaluation provided new ways of thinking about eval- uation that greatly broadened earlier views, making it clear that good program evaluation encompasses much more than simple application of the skills of the empirical scientists. (These models and others will be reviewed in Part Two.)

This burgeoning body of evaluation literature revealed sharp differences in the authors’ philosophical and methodological preferences. It also underscored a fact about which there was much agreement: Evaluation is a multidimensional techni- cal and political enterprise that requires both new conceptualizations and new insights into when and how existing methodologies from other fields might be used appropriately. Shadish and his colleagues (1991) said it well when, in recognizing the need for unique theories for evaluation, they noted that “as evaluation matured, its theory took on its own special character that resulted from the interplay among

48 Part I • Introduction to Evaluation

problems uncovered by practitioners, the solutions they tried, and traditions of the academic discipline of each evaluator, winnowed by 20 years of experience” (p. 31).

Publications that focused exclusively on evaluation grew dramatically in the 1970s and 1980s, including journals and series such as Evaluation and Program Planning, Evaluation Practice, Evaluation Review, Evaluation Quarterly, Educational Evaluation and Policy Analysis, Studies in Educational Evaluation, Canadian Journal of Program Evalua- tion, New Directions for Program Evaluation, Evaluation and the Health Professions, ITEA Journal of Tests and Evaluation, Performance Improvement Quarterly, and the Evaluation Studies Review Annual. Others that omit evaluation from the title but highlight it in their contents included Performance Improvement Quarterly, Policy Studies Review, and the Journal of Policy Analysis and Management. In the latter half of the 1970s and throughout the 1980s, the publication of evaluation books, including textbooks, reference books, and even compendia and encyclopedias of evaluation, increased markedly. In response to the demands and experience gained from practicing eval- uation in the field, a unique evaluation content developed and grew.

Simultaneously, professional associations and related organizations were formed. The American Educational Research Association’s Division H was an initial focus for professional activity in evaluation. During this same period, two professional associations were founded that focused exclusively on evaluation: the Evaluation Research Society (ERS) and Evaluation Network. In 1985, these organ- izations merged to form the American Evaluation Association. In 1975, the Joint Committee on Standards for Educational Evaluation, a coalition of 12 professional associations concerned with evaluation in education and psychology, was formed to develop standards that both evaluators and consumers could use to judge the quality of evaluations. In 1981, they published Standards for Evaluations of Educational Programs, Projects, and Materials. In 1982, the Evaluation Research Society devel- oped a set of standards, or ethical guidelines, for evaluators to use in practicing evaluation (Evaluation Research Society Standards Committee, 1982). (These Standards and the 1995 Guiding Principles, a code of ethics developed by the American Evaluation Association to update the earlier ERS standards, will be reviewed in Chapter 3.) These activities contributed greatly to the formalization of evaluation as a profession with standards for judging the results of evaluation, ethical codes for guiding practice, and professional associations for training, learning, and exchanging ideas.

While the professional structures for evaluation were being formed, the markets for evaluation were changing dramatically. The election of Ronald Reagan in 1980 brought about a sharp decline in federal evaluations as states were given block grants, and spending decisions and choices about evaluation requirements were delegated to the states. However, the decline in evaluation at the federal level resulted in a needed diversification of evaluation, not only in settings, but also in approaches (Shadish et al., 1991). Many state and local agencies began doing their own evaluations. Foundations and other nonprofit organizations began emphasizing evaluation. As the funders of evaluation diversified, the nature and methods of evaluation adapted and changed. Formative evaluations that examine programs to provide feedback for incremental change and improvement and to find the links

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 49

between program actions and outcomes became more prominent. Michael Patton’s utilization-focused evaluation, emphasizing the need to identify a likely user of the evaluation and to adapt questions and methods to that user’s needs, became a model for many evaluators concerned with use (Patton, 1975, 1986). Guba and Lincoln (1981) urged evaluators to make greater use of qualitative methods to de- velop “thick descriptions” of programs, providing more authentic portrayals of the nature of programs in action. David Fetterman also began writing about alterna- tive methods with his book on ethnographic methods for educational evaluation (Fetterman, 1984). Evaluators who had previously focused on policymakers (e.g., Congress, cabinet-level departments, legislators) as their primary audience began to consider multiple stakeholders and more qualitative methods as different sources funded evaluation and voiced different needs. Participatory methods for involving many different stakeholders, including those often removed from decision making, emerged and became prominent. Thus, the decline in federal funding, while dramatic and frightening for evaluation at the time, led to the de- velopment of a richer and fuller approach to determining merit and worth.

1990–The Present: History and Current Trends

Today, evaluations are conducted in many different settings using a variety of approaches and methods. Evaluation is well established as a profession and is, as LaVelle and Donaldson remark, “growing in leaps and bounds” in recent years (2010, p. 9). Many jobs are available. Although many evaluators continue to come to the profession from other disciplines, the number of university-based evaluation training programs in the United States grew from 38 in 1994 to 48 in 2008 (LaVelle and Donaldson, 2010). Al- most 6,000 people belong to the American Evaluation Association (AEA) and another 1,800 belong to the Canadian Evaluation Society (CES). In 2005, the CES and AEA sponsored a joint conference in Toronto that attracted 2,300 evaluators, including many members and attendees from other countries. Policymakers and managers in government and nonprofit settings know of, and often request or require, evaluations. For many, evaluation—funding it, managing it, or conducting it—is one of their responsibilities. So, evaluators, at least those in the United States and Canada, are no longer struggling with establishing their discipline. But in the years since 1990, evaluation has faced several important changes that influence its practice today.

Spread of Evaluation to Other Countries

Evaluation has grown rapidly in other countries in recent years. This internation- alization of evaluation has influenced the practice of evaluation as evaluators adapt to the context of their country and the expectations and needs of stakeholders. Today, there are more than 75 regional and national evaluation associations around the world (Preskill, 2008). Major associations include the European Evaluation Soci- ety, the Australasian Evaluation Society, the United Kingdom Evaluation Society,

50 Part I • Introduction to Evaluation

and the African Evaluation Association. The International Organization for Coop- eration in Evaluation (IOCE) was created in 2003 by its 24 members of national or regional evaluation associations, with a mission to “help legitimate and support evaluation associations, societies, and networks so that they can better contribute to good governance, effective decision making, and strengthening the role of civil society” (IOCE, 2003, para 3).

As noted earlier in this chapter, Ray Rist and his colleagues identified the United States, Canada, Germany, and Sweden as countries in the “first wave” of modern evaluation that began in the late 1960s and early 1970s during a period of social experimentation. Evaluation in these first-wave countries was linked to that social experimentation and to program improvement (Rist, 1990). Rist and his colleagues identified a “second wave” of European countries where evaluation started in a different context.3 In these second-wave countries, which included the United Kingdom, the Netherlands, Denmark, and France, evaluation began as an effort to control federal budgets and reduce government spending. The focus of evaluation was more on accountability and identifying unproductive programs than on social experimentation and program improvement. Given its purposes, evaluation in these second-wave countries was often housed centrally, near those who made decisions regarding budgets and priorities. Rist and his col- leagues found that the initial impetus for evaluation in a country often had a strong influence on the subsequent conduct and purposes of evaluation in that country. A more recent evaluation influence in Europe has been the European Union and the evaluation mandates of the European Commission. For many countries in Eastern Europe, responding to these evaluation mandates is their first venture into evaluation.

Evaluation in different cultures and other countries is an exciting venture, not only because evaluation can be beneficial in helping address policy questions and issues in those countries, but also because North American evaluators can learn new methods and organizational approaches from the efforts of those in other countries (Mertens, 1999). As any traveler knows, seeing and experiencing a cul- ture different from one’s own is an eye-opener to the peculiarities—both strengths and constraints—of one’s own culture. Practices or mores that had not been previously questioned are brought to our attention as we observe people or insti- tutions in other cultures behaving differently. Citizens differ in their expectations and beliefs regarding their government, its actions, and what they want and expect to know about their government.4 Ways in which programs are judged, feedback

3The research of Rist and his colleagues focused only on Europe, Canada, and the United States. 4For example, a French evaluator, when interviewed by Fitzpatrick, commented that the mistrust that Americans have of their government creates a fertile ground for evaluation because citizens want to know what the government is doing and what mistakes it is making. He felt French citizens lacked that suspi- cion of government actions and, hence, were less interested in evaluation. Patton, in the interview cited at the end of the chapter, comments on cultural differences between Japan and the United States that had implications for evaluation. In his work in Japan, he observed that blaming or calling attention to mistakes is avoided and, thus, evaluation findings would be handled differently than in the United States.

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 51

is given, or participation is sought differ across cultures and countries. These dif- ferences, of course, have implications for evaluators who must pay attention to the political and cultural context of the evaluation in order to plan and implement a study that will be trusted and used. We believe the twenty-first century will be a time for evaluators in the Western world to learn from the practices of their col- leagues in other countries and that these efforts will both strengthen our own work and spread the culture of evaluation—collecting data to judge programs and form decisions—around the world.

Nonevaluators Take on Internal Evaluation Responsibilities

Another change in evaluation in recent years concerns the number and types of people carrying out evaluation-related tasks. As evaluation expanded, many people— managers, supervisors, and other program professionals—began having responsi- bilities for evaluation as one part of their job. As noted in this history, evaluation has often been conducted by people without specific training in evaluation. Begin- ning in the 1960s when social science researchers began conducting evaluation studies to meet the demand, evaluation has often had to rely on those without specific education or training in evaluation to conduct studies. In earlier years, those people were often social scientists who had training in methodology and research, but were not familiar with evaluation theories and particular concerns about context and use. Social science researchers continue to conduct evaluations today. However, many learn about the discipline of evaluation and supplement their methodological expertise with further reading, training, and attendance at evalua- tion conferences, as the discipline of evaluation grows and becomes better known. New today are the increasing numbers of managers and program staff who lack the methodological training in evaluation and in social science research methods, but are often responsible for internal evaluations (Datta, 2006).

Evaluation in the nonprofit sector provides an excellent example of the extent to which in-house evaluators, typically program managers and staff with other program responsibilities, have also become responsible for major compo- nents of data collection and evaluation in their organizations. More than 900,000 nonprofit and religious organizations deliver the majority of social service pro- grams in the United States (Carman, Fredericks, and Introcaso, 2008). Most of these organizations receive funds from the 1,300 local United Way organizations, and United Way requires these organizations to conduct evaluations of their programs. The United Way approach to evaluation has admirable components, including sig- nificant training, but most of the evaluations, with United Way encouragement, are conducted by existing staff with occasional guidance from external evaluators (Hendricks, Plantz, and Pritchard, 2008). Hendricks et al. (2008), who are otherwise pleased with many elements of the United Way approach, are concerned that the overreliance on current employees who lack evaluation expertise may short- change the organizations when it comes to effective use of the results. Survey

52 Part I • Introduction to Evaluation

studies of evaluators provide further evidence of the increase in numbers of eval- uators who are both internal to the organization and have other responsibilities within the organization. Christie (2003) found that many of the evaluators she surveyed in California were internal and held other, generally management, responsibilities. Many had little or no training in evaluation and were unfamiliar with evaluation theories and approaches.

In education, school districts have been faced with serious budget constraints and many have coped with these fiscal constraints by cutting central office staff, including evaluation departments. Schools, faced with increasing evaluation demands in the current standards-based environment, have had to cope with these demands with fewer evaluation professionals. As a result, teachers and adminis- trators often face additional evaluation responsibilities. The expansion of evaluation has, therefore, had some unintended consequences that have implications for building organizational capacity and for improving education and training.

Many people involved in conducting in-house evaluations have primary professional identifications other than evaluation. They are often not interested in becoming full-time evaluators and, hence, university-based education is not the best option for providing training for these individuals. (See Datta [2006] for her discussion of the need to learn more about the evaluations produced by these practitioners to consider how their training needs can be addressed.) Expanded training opportunities and creative thinking by those in the evaluation field are needed to help these people develop their evaluation skills. Evaluation kits abound, but often focus on basic methodological issues such as designing a survey, and not on critical issues such as carefully defining purpose, involving stakeholders, and considering use.

Although the explosion of employees in organizations conducting evaluation has serious implications for training and for the accuracy, credibility, and use of eval- uation studies, the move to involve other employees of schools and organizations in evaluation also has great advantages. In 2000 and 2001, the conference themes of both presidents of the American Evaluation Association addressed, in different ways, the issue of working with other employees in organizations to improve eval- uation quality and use. Noting the increasing demand for evaluation and, yet, evaluators’ continued struggles in affecting programs and policies, Laura Leviton, the president of AEA in 2000, used her theme of “Evaluation Capacity Building” to discuss ways to build evaluators’ collective capacity to conduct better evaluations. Her suggestions included recognizing and using the strengths of program practi- tioners in program logic and implementation, in organizational behavior, and in the people skills needed to help those within organizations understand and use evaluation (Leviton, 2001). Rather than maintain a distance from managers and others in the program being evaluated as some evaluators have done, Leviton encouraged eval- uators to learn from these people with experience in the organization. James Sanders, our co-author and AEA president in 2001, chose as his theme, “Main- streaming Evaluation.” In his opening remarks, Sanders noted that when he and Blaine Worthen published the first edition of this book in 1973, they began with the observation that “evaluation is one of the most widely discussed but little used

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 53

processes in today’s systems” (2002, p. 253). He notes that the status of evaluation has improved but that it is still not second nature to organizations. Explaining his concept of mainstreaming evaluation, Sanders said, “Mainstreaming refers to the process of making evaluation an integral part of an organization’s everyday operations. Instead of being put aside in the margins of work, evaluation becomes a routine part of the organization’s work ethic if it is mainstreamed. It is part of the culture and job responsibilities at all levels of the organization” (2002, p. 254). Today, with much attention being paid to evaluation and accountability and with many managers and other employees playing a part in conducting evaluations, we have that opportu- nity. As noted earlier, the spread of evaluation responsibilities has its risks, but it also has potential benefits to evaluation and to the organization. We can cope with the risks by expanding training opportunities and by making use of partnerships between internal and external evaluators, as discussed in Chapter 1. Meanwhile, the fact that many employees of organizations, schools, and other agencies who do not identify themselves as evaluators are now involved in evaluation presents an opportunity for evaluation to become part of the culture of the organization. But, this will succeed only if we proceed carefully. Just as social scientists who came to evaluation in the 1960s often erred in viewing evaluation as simply the application of research methods in the field, today’s busy managers or professionals who are conducting evaluation while balancing other responsibilities in their organization may view evaluation as simply collecting some data and reporting it to others. Sanders’ concept of mainstreaming evaluation includes carefully crafting the pur- poses of evaluation for organizational learning and use.

A Focus on Measuring Outcomes and Impact

Another major trend that emerged in evaluation during the 1990s is the empha- sis on measuring outcomes and using evaluation for purposes of accountability. The United States began evaluation in what Ray Rist and his colleagues (1999) called “first wave” evaluation with a focus on innovative experimentation and collecting data to improve programs and test new interventions. However, in many ways, the United States has transformed to a “second wave” country with a focus on evaluation for accountability and, at least the claim is, for using results to make summative and budgetary decisions about program continuation and expansion. The outcomes focus began in the early 1990s and continues unabated today.

In education, the foundation for the current standards-based outcome focus began in 1983 with the publication of A Nation at Risk (National Commission on Excellence in Education, 1983). That report expressed serious concerns about the state of education in the United States and provided the impetus for change. The message, which continues today, was that education in the United States was broken and that the federal government needed to become more involved to fix it. The na- ture of that action was not determined for a few years, but gradually a federal role with a focus on accountability emerged. Historically, local school districts and, to a lesser extent, the states have been responsible for schools in the United States. Therefore, an increased federal role in an issue that had historically been based on

54 Part I • Introduction to Evaluation

local community needs was somewhat controversial. However, in 1989, the National Governors Association met with then-President George H.W. Bush at the President’s Educational Summit with Governors and endorsed national goals for education while still maintaining state and local control. Later, President Clinton, who had led the National Governors Association in meeting with President Bush at the 1989 summit, greatly increased both the role of the federal government in educa- tion and the emphasis on standards with six major pieces of legislation that he signed in 1994. Press releases indicted that “not since the 1960s has so much sig- nificant education legislation been enacted” and that the six acts “promise to alter the landscape of American education in important and lasting ways” (http://www .ed.gov/PressReleases/10-1994/legla.html). The legislation included the Improving America’s Schools Act (IASA), an amendment to the old 1965 Elementary and Sec- ondary School Act that had marked the beginning of modern evaluation, and the Goals 2000: Educate America Act. Among other things, these acts provided finan- cial support and incentives for states to develop high standards for academic achievement, to guide learning, and to monitor schools’ progress toward achieving these standards. By the end of 1994, 40 states had applied for planning funds to begin developing standards. The argument was that local authority would be main- tained by having states develop their own standards; the federal government’s role was to require standards and to provide fiscal incentives for doing so. In 2001, un- der President George W. Bush’s leadership, Congress passed legislation that has been the focus of educational reform ever since—the No Child Left Behind (NCLB) Act. This legislation greatly increased the federal role by establishing more requirements for student performance, testing, and teacher training, and by adding fiscal sanc- tions and corrective action when goals were not achieved.5 Of course, standards and methods of assessment vary greatly across the 50 states, but, in each state, standards and their means of assessment serve as the focus for educational reform and much of educational evaluation today. Lauren Resnick writes, “Test-driven accountability has become a reality [in education]” (2006, p. 33), adding that “enormous weight is placed on tests and accountability formulas (2006, p. 37).”6

These policies have greatly changed the role of evaluation in public schools in the United States. Standards and their assessment receive much public attention and, in most states, are a significant driver of educational policies, practices, and evaluation. Evaluation in K–12 education in the United States today focuses on several related issues: developing appropriate means for assessing students and their progress, iden- tifying successful schools and schools that are failing, and identifying practices that can help bring students’ performance up to the standards. As schools that do not meet the standards can be closed or faculty and administrators changed, the evaluation focus is both summative (Should a school continue or not? Be re-staffed or closed?) and form- ative (Which students in a given school are failing to meet a standard? What have

5President Obama has now proposed changing No Child Left Behind, but no specific legislation has yet been passed on the issue. 6Resnick’s special issue of Educational Measurement: Issues and Practice focuses on case studies of four states and how standards and measures of assessment have been put into practice and used.

http://www.ed.gov/PressReleases/10-1994/legla.html

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 55

been their experiences? What are the experiences of similar students who succeed? What types of interventions may be most appropriate to help those students who have not met the standard?). Such evaluation efforts can, of course, improve schools, but the focus on standards, and their assessment, also holds risks. It has changed the focus of evaluation in education to standards and accountability at a time when resources are scarce and many school evaluation efforts are able to focus on little else.

Reactions and Policy Statements. In recent years, the American Evaluation As- sociation (AEA) has taken its first policy positions on the issues of testing and education accountability. In 2000, AEA President James Sanders, our co-author, appointed a Task Force on High Stakes Testing in K–12 Education to review the re- search and to develop a statement of the organization’s position. The AEA Position Statement on High Stakes Testing in PreK–12 Education was passed by the AEA Board in 2002 and can be found on the AEA web site at www.eval.org/hst3.htm. The statement summarizes research on the risks and benefits of high stakes test- ing, concluding that “evidence of the impact of high stakes testing shows it to be an evaluative practice where the harm outweighs the benefits” (2002, p. 1). The Task Force wrote:

Although used for more than two decades, state mandated high stakes testing has not improved the quality of schools; nor diminished disparities in academic achieve- ment along gender, race, or class lines; nor moved the country forward in moral, social, or economic terms. The American Evaluation Association (AEA) is a staunch supporter of accountability, but not test driven accountability. AEA joins many other professional associations in opposing the inappropriate use of tests to make high stakes decisions. (2002, p. 1)

The Task Force presents other avenues for improved evaluation practice, including better validation of current tests for the purposes for which they are used, use of multiple measures, and consideration of a wide range of perspectives, in- cluding those of professional teachers to assess student performance. In 2006, the AEA Board approved a second policy statement on the issue of educational account- ability (see http://www.eval.org/edac.statement.asp.) This statement expresses con- cerns with three major issues:

Overreliance on standardized test scores that are not necessarily accurate measures of student learning, especially for very young and for historically underserved stu- dents, and that do not capture complex educational processes or achievements;

Definitions of success that require test score increases that are higher or faster than historical evidence suggests is possible; and

A one-size-fits-all approach that may be insensitive to local contextual variables or to local educational efforts (American Evaluation Association, http:// www.eval.org/edac.statement.asp, 2006, p. 1)

This AEA policy statement encourages use of multiple measures, measures of individual student progress over time, context-sensitive reporting, use of data to con- sider resource allocations for teachers and schools, accessible appeals processes, and public participation and access.

http://www.eval.org/hst3.htm

http://www.eval.org/edac.statement.asp

56 Part I • Introduction to Evaluation

Choice in Education. Another factor influencing evaluation in the educational environment today is school choice. Choice is represented in many different ways across the country. Some cities (Washington, DC. and Milwaukee, Wisconsin being prominent examples) have had voucher and choice systems for some time and much research has been conducted on these systems (Buckley & Schneider, 2006; Goldring & Shapira, 1993; Hoxby, 2000). In many school districts, parents now are able to send their child to another public school within the district or, in some cases, outside the district. Districts across the United States have many different choice plans, from traditional neighborhood schools to magnet schools, charter schools, and, in some areas, vouchers to private schools. The choice environment in K–12 educa- tion has, of course, influenced evaluation practice. The theory of choice is based on the market theory that competition improves performance; therefore, giving parents a choice of schools will inspire schools to become more competitive, which will improve school performance and student achievement (Chubb & Moe, 1990).

In some districts, evaluation plays a role in helping educational administra- tors and teachers in individual schools or groups of schools to consider how they want to market their school to recruit other students. New programs emerge; old ones are put aside. At minimum, schools struggle with predicting their enrollments and planning to staff their schools adequately. In addition, school administrators and teachers work to develop and implement new programs designed to improve learning or draw more, and sometimes, better students. Such choices, which are new to public school administrators, present challenging decision demands. What programs, curricula, or interventions will improve the schools’ scores on standards? What programs, curricula, or interventions will attract more students to the school? Traditional evaluation methods can, and are, used to help teachers and adminis- trators deal with such decisions and provide opportunities for evaluation to serve new uses. For example, Fitzpatrick has been involved in studies that examine how low-income parents who are perhaps most likely to lose out in choice environ- ments have learned about school choice and made choices for their children (Teske, Fitzpatrick, & Kaplan, 2006). These studies are designed to help school districts better inform parents about choices. In this environment, there is much that evaluators can do to help teachers and administrators adapt to change and improve learning. (See Rodosky and Munoz [2009] for an example of how one urban school district manages its evaluation responsibilities for accountability.)

Performance Monitoring in Other Governmental Sectors. Just as education was becoming concerned with standards, their assessment, and evaluation for account- ability in the late 1990s and early part of this century, other government entities and nonprofit organizations also began focusing on performance monitoring and evaluating outcomes.7 The early influences in the trend to measure outcomes in

7Although the history in these other arenas is a little different from that of education, the theory and approach behind the focus on outcomes in both education and other sectors are the same. Therefore, it is helpful for those in both arenas, education and agencies that deliver other services, to be aware of the similar pressures to measure outcomes and the forces that influence each.

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 57

government came from New Public Management, a movement in public admin- istration and management, and the related call to “reinvent government.” In 1992, David Osborne and Ted Gaebler authored the popular and influential book, Reinventing Government, which urged public policymakers and managers to build on the successes of the private sector that was then experimenting with re-engineering and Total Quality Management (TQM). Osborne and Gaebler ad- vocated an entrepreneurial, consumer-driven government in which managers viewed citizens as “consumers” and government managers became more entre- preneurial in developing and experimenting with programs, policies, and inter- ventions.8 Reinventing government was not without its critics. (See, for example, deLeon and Denhardt [2000] and their concerns with how the economic-based, market model of reinventing government and viewing citizens as consumers might neglect the broader public interest.) However, reinventing government and its principles was widely implemented in many state and local governments as well as at the federal level. During the Clinton administration, Vice-President Al Gore authored the National Performance Review, a government report to guide change, based on Osborne and Gaebler’s principles of reinvention (National Performance Review, 1993). The report and its recommendations were intended to encourage public managers to be entrepreneurial to deal with budget constraints and to become more efficient but, at the same time, to meet citizen needs.

An important part of reinventing government was, of course, accountability or collecting data to see what worked and what didn’t. Therefore, the Clinton adminis- tration also proposed the Government Performance Results Act (GPRA) to address concerns about accountability with these new initiatives (Radin, 2006). (See OBM Watch [2000] http://www.ombwatch.org/node/326 for more on GPRA.) GPRA was an example of performance monitoring measurement being advocated and imple- mented by several countries, including Canada and Australia in the late 1990s (Perrin, 1998; Winston, 1999). Joseph Wholey, a prominent leader in evaluation in the U.S. government in the 1970s, was involved in the development of GPRA and was a leader in performance measurement (Wholey, 1996). Passed in 1994 with implementation beginning in 1997, GPRA required all federal agencies to produce a strategic plan and to measure progress toward meeting the goals and objectives delineated in the plan with performance data. Thus, GPRA was the first major federal government mandate to measure program or policy outcomes. Government employees across the country became well acquainted with GPRA and its requirements as different levels of gov- ernment responded to the requirements to identify and measure outcomes.

8Note the similarity between the theories of reinventing government and the theories concerning school choice. Both emerge from concepts about the market and the “success” in the private sector and a be- lief that public institutions can become more successful by becoming more like the private sector or busi- nesses. Managers and school principals become “entrepreneurs” and clients, parents, and students become “consumers” or “customers” who are making choices and decisions about services. Given the economic failures of the private sector seen in the United States and around the world in 2008 and 2009, we have chosen to use quotation marks around the word success because economists and citizens are now not so certain about the successes of the private sector. Entrepreneurial behavior, without regula- tion, appears to have prompted the housing crisis and many problems with banks and security firms.

http://www.ombwatch.org/node/326

58 Part I • Introduction to Evaluation

9OMB’s advocacy of randomized control trials will be discussed in Chapter 15 on design. Randomized experiments are certainly one way of establishing causality, but, along with the American Evaluation Association, we believe there are many established approaches to determining causality and the one selected should be appropriate for the context of the program and the judgments and decisions to be drawn from the evaluation.

The Bush administration continued the emphasis on performance-based man- agement and measuring outcomes with its own measure to replace GPRA, the Pro- gram Assessment Rating Tool (PART) (OMB, 2004). PART is a 25-item questionnaire designed to obtain information on program performance. Scores are calculated for each program based on agencies’ responses, and one-half of the PART score is based on results or outcomes. Each year, the Office of Management and Budget (OMB) ob- tains PART scores from 20% of all government programs; programs are required to complete PART on a rotating basis, so that all programs are reviewed within five years. By 2008, 98% of federal programs had completed PART forms and been re- viewed. (See http://www.whitehouse.gov/omb/part/.) Just as scores on standards- based tests can influence the staffing and even the continuation of individual schools, PART scores are intended to be used to make budgetary decisions. As in education, instances of the dramatic use of PART scores to slash funding for programs are rela- tively rare, but the interest in outcomes and results has been clearly established (Gilmour & Davis, 2006).

Outcomes Measurement in the Nonprofit Arena. Schools and other public orga- nizations have not been the only ones to move to an outcomes orientation in recent years. Nonprofit organizations, as well, now focus their evaluation activities on assessing and reporting outcomes. As mentioned earlier, United Way influences much of the evaluation in the nonprofit sector. Foundations and other philan- thropic organizations that fund nonprofits also influence evaluations in this arena through their grant requirements. These funding agencies have encouraged non- profit organizations to measure their outcomes. United Way’s evaluation system is called the Outcomes Measurement System and, as the name suggests, the focus is on outcomes. Other elements of the system include developing logic models to link inputs, activities, outputs, and outcomes; encouraging quantitative and repeated measures of outcomes; and emphasizing use of results for program improvement. The activities are not labeled as “evaluation” by United Way but, instead, are con- sidered “a modest effort simply to track outcomes” (Hendricks et al., 2008, p. 16). However, the activities generally take the place of traditional evaluation efforts. The United Way model has influenced the nonprofit field broadly. There are, however, a couple of noteworthy differences between the United Way model and the out- comes focus in education and other public agencies: (a) in the United Way model accountability is considered secondary to the purpose of program improvement; (b) expectations for measuring outcomes are generally more realistic than require- ments for public-sector agencies. For example, the Office of Management and Budget, in discussing evidence for outcomes, strongly encourages use of Randomized Control Trials, or RCTs (OMB, 2004).9 United Way, recognizing that many nonprofit

http://www.whitehouse.gov/omb/part

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 59

human service organizations lack resources to conduct sophisticated evaluations of all outcomes, prefers to view the process as performance monitoring of outcomes without attempts to clearly establish causality. Nonprofit organizations, like many public sector organizations, had typically reported inputs and activities to funders. The move to assess and monitor program outcomes can be a step in the right direction in providing a more comprehensive assessment of a program.

Considering Organizational Learning and Evaluation’s Larger Potential Impacts

A related trend that has influenced evaluation in the early part of the twenty-first cen- tury is a discussion of the role of evaluation in organizational learning. People in many different, but related, fields—public management, adult learning, workplace learning, organizational management and change, educational administration, leadership, and evaluation—are all writing about organizational learning and looking for ways to build organizations’ capacity to learn and manage in difficult times. Senge’s 1990 book on the learning organization introduced many to the theories and research in this area and prompted managers, policymakers, and others to begin thinking more about how organizations learn and change. Since evaluators are concerned with getting stake- holders within organizations to use evaluation information, obviously the concept of organizational learning was important. Preskill and Torres’ book, Evaluative Inquiry for Learning in Organizations (1998), was one of the first to bring these concepts to the attention of evaluators through their proposal for evaluative inquiry. But other eval- uation theories and approaches and the experiences of evaluators in the field were also converging to prompt evaluators to think more broadly about the role of evaluation in organizations and the tasks evaluators should perform. As early as 1994, Reichardt, in an article reflecting on what we had learned from evaluation practice, suggested that evaluators should become more involved in the planning stages of programs, because the skills that evaluators brought to the table might be more useful in the beginning stages than after programs were completed. Evaluators’ increasing use of logic models to identify the focus of an evaluation and to put that focus in an appropriate context made program stakeholders more aware not only of logic models, but also of evalua- tive modes of thinking (Rogers & Williams, 2006). Patton (1996) coined the term “process use” to refer to changes that occur in stakeholders, often program deliverers and managers, who participate in an evaluation. These changes occur not because of specific information gained from the evaluation results, but, instead, because of what they learned from participating in the evaluation process. The evaluation process itself prompts them to think in new ways in the future. This learning may include some- thing as direct as using logic models to develop programs or being more comfortable and confident in using data to make decisions.

Thus, the concept of learning organizations, introduced from other disciplines, and evaluators’ reflections and observations on their role in organizations and their potential impact converged and prompted evaluators to move beyond the traditional focus on instrumental use of results to consider broader uses of evaluation and ways to achieve those uses more effectively.

60 Part I • Introduction to Evaluation

All the changes we have discussed here—standards-based movements, the focus on outcomes, and the government’s and United Way’s focus on employees collecting data using on going internal systems—were also designed to change the culture of organizations and to improve organizational learning and decision making. These changes have often been initiated by people outside evaluation, such as policymakers; public administrators; and people from management, budgeting, or finance. The evaluators involved in creating performance monitor- ing systems such as GPRA or United Way’s focus are often from different schools of evaluation than those who are advocating organizational learning through empowerment evaluation or evaluative inquiry. Nevertheless, the directions of all these changes are to modify and improve organizations’ ways of learning and making decisions. Some methods are likely to be more successful than others, but the overwhelming change in this period is for evaluators to begin thinking of evaluation in broader terms. In the past, evaluators and their clients have tended to see evaluations as discrete studies to be used for a particular problem or policy, rather than viewing evaluation as a continuing system for learning and one part of many systems that provide information and learning opportunities for organizations.

Individual, important evaluation studies will continue to take place. But evaluators have moved from a comparatively narrow focus on methodological issues in the early years to today’s broader consideration of the role of evaluation in organizations. Evaluators have recognized that they need to know more about organizational culture, learning, and change, drawing from other disciplines in addition to their knowledge of evaluation theories and practices. They need to identify ways to create an openness to evaluative information and to improving organizational performance, not just the performance of an individual program or policy. As evaluators think of organizational change and learning, they become involved in evaluation-related activities such as planning, performance monitor- ing, and even fiscal and budgetary decisions. They recognize the need for cooper- ation across departments or systems that address these related issues so that those gathering and providing information are not working at cross-purposes, but, in- stead, are collaborating and learning from each other about the information they collect and the methods they use to disseminate information and get it used. Preskill and Boyle (2008) write about the need for organizations to develop “an integrated knowledge-management system” (p. 455) that is aligned with other in- formation systems in the organization. Such systems are essential for many rea- sons, but reflect the need for planning across systems to maintain information for learning and decisions in the future.

The role of evaluation vis-à-vis schools, organizations, government agencies, and funding sources is changing and will continue to change due to the trends we have discussed here. Evaluation is expanding and becoming more important in the twenty-first century as the world faces critical economic and social challenges. Policymakers, managers, and the public now expect and demand evaluative information, though they may call it by different names. As more people become in- volved in evaluation within organizations, evaluators will play a critical role in

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 61

TABLE 2.1 Stages in the Development of Evaluation

Period Studies/References Characteristics

Pre-1800 Sailors eating limes Most judgments based on religious, political beliefs

1800–1940 Commissions

Mass. reports on schools

Thorndike and Tyler in ed.

Taylor and efficiency

Accreditation (Flexner)

Measurement and use of experts begins

Focus on public health, education

Formal testing begins in schools

Social scientists move to government

Studies explore social science issues

1940–1963 WW II research on military

National Defense Ed. Act (NDEA)

Cronbach (1963)

Social science research methods increase

Evaluations in schools increase to compete with the Soviet Union

Evaluation expands to many areas

Methods continue to rely on social science

1964–1973 ESEA of 1965

Head Start Evaluation

Great Society Programs

Campbell and Stanley (1966)

Stufflebeam and CIPP (1971)

Stake and Responsive Evaluation (1967)

First mandates for evaluation with Great Society programs

A period of social experimentation

Texts and articles in evaluation emerge

Theorists develop first models

Graduate programs in evaluation begin

1974–1989 Joint Committee Standards (1981)

Utilization-Focused Evaluation (Patton, 1978)

Naturalistic Evaluation (Guba and Lincoln, 1981)

Professional associations, standards, and ethical codes developed

Federal support for evaluation declines

Evaluation approaches and settings diversify

1990–present Empowerment Evaluation (Fetterman, 1994)

AEA Guiding Principles (1995)

United Way Outcomes Measurement System (1996)

Participatory models (Cousins and Whitmore, 1998)

Third Edition—Joint Committee Standards (2010)

Evaluation spreads around the globe

Participative and transformative approaches

Theory-based evaluation

Ethical issues

Technological advances

New people conducting evaluation

Outcomes and performance monitoring

Organizational learning

helping plan systems, build internal capacity, and use methods and approaches that will allow evaluation, or the collection of information to inform and make judgments, to achieve organizational learning.

Table 2.1 summarizes some of the historical trends we have discussed here.

62 Part I • Introduction to Evaluation

Discussion Questions

1. How did the early years of evaluation, before 1965, affect how we think about and practice evaluation today?

2. The Elementary and Secondary Education Act of 1965 (ESEA) and many Great So- ciety programs required that agencies receiving funding submit evaluation reports documenting program results. Discuss the effect of requiring evaluation reports, the impact this mandate had on modern program evaluation, and the problems with both evaluations and evaluators this mandate brought to the surface. What were some important characteristics of evaluation during this period?

3. Since the 1990s, many managers and professionals within organizations have assumed performance monitoring and evaluation responsibilities. What are the strengths and weaknesses of this change? Contrast the knowledge and skill these people bring to evaluation with those of the social scientists who performed many of the mandated evaluations in the 1960s and 1970s.

4. Which of the recent trends we described do you think will have the most impact on evaluation in the future? Why?

Major Concepts and Theories

1. Commissions to report on specific problems, objective tests, and accreditations were among the early forms of evaluation. During the Depression, social scientists began working for the federal government to advise it on ways to cure social ills and improve the economy.

2. The Russians’ launch of Sputnik I created unease in the United States about the ef- fectiveness of techniques used to teach math and science to American students. Congress passed the National Defense Education Act (NDEA) of 1958, which began much evalu- ation in the educational arena.

3. During the 1960s and 1970s, with the Great Society legislation of the Johnson ad- ministration, the federal government began mandating evaluation in many education and social settings. This era of social experimentation represented the first major phase in the growth of evaluation in the United States.

4. The growth of evaluation spurred the first efforts to train and educate profession- als specifically to conduct evaluations. Different evaluation theories, models, and con- cepts to characterize and guide evaluation work began to emerge.

5. The profession became more fully established with the creation of professional as- sociations such as the American Evaluation Association, standards for evaluation, and codes of conduct.

6. The field expanded its methods to include more qualitative approaches and dis- cussions of how evaluators can ensure that evaluation is used by many diverse groups.

7. Since 1990, several trends have influenced evaluation, including its spread to many different countries, more managers and professionals within the organization perform- ing evaluation tasks, a focus on measuring outcomes, and consideration of ways evalua- tion can influence organizational learning. The tasks for evaluation begin to merge with other areas, including performance monitoring and planning.

Chapter 2 • Origins and Current Trends in Modern Program Evaluation 63

Application Exercises

1. What do you see as the critical events and themes in the history of evaluation? How did they shape how people in your field view evaluation? How do people in your field approach an evaluation study?

2. Read one of the interviews cited in the “Suggested Readings” and discuss how this person’s experience in the early years of evaluation influenced the field today. How did this influence how you think about evaluation?

3. How has performance measurement or standards-based education influenced work in your school or organization? Are these evaluation measures useful for your organization? For consumers? Why or why not?

4. How does the culture of your organization support organizational learning? How does it support evaluation?

5. Does your organization measure outcomes? Was the focus on outcomes prompted by a mandate, or did your organization choose this focus? How has examining outcomes affected your organization? Its learning?

Suggested Readings

Madaus, G. F., & Stufflebeam, D. L. (2000). Program evaluation: A historical overview. In D. L. Stufflebeam, G. F. Madaus, & T. Kellaghan (Eds.), Evaluation models: Viewpoints on educa- tional and human services evaluation. Boston: Kluwer-Nijhoff.

Mark, M. (Ed.). (2002). American Journal of Evalua- tion, 22(3). This issue contains 23 articles by leaders in the evaluation field on the past, pres- ent, and future of evaluation. It is a follow-up to the 1994 issue of Evaluation Practice, 15(3), edited by M. Smith, in which different contrib- utors considered the past, present, and future of evaluation.

In 2003, the Oral History Project Team, con- sisting of Jean King, Melvin Mark, and Robin Miller, began conducting interviews with people who were in the field in the United States in the early years. These interviews were intended to “capture the pro- fessional evolution of those who have contributed to the way evaluation in the United States is understood and practiced today” (2006, p. 475). They make for interesting and exciting reading in conveying the na- ture of evaluation in its early years and its impact on the practice of evaluation today. The interviews are listed in the column to the right. We encourage you to read some of them to gain some insight.

Datta, L. E., & Miller, R. (2004). The oral history of evaluation Part II: The professional develop- ment of Lois-Ellin Datta. American Journal of Evaluation, 25, 243–253.

Patton, M. Q., King, J., & Greenseid, L. (2007). The oral history of evaluation Part V: An interview with Michael Quinn. American Journal of Eval- uation, 28, 102–114.

Sanders, J., & Miller, R. (2010). The oral history of eval- uation. An interview with James R. Sanders. American Journal of Evaluation, 31(1), 118–130.

Scriven, M., Miller, R., & Davidson, J. (2005). The oral history of evaluation Part III: The profes- sional evolution of Michael Scriven. American Journal of Evaluation, 26, 378–388.

Shadish, W., & Miller, R. (2003). The oral history of evaluation Part I: Reflections on the chance to work with great people: An interview with William Shadish. American Journal of Evalua- tion, 24(2), 261–272.

Stufflebeam, D. L., Miller, R., & Schroeter, D. (2008). The oral history of evaluation: The profes- sional development of Daniel L. Stufflebeam. American Journal of Evaluation, 29, 555–571.

Weiss, C. H., & Mark, M. M. (2006). The oral history of evaluation Part IV: The professional evolution of Carol Weiss. American Journal of Evaluation, 27, 475–483.

Political, Interpersonal, and Ethical Issues in Evaluation

Orienting Questions

1. Why is evaluation political? What are some of the actions an evaluator can take to work effectively in a political environment?

2. Why are communication skills important in an evaluation?

3. What are some of the key standards by which we judge a good evaluation?

4. What are some of the important ethical obligations of an evaluator?

5. What are some of the sources of bias that can affect an evaluation? How might such biases be minimized?

Before we begin introducing you to the different approaches to evaluation and the technical skills for actually conducting an evaluation, it is important to first discuss some fundamental issues that influence all of evaluation practice. Eval- uation is not just a methodological and technical activity. Important as method- ological skills are to good evaluation, those skills are often overshadowed by the political, interpersonal, and ethical issues that shape evaluators’ work. Many a good evaluation, unimpeachable in all technical details, has failed because of in- terpersonal insensitivity, poor communication, ethical breaches, or political naïveté. Clients have certain expectations about evaluation. Sometimes these expectations are accurate; sometimes they are not. Evaluators need to listen and observe carefully to learn those perspectives and to understand the political en- vironment in which the evaluation is taking place. Stakeholder groups have dif- ferent perspectives, different interests, and different concerns about the program

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 65

and about the evaluation. Evaluators must be skilled in human relations and communication to work with different groups, to facilitate their communication as appropriate, and to make choices about how the evaluation meets the needs of different groups, all within a political context where different groups are struggling for different resources.

Evaluators cannot afford to content themselves with polishing their tools for collecting, analyzing, and reporting data. They must consider how to deal with pressures to supply immediate data or with the misuse of results. They must consider ways to minimize fears or misunderstandings about evaluation, the means for involving different groups in the evaluation and, then, ways to bal- ance their interests and needs. Evaluators need to think about how evaluation reports will be received by different stakeholders; whether the results of the evaluation will be suppressed, misused, or ignored; and many other interper- sonal and political issues. Ignoring these issues is self-defeating, because human, ethical, and political factors pervade every aspect of an evaluation study. It is folly to ignore them, labeling them as mere nuisances that detract evaluators from important methodological tasks. Political, ethical, and human factors are present in every program evaluation, and moving ahead without considering them will lead to a poor evaluation regardless of the technical merits of the study. Recall our discussion of the differences in evaluation and research in Chapter 1. Evaluators are working to make an impact on real people, organiza- tions, and societies. To do so, they must not only collect good data, but they must also see that intended audiences are open to using or being influenced by the data. This can be a challenging task!

In this chapter, we deal with three important, interrelated topics: (1) the polit- ical context of evaluation; (2) communication between the evaluator and others involved in the study or the program; and (3) ethical considerations and potential sources of bias in evaluation.

Evaluation and Its Political Context

Was it mere naïveté that accounted for the initial failure of evaluation researchers to anticipate the complexities of social and political reality? These researchers [evaluators] were mentally prepared by the dominant Newtonian paradigm of social science for a bold exploration of the icy [unchanging] depths of interplane- tary space. Instead, they found themselves completely unprepared for the tropical nightmare of a Darwinian jungle: A steaming green Hell, where everything is alive and keenly aware of you, most things are venomous or poisonous or otherwise dangerous, and nothing waits passively to be acted upon by an external force. This complex world is viciously competitive and strategically unpredictable because [evaluation] information is power, and power confers competitive advantage. The Darwinian jungle manipulates and deceives the unwary wanderer into serving myriads of contrary and conflicting ends. The sweltering space suits just had to come off. (Sechrest & Figueredo, 1993, p. 648)

66 Part I • Introduction to Evaluation

This colorful portrayal of evaluators’ first forays into the complex and unpre- dictable environment in which programs are managed and evaluated underscores a critical point: Evaluators work in a political environment. Evaluation itself is a political act—and the professional evaluator who prefers to eschew “politics” and deal only with technical considerations has made a wrong career choice.

From the beginning of modern-day evaluation, evaluators have written about the political nature of the activity. Suchman (1967), Weiss (1973), and Cronbach and his colleagues (1980) all emphasized the political nature of eval- uation, underscoring the fact that evaluation of publicly supported enterprises is inextricably intertwined with public policy formulation and all the political forces involved in that process. However, as Sechrest and Figueredo’s descrip- tion at the beginning of the chapter so vividly indicates, researchers moving into the political arena to conduct evaluations during its time of growth in the United States in the 1970s were unaware of the implications that working in a political environment had for their methodological work. (See also Datta and Miller [2004], and Weiss and Mark [2006] for their descriptions of these early evaluations.)

Today, at least partly because the field has had time to mature and gain more experience in conducting evaluations and to consider the factors that influence their success, evaluators are much more aware that they work in a po- litical environment. Nevertheless, perhaps because the training of evaluators tends to emphasize methodology, evaluators continue to be surprised at the political context of their work and a little unsure of what to do in it (Chelimsky 2008; Leviton 2001). Another explanation for at least U.S. evaluators’ naïveté about the political world may rest with the disciplines they studied. A study of members of the American Evaluation Association found that the most common fields of study for U.S. evaluators were education and psychology (American Evaluation Association, 2008). Unlike European evaluators (Toulemonde, 2009), few evaluators in the United States were trained in the fields of political science or economics and, therefore, consideration of politics and the political context may be relatively new to them. Shadish, a leader in evaluation theory and methodology who was trained in psychology, remarks on his coming to un- derstand that politics played an important role in evaluation (Shadish & Miller, 2003). He tells of his surprise years ago that people did not choose to adopt a program that had been proven to be quite successful. The occasion prompted him to read and then write an article on policymaking in the American Psycholo- gist, the leading psychology journal (Shadish, 1984). He notes that in preparing the article he read “Politics and Markets” by Charles Lindblom, an esteemed political scientist, along with some other important works in economics and political science, and “all of a sudden I realized that the world didn’t work around what was effective. It worked on other matters entirely—on politics and economics” (Shadish & Miller, 2003, p. 270).

In this section, we will discuss the reasons why evaluation is political and the nature of that political environment. Then, we will provide a few suggestions for how evaluators can work effectively in a political world.

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 67

How Is Evaluation Political?

The term “politics” has been applied so broadly to so many different phenomena that it has all but lost its meaning. It has come to stand for everything from power plays and machinations within a school or organization to political campaigns or relations among governmental agencies. The Merriam-Webster Dictionary reflects these different meanings as it defines politics variously as

• “the art or science concerned with government. . . .” • “the art or science concerned with guiding or influencing governmental

policy” • “competition between competing interest groups or individuals for power or

leadership” • “the total complex of relations between people living in society” (Merriam-

Webster, 2009)

So, how is the context of evaluation political? It is political in each of the ways cited in this definitions! Evaluation is most often concerned with governmental programs, whether they are programs funded or operated at the international, national, state, or local level.1 At the international level, the European Commis- sion, the governing body of the European Union, has mandated cross-country evaluations in Europe and, as a result, has introduced evaluation to many European countries, particularly those in Eastern Europe. In the United States, as we noted in the previous chapter, modern evaluation began through mandates from the federal government during the 1970s, but now is actively conducted at all levels. In addition, evaluation is quite active in state departments of education and in local school districts.

Evaluation is, of course, concerned with “guiding or influencing govern- ment policy,” the second definition, but perhaps of even more importance, eval- uators are working with individuals and groups of stakeholders who are also concerned with guiding or influencing governmental policy. These stakeholders want to influence government policy for many reasons, including helping their constituents and improving government and society. However, one reason for their interest in influencing government policy concerns the third definition: These stakeholders are competing with each other for resources, power, and leadership. Evaluations serve executive and legislative decision makers who make decisions about funding programs; about continuing, expanding, or cut- ting programs; and about policies that influence those programs. Evaluations

1In the United States, many evaluations take place in nonprofit organizations, which, by definition, are nongovernmental organizations. Nevertheless, we will consider these organizations governmental, or political, for the sake of this discussion because in the last few decades, as the U.S. government moved to privatization, many social services that had previously been delivered by government agencies were contracted out to nonprofit organizations. These government contracts are a large part of what prompts nonprofit organizations to conduct evaluations, and their interaction with government agencies places them in a similar political context.

68 Part I • Introduction to Evaluation

also serve program managers and other stakeholder groups who are competing with other groups for funding, for scarce resources, and for leadership in devel- oping and implementing interventions for solving societal problems. Policymak- ers and managers, and other stakeholders, are competing for resources, power, and leadership, and evaluation is a powerful tool for them to use in arguing for resources for their group or program. Thus, evaluation is part of the political system and operates within a political context.

Finally, of course, evaluations take place in organizations where complex relationships exist among many groups—in a school among parents, teachers, students, principals, and the central office; in social welfare departments among clients, social workers, managers, and policymakers. Evaluations are political be- cause even the most basic evaluation can upset or change these relationships. The evaluator may include different groups in the decision making about the evaluation, the data collection may prompt stakeholders to reveal beliefs or attitudes they had not considered or had not voiced, and the results often illus- trate the multiple ways in which the program is viewed and, of course, its suc- cesses and failures. Thus, evaluation work, in itself, is political.

Recall that the very purpose of evaluation is to make a judgment about the merit or worth of a program or policy. In this way, evaluation differs from research. Evaluation is not solely the collection of data using social science research methods. Instead, it involves making a judgment about the quality of the thing being stud- ied. As such, evaluation is highly political. Researchers do not make a judgment; they draw conclusions. Evaluators, however, make a judgment. That judgment may be about a part of a program, as often occurs in formative evaluation, or about the program or policy as a whole to assist in summative decisions. But moving from data to judgment also moves evaluators into the political realm. Further, evaluative judgments often include recommendations for change and such changes are political. These judgments and recommendations have implications for the competition between stakeholder groups and individuals for resources, leadership, and power.

Evaluation in a Political Environment: A Mixed Blessing? For many evaluators, an appealing aspect of evaluation is that it allows them to influence the real world of policy and practice. Researchers are more detached from that world. Research may influence policy or practice, but the researcher has no obligation to make that connection. The evaluator does. Evaluations are judged by their utility, and designing and implementing an evaluation that is likely to be used is one of an evaluator’s responsibilities. So, in order to achieve use, evaluators must attend to the political context of the program or policy they are studying.

Many evaluators tend to view politics as a bad thing, but we suggest there is a more enlightened view. Thoughtful evaluators of publicly funded programs view politics as the way laws and program regulations are made, the way indi- viduals and groups influence the government, and the very essence of what enables governments to respond to the needs of those individuals and groups. Indeed, without politics, government programs would be less responsive to public

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 69

needs, not more. As Carol Weiss has remarked, “Politics is the method we have as a nation to resolve differences and reach conclusions and decide policy issues. We don’t always like the way it turns out, but it’s an essential part of our system” (Weiss & Mark, 2006, p. 482).

Furthermore, evaluation serves a central purpose in our political system: accountability. Accountability means that government is accountable to the people, to the public that elects its leaders. Eleanor Chelimsky, the former director of the Program Evaluation and Methodology Division of the U.S. government’s General Accounting Organization, now the Government Accountability Office (GAO), which provides evaluative information to Congress, argues that “the ultimate client or user of our work is the public” (2008, p. 403). She views evaluation as central to democracy in making the work of government more transparent so that leaders may be held accountable. It is rare, she notes, for nondemocratic countries to have organizations within government that evaluate its work (Chelimsky, 2006).

Thus, evaluators do work in a political context, and evaluation exists, at least partly, to make government accountable. Of course, this system does not always work perfectly, but it is important for evaluators to recognize the potential for their role within this system. That potential is to provide information to the public and to stakeholder groups, be they policymakers, program managers, or groups lobbying for or against a particular cause. However, to do so well requires the eval- uator to have some understanding of that system and the complexities involved in evaluators’ interaction with it.

One reason why evaluators are sometimes reluctant to become involved in politics is their concern that the strength of evaluative findings lies in those findings being seen as independent or objective. In other words, policy actors and the public value evaluations because they think evaluators and their evaluations are not political, but instead are neutral and, as such, are providing information that is “untainted” by political views and beliefs. (Most evaluators recognize that, in fact, data and evalua- tions are inevitably influenced by values and that it is impossible to totally remove bias from data collection. We will discuss that issue further in later chapters on data collection. Suffice it to say here that evaluation is often valued by stakeholders because they perceive it to be objective.) Therefore, evaluators can be legitimately concerned with how their work within this political context may affect the perceived objectivity of their work. How can evaluators interact with those in the political environment to make sure their study addresses important issues and that the results get to the right people or groups without harming the perceived independence or objectivity of their work? There is no easy answer to this question, but we will describe several potential ways in which evaluators can work within the political system.

Interacting with the Political System. Vestman and Conners (2006) describe three different positions in which evaluators may interact with the political system:

1. The evaluator as value-neutral. In this position, the evaluator tries to protect or separate the evaluation from politics in order to maintain its perceived legitimacy and objectivity. Evaluators are rational methodologists who collect data and

70 Part I • Introduction to Evaluation

provide it to stakeholders. Judgments about quality are then made by the stake- holders. The evaluator works to remain separate and independent and, thus, maintain the objectivity of the evaluation.

2. The evaluator as value-sensitive. In this position, the evaluator works to maintain the technical aspects of the evaluation, the provision of information, as separate from politics. However, the evaluator recognizes that other elements of the evaluation— in particular providing judgments, considering ethical issues, and encouraging democratic values—require the evaluator to learn of and become involved in the political environment.

3. The evaluator as value-critical. Along a continuum of whether it is possible and whether it is desirable to separate evaluation from politics, the evaluator taking this position believes that values are inextricably part of politics and that it is critical for the evaluator to become involved in politics to actively articulate those values. Evaluation and politics are viewed from a larger perspective in this third position. Vestman and Conners note that the value-critical evaluator “views politics as some- thing integrated in our everyday life,” and so “there can be no separation between evaluation and politics and therefore no neutral value or operational position taken by the evaluator” (2006, p. 235). The evaluator, then, takes an active role in consid- ering what is in the public good, and serves “as a cooperative and structuring force in our understanding of society” (2006, p. 236). (See also Dahler-Larsen [2003])

Most evaluators today recognize that the first position is unrealistic. This position is one that is frequently taken by applied researchers who move into evaluation and are less familiar with the purposes of evaluation and its goals; in particular, the importance of use. Weiss (1998a, 1998b), Datta (1999), and Patton (1988, 2008a) have all noted that a principal reason that the evaluations of the 1970s were not used was the failure of evaluators to consider the political context. Most evaluators today recognize the need to balance the technical aspects of their study with a need to learn more about the political context to see that their evaluation is useful to at least some stakeholders in the political environment and to ensure that the evalua- tion is one that furthers democratic values of participation and equality. Our view is that the third position has elements of validity—politics and evaluation, at least in- formal evaluation, are part of everyday life, data collection is not a truly neutral ac- tivity, and evaluators should consider the public good. However, we do think it is important to attend to the view that formal evaluation and evaluators conducting those evaluations provide a different kind of information, one that addresses peo- ple’s concerns with accountability in today’s society. It is important that the results of evaluation studies be trusted; hence, the evaluator must pay attention to pre- serving the validity of the study and the perceived independence of the results. (You will read more on the ethical codes of evaluation later in this chapter. See also Chapter 8 on participatory and transformative approaches as means for achieving these goals.) However, the three positions developed by Vestman and Conner illus- trate the types of relationships that can exist between evaluation and politics and the important issues to consider in those relationships. They help us, then, to reflect

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 71

on the role one should play in specific evaluations. That role may differ from one evaluation to the next, depending on the context of the evaluation.

In this section, we have tried to make you aware that evaluation does occur in a political environment, why that is so, and how knowledge of the political context can be an advantage to an evaluator in bringing about use and, ultimately, helping improve government and society. We have also illustrated some of the roles or positions that evaluators can take within that environment and the risks and potential benefits of these different roles. In the next section, we will explore in more depth some of the actions that an evaluator can take to work effectively in a political environment.

Suggestions for Working Within the Political Environment

Chelimsky (2008) has spoken of “the clash of cultures” between evaluation and politics. As we have noted, that “clash” often occurs because our training focuses on methodology and those methodological skills are frequently gained in research courses with a positivist focus on assumptions of independence and neutrality. We think of ourselves primarily as researchers and are concerned that, in working in the political environment, we will lose our independence and neutrality. In addition, students in evaluation typically do not receive much training in working with stakeholders or in a political environment (Dewey, Montrosse, Schroter, Sullins, & Mattox, 2008).

The Joint Committee on Standards for Educational Evaluation, however, has long recognized the need for evaluators to attend to the political context. As we dis- cussed in Chapter 2, the Joint Committee on Standards for Educational Evaluation is a coalition, currently composed of 18 different professional organizations in edu- cation and psychology that have interests in evaluation. In 1981, the Joint Committee published standards for evaluators and consumers of evaluation to use to judge the quality of evaluations. In 1994, the Joint Committee wrote this standard:

Political Validity. The evaluation should be planned and conducted with antici- pation of the different positions of various interest groups, so that their coop- eration may be obtained, and so that possible attempts by any of these groups to curtail evaluation operations or to bias or misapply the results can be averted or counteracted. (p. 71)2

Note that the wording of this standard reflects two concerns: (a) learning about the political context, that is, the positions of various interest groups, so that the study may be conducted feasibly and effectively; and (b) avoiding possible bias of the evaluation during the study and misuse after the study is completed.

2The 2010 version of the Standards broadened this standard to “Contextual Validity: Evaluations should recognize, monitor, and balance the cultural and political interests and needs of individuals and groups” (Joint Committee, 2010). We approve of the new standard and its broader attention to many elements of context, but use the 1994 version here to illustrate particular elements of the political context.

72 Part I • Introduction to Evaluation

These two concerns suggest the merits of learning more about the political context and the risks of not doing so. A good evaluator learns about the political con- text, which includes the positions of various interest groups in regard to the pro- gram. More broadly, the evaluator takes the time to learn the identity of the various groups who are interested in the program, who have some power or con- trol over it, or who may be opposed to the program for whatever reason. The eval- uator learns the perspectives of each of these groups, in regard both to the program and to related issues. What is their history? What are their values? What has formed their interest in or opposition to the program? What interest might they have in the evaluation? How might they use the evaluation and its results in the future? This period of exploration helps acquaint evaluators with the political con- text of the program in a positive way and provides an important foundation for the future. They become aware of how the evaluation, and the questions it is addressing, might be used by various stakeholders. They can then consider how those stakeholders might be incorporated, for example, into the evaluation process or the dissemination phase.

The standard also conveys the risks the evaluator faces because evaluations occur in a political context. That is, individuals or groups may act to bias the evalua- tion. It should be no surprise that individuals or groups who are competing for resources or leadership or power should look to the evaluation as a possible threat or, conversely, as a tool they can use to achieve their goals. Of course, in this time when policymakers place a major premium on accountability and demonstrations that desired outcomes are being achieved, managers of programs, school principals, agency directors, and the like want the evaluation to look good, to show that their program is successful. Conversely, there are others, often less readily identifiable, who may want the evaluation to make the program look bad or to suggest serious prob- lems in implementation or in achieving outcomes. So, of course, the evaluator is sub- ject to political pressure. That pressure can take many forms: working to see that the evaluation addresses the outcomes or questions that a person or group desires; sug- gesting that certain people be interviewed and others avoided or data be collected in ways that they think will provide desired results; manipulating the interpretation or reporting of results for desired ends, be they positive or negative; and, finally, misus- ing the results, misquoting them, citing “evidence” without context, or purposely dis- torting findings. Evaluators are pressured by stakeholders in all these ways—and more. Therefore, it is imperative that the evaluator both know the political environ- ment and stakeholder groups and be willing to take courageous positions to maintain the validity or accuracy of the study and the dissemination of results in an accurate manner. We will address some of these issues in our discussion of ethical standards and codes, or expected ethical behavior in evaluation, later in this chapter. Here, we will discuss some of the steps that the evaluator can take to understand the political context and to avoid problems of bias or misuse because of the political milieu.

Eleanor Chelimsky makes several recommendations for reducing the “clash of cultures” and “improving the ‘fit’ between evaluative independence and the politi- cal requirements of a democratic society” (2008, p. 400). She notes that unwanted political influence can occur at any time during the evaluation: during the design

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 73

phase; as the study is being conducted; and as results are being interpreted, reported, and disseminated. Her recommendations address each of these phases. Following are her recommendations and, in the narrative following each, a description of how the evaluator might make use of her recommendations to improve the evaluation.

1. Expand the design phase. Take the time to learn the political context as described previously. Learn the history of the program and the values and interests of stake- holders who have supported or opposed it. Consider the reason(s) the evaluation has been commissioned, the questions it is intended to address, and how the evaluation and its focus may complement or subvert the interests of stakeholders.

2. Include public groups in evaluations, when relevant. As we will discuss in later chapters, evaluations today are typically at least somewhat participatory and, occasionally, involve many different stakeholder groups. The evaluator might include different groups in an advisory or planning task force created for the eval- uation, might collect data from the different groups through interviews or surveys, or might include each group in other ways. Gaining the input of different stake- holders can help the evaluation in many ways. It increases the validity and the credibility of the evaluation because the evaluation results present many different perspectives on the program. Seeking the participation of a number of groups or individuals may gain the support, or at least the understanding, of different groups for the evaluation. Finally, the involvement increases the likelihood that the public and related stakeholder groups are aware of the evaluation and its results. This both helps fulfill the accountability function and can help to achieve other types of use. How can an evaluation make a difference if many of those who are concerned with the program don’t know of the evaluation?

3. Lean heavily on negotiation. Chelimsky makes two important, but contrasting, points on this issue: (a) Talk, talk, talk with others. Many issues can be negotiated if we only continue to talk with the concerned groups or individuals and find room for compromise and change. (Think of Congress. Why does it take so long to pass a con- troversial bill? Because the legislators are attempting to represent the viewpoints or needs of their constituents, which often differ dramatically across districts.) (b) If the issue in dispute is something that cannot be compromised or threatens the propriety of the evaluation, such as revealing anonymous sources or altering data or results, the evaluator should show “an unwillingness to be intimidated, even when it’s clear the outcome may not be a happy one” (Chelimsky, 2008, p. 411).

4. Never stop thinking about credibility. The evaluator’s strength lies in the integrity of the study, the use of appropriate methods, honest and balanced interpretation of results, and judgments and recommendations based on those results. Evaluators and evaluation units within an organization or evaluation companies that contract for evaluation gain reputations. Since evaluators are working in a political envi- ronment, it is important for clients to believe that the evaluations they conduct or that are conducted by their organization or department are credible, even though the results may not always match some stakeholders’ or key clients’ wishes.

74 Part I • Introduction to Evaluation

5. Develop a dissemination strategy. Chelimsky (2008) strongly believes in evalua- tion as an important tool for a democratic society, as the means for making the gov- ernment accountable to the people. Therefore, the evaluator should communicate the results of the evaluation in ways that can be understood by each audience and develop appropriate methods for those results to be disseminated. In local evalua- tions, results are often disseminated most effectively in meetings—with parents at school PTA meetings, with program staff at staff meetings. Clients and the larger public may be reached through short pieces in newsletters or on web sites. (See Chapter 16 for recommendations on reporting results.)

1. Build in time during the planning stage to learn about the political context. What does your primary client hope to accomplish with the evaluation? Who funded the evaluation and what do they hope to learn? To gain from it? What other individuals or groups have potential interests in the evaluation? (This would certainly include those served by the program and those delivering the program, agencies that fund or set policies for the program, competing or potentially competing programs, and other programs or organizations that serve the same clients.) Take time at the beginning to interview individuals or representatives of these groups and learn their perspective on the program, their concerns or interests in the evaluation, and so forth. (See Fitzpatrick [1989] for a description of an evaluation in which she analyzes the polit- ical environment and identifies viewpoints of different stakeholder groups.)

2. During the planning stage, make sure your client knows that most evaluations find some successes and some failures. Many clients assume that their program achieves all of its goals and that the evaluation will demonstrate this. At the early stages, we always find an occasion to mention that few programs achieve all their goals and that it is quite likely that we will find that they are doing very well at some things and not so well at others. Since most of our evaluations have some formative com- ponents, we add that we should be able to provide information or suggestions on how to improve their program as well.

3. Think about the politics of your data collection. Are there groups, individuals, or data sources that some seem to want you to avoid? If so, why? Pilot test some data collection from this source to get a sense for their perspective or the information they might provide. If the pilot test suggests that their input is useful, use your ad- visory group and your own reasoning to argue to collect data from this source to add to the validity of the study. Think carefully about any method or component of data collection or design that seems to be especially encouraged or discouraged by particular groups. Why are they taking that perspective? Does the perspective tell you something about their values and the kind of information they need or find credible? Or, are there political reasons—hopes for success or failure—that are influencing their suggestions? Remember that you have been selected to conduct this evaluation at least partly because of your methodological expertise. Use that

Let us add a few other suggestions, building on our previous discussion and Chelimsky’s recommendations:

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 75

expertise to advocate and, ultimately, to select the most appropriate methodolog- ical strategies for the evaluation.

4. Include others, your advisory group and other stakeholders, in your interpretation of results. It is helpful to get other perspectives, but attend to those perspectives. Is an individual’s different interpretation of something a useful insight, one that reflects a genuine alternate perspective? Everyone’s perspective, including the evaluator’s, is influenced by one’s values and experiences. But consider to what extent the perspectives you hear may arise from political concerns and examine the real meaning of those concerns.

5. Seek input from many on the final report(s) and other products for disseminating results. The final written report should not be a surprise to key stakeholders. They should have seen and heard the results before receiving the final report. You can make sure they have by including them in a review and meeting with them as data are analyzed to get their reactions and to discuss your interpretations and recom- mendations. These meetings may be far more useful in achieving understanding and change than the written report. Be clear in how your conclusions and any recommendations emerge from your findings. An evaluation is not about finding fault with individual people; rather, it is concerned with identifying the value and the strengths and weaknesses of programs and policies. So consider your wording. Make suggestions for improvement or action where possible, but do so with care, making sure you can defend your conclusions, your judgments, and your recom- mendations. (See Chapter 18 for more on reporting findings.)

Fortunately, the Program Evaluation Standards and the Guiding Principles developed by the American Evaluation Association also provide evaluators with the means to work with many of these political issues. Many evaluators find it useful to share the Guiding Principles with their client and other stakeholders as they begin their work.

Establishing and Maintaining Good Communications

As this discussion on working in a political context indicates, good evaluation work involves much more than knowing how to collect and analyze data. Our recom- mendations for working in a political environment often concern communicating with stakeholders. But, interpersonal skills and communication are important enough to merit a separate section here. In this section we want to consider how to develop and maintain good relationships with clients and other stakeholders while conducting an evaluation. After citing some of the dangers to evaluation from working with others in a political environment—“the hidden agendas, coop- tation of the evaluator, subversion of the evaluation question, sabotage of the design or the measurement scheme, and misuse of results”—Laura Leviton in her Presidential Address to the American Evaluation Association focused her remarks on the problems evaluators themselves present: “Often, evaluations are blindsided and the product is less than it could be because of our own lack of skill in dealing

76 Part I • Introduction to Evaluation

with people and organizations” (2001, p. 3). Noting that research shows there are many different types of intelligences and that the strength of evaluators is often in analytical skills, she added:

I think that sometimes evaluators are absolutely dumbfounded at the negative effects of their words and actions on other people. Yet it should be clear that the abil- ity to communicate well and relate well to others is fundamental to negotiating a better and more useful evaluation question, employing better methods with less resistance, and conveying the results more effectively to clients. In other words, our lack of interpersonal intelligence and “people skills” often make the negative syndromes of evaluation worse than they might otherwise be. Equally bad, our lack of people skills prevents us from optimizing our other talents and skills to produce high quality and useful evaluations. (Leviton, 2001, p. 6)

Now, we think that many evaluators do have interpersonal skills but may simply not think to use them in effective ways in conducting evaluations because they are too focused on methodological issues and their role as social scientists. Just as we discussed the evaluators’ obligation to learn about the political context of the program they are evaluating, we want to emphasize that communicating effectively with those involved in the evaluation is critical to the success of the evaluation. As Leviton suggests, evaluators must think about their language, learn about the perspectives of others, and involve them—and learn from them—as the evaluation is conducted.

Here are a few of our suggestions for establishing and maintaining good com- munications during an evaluation:

1. In planning the evaluation—writing the proposal or preparing the contract—build in time for communication. Remember to include time for communication through meet- ings, meetings, and more meetings! Discuss evaluation plans and results orally with key groups first. Allow for dialogue. Listen to how the different individuals you are meeting with react to the evaluation plans and, later, to the results. Communication with others, of course, should not always be in a group setting. The evaluator should remember to take time to chat with those delivering or managing the program when on site. Learn what their concerns are about the evaluation and about the program itself. What are they worrying about? What pressures are they under? Use interim reports and memos to send information to individuals or groups whom you may not encounter frequently, but follow up by phone or in person to talk with them and get their thoughts. The evaluator needn’t be co-opted by seeking to communicate with these groups and to hear their ideas. But hearing their ideas, their perspectives, and their experiences with the program and the evaluation is the only way that the evaluator can break down barriers to evaluation and prepare stakeholders to receive the results and see them as credible and useful.

2. Prepare clients (those who sponsor the evaluation) and other stakeholders for evalua- tion. Develop an “evaluation spirit” by talking with all participants about the purpose and benefits of the evaluation. Resistance to evaluation comes naturally to most

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 77

people and not knowing what to expect can only increase such resistance. If stakeholders are new to evaluation, or have had previous bad experiences with it, learn more about their concerns and fears. Ask about their previous experiences with evaluation and what they think this one will do. Let them know your views about evaluation and what it can do. As appropriate, provide stakeholders with in- formation on other evaluations or evaluation approaches or on organizational change and learning to illustrate what evaluation and self-examination can accomplish for an organization. Current literature on continuous improvement and learning organizations can be helpful. (See Chapter 9 for a discussion of using evaluation to improve learning in organizations and how to use the research on learning to improve evaluation practice.)

3. Invite and nurture outside participation. In evaluating a school program, for example, remember that parents, school board members, and citizens in the community are all potential stakeholders. Their participation not only strengthens the evaluation but also signals that this is an important project. When evaluating programs in health and human services or in corporate settings, the external stakeholders may be different (e.g., citizen groups, families of those receiving treatment, county commissioners, service providers, corporate board members, consumer advocate groups). Learn who the stakeholders are for the program. Eval- uators can play an important role in seeking to empower previously disenfran- chised groups by bringing representatives of these groups into the discussion. (See our discussion of participatory and empowerment approaches in Chapter 8.)

4. Seek input from key individuals or groups on evaluation decisions. Evaluators should not make important decisions alone. While evaluators may be the most expert in methods of data collection, the client and stakeholders have expertise in the program being evaluated and their experiences with it. Their needs and views must be sought and considered. Foster a spirit of teamwork, negotiation, and compro- mise. Seek input and consult with others at important points, including determin- ing the purpose of the evaluation and developing the evaluation questions, selecting sources and methods for collecting data, developing measures or looking for existing measures, analyzing and interpreting the data, and, of course, consid- ering the implications of the findings. The evaluator should frequently seek input from others on when to disseminate results (not waiting until all is done) and how to do so. Others will know what is most likely to be heard or read, when individu- als or groups are interested in results, and which results would be of most interest to them. Watch out for political agendas, though. Don’t make assumptions about what people want to know. Instead, talk with them to find out.

5. Encourage constructive criticism of the evaluation. Invite stakeholders to challenge assumptions or weaknesses; encourage divergent perspectives. Model a spirit of fair- ness and openness when critical feedback is given. By encouraging stakeholders to provide constructive, critical feedback on their work and then responding in an accepting and open manner, evaluators can demonstrate the evaluation spirit they hope to see in stakeholders. (See Fitzpatrick and Donaldson [2002] for a discussion

78 Part I • Introduction to Evaluation

of Donaldson’s use of 360-degree feedback during an evaluation to provide a means for program people to comment on the evaluation, just as the evaluator is com- menting on the program.)

Following these recommendations can improve responsiveness to the evaluation— and, hence, its subsequent use—and can enhance the quality of the evaluation prod- uct itself.

Maintaining Ethical Standards: Considerations, Issues, and Responsibilities for Evaluators

Given that evaluation occurs in a real-world political context and that to carry out a good evaluation, evaluators must learn about the political context and develop good communications with stakeholders, it should not be surprising that ethical problems can often arise for the evaluator. In our discussion of the political context, we noted the political pressures that can emerge to change the purpose of the eval- uation or an evaluation question, to select data sources or designs that are more likely to produce desired results, and, of course, to interpret or report findings in more favorable or desired ways. In addition, as one works to improve communica- tion with clients and stakeholders, closer relationships develop and these relation- ships can present ethical problems. Therefore, evaluators must be sufficiently sensitive to the potential ethical problems that can occur in evaluation so that they recognize the problems when they occur and have some sense for what to do about them. One step in that direction is gaining knowledge about the profession’s expectations for ethical behavior in evaluation.

Let us begin this important section on ethical behavior in evaluation with a real-world example of ethical failures in the evaluation of a different type of prod- uct. In 2009, the world was facing what some were calling a “financial meltdown.” Home values and the stock market were plummeting. Thousands had to leave their homes because of foreclosures. Unemployment was increasing and predicted to reach 10% in the United States. Countries all over the world were affected by the crisis. Though the factors that contributed to the financial meltdown were many, analysts and elected officials were highly critical of the role of rating agencies in this crisis. Rating agencies such as Moody’s Investors Service, Fitch, and Standard & Poor’s analyze stocks and bonds and assign credit ratings based on their research. Dating back to the early years of the twentieth century, these agencies began con- ducting research to judge the quality of companies that issue bonds and, through their ratings, to provide information to investors who made decisions based on these ratings, that is, deciding that a company is safe or unsafe for investment. But changes occurred in recent years that affected the quality of these ratings. For more than 50 years, investors paid these companies for their ratings. As the economy worsened in the 1970s, companies issuing bonds began paying agencies for their own ratings. This established a huge, but relatively unnoticed, conflict of interest.

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 79

Rating agencies were now rating the bonds of those who paid them. Of course, some bonds continued to receive low ratings, but many were highly rated. In 2007, 37,000 “structured finance products,” the complex financial products that are the source of much of the financial turmoil today, received the highest possible ratings (“A Brief History of Rating Agencies,” 2009). Today, many of those ratings have been downgraded, but too late for many investors and citizens who are paying for company bailouts. The first suggestion of problems with the rating systems arose in 2001 when Enron Corporation, which had earlier been highly rated by these agencies, defaulted. In 2007, the heads of these rating agencies were called to testify before the U.S. Congress and were heavily criticized for their failure to identify risky investments and warn investors.

Evaluators, like analysts at Moody’s, Standard & Poor’s, and Fitch, judge the quality of something and provide information for clients and other stakeholders to make decisions based on those judgments. Although our actions are unlikely to collectively threaten the financial stability of the world economy, many elements of our work are very similar. Clients and others—the public—look to evaluators to provide objective, independent judgments about the quality of programs, prod- ucts, or policies. We use analytic methods to judge the product and assume that the transparency and validity of those methods will substantiate our findings. But just as employees at the rating agencies must interact with real people at the com- panies they are judging to conduct their analyses, we, too, interact with clients and stakeholders to learn about the programs or policies we are judging. In many cases, our client is the CEO or a manager of the program we are evaluating. Our results may be used by our client or the manager of the program to seek further funding or to make decisions about funding a program, just as the results of bond raters are used by investors and company managers to make decisions about fund- ing a company. The potential for ethical conflicts—conflicts we may not see— is great. The conflicts lie not simply in methodological choices, but in the relationships that develop when research methods are used in the real world. In this section, we will describe some of the ethical problems that evaluators encounter, discuss ethical codes developed to guide practice, and provide some suggestions of our own to help evaluators consider how to behave ethically.

What Kinds of Ethical Problems Do Evaluators Encounter?

Studies of practicing evaluators reveal the types of ethical challenges that evalua- tors face. Morris and Cohn (1993) surveyed members of the American Evaluation Association and found that nearly two-thirds of the evaluators had encountered major ethical challenges in their evaluation work. Their analysis of the types of ethical violations that members encountered showed these types of problems:

A. Challenges in the contracting phase: • Stakeholder has already decided what the findings “should be” or plans to

use the findings in an ethically questionable fashion.

80 Part I • Introduction to Evaluation

• Stakeholder declares certain research questions off-limits in the evalua- tion, despite their substantive relevance.

• Legitimate stakeholders are omitted from the planning process. B. Ethical concerns regarding confidentiality or disclosure agreements:

• Disputes or uncertainties concerning ownership/distribution of the final report, raw data, etc.

• Although not pressured by stakeholders to violate confidentiality, the evaluator is concerned that reporting certain findings could represent such a violation.

• Evaluator is pressured by stakeholder to violate confidentiality. C. Challenges in presenting findings:

• Evaluator is pressured by stakeholders to alter presentation of findings. • Evaluator is reluctant to present full findings for unspecified reasons. • Evaluator has discovered behavior that is illegal, unethical, dangerous, etc. • Evaluator is unsure of his or her ability to be objective or fair in presenting

findings. D. Ethical concerns after the report is complete concerning misinterpretation

or misuse: • Findings are suppressed or ignored by the stakeholder. • Unspecified misuse by the stakeholder. • Findings are used to punish someone (the evaluator or someone else). • Findings are deliberately modified by the stakeholder prior to release. • Findings are misinterpreted by the stakeholder (Morris & Cohn, 1993,

pp. 630–632).

Morris and Cohn’s study remains one of the few to empirically examine the ethical challenges that evaluators face in their work. The most frequent category of problems occurred in preparing results: almost two-thirds of the evaluators reported being pressured by stakeholders to alter results. Morris and Cohn draw several interesting conclusions from their study. First, their content analysis of responses revealed that ethical problems “can, and do, arise in every stage of eval- uation” (1993, p. 639). Although respondents reported problems at every stage of the evaluation, the most frequently cited problems occurred at the final stages of the evaluation, in presenting findings. These ethical problems generally arise from pressures from stakeholders, typically the client, concerning the product of the evaluation. In other words, stakeholders are less likely to apply pressure as the study is being carried out than with the final product, the evaluation findings, and the report. In fact, clients presumably value the scientific and objective nature of the work they have hired the evaluator to conduct. But, their concerns emerge with the product itself when the results are surprising or disagreeable. When clients or other stakeholders argue with the evaluator over the interpretation of the results or the presentation of findings, the evaluator may be surprised, hav- ing conceptualized his or her role as an independent, objective evaluator. Thus, Morris and Cohn note, the stakeholders’ pressures, as seen by the evaluator, “undermine the mission of scientific inquiry, which is to seek the truth and

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 81

communicate it” and the evaluators “feel pressured to compromise their role as scientists” (1993, p. 639). These conflicts reveal the “clash of cultures” described by Eleanor Chelimsky and discussed at the beginning of this chapter. That is, stake- holders in this political context are competing for resources, power, and leader- ship. They see the evaluation findings as a tool that they can use to their benefit in that competition. The evaluation is valued because of its perceived objectivity. But, when the findings clash with their needs in the competition, the political context becomes more important to the stakeholder than the continued objectiv- ity or independence of the evaluator’s conclusions.

Faced with such ethical conflicts, the evaluator must take a stand to protect the credibility of this evaluation and future ones. The situation is not an easy one. It is relatively rare for stakeholders to ask evaluators to actually change data. And, to bring about use, good evaluators generally seek the input of clients and other stakeholders on the interpretation of results and presentation of findings in draft reports. So, when the client gives feedback, suggesting changes, evaluators may interpret these “suggestions” differently. (What is the nature of the suggestion? Does it result in a major difference in the interpretation? How strongly does the client ask for or even demand changes?) So the request for changes must be inter- preted by the evaluator. Of course, in some cases, the ethical challenge would be quite clear: The client demands that the evaluator change major conclusions con- cerning the quality of the program. In other cases, the client may be asking for what the client perceives as editing changes, but the evaluator sees as watering down the clarity or strength of the judgments made. How does the evaluator handle this more ambiguous ethical challenge? Dealing with the first situation, in which major conclusions on the quality of the program are demanded, the obvious, ethical chal- lenge requires courage and integrity on the part of the evaluator to maintain the validity of the findings. Dealing with the second ethical challenge certainly may, in the end, require courage and integrity, but may initially require careful thought and reflection concerning the intentions of the client’s editing suggestions and the ownership of the report, its wording, and its conclusions. Finally, both situations require the evaluator to recognize that an ethical challenge has occurred.

Although the Morris and Cohn study reveals much of interest concerning the types of ethical conflicts that evaluators actually encounter, they are also concerned that one-third of their sample reported they had not encountered any ethical con- flicts in their evaluation work. Their concern, rightly so, is that these evaluators are not just operating in safer environments, but, instead, are not recognizing ethical conflicts or challenges when they arise. As Morris and Cohn conclude, “The subjec- tive notions of ethicality held by many unchallenged group members [those not reporting an ethical challenge] differ in systematic ways from those held by members of the challenged group” (p. 635). Since their study was concerned with describing the ethical problems that evaluators encounter, they were unable to explore the rea- sons for these different notions of ethical behavior. However, they recommend, and we concur, that the differences illustrate the need for education and training for evaluators to discuss and explore the ethical challenges they may encounter—how to recognize and interpret them, and, ultimately, how to deal with them.

82 Part I • Introduction to Evaluation

One of the few other studies on ethical behavior among evaluators took a qual- itative approach by asking a smaller number of evaluators to discuss how they dealt with ethical issues in their work (Honea, 1992). Honea found that these evaluators seldom discussed ethics or values in their work lives. She identified four factors that seemed to inhibit such discussions. Specifically, her interviewees perceived that:

1. They were being ethical if they were following the model of “objective scien- tist,” and lapses in objectivity were viewed as less an ethical than a method- ological concern;

2. Participants in evaluation always behave ethically, so discussion of ethics is unnecessary;

3. Being a member of an evaluation team and engaging in team deliberations prevents unethical behavior from occurring;

4. Neither evaluators nor others involved in the evaluation have the time to confront or discuss ethical issues.

These studies suggest that more attention should be given to ethical issues in educating and training evaluators. In the next sections we discuss the professional codes that can be helpful to evaluators in raising their awareness of ethical obli- gations and in communicating professional obligations to stakeholders.

Ethical Standards in Evaluation

Since the mid-1970s, the field of evaluation has been active in developing different ethical codes or standards. (See Fitzpatrick [1999] for a discussion of the history of ethical codes in evaluation and a comparison to codes in other disciplines.) Currently, the two most prominent codes for evaluation in the United States are the Program Evaluation Standards developed by the Joint Committee on Standards for Educational Evaluation (1981, 1994, 2010) and the Guiding Principles for Evaluators developed by the American Evaluation Association in 1995 and revised in 2003.

These two codes differ in purpose. The Standards are designed to assist both evaluators and consumers in judging the quality of a particular evaluation. The Guiding Principles are to provide ethical guidance for evaluators in their everyday practice. The Standards focus on the product of the evaluation. The Guiding Princi- ples focus on the behavior of the evaluator. Both, however, inform us as to ethical and appropriate ways for evaluations to be conducted. And, as Sanders (1995) observes, there are no conflicts or inconsistencies between the two documents.

Other countries, too, have been involved in developing ethical codes. The Canadian Evaluation Society (1992) and the Australasian Evaluation Society (Amie [1995]) have each developed ethical codes for evaluators. Many European countries, including Switzerland, Germany, France, and England, have adopted ethical codes or standards. The Swiss and German codes draw on the Standards of the Joint Committee, as do the African Evaluation Guidelines (Rouge, 2004). Countries in Asia, South America, and Africa are developing codes either as indi- vidual countries or as groups (Stufflebeam, 2004a). This activity reflects the many

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 83

and different ethical challenges evaluators face in conducting their work. As Hendricks and Conner (1995) noted when the AEA Guiding Principles were first published, the context of evaluation and the ethical principles that are of primary concern differ across countries. Rouge (2004), for example, discusses the devel- opment of evaluation codes for the diverse countries in Africa, and how the codes, although beginning with the Joint Committee’s Standards as a guide, had to be adapted to the different context of politics and governments in Africa. Specifically, given the many authoritarian governments in Africa, the guidelines include pro- tection for evaluators and special considerations regarding political viability and the disclosure of findings. In these countries where evaluation cultures are new, ethical guidelines can be useful in helping to form those cultures.

What are the ethical obligations of evaluators? We will briefly review the ethical components of the Program Evaluation Standards and the Guiding Principles here. The more complete text of both documents is presented in Appendix A.

The Program Evaluation Standards. Before moving into a discussion of the Standards themselves, let us briefly describe how the Standards were developed. When appointed in 1975, the task of the Joint Committee on Standards for Edu- cational Evaluation was to develop standards for evaluators and other audiences to use to judge the overall quality of an evaluation. Today, 18 academic and profes- sional associations belong to the Joint Committee and oversee the revision and publication of the Standards.3 The Standards have been approved by the American National Standards Institute (ANSI) and not only have served as a model for edu- cational evaluations in the United States and Canada, but have also been adapted for use in other countries and in disciplines beyond education, such as housing and community development (Stufflebeam, 2004a). The first Standards published in 1981 were designed to address evaluation activities in public schools in the United States. The revision in 1994 expanded their purview to other educational settings, including higher education and training in medicine, law, government, corpora- tions, and other institutions.

The developers of the Standards and their revisions make use of an unusual “public standard-setting process” in which evaluators, educators, social scientists, and lay citizens review, field test, comment, and validate the standards (Joint Committee, 1994, p. xvii). Daniel Stufflebeam, who has led the development of the Standards, notes that a key step in the early stages in 1975 was the decision to include on the Joint Committee not only professional groups that represent evaluators and applied researchers, but also professional associations that represent school administrators, teachers, counselors, and others who are often clients for educational evaluation

3These include the American Evaluation Association (AEA) and the Canadian Evaluation Society (CES), as well as the American Educational Research Association (AERA), the Canadian Society for the Study of Education (CSSE), the American Psychological Association (APA), the National Council on Measurement in Education (NCME), and many associations concerned with school administration and education, including the National Education Association (NEA), the American Association of School Administrators (AASA), and the Council of Chief State School Officers (CCSSO).

84 Part I • Introduction to Evaluation

(Stufflebeam, 2004a). Inclusion of these groups on the Joint Committee led to some contentious discussions about what constituted a good evaluation. However, these dis- cussions helped produce standards that are useful guides for practicing evaluators in designing evaluations and helping clients and other stakeholders to know what to ex- pect from an evaluation. Standards also play a major role in metaevaluations or judg- ing the final product of an evaluation. (See Chapter 13 for more on metaevaluations.)

The Joint Committee defines an evaluation standard as “[a] principle mutu- ally agreed to by people engaged in the professional practice of evaluation, that, if met, will enhance the quality and fairness of an evaluation” (Joint Committee, 1994, p. 3). As such, the Standards are important for the reader to consider before we move into a discussion of how evaluations should be conducted, because the Standards communicate what the evaluator, in planning and carrying out an eval- uation, should consider. They serve as a guide for the evaluator and a means for the evaluator to discuss and reflect on issues critical to the evaluation with clients and other stakeholders.4

The Joint Committee developed 30 standards, which are presented in their entirety in Appendix A. Our attention will be devoted here to the five important attributes of an evaluation under which the 30 standards are organized. The iden- tification of these five attributes was, in itself, a quite significant step for the field of evaluation because it signified the major areas of importance in conducting an eval- uation. The four areas are (1) utility, (2) feasibility, (3) propriety, and (4) accuracy. The 2009 revision of the Standards added (5) evaluation accountability. Note that prior to the identification of these areas, it was generally assumed that evaluations should be judged based on their validity, or accuracy, because validity is the primary means for judging the quality of research (Stufflebeam, 2004a). The identification of the other areas reminded evaluators and their clients that evaluation also needed to attend to other issues, because it was being conducted in the field and for differ- ent purposes than research.

To articulate the meaning of the original four areas, let us draw from the Joint Committee’s publication of the Standards in 1994.5 Their introduction to each area addresses the following concepts:

Utility standards guide evaluations so that they will be informative, timely, and influential. They require evaluators to acquaint themselves with their audiences, define the audiences clearly, ascertain the audiences’ information needs, plan eval- uations to respond to these needs, and report the relevant information clearly and in a timely fashion. . . .

4The Joint Committee notes that not every standard is relevant to every evaluation. They recognize that the context for individual evaluations differs and, therefore, the nature of the evaluation differs. The evaluator and others should consider which of the standards are most relevant for guiding or judging an individual evaluation. 5In late 2009, the Joint Committee approved new standards to be published in 2010. We have obtained a prepublication list of the new standards, but the discussion and explanation of these standards are to be published in 2010. Therefore, we present the 2010 standards, but will rely on the previous version for a discussion of the original four categories and their meanings.

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 85

Feasibility standards recognize that evaluations usually are conducted in a natural, as opposed to a laboratory, setting and consume valuable resources. There- fore evaluation designs must be operable in field settings, and evaluations must not consume more resources, materials, personnel, or time than necessary to address the evaluation questions . . .

Propriety standards reflect the fact that evaluations affect many people in a variety of ways. These standards are intended to facilitate protection of the rights of individuals affected by an evaluation. They promote sensitivity to and warn against unlawful, unscrupulous, unethical, and inept actions by those who conduct evaluations. . . .

Accuracy standards determine whether an evaluation has produced sound information. The evaluation of a program must be comprehensive; that is, the eval- uators should have considered as many of the program’s identifiable features as practical and should have gathered data on those particular features judged important for assessing the program’s worth or merit. Moreover, the information must be technically adequate, and the judgments rendered must be linked logically to the data. (Joint Committee, 1994, pp. 5–6)

The identification of these four areas of concern reminds us that evaluation is conducted in the field with the intention of providing sound information to others. The first area emphasizes the importance of use to evaluation and identi- fies some of the steps the evaluator can take to maximize the likelihood that the evaluation will be used. The identification of feasibility as an area of concern reflects the special considerations that must be made because evaluation takes place in real-world settings with real clients and stakeholders. Procedures must be practical and cost-effective. In addition, for the evaluation to be feasible, the eval- uator must consider the context in which the evaluation is conducted—the polit- ical and cultural interests. Accuracy standards reflect concerns with the scope of the study and the means by which data are collected. The means for addressing each of these three areas will be discussed further in subsequent chapters. Utility standards and use are the focus of Chapter 17, in which we discuss research and theories on the use of evaluation and recommend ways to increase use. Feasibil- ity is addressed in Chapter 14, in which we discuss planning and managing the study. Finally, accuracy is examined in Chapters 15 and 16 where we discuss methodological concerns.

Here we will focus on the propriety area because our primary concern in this chapter is with ethical conduct in evaluation. The specific standards listed under propriety in the new 2010 Standards are as follows:

• “P1 Responsive and Inclusive Orientation. Evaluations should be responsive to stakeholders and their communities.” This standard, as do many in the 2010 edition, emphasizes the evaluator’s obligation to be responsive to stakeholders and to con- sider the many different groups who may have interests in the evaluation.

• “P2 Formal Agreements. Evaluation agreements should be negotiated to make obligations explicit and take into account the needs, expectations, and cultural contexts of clients and other stakeholders.” External evaluations generally include

86 Part I • Introduction to Evaluation

a formal agreement, but internal evaluations often do not. The Joint Committee encourages evaluators to develop a formal agreement at the planning stage of each evaluation and to use it as a guide. The guidelines to this standard provide a use- ful list of the types of information that might be included in a formal agreement.

• “P3 Human Rights and Respect. Evaluations should be designed and conducted to protect human and legal rights and maintain the dignity of participants and other stakeholders.” The rights of human subjects are understood to include issues such as obtaining informed consent, maintaining rights to privacy, and assuring confidentiality for those from whom data are collected. (See later section on Institutional Review Boards or IRBs in this chapter.)

• “P4 Clarity and Fairness. Evaluations should be understandable and fair in ad- dressing stakeholder needs and purposes.” New to the 2010 edition of the Standards is an emphasis on clarity, recognizing that many different audiences and stake- holder groups have interests in the evaluation and must receive results in ways that are understandable and comprehensible to them.

• “P5 Transparency and Disclosure. Evaluations should provide complete descrip- tions of findings, limitations, and conclusions to all stakeholders, unless doing so would violate legal and propriety obligations.” Government in the early twenty-first century has emphasized transparency and the wording of this 2010 standard reflects that emphasis, although previous standards have also emphasized disclosing find- ings to all who are affected or interested within legal boundaries.

• “P6 Conflicts of Interest. Evaluations should openly and honestly identify and ad- dress real or perceived conflicts of interest that may compromise the evaluation.” Conflicts of interest cannot always be totally eliminated, but if evaluators consider potential conflicts of interest and make their values and biases explicit in as open and honest a way as possible, in the spirit of “let the buyer beware,” clients can at least be alert to biases that may unwittingly creep into the work of even the most honest evaluators.

• “P7 Fiscal Responsibility. Evaluations should account for all expended resources and comply with sound fiscal procedures and processes.” This standard has been included in all editions and reflects the important fiscal obligations of evaluations and emphasizes that the proper handling of these fiscal responsibilities, as well as respecting human rights, is part of the propriety of the evaluation.

Note that the Standards emphasize quite a few different issues and, thus, illus- trate how ethical concerns cross many dimensions of evaluation and should be considered throughout the study. Traditionally, ethical codes in the social sciences focus on the means for collecting data from others; that is, ensuring informed consent, confidentiality, or anonymity, as appropriate, and dealing with other important issues in protecting the rights of individuals when collecting data from them. These standards indicate that ensuring the rights of human subjects is cer- tainly one very important standard in evaluation. But, the propriety standards also

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 87

communicate other areas of ethical concern for the evaluator, such as being re- sponsive to many stakeholders; considering the cultural and political values that are important to the evaluation; being clear on agreements and obligations in the eval- uation, conflicts of interest, and reports of findings and conclusions; and managing fiscal resources appropriately. The standard on formal agreements attests to the fact that evaluations, unlike research, always include other parties and, therefore, mis- understandings can arise. Typically, an evaluation involves a partnership between the evaluator and the client. Putting agreements in writing and following them, or formally modifying them as changes are needed, provides the evaluator and the client with a means for clarifying these expectations. At the beginning of the process, the evaluator and client can begin by talking about their understandings and expectations and putting them in writing. This agreement then provides a doc- ument to use to monitor these understandings about the evaluation and, thus, can prevent the violation of other propriety standards. Clients, for example, may not be aware of propriety issues such as informed consent or the obligation to disseminate results to others. Formal agreements can work to clarify these concerns. The 2010 Standards emphasis on clarity and transparency further highlights the fact that eval- uation occurs in the public arena where democracy requires attention to many dif- ferent stakeholders.

Take a minute now to read the complete text of all of the Standards in Appendix A to become acquainted with the meaning and intent of each.

The Guiding Principles. The American Evaluation Association’s (AEA) Guiding Principles for Evaluators are elaborations of five basic, broad principles (numbered A–E here to reflect their enumeration in the original document):

A. Systematic Inquiry: Evaluators conduct systematic, data-based inquiries. B. Competence: Evaluators provide competent performance to stakeholders. C. Integrity/Honesty: Evaluators display honesty and integrity in their own behav-

ior and attempt to ensure the honesty and integrity of the entire evaluation process.

D. Respect for People: Evaluators respect the security, dignity, and self-worth of respondents, program participants, clients, and other stakeholders.

E. Responsibilities for General and Public Welfare: Evaluators articulate and take into account the diversity of general and public interests and values that may be related to the evaluation (American Evaluation Association, 2004, The Principles section). (See Appendix A for a more complete presentation of the Guiding Principles.)

Systematic inquiry emphasizes the distinction between formal program eval- uation and the evaluations conducted in everyday life. Program evaluators, this principle asserts, use specific, technical methods to complete their evaluations. Because no method is infallible, the principle encourages evaluators to share the strengths and weaknesses of the methods and approach with clients and others to permit an accurate interpretation of the work.

88 Part I • Introduction to Evaluation

The Competence principle makes evaluators aware of the need to practice within their area of expertise and to “continually seek to maintain and improve their competencies, in order to provide the highest level of performance” (American Eval- uation Association, 2004, Section B.4). An emphasis on maintaining professional knowledge is a principle common to many professions’ ethical codes, serving to remind their practitioners that their education is ongoing and that they have an obligation to the profession to produce work that maintains the standards and reputation of the field (Fitzpatrick, 1999). The 2004 revision of the Guiding Princi- ples specifically addressed the need for evaluators to be culturally competent in the context of the program they are evaluating. Principle B.2 states

To ensure recognition, accurate interpretation, and respect for diversity, evaluators should ensure that the members of the evaluation team collectively demonstrate cultural competence. Cultural competence would be reflected in evaluators seek- ing awareness of their own culturally based assumptions, their understanding of the world views of culturally different participants and stakeholders in the evalua- tion, and the use of appropriate evaluation strategies and skills in working with cul- turally different groups. Diversity may be in terms of race, ethnicity, gender, religion, socio-economics, or other factors pertinent to the evaluation context. (American Evaluation Association, 2004, Section B.2)

This new principle reflects the recent attention that AEA and professional evaluators have given to the issue of cultural competence, recognizing that evalua- tors are often responsible for evaluating programs that serve clients or involve other stakeholders who have different cultural experiences and norms than those of the evaluator. To accurately evaluate the program competently, the evaluator needs to consider the context of the program and those it serves. The 2010 revision of the Standards also reflects this concern with its emphasis on learning the cultural context. (See the interview with Katrina Bledsoe, in the “Suggested Readings” section at the end of this chapter for her description of an evaluation where the different cultural norms of clients, volunteers, program staff, and managers were critical to evaluating the program and making recommendations for improvement.)

The principle of Integrity/Honesty also mirrors many of the issues articulated in the Standards. It addresses ethical concerns regarding negotiations with clients and relevant stakeholders, conflicts of interest, sources of financial support, misrepre- sentation of findings, and consideration of methods. Let us highlight two issues here: Guiding Principle C.5 explicitly states, “Evaluators should not misrepresent their procedures, data, or findings. Within reasonable limits, they should attempt to pre- vent or correct misuse of their work by others” (American Evaluation Association, 2004, Section C.5). Further, Principle C.6 notes that, “If evaluators determine that certain procedures or activities seem likely to produce misleading evaluative infor- mation or conclusions, they have the responsibility to communicate their concerns and the reasons for them [to the client]” (American Evaluation Association, 2004, Section C.6). These two principles put evaluators in an assertive position to prevent some of the ethical challenges encountered by evaluators in the research by Morris and Cohn (1993) described earlier.

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 89

As noted, the Standards and Guiding Principles each provide a means for evaluators to convey to clients their professional obligations. The client has hired an evaluator because of his or her autonomy and expertise. Part of that expertise involves the sense of professionalism that comes from knowing and following the ethical standards of the profession. While evaluators have an obligation to inform clients of these Standards and Guiding Principles, conforming to them can be in the clients’ self-interest as well, by increasing the credibility of the evaluation.

Respect for People corresponds to Standard P.3, “Human Rights and Respect.” This principle and the related standard concern expectations about obtaining informed consent from those from whom data are collected and advis- ing participants regarding the scope and limits of confidentiality. The core of this principle is drawn from the ethical codes of many social sciences concerned with collecting data from individuals—for example, the American Psychological Association, the American Anthropological Association, and the American Educa- tional Research Association. New sections of this principle in 2004 focused on the obligation of the evaluator to understand the context of the evaluation, including the political, social, and economic climate of the program and its stakeholders. This addition built on the evaluator’s obligation to have cultural competence. However, it also emphasized the understanding that context and its political, social, and economic components were part of showing respect for people, which was the focus of this principle. Principle D also indicated that the evaluator should ensure that those who provide data do so willingly and do not feel forced into participa- tion out of fear that they may lose the services the program delivers if they decline to participate in the evaluation. Respect for people also reminded evaluators of their obligation to be sensitive to ethnic, cultural, and other differences among participants and stakeholders at all stages of the evaluation, from planning the evaluation to reporting its results.

The Guiding Principles represented a change from the standards developed in 1982 by the Evaluation Research Society, an earlier professional association, by including a greater focus on nonmethodological issues (Fitzpatrick, 1999). This is nowhere more evident than in Guiding Principle E concerning Responsibilities for the General and Public Welfare. This principle emphasizes the obligations of evalua- tors to include “relevant perspectives and interests of the full range of stakeholders,” to consider “not only the immediate operations and outcomes of whatever is being evaluated but also its broad assumptions, implications, and potential side effects,” to “maintain a balance between client needs and other needs” and to “go beyond analy- sis of particular stakeholder interests and consider the welfare of society as a whole” (American Evaluation Association, 1995, pp. 25–26). The inclusion of this principle has sparked dialogue about evaluators’ obligations to the public. Certainly, no eval- uator has a handle on exactly what the public good is, but Principle E reminds us that our obligation is broader than our particular obligation to the client. Practicing eval- uators must also consider the needs of society. Our role might be to stimulate dia- logue about those needs or to involve stakeholders in considering the implications of program actions. This principle also might prompt the evaluator to call attention to the need to collect data on unintended side effects of a policy or program either on

90 Part I • Introduction to Evaluation

the direct clients served or on others who may be indirectly affected by the program. Whatever action is taken, Principle E reminds evaluators to attend to the implications of the program for the community and society as a whole.

In fact, Principle E addresses a concern raised by Smith (1983) prior to the emergence of the Guiding Principles and the 1994 Standards. He criticized the writ- ing then on evaluation ethics for focusing solely on methodological issues. Smith wrote:

Much of the work in evaluation ethics (i.e., the moral behavior of an individual as a professional evaluator) which has been done to date has focused on evaluation moral issues such as confidentiality of data, protection of human subjects, proper profes- sional behavior, and so on. Little has been done on program moral issues, such as: Is this mental hospital placing the community at risk by its early release of patients? Is this nursing home meeting residents’ physical needs but at the cost of their human rights of privacy, freedom of movement, and individual expression? Is this educa- tional program for talented students enhancing cognitive skills but reinforcing their emotional dependency on special recognition and privileges? (1983, p. 11)

Principle E addresses Smith’s concerns by stating that the evaluator does have an obligation to consider the moral or ethical issues that arise as a result of the pro- gram itself.

Readers are encouraged to visit the American Evaluation Association web site (http://www.eval.org/Publications/GuidingPrinciples.asp) to download brochures of the Guiding Principles that can be used to acquaint clients and stakeholders with evaluators’ professional obligations and to make use of the additional training materials and readings provided there.

Protections to Human Subjects and the Role of Institutional Review Boards

Both the Standards and the Guiding Principles emphasize that to behave ethically, evaluators must protect the rights of people from whom they collect data. Institu- tional Review Boards (IRBs) are committees of five or more peer researchers who review the data collection plans, or protocols, for proposed research and monitor ongoing research to ensure that the rights of human subjects are protected.6 IRBs are governed by the Office of Human Research Protections (OHRP), which is part of the U.S. Department of Health and Human Services. Since 1991, federal regulations

6The words ”human subjects” have historically been used in research to refer to the people who are providing data for the study. That is how we use the words “human subjects” here. However, the word “subjects” implies a helplessness or passivity that many find inappropriate in today’s research and eval- uation endeavors. As do others, we will typically use the word “participants” when referring to the people who provide data for evaluations through completing surveys, participating in focus groups or interviews, permitting observations, etc. We use the words “human subjects” to avoid confusion when citing or discussing work by others who use these words. Thus, for example, IRBs are often called Human Subjects Review Boards.

http://www.eval.org/Publications/GuidingPrinciples.asp

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 91

require that organizations that receive federal funds for research have IRBs to re- view all research conducted by the organization. The complete set of regulations is available at http://ohrp.osophs.dhhs.gov. The OHRP can suspend federally funded research if institutions are viewed as noncompliant with regulations. Although such suspensions are “extremely rare and highly controversial,” the threat prompted institutions to review and tighten many IRB procedures (Oakes, 2002, p. 450).

The guidelines for protecting human subjects emerged from the Belmont Report (1979) where the focus was on biomedical research, but the Belmont Report itself was prompted by congressional and public outrage at the infamous, 40-year Tuskegee Syphilis Study.7 Other serious violations in the ethics of social science research have also occurred (Humphreys, 1975; Kahn & Mastroianni, 2001; Milgram, 1974). Insti- tutional Review Boards and their regulations were prompted and guided by the recommendations of the Belmont Report concerning protection of human subjects.

For evaluation studies, the common concern of IRBs is determining whether participants have, in fact, given informed consent to participate in a study. Studies using “vulnerable populations,” typically children, pregnant women or women who may be pregnant, prisoners, and people with limited capacity are given special attention by IRBs and the regulations, at least partly because they may not be able to give full, informed consent. Many evaluation studies, however, may be exempt from IRB review according to the regulations. Specifically, research in educational settings that is intended to study traditional educational practices is exempted from IRB review, as well as data collected through “educational tests,” defined to include surveys, interviews, and observations of public behavior when individuals are not identifiable and data are confidential. However, the individual evaluator should not decide whether his or her study is exempt. Instead, an IRB board should determine if an exempt status is appropriate. This can often be done relatively easily through contact with the IRB or through an expedited review process. In fact, many evalu- ations are reviewed through an expedited review process that involves the research protocols for the study being reviewed by one IRB member.

In recent years, however, IRBs have drawn some criticism for their stringent review of social science research, with critics arguing that some IRB requirements have jeopardized legitimate research. We acknowledge, from personal experience,

7The Tuskegee Syphilis Study or the Tuskegee Experiment was begun in 1932 and continued until it was halted in 1972. The study initially recruited 399 poor African American men who were share- croppers and had syphilis with the purpose of describing the natural progression of the disease. In the 1940s, penicillin was found to be a cure for syphilis and became a common treatment. This informa- tion was withheld from the men in the study and they were left untreated as the study continued for 40 years. The study was halted in 1972 through the efforts of Peter Buxton, a Public Health Service venereal disease investigator. Although his efforts to stop the study began in the late 1960s, he was un- able to stop the study through official channels. He went to the press in the 1970s and congressional hearings were held. Many men died during the course of the study. Forty of their wives were infected and 19 of their children were born with congenital syphilis. In 1997, President Clinton made a formal apology on behalf of the U.S. government, which had funded and conducted the study through the Public Health Service. The study prompted the government to create a commission to write regulations for research, which resulted in the Belmont Report.

http://ohrp.osophs.dhhs.gov

92 Part I • Introduction to Evaluation

that individual IRBs do not always provide reasonable feedback and can overstep their boundaries because of lack of knowledge of research and informed consent. Qualitative data collection, where flexibility and adaptation in data collection may be required, can pose particular problems. IRBs may request standardized inter- view questions for review when the evaluator needs the flexibility to adapt ques- tions to the purposes of the evaluation and the previous statements of the person being interviewed. The National Science Foundation has taken a leadership role in trying to clarify guidelines for qualitative research. Their web site containing Frequently Asked Questions (FAQs) about data collection and ethical reviews is particularly informative for readers with these concerns. (See http://www.nsf .gov/bfa/dias/policy/hsfaqs.jsp#exempt.) Oakes’s “Evaluator’s Guide to the IRB” provides more details on the history of IRBs and their requirements (Oakes, 2002).

Our concern in this chapter is making sure that data are collected in a way that protects the rights of participants in an evaluation study. It is not only impor- tant for evaluators to learn the policies of the IRBs that oversee their work and the federal regulations that govern them, but also to consider voluntarily seeking IRB review. We agree with many who study the ethics surrounding human data collection that it is useful for researchers and evaluators to seek the opinions of others about their data collection. Often, the researcher or evaluator is too close to his or her own study to see something that might be a threat. IRBs can provide use- ful input from other researchers who are informed on ethical issues concerning data collection from humans.

Confidentiality and Informed Consent. Confidentiality and informed consent are issues that any evaluator collecting data should be aware of and consider when collecting data. Often, confidentiality and anonymity are confused. Anonymity means that no one knows the identity of the person who provided the data. Confidentiality means that the researcher, evaluator, or person devel- oping the data base may have a code that, in other documents, can be linked to a name, but that the identity of people providing the data will not be revealed to others. Obviously, interviews or observations are not anonymous. The person conducting the interview or observation is aware of the identity of the individ- ual. Similarly, when codes are used on surveys to track who has responded and to prompt those who have not responded to do so, someone is able to make a link among those codes, the responses to a survey, and an individual name. However, the data analysis will not make use of the individual identifiers and, hence, the data are confidential. Further, specific procedures for separating the names and codes from the data and for maintaining the security of the list of names and codes must be established. Any data collection activity should correctly inform in- dividuals as to whether the data they provide should be considered anonymous or confidential.

Informed consent is a central mechanism for protecting the rights of human subjects. As Oakes has written in his Evaluator’s Guide to the IRB, “Informed con- sent is one of the primary ethical requirements underpinning research with human subjects” (2002, p. 463). Informed consent emerged as a central ethical

http://www.nsf.gov/bfa/dias/policy/hsfaqs.jsp#exempt

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 93

principle after the Nuremberg trials of Nazi scientists who had conducted research on prisoners in concentration camps. The Nuremberg Code, developed after the trials, established the principle that researchers should not collect data from someone without first obtaining their consent and that such consent should be both fully voluntary and informed. “Informed” means that participants should be told of the purpose of the research, its potential risks and benefits to them, the confidentiality of the information and other relevant issues concerning how it will be handled and protected, and what participating in the study will mean for them, that is, the data that will be collected. The voluntary nature of their participation should also be made clear. For example, evaluations are typically conducted in the context of a program. Participants need to know that they can continue receiving the services of the program even if they choose not to partic- ipate in the research. If people receiving services believe they must participate in the research to continue in the program, their participation in the research is not truly voluntary. Informed consent is typically obtained through an informed con- sent form that describes the study, its purpose, the potential risks and benefits, the voluntary nature of participation, how data will be handled, and other rele- vant concerns. But it is important that the language of such forms be clear and understandable to the intended audience. A member of the evaluation team trained in the ethical concerns of informed consent should be present to answer questions a participant may have. (See Fitzpatrick [2005] on informed consent.) IRBs typically pay considerable attention to the issue of informed consent and may have sample consent forms for new evaluators to use as guides for consent forms in their own research.

Cultural Competence and Sensitivity. Finally, ethical data collection involves sensitivity to the cultural norms and beliefs of the individuals and groups from whom one is collecting data. Information that might be considered quite appropri- ate to collect in one group might be considered quite private or misunderstood by another. Such sensitivity is part of attaining cultural competence, as illustrated in the Guiding Principle B.2 and also in D.6. Consider, for example, an evaluation of a school program for children of recently arrived immigrants. Some of those immi- grants may be in the country illegally; however, in most cases, the immigration status of children’s parents would be irrelevant to the evaluation. The evaluator should avoid pressure to collect such data and, more importantly, should consider the wording of individual questions. If the questions appear to be gaining informa- tion that pertains to entry to the country, it may threaten the validity of responses to other items and may not show respect for the privacy of those participating in the evaluation. Evaluators should recognize that evaluation and confidentiality may be suspicious or alien concepts to those completing the survey. Providing information on themselves and their families may be a threatening or frightening experience. Needless to say, surveys should be translated into a language and words that parents can read and understand. Interviews should be conducted by individuals who are not only fluent in the language, but have good knowledge of the culture and norms of the particular immigrant group.

94 Part I • Introduction to Evaluation

In closing, let us emphasize that evaluators have an ethical obligation to con- sider the rights of those from whom they are collecting data and to make sure that those rights are protected. Seeking input from others—whether it be from an IRB committee or other informed researchers, members of an advisory group, clients, or representatives of the individuals from whom data are collected—should be a central part of that process. Individuals providing data should be informed of the purposes of the research and any risks that may be incurred by participating. Further, the evaluator should collect only the data necessary and essential for the evaluation. An evaluation does not give one license to collect irrelevant data or to unnecessarily intrude on the privacy of individuals.

Learning and Practicing Ethical Behavior. In this section, we have attempted to acquaint the reader with the standards and codes that have been developed to guide evaluators. But applying these standards or codes to individual evaluations is a much different issue. As the Joint Committee emphasizes, not all standards are equally important in every evaluation. Choices must be made. Similarly, while the Guiding Principles are intended to “proactively guide the behaviors of profes- sionals in everyday practice” (American Evaluation Association, 2004, Preface C), deciding how to apply them in a specific evaluation, particularly when conflicts emerge, requires careful consideration and tough choices. Morris and Cooksy have helped make us aware of the complexity of these choices through an ongo- ing column on ethical dilemmas in the American Journal of Evaluation. The column, begun by Morris in 1998 and assumed by Cooksy in 2004, presents an ethical problem and calls upon two different, experienced evaluators to describe how they would respond to the issue. Distinct differences emerge. See, for example, the disagreements between Cooksy and Knott on an ethical problem concerning sexual harassment by a manager that was reported during confidential interviews (Morris, Cooksy, & Knott, 2000). These differences help educate and sensitize evaluators to recognize and analyze ethical problems they encounter and consider the choices they make. We encourage readers to read some of these ethical dilem- mas and the responses to consider how they would respond and to refine their skills in ethical reasoning for evaluation practice.

In the next section, we will give more attention to an ethical problem or concern that occurs in every evaluation—bias and its many sources.

Reflecting on Sources of Bias and Conflicts of Interest

Several of the Program Evaluation Standards (U.1, U.4, P.6, A.8) and Guiding Principles (C.3 and C.4) are concerned with the importance of evaluations being honest and impartial, avoiding conflicts of interest and conducting evaluations with integrity. Yet, as research has shown, many evaluators do not believe they have encountered ethical problems and think that because they are following accepted social science methods and the model of the “objective scientist” they are, of course, behaving ethically (Honea, 1992; Morris & Cohn, 1993). In this section, we would like to dis- cuss potential biases and conflicts of interest that evaluators must consider carefully.

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 95

First, we should acknowledge that the possibility of human beings rendering completely unbiased judgments is very slight. In fact, it is ironic that some evalu- ators actually could be more susceptible to bias, simply because they believe that by using social science methodologies to draw conclusions, they are objective and unbiased. But, Carol Weiss, one of the founders of evaluation notes, “You never start from scratch. We pick up the ideas that are congenial to our own perspective. Therefore, people pick up this thought or that interpretation of a research report that fits with what they know or what they want to do” (2006, p. 480).

Evaluators and those involved in evaluation should carefully reflect on their biases. By becoming aware of those biases, one can consider and perhaps counteract some of their influence on evaluation. The ethical evaluator recog- nizes that evaluation practice consists of making choices—choices about evalu- ation purposes and questions, about which stakeholders to involve and which designs and data collection strategies to use, about ways to analyze data and to interpret the results. Note, for example, that the raters at Moody’s or Fitch’s who were researching and assigning ratings to bonds were making choices, too— about what information was important and what was not, about what types of business and investment strategies were fiscally sound. Their ratings involved much more than simply adding together some numbers. Swayed by the tenor of the times, changing business practices, the interests of their rating company and those who paid them, their ratings were influenced in undesirable ways. Choices are, by nature, subjective. Evaluators increasingly realize that bias—inadvertent or conscious—can intrude subtly into nearly every choice they make, from selecting an evaluation approach to writing a report. To avoid the faulty findings of bond analysts, evaluators must think more carefully about the potential sources of bias and conflicts of interest that can occur in each evaluation they are conducting.

It is worth noting that when asked to describe ethical problems they have encountered, evaluators tend to describe problems presented by stakeholders (Morris & Cohn, 1993). As Morris and Cohn themselves note, it may be more difficult for evaluators to recognize or report ethical problems that were of their own doing. The only ethical problem they found that appeared to originate with the evaluator was the concern about their ability to be objective or fair in pre- senting findings. Recognition of this particular problem, though, is a major first step. It suggests that, even when encountering many ethical problems presented by client or stakeholder pressure, some evaluators remain conscious of how their own biases can interfere with the accurate presentation of results. (Of course, stakeholder pressure and concern with being objective and fair in presenting findings can overlap. When faced with strong pressure from a client, it may be difficult not to become biased against that client and overreact and become less fair and objective in the opposite direction. This might result in reporting or emphasizing problems either in retaliation or to show that you are objective, rather than maintaining a balanced view. It can be difficult to see things from the perspective of someone who has behaved inappropriately toward you, yet that is what the evaluator must do in order to consider all sides.)

96 Part I • Introduction to Evaluation

Guidance from Ethical Codes and Standards. The Joint Committee Standards and the AEA Guiding Principles can often serve as a good first step to raising awareness and considering potential problem areas. So, let us review a few of the Standards and Principles that are relevant to the issues of bias and con- flict of interest. Guiding Principle C concerns integrity and honesty. Principles C.2 and C.4 directly address expectations concerning values, interests, and relationships:

C.2 Before accepting an evaluation assignment, evaluators should disclose any roles or relationships they have that might pose a conflict of interest (or apparent conflict of interest) with their role as an evaluator. If they proceed with the evaluation, the conflict(s) should be clearly articulated in reports of the evaluation results. . . .

C.4 Evaluators should be explicit about their own, their clients’ and other stakeholders’ interests and values concerning the conduct and outcomes of an evaluation (American Evaluation Association, 2004, Section C Integrity/Honesty).

Principle C.7 addresses expectations concerning financial disclosures:

C.7 Evaluators should disclose all sources of financial support for an evaluation, and the source of the request for the evaluation (American Evaluation Association, 2004, Section C, Integrity/Honesty).

Several standards also reveal critical expectations for the evaluation itself in regard to the nature of reporting information, the credibility of the evaluator, and the identification of the values involved in interpreting the findings of the eval- uation and making final judgments. One such standard is the Propriety Standard P.6, on Conflicts of Interest described above. The Joint Committee defines conflict of interest in this way: “Conflict of interest exists in an evaluation when the per- sonal or financial interests of an evaluator might either influence an evaluation or be affected by the evaluation” (Joint Committee, 1994, p. 115). They note that such conflicts can be caused by “close friendships and personal working relationships” that are more common in internal evaluations and by external evaluators’ desire to gain future contracts (Joint Committee, 1994, p. 116). We will discuss interper- sonal and financial conflicts of interest later, but, first, we will focus on the stan- dards that provide direction to evaluators.

In describing elements necessary for an evaluation to attain the Accuracy Standard, the 1994 edition of the Standards specified Impartial Reporting as an important standard:

• A.11 Impartial Reporting. Reporting procedures should guard against distortion caused by personal feelings and biases of any party to the evaluation, so that evaluation reports fairly reflect the evaluation findings (Joint Committee, 1994, p. 181).

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 97

The 2010 revision of the Standards continues this emphasis:

• A.8 Communication and Reporting. Evaluation communications should have adequate scope and guard against misconceptions, biases, distortions, and errors (Joint Committee, 2010).

Interestingly, the other two standards that address bias fall under the Utility Standard, reflecting how important transparency and credibility are to the ulti- mate use of the evaluation study and its results:

• U.1 Evaluator Credibility. Evaluations should be conducted by qualified people who establish and maintain credibility in the evaluation context.

• U.4 Explicit Values. Evaluations should clarify and specify the individual and cultural values underpinning purposes, processes, and judgments (Joint Committee, 2010).

Bias Introduced by Values, Views, and Culture. These principles and standards are helpful in reminding the evaluator and the client and other stakeholders of profes- sional expectations. Although the standards and principles are worded in terms of what is expected, they indirectly attest to the harm that unacknowledged values and relationships—between people or organizations—can do to the credibility and integrity of an evaluation study. Therefore, evaluators should seek to consider not only the values of stakeholders and clients, but also their personal values that in- fluence both the conduct and the outcomes of the evaluation. What are the evalu- ator’s views, values, and experiences concerning the program or others like it, its clients, the organization in which it resides, and its mission? Suppose you are called upon to evaluate the success of a school serving low-income immigrant students at achieving state academic standards. The school has failed to meet the acknowl- edged high standards for the past 3 years and is now under review. It could be closed in the following year and students could be moved to other schools. What are your views on educational standards? On high stakes testing? On the efforts of this school, its teachers, and its administrators? On the children it serves? How will your values and views affect how you conduct the evaluation? The stakeholders you include in the evaluation? The way in which you interpret the results? Your ultimate conclusions? Will you be able to report results impartially? Standards- based education and its policies to raise achievement are controversial issues in the United States. Almost everyone concerned with education has a point of view and experience with standards. It would be almost impossible to avoid having these views and experiences affect at least some of the ways in which you conduct the study and reach your conclusions. What steps would you take to attempt to reduce the impact of your views and experiences so that the evaluation is not biased and is seen as credible? Or, do you view it as fortuitous that you have been asked to conduct this evaluation, perhaps because past state or local studies on this issue have been conducted primarily by people from “the other side” (whatever that side may be), people unaware of ethical codes in evaluation who have allowed their

98 Part I • Introduction to Evaluation

views or experience to influence the work? Revealing your views might jeopardize your opportunity to conduct the study and, thus, present different points of view. What should you do?

Another example of the difficult issues one can confront when considering the bias one’s own values introduce might be helpful. You have been asked to conduct an evaluation of support groups for children who have had a parent die. You think you might be able to do an especially good job at evaluating such programs because you have personal experience with this issue. Your spouse died unex- pectedly when your children were relatively young, and they participated in grief groups for children and teenagers. You have also read quite a bit on the issue and know what helped your children. Will your views and personal experience with grief groups for children enhance your ability to conduct the evaluation or detract from it? Are you obligated to tell the client about your personal experience and views? How much of your personal experience or your children’s personal expe- rience are you obligated to reveal?

Cultural competence, or cultural incompetence, is another personal factor that influences the validity and ethicality of an evaluation. Kirkhart has discussed our own difficulty in seeing our “cultural boundedness”; yet, a good evaluation should describe “multiple cultural perspectives accurately, soundly, and appropriately” (1995, p. 3). As noted, the 2004 revision of the Guiding Principles spoke to the importance of cultural competence. We will discuss cultural competence more fully in Chapter 9, but it is essential to address the issue here as well. Cultural competence has emerged as a concern in evaluation because of the recognition of the role of one’s own values and experiences on the conduct of an evaluation. Hood notes that “[t]he evaluation community is replete with those who have limited understanding of the values that are grounded in the racial and cultural backgrounds of groups other than their own” (2000, p. 78). Many of the people served by public or nonprofit programs are people in need. They are likely to differ in many ways from the evaluator: obviously in income; possibly in race or ethnicity; perhaps in their goals and the values, beliefs, and expectations they have in regard to the program; and quite probably in how others treat and view them.

Strategies for Reducing Bias. What can an evaluator do to minimize the bias that personal views and experience bring to an evaluation? One strategy recom- mended by qualitative researchers (Lincoln and Guba, 1985, Miles & Huberman, 1994; Schwandt & Halpern, 1989) is to maintain an “audit trail,” which Schwandt defines as “a systematically maintained documentation system” (2001b, p. 9) to record all the details of the process of conducting the study. The audit trail would include the evaluator’s notes on evolving perceptions, day-to-day procedures, methodological decisions, day-to-day personal introspections, developing in- sights and hypotheses to help the evaluator explore how the evaluation design is emerging and the values and experiences that may influence the evaluator in that evolution. (See Cooksy [2000] for an excellent example of using such memos to aid in reflecting on an ethical problem encountered in data collection.) The eval- uator may choose to use the notes for self-reflection and consideration of how

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 99

values and experiences may be introducing bias. Alternatively, the evaluator may decide to share portions of the notes with an external party. This person, generally another evaluator, can review the audit trail to explore the appro- priateness of the evaluation decisions and the ways in which bias may have been introduced.

Another strategy for minimizing bias is through the process of metaevaluation, or the evaluation of an evaluation in which an outside person reviews an evaluation for its quality. This topic is addressed in detail in Chapter 16. Whatever methods are used, it is important for evaluators (and clients) to examine their personal values and beliefs and consider how these factors can influence their approach to each evalua- tion and to their eventual conclusions and judgments. Becoming aware represents the first step in preventing bias.

Interpersonal Relationships and Bias. It is apparent to even the casual observer that individuals’ feelings toward one another can color their judgments, not only about each other but about practically anything with which the other person is perceived to be associated. Hence, we have legal restrictions on testimony about one’s spouse and anti-nepotism policies that prohibit individuals from being in positions where they would need to make decisions about the salary, promotion, or job security of a family member. Similarly, evaluators should avoid evaluating programs that a close friend or family member is concerned with, whether as a policymaker, a manager, or a person delivering the program. The apparent conflict of interest would be too strong even if the evaluator were able to overcome the bias the interpersonal relationship introduced.

Internal evaluators, except in the largest organizations, are almost inevitably evaluating programs that are staffed by someone they know. Therefore, internal evaluators need to think carefully about how to define their role in such settings. Even if the purpose of the evaluation is formative or for organizational learning, the evaluator needs to be prepared to give negative feedback. To achieve change, that feedback may be given in a way that is clear but palatable. Nevertheless, evaluators should be alert to examining how their relationships with those who operate or manage the program can influence the choices and decisions made. Such relation- ships can affect many elements of the evaluation, from the questions the evaluation addresses to the ways in which results are interpreted and presented. As an evalua- tor, you are hired or assigned to provide an independent, impartial judgment, and concerns about personal relationships should not interfere with the evaluator’s responsibility.

As we discussed earlier in this chapter, however, evaluators have a respon- sibility to develop some type of relationship with the client and stakeholders con- cerned with the evaluation. They must be able to communicate with them effectively so they can understand their needs and provide information in a way that meets those needs. Evaluators who are entirely new to the setting of the evaluation should spend time observing the program, meeting with clients and stakeholders, and developing relationships. These relationships are intended to help the evaluation to succeed—to reduce mistrust, to improve understanding, and so

100 Part I • Introduction to Evaluation

forth—but these relationships also introduce bias. Evaluators are likely to feel more comfortable with people whose values and beliefs are like their own, who support the evaluation and who are open to its methods and interested in its re- sults. At the same time, evaluators learn that some people concerned with the evaluation are more difficult. They are suspicious, accusatory, demanding, inflexible, or behave in any number of ways that are frustrating to the evaluator. These relationships, good and bad, influence the evaluator’s behavior. Is the eval- uator prepared to give tough, negative results—ones the evaluator knows they won’t like—to people with whom he has established rapport? To those who are helpful to the continuation of the study? These are tough issues, but evaluators must be prepared to deal with them, to prepare audiences for difficult results, and to prepare themselves for delivering them.

We find it is useful to clarify and demonstrate one’s role at the beginning of the evaluation, during the planning phase. Evaluators do not need to be the tough guy but they do need to be willing to ask tough questions and provide difficult feedback. In a later chapter, we will discuss using logic models during the early stages of the evaluation, as a method to help the evaluator understand the program. This is often a useful time to ask probing or tough questions such as, “Now, why is it that you think that this activity will lead to X change? Some of the research I have read doesn’t support that,” or “Which of your objectives do you think you are probably not achieving?” You may choose other ques- tions, but our point is that at the beginning of the study—not waiting until the end—you should start to define your role as someone who is interested in them and their program, but is also curious, objective, and questioning. This persona or manner then becomes part of your interpersonal relationship with others in the program.

Financial Relationships and Bias. Unfortunately, financial considerations are a source of bias in evaluation just as they were in the rating of bonds by Moody’s discussed earlier in this chapter. We doubt there are many instances of evaluators being bribed to sway an evaluation one way or another, but financial pressures are rarely so obvious and direct. To illustrate how thorny this situation can be, let us describe an actual case.

An evaluator of our acquaintance—we’ll call her Diane—was employed by a U.S. government-supported research center whose mission was to develop and test exemplary programs and practices for schools. Assigned to direct the center’s evaluation unit, in due time Diane completed an evaluation of a center program designed to improve secondary school students’ mathematics performance and attitudes toward math (AMP). The AMP program was expensive. Congress had invested more than $1 million in its development and, although Diane found that students liked the program, there wasn’t a shred of evidence to suggest that it had any impact on their performance. Troubled by the implications of reporting such information to the funding agency through which Congress had initiated the program, Diane finally worded her draft report to convey that the evaluation was

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 101

to blame for AMP’s failure to produce evidence of success. The summary of her re- port read as follows (italics ours).8

Summary. The results of this study indicate that the Accelerated Mathematics Program (AMP) was somewhat effective in developing positive attitudes toward mathematics, in the sense that students tended to like the AMP materials. The study supplied no evidence, however, from which either long- or short-term student performance changes in mathematics ability can be inferred. The results do not necessarily indicate that AMP was not effective in promoting change in math performance, but that a variety of shortcomings and limitations of the evaluation design did not allow for the identification and measurement of these changes.

And how did the funding agency respond to this obvious effort to soften the bad news? Their reaction to the draft report came in a letter, which is reprinted here.

Dear Diane:

Thank you for the three draft copies of the AMP Impact study. I look forward to the final report.

I hope that our future efforts will be structured so that statements such as those in the “summary” will not have to be made. Instead, I hope that we will be able to say something positive in the final report about changes in important performances. I have heard so many good things about AMP that I am disheartened by the lack of evidence that it has short-term performance effectiveness and that I cannot therefore argue for its potential for long-term effectiveness.

The issue here is straightforward. The best argument for funding centers such as yours that I can make internally here in the Department and externally with the Congress is that our products lead to measurable changes for good in American schools. Re- gardless of the positive “feelings” I get about AMP, it appears we cannot justify all the effort in terms of performance criteria, as per your draft report. That is a drawback, but one which I think we can overcome in future efforts, hopefully in your final report.

Sincerely,

Lawrence T. Donaldson Chief Administrator

The message is blatantly clear. Diane better find something positive to prove AMP and its cohort programs are worth the investment, or funding could be with- drawn, the program would fold, and Diane herself would be looking for other employment. It would take a robust soul indeed not to feel some ethical strain in such a situation, especially when her salary comes directly from the threatened program!

8The names, organizations, and titles in this summary and the following letter have been changed to provide anonymity, but the essential content has not been altered and is reproduced here verbatim.

102 Part I • Introduction to Evaluation

Fortunately, though Diane equivocated at first, this story eventually had a happy ending. The final report told the true story, and Diane was able to assume the role of evaluator (with a clear conscience) on the development staff for another program at the same center.

Even when the evaluator is external to the agency whose programs or prod- ucts are being evaluated, financial dependence can be a potential source of bias. Consider, for example, the delicate balance that must be maintained by external evaluation consultants or firms who are inevitably dependent on repeat business. Scriven (1993) points out this potential source of bias succinctly: “. . . one key economic insight about evaluation contracting is this: No one ever got rich from one evaluation contract” (p. 84). The possibility of future evaluation contracts or consulting depends on how well the client likes the most recent evaluation com- pleted by the evaluator. No problem here if the client has a penchant for the truth, even if it might reflect negatively on the program. But what if the client goes rigid at the first hint of criticism? Developing formal agreements as indicated in the Standards can provide some assurance, but evaluators should always think carefully about financial relationships and recognize that their long-term reputa- tion as an independent, impartial evaluator is critical to their sustainability.

Organizational Relationships and Bias. Organizational relationships may be of greater concern to evaluators than immediate financial gain. The relationship between evaluators and the programs they evaluate can determine not only their present financial welfare but their future employment as well. Further, an orga- nization may exert great (or total) control over the evaluator’s other perquisites: such things as office space; access to resources, facilities, and record keeping systems; even the convenience of available parking space. The way the organiza- tion exercises this control to make the evaluator’s life easier or more difficult can certainly cause problems with bias.

To make this point, we present in Table 3.1 eight possible organizational rela- tionships between evaluators and the program being evaluated. Generally, the greatest potential of bias exists in the first row of Table 3.1, and the least potential of bias exists in the last row. Thus, the potential for organizational pressure is greater when the evaluator is employed by the organization whose program is being eval- uated than when the evaluator is employed by an outside agency. In addition, bias is more likely when the internal evaluator reports to the director of the program being evaluated than when the evaluator reports to someone outside that program. Sonnichsen (1999), the director of the internal evaluation unit at the FBI, argues that internal evaluators must be placed independently, separated from programs, to be effective. Lovell (1995), in commenting on internal evaluation, notes that, in the long term, the organization expects internal evaluation to pay off, that is, to provide recommendations for improved organizational operations. Bias that produces overly positive reports on programs leads to evaluation not fulfilling its promise.

Mathison (1999) has served as an internal and external evaluator and has written often on the issue. She believes that internal and external evaluators face the same ethical challenges but are part of different communities and that these

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 103

communities influence their responses to ethical challenges. Internal evaluators, she asserts, operate within fewer communities than the external evaluator and their primary community is the organization in which they work. Consider, simply, the amount of time the typical internal evaluator spends in that organi- zation over weeks and years. External evaluators, in contrast, have many com- munities, which can include the organizations they evaluate, the organization that employs them, colleagues in evaluation and their professional association, funding agencies, and others. These communities, Mathison argues, influence evaluators’ ethical choices in many complex ways. For example, the internal evaluator’s closeness to the organization and relationships in it may enable her to behave more ethically when it comes to creating an ongoing evaluative culture in the organization or sustaining a dialogue about a controversial issue uncovered by an evaluation—a dialogue that may be required to bring about change. In con- trast, the external evaluator’s diversity of communities and greater distance from the community of the organization being evaluated makes it easier for her to raise questions concerning unethical issues in the program or in the organization. Mathison’s concept of the communities of reference for internal and external

TABLE 3.1 Organizational Relationships of Evaluator to Client

Evaluator Employed To Do One Evaluation or Successive Evaluations Evaluator Reports

1. Within organization which has responsibility for the program being evaluated

1. Successive evaluations 1. Directly to director of program evaluated

2. Within organization which has responsibility for the program being evaluated

2. One evaluation 2. Directly to director of program being evaluated

3. Within organization which has responsibility for the program being evaluated

3. Successive evaluations 3. To someone outside the program being evaluated but within the same organization

4. Within organization which has responsibility for the program being evaluated

4. One evaluation 4. To someone outside the program being evaluated but within the same organization

5. By outside agency 5. Successive evaluations 5. As consultant or contractor to director of program being evaluated

6. By outside agency 6. One evaluation 6. As consultant or contractor to director of program being evaluated

7. By outside agency 7. Successive evaluations 7. Directly to outside funding agency which supports the program

8. By outside agency 8. One evaluation 8. Directly to outside funding agency which supports the program

104 Part I • Introduction to Evaluation

evaluators is useful in considering the types of problems each can deal with more effectively, and it helps us to recognize the complex influence of personal, interpersonal, financial, and organizational factors on the ethical behavior of evaluators.

A final important consideration when considering the influence of organiza- tional and financial relationships on bias is whether the evaluation is primarily form- ative or summative. In considering the pros and cons of an evaluator’s financial and administrative dependence or independence from the client, such dependence may be not only tolerable in a formative evaluation, but even desirable. The internal eval- uator’s relationship with the organization may prompt him to be more responsive to particular information needs of the program and the organization because of his greater understanding and loyalty to the organization. Or, as Mathison notes, the in- ternal evaluator’s close relationship with the organization can prompt him to sustain dialogue on an issue long after the external evaluator has gone and to improve the nature of that dialogue because he knows the values and beliefs of those in the or- ganization. However, an internal evaluator may not be so effective for a summative evaluation, particularly if the evaluation concerns large, costly, or high-profile pro- grams. In this case, the internal evaluator’s relationships with the organization and its employees, especially if the internal evaluator is affiliated with the unit operating the program, are quite likely to introduce bias. An external, independent evaluator is generally to be preferred in summative evaluations of this type. As we have noted in the prior section, though, independence is defined by a variety of factors.

Ethics Beyond a Code of Ethics

The evaluation standards and guidelines described earlier are, in our judgment, singularly useful in improving the practice of evaluation. We urge anyone aspir- ing to do high-quality evaluation to become intimately familiar with those stan- dards and guidelines and to apply them diligently. At the same time, mere adherence to ethical standards, however sound, does not ensure ethical behavior. As Peter Dahler-Larsen has written in regard to the broader issue of codifying evaluation practices, these codes serve “at best, as aids to a competent judgment in evaluation, not substitutions for it” (2006, p. 154). Mabry (1999) reminds us that codes of ethics don’t remove the subjectivity that is inherent in evaluation and in every human endeavor. She argues that standards and guide- lines for ethical conduct cannot anticipate the wide range of particularities that are present in any evaluation. Thus, evaluators’ personal standards and judg- ment inevitably play a role in how they apply these codes of conduct to the eval- uations they carry out.

Perhaps Sieber still states it best:

A code of ethics specifically for program evaluators . . . would be a minimum stan- dard; it would only state what the profession expects of every evaluator in the way of honesty, competence, and decency in relation to those ethical problems that are clearly defined at present.

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 105

In contrast, being ethical is a broad, evolving personal process. . . . Ethical problems in program evaluation are problems having to do with unanticipated conflicts of obligation and interest and with unintended harmful side effects of evaluation. To be ethical is to evolve an ability to anticipate and circumvent such problems. It is an acquired ability. . . . As one undertakes new and different kinds of evaluation and as society changes, one’s ability to be ethical must grow to meet new challenges. Thus, being ethical in program evaluation is a process of growth in understanding, perception, and creative problem-solving ability that respects the interests of individuals and of society. (1980, p. 53)

Major Concepts and Theories

1. Good evaluation practice involves much more than methodological skills. Evalua- tors must have the skills to work well in a sometimes highly political environment, must be able to communicate well with clients and other stakeholder groups, and must know the ethical problems evaluators can encounter and the ethical expectations for good evaluation practice.

2. Evaluations are an inherently political activity because they are concerned with guid- ing or influencing public policies, because their results can have powerful implications for individuals and stakeholder groups competing for power, and because they concern human beings, organizations, and judgments about programs.

3. Evaluators need to have skills to work in a political environment both to increase the likelihood that the evaluation will be used and to prevent political actions that may bias the results. These skills include learning about the political environment and the po- sitions of the stakeholders in it, considering and including other stakeholders and the public in the evaluation as appropriate, and working to maintain the credibility of the evaluation.

4. Evaluators should foster good communication with stakeholders by listening to their concerns and learning about their experiences with evaluation, educating stake- holders on the different purposes evaluation can serve, meeting frequently with the client and other appropriate stakeholders, and involving them in the decisions made concern- ing the evaluation.

5. The Program Evaluation Standards and the American Evaluation Association’s Guiding Principles provide guidance for the conduct of good and ethical evaluations. Evaluators should be knowledgeable about the standards and principles for their coun- try and use them to inform clients and other stakeholders of the expectations for them as a professional.

6. Protecting the rights of those who provide data for the study is essential to good, ethical evaluation. Such rights include having a free choice to participate without threat of losing services, understanding the nature of the evaluation and the data collection and its potential risks and benefits, being informed about confidentiality and its limitations, and being treated with respect and dignity. Evaluators should seek the input or approval of Institutional Review Boards or others informed on the ethics of data collection to ensure that appropriate precautions are taken.

106 Part I • Introduction to Evaluation

7. Evaluations can be biased by the personal views and experiences of the evalua- tor; by his or her views and relationships with program staff, administrators, and clients; or by financial or organizational pressures. The evaluator should be conscious of these sources of bias and seek to avoid relationships that would unduly threaten the perceived neutrality of the evaluation findings. The evaluator should work to gain cultural competence in the setting of the evaluation and consider the cultural views of others.

8. Ethical practice requires evaluators not only to become familiar with the Standards and the Guiding Principles and to acquaint clients with these professional expectations, but also to carefully consider decisions throughout the evaluation in terms of potential ethical concerns. Professional codes can be one source for resolution of ethical problems, but con- tinued personal growth, reading, reflection, and discussion with others are essential.

Discussion Questions

1. What are the good elements of evaluation studies taking place in a political envi- ronment? The bad elements? How does politics enter into evaluations that you know about?

2. Which of the three positions described by Vestman and Conner for evaluators to take in a political environment do you feel are most appropriate? Why?

3. Why is there a need for explicit ethical standards in evaluation? What benefits accrue to the evaluator and client by adhering to these standards?

4. What types of ethical violations do you think would occur most commonly in or- ganizations with which you are familiar? How might these violations be prevented?

Application Exercises

For exercises 1 to 3, consider an evaluation in which you were a participant or the evaluator.

1. How did politics enter into this evaluation? Did the politics introduce bias or prob- lems? How did the evaluator attend to the political context?

2. How did the evaluator or the evaluation team communicate with you and other key stakeholders? On what issues did they seek your input? Do you think the relation- ships the evaluator established with you and other stakeholders in the evaluation led to bias, or did it improve the evaluation?

3. Consider this evaluation in reference to the Program Evaluation Standards and the AEA Guiding Principles. What were the ethical strengths and weaknesses of the evaluation?

4. Now consider a program that you are familiar with—perhaps one in your organiza- tion. If you had to evaluate that program, what biases would you bring? Do you think you would be an appropriate person to evaluate it? Who (person or organization) might be the best alternative? Why?

Chapter 3 • Political, Interpersonal, and Ethical Issues in Evaluation 107

Case Studies

To close this chapter on politics, interpersonal relationships, and ethics, we resume the practice begun in Chapter 1 of recommending interviews that describe an evaluation that illustrates the issues discussed in the chapter. The interviews we recommend for this chapter are in Evaluation in Action, Chapters 4 (Len Bickman) and 12 (Katrina Bledsoe).

In Chapter 4, Len Bickman, a past president of the American Evaluation Association, describes some of the difficult political circumstances he en- countered in a nationally recognized evaluation of mental health systems of care. The journal source is Fitzpatrick, J. L., & Bickman, L. (2002). Evaluation of the Ft. Bragg and Stark County systems of care for children and adolescents:

A dialogue with Len Bickman. American Journal of Evaluation, 23, 67–80.

In Chapter 12, Katrina Bledsoe de- scribes her evaluation of a program for par- ents and preschool children to encourage reading and preliteracy skills. She demon- strates her skills at developing strong inter- personal relationships with people in the program, working at achieving cultural com- petence and understanding different cultural views, and facing ethical challenges from the client on the final report. The journal source is Fitzpatrick, J. L., & Bledsoe, K. (2007). Eval- uation of the Fun with Books Program: A dialogue with Katrina Bledsoe. American Jour- nal of Evaluation, 28, 522–535.

How do expertise-oriented an consumer-oriented evaluation approaches differ?

U.S Elite Tutors

Writing Services

/

Contact Info