Tag Archives: Evaluations

Another piece of the evaluation puzzle: Why do experiments make people unhappy?

The more time you spend around grants, grant writing, nonprofits, public agencies, and funders, the more apparent it becomes that the “evaluation” section of most proposals is only barely separate in genre from mythology and folktales, yet most grant RFPs include requests for evaluations that are, if not outright bogus, then at least improbable—they’re not going to happen in the real world. We’ve written quite a bit on this subject, for two reasons: one is my own intellectual curiosity, but the second is for clients who worry that funders want a real-deal, full-on, intellectually and epistemologically rigorous evaluation (hint: they don’t).

That’s the wind-up to “Why Do Experiments Make People Uneasy?“, Alex Tabarrok’s post on a paper about how “Meyer et al. show in a series of 16 tests that unease with experiments is replicable and general.” Tabarrok calls the paper “important and sad,” and I agree, but the paper also reveals an important (and previously implicit) point about evaluation proposal sections for nonprofit and public agencies: funders don’t care about real evaluations because a real evaluation will probably make the applicant, the funder, and the general public uneasy. Not only do they make people uneasy, but most people don’t even understand how a real evaluation works in a human-services organization, how to collect data, what a randomized controlled trial is, and so on.

There’s an analogous situation in medicine; I’ve spent a lot of time around doctors who are friends, and I’d love to tell some specific stories,* but I’ll say that while everyone is nominally in favor of “evidence-based medicine” as an abstract idea, most of those who superficially favor it don’t really understand what it means, how to do it, or how to make major changes based on evidence. It’s often an empty buzzword, like “best practices” or “patient-centered care.”

In many nonprofit and public agencies, evaluations and effectiveness are the same: everyone putatively believes in them, but almost no one understands them or wants real evaluations conducted. Plus, beyond that epistemic problem, even if evaluations are effective in a given circumstance (they’re usually not), they don’t necessarily transfer. If you’re curious about why, Experimental Conversations: Perspectives on Randomized Trials in Development Economics is a good place to start—and this is the book least likely to be read, out of all the books I’ve ever recommended here. Normal people like reading 50 Shades of Grey and The Name of the Rose, not Experimental Conversations.

In the meantime, some funders have gotten word about RCTs. For example, the Department of Justice’s (DOJ) Bureau of Justice Assistance’s (BJA) Second Chance Act RFPs have bonus points in them for RCTs. I’ll be astounded if more than a handful of applicants even attempt a real RCT—for one thing, there’s not enough money available to conduct a rigorous RCT, which typically requires paying the control group to follow up for long-term tracking. Whoever put the RCT in this RFP probably wasn’t thinking about that real-world issue.

It’s easy to imagine a world in which donors and funders demand real, true, and rigorous evaluations. But they don’t. Donors mostly want to feel warm fuzzies and the status that comes from being fawned over—and I approve those things too, by the way, as they make the world go round. Government funders mostly want to make congress feel good, while cultivating an aura of sanctity and kindness. The number of funders who will make nonprofit funding contingent on true evaluations is small, and the number willing to pay for true evaluations is smaller still. And that’s why we get the system we get. The mistake some nonprofits make is thinking that the evaluation sections of proposals are for real. They’re not. They’re almost pure proposal world.

* The stories are juicy and also not flattering to some of the residency and department heads involved.

Yours is not the only organization that isn’t worried about long-term grant evaluations

Ten years ago, in “Studying Programs is Hard to Do: Why It’s Difficult to Write a Compelling Evaluation,” we explained why real program evaluations are hard and why the overwhelming majority of grant-funded programs don’t demand them; instead, they want cargo cult evaluations. Sometimes, real, true evaluations or follow-up data for programs like YouthBuild are actively punished:

As long as we’re talking about data, I can also surmise that the Dept. of Labor is implicitly encouraging applicants to massage data. For example, existing applicants have to report on the reports they’ve previously submitted to the DOL, and they get points for hitting various kinds of targets. In the “Placement in Education or Employment” target, “Applicants with placement rates of 89.51% or higher will receive 8 points for this subsection,” and for “Retention in Education or Employment,” Applicants with retention rates of 89.51% or higher will receive 8 points for this subsection.” Attaining these rates with a very difficult-to-reach population is, well, highly improbable.

That means a lot of previously funded applicants have also been. . . rather optimistic with their self-reported data.

To be blunt, no one working with the hard-to-serve YouthBuild population is going to get 90% of their graduates in training or employment. That’s just not possible. But DOL wants it to be possible, which means applicants need to find a way to make it seem possible / true.

So. That brings us to a much more serious topic, in the form of “The Engineer vs. the Border Patrol: One man’s quest to outlaw Customs and Border Protection’s internal, possibly unconstitutional immigration checkpoints,” which is a compelling, beautiful, and totally outrageous read. It is almost impossible to read that story and not come away fuming at the predations of the Border Patrol. Leaving that aspect aside, however, this stood out to me:

Regarding Operation Stonegarden, the DHS IG issued a report in late 2017 that was blunt in its assessment: “FEMA and CBP have not collected reliable program data or developed measures to demonstrate program performance resulting from the use of more than $531.5 million awarded under Stonegarden since FY 2008.”

Even in parts of government where outcomes really matter, it’s possible to have half a billion dollars disappear, and, basically, no one cares. If FEMA can lose all that money and not even attempt to measure whether the money is being spent semi-effectively, what does that communicate to average grant-funded organizations that get a couple of hundred thousand dollars per year?

We’re not telling you to lie in evaluation sections of your proposal. But we are reminding you, as we often do, about the difference between the real world and the proposal world. What you do with that information is up to you.

Studying Programs is Hard to Do: Why It’s Difficult to Write a Compelling Evaluation

Evaluation sections in proposals are both easy and hard to write, depending on your perspective, because of their estranged relationship with the real world. The problem boils down to this: it is fiendishly difficult and expensive to run evaluations that will genuinely demonstrate a program’s efficacy. Yet RFPs act as though the 5 – 20% most grant budgets usually reserved for evaluations should be sufficient to run a genuine evaluation process. Novice grant writers who understand statistics and the difficulties of teasing apart correlation and causation but also realize they need to tell a compelling story in order to have a chance at being funded are often stumped at this conundrum.

We’ve discussed the issue before. In Reading Difficult RFPs and Links for 3-23-08, we said:

* In a Giving Carnival post, we discussed why people give and firmly answered, “I don’t know.” Now the New York Times expends thousands of words in an entire issue devoted to giving and basically answers “we don’t know either.” An article on measuring outcomes is also worth reading, although the writer appeared not to have read our post on the inherent problems in evaluations.

That last link is to an entire post on one aspect of the problem. Now, The Chronicle of Higher Education reports (see a free link here) that the Department of Education has cancelled a study to track whether Upward Bound works.* A quote:

But the evaluation, which required grantees to recruit twice as many students to their program as normal and assign half of them to a control group, was unpopular from the start […] Critics, led by the Council for Opportunity in Education, a lobbying group for the federal TRIO programs for disadvantaged students, said it was unethical, even immoral, of the department to require programs to actively recruit students into programs and then deny them services.

“They are treating kids as widgets,” Arnold L. Mitchem, the council’s president, told The Chronicle last summer. “These are low-income, working-class children that have value, they’re not just numbers.”

He likened the study to the infamous Tuskegee syphilis experiments, in which the government withheld treatment from 399 black men in the late stages of syphilis so that scientists could study the ravages of the disease.

But Larry Oxendine, the former director of the TRIO programs who started the study, says he was simply trying to get the program focused on students it was created to serve. He conceived of the evaluation after a longitudinal study by Mathematica Policy Research Inc., a nonpartisan social-policy-research firm, found that most students who participated in Upward Bound were no more likely to attend college than students who did not. The only students who seemed to truly benefit from the program were those who had low expectations of attending college before they enrolled.

Notice, by the way, Mitchem’s ludicrous comparison of evaluating a program with the Tuskeegee experiment: one would divide a group into those who receive afterschool services that may or may not be effective with a control group that wouldn’t be able to receive services with equivalent funding levels anyway. The other cruelly denied basic medical care on the basis of race. The two examples are so different in magnitude and scope as to make him appear disingenuous.

Still, the point is that our friends at the Department of Education don’t have the guts or suction to make sure the program it’s spent billions of dollars on actually works. Yet RFPs constantly ask for information on how programs will be evaluated to ensure their effectiveness. The gold standard for doing this is to do exactly what the Department of Education wants: take a large group, randomly split it in two, give one services and one nothing, track both, and see if there’s a significance divergence between them. But doing so is incredibly expensive and difficult. These two factors lead to a distinction between what Isaac calls the “proposal world” and the “real world.”

In the proposal world, the grant writer states that data will be carefully tracked and maintained, participants followed long after the project ends, and continuous improvements made to ensure midcourse corrections in programs when necessary. You don’t necessarily need to say you’re going to have a control group, but you should be able to state the difference between process and outcome objectives, as Isaac writes about here. You should also say that you’re going to compare the group that receives services with the general population. If you’re going to provide the ever-popular afterschool program, you should say, for example, that you’ll compare the graduation rate of those who receive services with those who don’t, for example, as one of your outcome measures. This is a deceptive measure, however, because those who are cognizant enough to sign up for services probably also have other things going their way, which is sometimes known as the “opt-in problem:” those who are likely to present for services are likely to be those who need them the least. This, however, is the sort of problem you shouldn’t mention in your evaluation section because doing so will make you look bad, and the reviewers of applications aren’t likely to understand this issue anyway.

In the real world of grants implementation, evaluations, if they are done at all, usually bear little resemblance to the evaluation section of the proposal, leading to vague outcome analysis. Since agencies want to get funded again, it is rare that an evaluation study of grant-funded human services programs will say more less, “the money was wasted.” Rather, most real-world evaluations will say something like, “the program was a success, but we could sure use more money to maintain or expand it.” Hence, the reluctance of someone like Mr. Mitchem to see a rigorous evaluation of Upward Bound—better to keep funding the program with the assumption it probably doesn’t hurt kids and might actually help a few.

The funny thing about this evaluation hoopla is that even as one section of the government realizes the futility of its efforts to provide a real evaluation, another ramps up. The National Endowment for the Arts (NEA) is offering at least $250,000 for its Improving the Assessment of Student Learning in the Arts (warning: .pdf link) program. As subscribers learn, the program offers “[g]rants to collect and analyze information on current practices and trends in the assessment of K-12 student learning in the arts and to identify models that might be most effective in various learning environments.” Good luck: you’re going to run into the inherent problems of evaluations and the inherent problems of people like Mr. Mitchem. Between them, I doubt any effective evaluations will actually occur—which is the same thing that (doesn’t) happen in most grant programs.

* Upward Bound is one of several so-called “TRIO Programs” that seek to help low-income, minority and/or first generation students complete post-secondary education. It’s been around for about 30 years, and (shameless plug here) yes, Seliger + Associates has written a number of funded TRIO grants with stunningly complex evaluation sections.

Why Do People Give to Nonprofits and Charities? And Other Unanswerable Questions

This month’s Giving Carnival—discussed here previously—asks why people give and what motivates giving. I have no idea and suspect no one else does, either, but that’s not reason not to speculate. I assume that some combination of altruism, kindness, self-interest, pride, and noblesse oblige motives giving. Slate talks about the “immeasurable value of philanthropy” here:

But the core of [Lewis Hyde’s] insights are about the connections between donors and recipients and about how successful gifts continue to give, in, yes, a circle, from the direct recipients to others to whom they pass a gift along (in one form or another) and back to the donors. While a gift can have market value, its worth is often—and more importantly—psychological and social. Even when its impact isn’t immediate, it’s likely to be what Hyde calls “a companion to transformation.”

I’m not sure that has anything to do with anything. Later, however, the article ties into issues raised by posts about evaluations and their limitations:

As philanthropic organizations become more attentive to businesslike standards—how effective are nonprofits? What is a particular donation likely to accomplish?—they increasingly use the language of finance to describe their goals.

I can buy the dominant narrative in the press about philanthropy becoming more businesslike, but I doubt this tendency over the long run because of the incredible difficulty in measuring output without dollars that can be added up toward a bottom line. Still, this Slate article is actually worth reading even if it strays outside the context of the question discussed here because it raises the right issues, although I’m not sure the reporter fully grasps the issue of measurement. This issue ties back into what motivates giving because part of what motivates giving is probably effectiveness—meaning, is what I’m giving actually doing someone some good? Some recent books, like The Bottom Billion: Why the Poorest Countries are Failing and What Can Be Done About It and The White Man’s Burden: Why the West’s Efforts to Aid the Rest Have Done So Much Ill and So Little Good argue that much Western aid has been wasted because it’s ineffective, which might lessen the desire to give aid. Consequently, I do think “effectiveness,” or the belief that giving helps someone aside from the giver, who gets a warm and fuzzy glow, at least in part motives ate.

I say “lessen” and “effectiveness” above, but I also think the desire to help others is intrinsic, like the desire for self-improvement. But just as there is an element of randomness in who gets funded, there seems to be a stronger one in the question of why people give.

More on Charities

A previous post linked to a Wall Street Journal post on charities; now the paper released a full article (may not be accessible to non-subscribers) on the subject of how donors evaluate the usefulness of a program, arguing that donors are becoming more engaged in measurement. One thing missing: statistics showing this is actually part of a trend, rather than just a collection of anecdotes. The article is more descriptive of the practices around how to evaluate effectiveness and uses hedge words:

Wealthy people and foundations sometimes hire philanthropy consultants to help them gauge a charity’s effectiveness. But other donors who seek that kind of analysis usually have had to rely on guesswork or do it themselves, which makes it tough to figure out whether one approach to solving a problem is better than another.

“Sometimes” they hire consultants, other times they essentially use the hope and pray method. That’s not terribly different from how things have always been done. Most interesting, however, is a topic relevant to evaluations that we’ll comment on more later:

The problem is, it can be difficult — and expensive — to measure whether charitable programs are actually working, and most nonprofits aren’t willing to devote scarce resources to collecting such information.

Most federal programs have in effect chosen a tradeoff: they provide more money and almost no real auditing. This is because real auditing is expensive and generally not worthwhile unless a blogger or journalist takes a picture of an organization’s Executive Director in a shiny new Ferrari. To really figure out what an organization is doing with $500,000 or $1,000,000 would cost so much in compliance that it would come to represent an appreciable portion of the grant: thus, the hope and pray method becomes the de facto standard (more on that below).

The writers also are pressed for space or don’t fully grok nonprofit evaluations, because they write:

Philanthropy advisers suggest first asking nonprofits about their goals and strategies, and which indicators they use to monitor their own impact. Givers should see how the charity measures its results both in the short term — monthly or quarterly — and over a period of years.

Measuring results isn’t a bad idea if it can be done, but the reason such measurements often don’t occur is precisely because they’re hard. Even if they do occur, you’re asking the organization to set its own goal marker—which makes them easy to set at very, ahem, modest, levels. If you set them at higher levels, the measurement problems kick in.

If you’re going to decide whether an after school program for middle-schoolers is effective, you’ll have to get a cohort together, randomly divide them into those who receive services and those who don’t, and then follow them through much of their lives—in other words, you have to direct a longitudinal study, which is expensive and difficult. That way, you’ll know if the group who received services were more likely to graduate from high school, attend college, get jobs, and the like. But even if you divide the group in two, you can still have poisoned data because if you rely on those who present for services, you’re often getting the cream of the high-risk/low-resource crop. You have numerous other confounding factors like geography and culture and the like.

The research can be far more costly than the project, and as little as donors like not knowing whether their money is effective, they’re going to like it even less if you spend 50 — 80% of the project on evaluating it. This is why the situation donors say they want to change is likely to persist regardless of what is reported.

EDIT: We wrote another, longer post on evaluations here.