Tag Archives: Evaluation

Another piece of the evaluation puzzle: Why do experiments make people unhappy?

The more time you spend around grants, grant writing, nonprofits, public agencies, and funders, the more apparent it becomes that the “evaluation” section of most proposals is only barely separate in genre from mythology and folktales, yet most grant RFPs include requests for evaluations that are, if not outright bogus, then at least improbable—they’re not going to happen in the real world. We’ve written quite a bit on this subject, for two reasons: one is my own intellectual curiosity, but the second is for clients who worry that funders want a real-deal, full-on, intellectually and epistemologically rigorous evaluation (hint: they don’t).

That’s the wind-up to “Why Do Experiments Make People Uneasy?“, Alex Tabarrok’s post on a paper about how “Meyer et al. show in a series of 16 tests that unease with experiments is replicable and general.” Tabarrok calls the paper “important and sad,” and I agree, but the paper also reveals an important (and previously implicit) point about evaluation proposal sections for nonprofit and public agencies: funders don’t care about real evaluations because a real evaluation will probably make the applicant, the funder, and the general public uneasy. Not only do they make people uneasy, but most people don’t even understand how a real evaluation works in a human-services organization, how to collect data, what a randomized controlled trial is, and so on.

There’s an analogous situation in medicine; I’ve spent a lot of time around doctors who are friends, and I’d love to tell some specific stories,* but I’ll say that while everyone is nominally in favor of “evidence-based medicine” as an abstract idea, most of those who superficially favor it don’t really understand what it means, how to do it, or how to make major changes based on evidence. It’s often an empty buzzword, like “best practices” or “patient-centered care.”

In many nonprofit and public agencies, evaluations and effectiveness are the same: everyone putatively believes in them, but almost no one understands them or wants real evaluations conducted. Plus, beyond that epistemic problem, even if evaluations are effective in a given circumstance (they’re usually not), they don’t necessarily transfer. If you’re curious about why, Experimental Conversations: Perspectives on Randomized Trials in Development Economics is a good place to start—and this is the book least likely to be read, out of all the books I’ve ever recommended here. Normal people like reading 50 Shades of Grey and The Name of the Rose, not Experimental Conversations.

In the meantime, some funders have gotten word about RCTs. For example, the Department of Justice’s (DOJ) Bureau of Justice Assistance’s (BJA) Second Chance Act RFPs have bonus points in them for RCTs. I’ll be astounded if more than a handful of applicants even attempt a real RCT—for one thing, there’s not enough money available to conduct a rigorous RCT, which typically requires paying the control group to follow up for long-term tracking. Whoever put the RCT in this RFP probably wasn’t thinking about that real-world issue.

It’s easy to imagine a world in which donors and funders demand real, true, and rigorous evaluations. But they don’t. Donors mostly want to feel warm fuzzies and the status that comes from being fawned over—and I approve those things too, by the way, as they make the world go round. Government funders mostly want to make congress feel good, while cultivating an aura of sanctity and kindness. The number of funders who will make nonprofit funding contingent on true evaluations is small, and the number willing to pay for true evaluations is smaller still. And that’s why we get the system we get. The mistake some nonprofits make is thinking that the evaluation sections of proposals are for real. They’re not. They’re almost pure proposal world.

* The stories are juicy and also not flattering to some of the residency and department heads involved.

Yours is not the only organization that isn’t worried about long-term grant evaluations

Ten years ago, in “Studying Programs is Hard to Do: Why It’s Difficult to Write a Compelling Evaluation,” we explained why real program evaluations are hard and why the overwhelming majority of grant-funded programs don’t demand them; instead, they want cargo cult evaluations. Sometimes, real, true evaluations or follow-up data for programs like YouthBuild are actively punished:

As long as we’re talking about data, I can also surmise that the Dept. of Labor is implicitly encouraging applicants to massage data. For example, existing applicants have to report on the reports they’ve previously submitted to the DOL, and they get points for hitting various kinds of targets. In the “Placement in Education or Employment” target, “Applicants with placement rates of 89.51% or higher will receive 8 points for this subsection,” and for “Retention in Education or Employment,” Applicants with retention rates of 89.51% or higher will receive 8 points for this subsection.” Attaining these rates with a very difficult-to-reach population is, well, highly improbable.

That means a lot of previously funded applicants have also been. . . rather optimistic with their self-reported data.

To be blunt, no one working with the hard-to-serve YouthBuild population is going to get 90% of their graduates in training or employment. That’s just not possible. But DOL wants it to be possible, which means applicants need to find a way to make it seem possible / true.

So. That brings us to a much more serious topic, in the form of “The Engineer vs. the Border Patrol: One man’s quest to outlaw Customs and Border Protection’s internal, possibly unconstitutional immigration checkpoints,” which is a compelling, beautiful, and totally outrageous read. It is almost impossible to read that story and not come away fuming at the predations of the Border Patrol. Leaving that aspect aside, however, this stood out to me:

Regarding Operation Stonegarden, the DHS IG issued a report in late 2017 that was blunt in its assessment: “FEMA and CBP have not collected reliable program data or developed measures to demonstrate program performance resulting from the use of more than $531.5 million awarded under Stonegarden since FY 2008.”

Even in parts of government where outcomes really matter, it’s possible to have half a billion dollars disappear, and, basically, no one cares. If FEMA can lose all that money and not even attempt to measure whether the money is being spent semi-effectively, what does that communicate to average grant-funded organizations that get a couple of hundred thousand dollars per year?

We’re not telling you to lie in evaluation sections of your proposal. But we are reminding you, as we often do, about the difference between the real world and the proposal world. What you do with that information is up to you.

Data-Based Client Tracking Services and Outcomes is a Real Challenge for Many Nonprofits

Jake recently wrote a post on the huge challenges faced by primary care provider organizations in meeting EMR Meaningful Use regulations. This got me thinking about other data collection challenges facing nonprofits. Apart from computers and the Internet,* one of few aspects of grant writing that has changed since I started writing proposals when dinosaurs walked the earth is an ever-increasing RFP/funder emphasis on data tracking to demonstrate services delivered and improved “outcomes.”

The scare quotes around “outcomes” expresses how we feel about many of them. While we’re adept at creating plausible data collection strategies in proposals, regardless of what our clients are actually doing in the real world, we know that demonstrating service delivery levels and outcomes is a major issue for certain types of human services providers. These include many faith-based organizations (FBOs)** and ethnic-specific providers, some of which have been operating since the days of Hull House. We’ve worked for several nonprofits that have been providing services for well over 100 years.

It’s not unusual for smaller FBOs and organizations serving immigrant/refugee populations to provide services in what seems, from the outside, to be a chaotic manner. But the service delivery practices are actually well-suited to their mission. A range of services might be provided to a particular individual, like help with an immigration problem, but the agency will end up helping the person’s extended family members with all manner of issues. In many ethnic communities, the concept of “family” is malleable. A nominal “uncle” or “cousin” is actually not related but hails from the same village or clan in their country of origin.

Such services are usually provided on the fly and the harried case worker, who is typically a co-religionist or from the same ethnicity, hops from client problem to problem without time or interest in database entry. Like pulling a thread on sweater, helping one person in a 30-member extended family can result in dozens of “cases” that may not be separated and documented. The family often does not want the problem documented because of cultural/religious taboos and (often justified) fear of government officials. Thus, much service delivery is provided on the down-low.

Everyone knows that New York City has dramatically changed from the bad old Death Wish days of the 1970s to a glittering metropolis of 70-story apartment buildings for the one-percenters and a well-scrubbed, tourist-focussed Times Square. What isn’t generally known is that an amazing 37% of NYC’s population is foreign-born. This percentage is increasing. NYC has more foreign-born residents than the entire City of Chicago has residents! Rapidly growing NYY immigrant groups include Orthodox Jews from the former Soviet Union, Dominicans, Asians, Central Americans, and so on. We work for many nonprofits that serve these immigrant populations; this client type usually only serves their brethren. These nonprofits have great difficulty documenting the often extraordinary services they provide—one of the main reasons they hire us is because of our ability to weave their stories into the complicated responses required by RFPs, including service and outcome metrics. Like the proverbial centipede, these nonprofits walk perfectly, as long as no one asks them how they do it.

The data capture challenge is compounded because few prospective social workers enter grad school with the idea of becoming bean counters. Like the best doctors and teachers/professors, social workers start off with the idealistic notion that they will spend most of their time helping people, not doing data entry and accounting for every minute of their day. When not extruding proposals or writing novels, Jake is a college English professor. He can attest that much of his best teaching doesn’t show up in metrics.

Many of us have had a “hero teacher” at one point and a conversation or a book recommendation might have changed your life, but will not be reflected in grades or academic honors. Similarly, a case worker who gets a tacoria to hire the “nephew” of one of her clients as a busboy to keep him out of juvenile hall might set the young man on a positive life path, even though “job placement” is not part of her official duties and will not appear in the agency’s reports.

* Which have also made the world worse, at least in some respects.

** This this does not refer to industrial-sized FBOs like Catholic Charities or the Salvation Army, which operate with bureaucratic precision.

Take Time to Develop a Proposal Timeline

Many RFPs require that you include a timeline that will describe when your project will actually unfold—remember that the “when” section is part of the 5Ws and H. Even if the RFP writers forget to require a timeline, you should include one anyway, either under the “Project Description” or “Evaluation” sections because the timeline will clarify both your own thinking and the reviewer’s understanding of how you plan to sequence activities and achieve milestones.

Think of your project timeline as something like the timelines cops are always trying to establish in police procedurals. A shocking crime is committed—perhaps a socialite is killed. A rogue cop on the outs with the department is trying to solve the case. The night of the murder, the husband was at a charity ball, while the ex-husband was at the gym, while the husband’s jealous lover was at a taqueria. Could the husband have slipped away between the main course and the souffle? Did the ex-husband have time between 9:45 and 10:45 to slip out of the racquetball game, run over to the condo, and do the deed? In asking these questions, the cop is always trying to figure out if the crime is plausible. He—and he is almost always a “he”—is checking the believability of the tales he’s constructing. When you write a timeline for a proposal, you’re trying to do the same, only for the future. You’re trying to convince yourself, and the reviewer, that you’re believable in doing the job (except in this case the job is human services, not murder, for most nonprofit and public agencies).

Doing a timeline right requires a number of elements, including:

  • Startup Period: You probably can’t start delivering services on the day you execute the contract with the funder. Chances are good that you’ll need staff, training, space, and maybe more. Some RFPs will dictate how long your startup period should last, either from the notice of grant award or from the execution date of your contract. Usually they’ll demand somewhere around 90 days, which is fairly reasonable if it’s from the date you’ve executed your contract. Even if the funder doesn’t include a minimum or maximum startup period, you should. Unless otherwise directed by the RFP or client, we usually include a 90 day startup period.
  • Staff Recruitment/Assignment and Training: Make sure to provide for staff recruitment/assignment and preservice training in the startup period, as well as periodic or annual refresher training. Funders love professional development as much as mystery writers love plot twists, so serve it up in your timeline.
  • Outreach Start: Many if not most projects will involve some effort to get the word out to the target population. You’ll probably need to start outreach prior to the start of service delivery. Outreach is usually an ongoing activity; I might eventually write a post about everything that outreach should entail.
  • Project Oversight/Participant Committee: Most projects should have some form of participant, staff, and community oversight committee mentioned in their proposal. The formation and meeting facilitation of such committees should be reflected in the timeline.
  • Referral and Intake: Once you’ve made the target population and other providers aware of your project, you need some system for deciding who gets services and who doesn’t. Put referral and intake in between outreach and service delivery.
  • Services Start: Whatever services you’re providing should have a start date, often three months after the project begins. In many projects, service delivery is ongoing. In others, the referral/take process is done on a “batch” basis, repeating annually or periodically, rather than ongoing. This is how many job training programs work.
  • Evaluation: Your project should have some form of annual evaluation. The timeline should include some time for developing the evaluation criteria, conducting the evaluation and preparing/disseminating the evaluation reports.

Those are the basic elements for a human services timeline, like the one that might go with Isaac’s hypothetical Project NUTRIA. If you’re doing a capital campaign, you’d have a different set of milestones relating to construction, like permits, architecture, engineering, the commencement of construction, burying the body of Ralph “Ralphie” Cifaretto in the foundation for Tony Soprano, and so on, but the same basic idea would remain: you’d enumerate significant steps in your project, without going into too much minutia. Most of our of timelines are 10 – 15 rows, which is enough to give the general idea while avoiding specifics the client might not want to meet.

You also have to decide how to lay your timeline out. We used to make elaborate Visio drawings, and if we did the same thing today we’d use Omnigraffle Pro. But with the rise of online submissions, it’s too dangerous to use anything but tables in Word; now we usually make tables with three columns: the “date” column, with the number of project months it will take something to happen; a “milestone” column that will say something like “evaluation begins” and a “description” column that will say something like, “The evaluation, to be conducted by an expert evaluator selected through an open bidding process, will examine both process and outcome measures, as described in section 4.b.” If required by the RFP, we will also include a “responsibility” column or similar. For most projects, it’s absolutely not necessary, and is likely to time wasting and counter-productive, to use such professional scheduling software as Microsoft Project or Primavera. Such software will drive you nuts and, if embedded in a Word document, will probably bork the upload process.

Timelines don’t have to be extraordinarily complex, but they do have to match what you’ve written in other sections of the proposal. Internally inconsistent proposals will often be rejected because they fail to make sense, which is one danger of doing when you split a proposal among multiple writers (see more about this in “Stay the Course: Don’t Change Horses (or Concepts) in the Middle of the Stream (or Proposal Writing)“).

If you have no idea what should go into your timeline, it’s probably means your narrative lacks cohesion. Sometimes you’ll find that writing the timeline reminds you of something that should go elsewhere in the narrative, which is another use for them: back checking your own work, just as the cops in police procedures use timelines to make sure their own logic is sound. Your job might be slightly easier and less likely to leave a crazed serial killer on the loose, but it’s still important to do it well if you’re going to get the money.

Studying Programs is Hard to Do: Why It’s Difficult to Write a Compelling Evaluation

Evaluation sections in proposals are both easy and hard to write, depending on your perspective, because of their estranged relationship with the real world. The problem boils down to this: it is fiendishly difficult and expensive to run evaluations that will genuinely demonstrate a program’s efficacy. Yet RFPs act as though the 5 – 20% most grant budgets usually reserved for evaluations should be sufficient to run a genuine evaluation process. Novice grant writers who understand statistics and the difficulties of teasing apart correlation and causation but also realize they need to tell a compelling story in order to have a chance at being funded are often stumped at this conundrum.

We’ve discussed the issue before. In Reading Difficult RFPs and Links for 3-23-08, we said:

* In a Giving Carnival post, we discussed why people give and firmly answered, “I don’t know.” Now the New York Times expends thousands of words in an entire issue devoted to giving and basically answers “we don’t know either.” An article on measuring outcomes is also worth reading, although the writer appeared not to have read our post on the inherent problems in evaluations.

That last link is to an entire post on one aspect of the problem. Now, The Chronicle of Higher Education reports (see a free link here) that the Department of Education has cancelled a study to track whether Upward Bound works.* A quote:

But the evaluation, which required grantees to recruit twice as many students to their program as normal and assign half of them to a control group, was unpopular from the start […] Critics, led by the Council for Opportunity in Education, a lobbying group for the federal TRIO programs for disadvantaged students, said it was unethical, even immoral, of the department to require programs to actively recruit students into programs and then deny them services.

“They are treating kids as widgets,” Arnold L. Mitchem, the council’s president, told The Chronicle last summer. “These are low-income, working-class children that have value, they’re not just numbers.”

He likened the study to the infamous Tuskegee syphilis experiments, in which the government withheld treatment from 399 black men in the late stages of syphilis so that scientists could study the ravages of the disease.

But Larry Oxendine, the former director of the TRIO programs who started the study, says he was simply trying to get the program focused on students it was created to serve. He conceived of the evaluation after a longitudinal study by Mathematica Policy Research Inc., a nonpartisan social-policy-research firm, found that most students who participated in Upward Bound were no more likely to attend college than students who did not. The only students who seemed to truly benefit from the program were those who had low expectations of attending college before they enrolled.

Notice, by the way, Mitchem’s ludicrous comparison of evaluating a program with the Tuskeegee experiment: one would divide a group into those who receive afterschool services that may or may not be effective with a control group that wouldn’t be able to receive services with equivalent funding levels anyway. The other cruelly denied basic medical care on the basis of race. The two examples are so different in magnitude and scope as to make him appear disingenuous.

Still, the point is that our friends at the Department of Education don’t have the guts or suction to make sure the program it’s spent billions of dollars on actually works. Yet RFPs constantly ask for information on how programs will be evaluated to ensure their effectiveness. The gold standard for doing this is to do exactly what the Department of Education wants: take a large group, randomly split it in two, give one services and one nothing, track both, and see if there’s a significance divergence between them. But doing so is incredibly expensive and difficult. These two factors lead to a distinction between what Isaac calls the “proposal world” and the “real world.”

In the proposal world, the grant writer states that data will be carefully tracked and maintained, participants followed long after the project ends, and continuous improvements made to ensure midcourse corrections in programs when necessary. You don’t necessarily need to say you’re going to have a control group, but you should be able to state the difference between process and outcome objectives, as Isaac writes about here. You should also say that you’re going to compare the group that receives services with the general population. If you’re going to provide the ever-popular afterschool program, you should say, for example, that you’ll compare the graduation rate of those who receive services with those who don’t, for example, as one of your outcome measures. This is a deceptive measure, however, because those who are cognizant enough to sign up for services probably also have other things going their way, which is sometimes known as the “opt-in problem:” those who are likely to present for services are likely to be those who need them the least. This, however, is the sort of problem you shouldn’t mention in your evaluation section because doing so will make you look bad, and the reviewers of applications aren’t likely to understand this issue anyway.

In the real world of grants implementation, evaluations, if they are done at all, usually bear little resemblance to the evaluation section of the proposal, leading to vague outcome analysis. Since agencies want to get funded again, it is rare that an evaluation study of grant-funded human services programs will say more less, “the money was wasted.” Rather, most real-world evaluations will say something like, “the program was a success, but we could sure use more money to maintain or expand it.” Hence, the reluctance of someone like Mr. Mitchem to see a rigorous evaluation of Upward Bound—better to keep funding the program with the assumption it probably doesn’t hurt kids and might actually help a few.

The funny thing about this evaluation hoopla is that even as one section of the government realizes the futility of its efforts to provide a real evaluation, another ramps up. The National Endowment for the Arts (NEA) is offering at least $250,000 for its Improving the Assessment of Student Learning in the Arts (warning: .pdf link) program. As subscribers learn, the program offers “[g]rants to collect and analyze information on current practices and trends in the assessment of K-12 student learning in the arts and to identify models that might be most effective in various learning environments.” Good luck: you’re going to run into the inherent problems of evaluations and the inherent problems of people like Mr. Mitchem. Between them, I doubt any effective evaluations will actually occur—which is the same thing that (doesn’t) happen in most grant programs.

* Upward Bound is one of several so-called “TRIO Programs” that seek to help low-income, minority and/or first generation students complete post-secondary education. It’s been around for about 30 years, and (shameless plug here) yes, Seliger + Associates has written a number of funded TRIO grants with stunningly complex evaluation sections.