How Good was the 2010 Census? A View from the Post-Enumeration Survey

Bookmark and Share

Written by: Director Robert Groves

Last week, the Census Bureau released the results of our post-enumeration survey, called “Census Coverage Measurement (CCM).” The results showed that the 2010 Census had a net overcount of 0.01 percent, meaning about 36,000 people were overcounted in the census on the base of over 300 million. This sample-based result, however, was not statistically different from zero, after taking into account the sampling variability of the post-enumeration survey.

On this one evaluation — the net undercount of the total population — I can honestly say this was an outstanding census. When this fact is added to prior positive evaluations I’ve previously described (for example, here and here), the American public can be proud of the 2010 Census their participation made possible.

It is also useful to look at other censuses and ask the question: how is this country doing in measuring its population using the tools we have?

The graph below compares the net coverage estimates of each U.S. census since 1980 to the corresponding census of three other nations — Canada, Australia and the United Kingdom.

On the right, you see the very small number (an overcount of 0.01 percent) that we just announced from the 2010 U.S. Census. (We don’t yet have equivalent figures from other countries.) It is easy to look at this chart and feel proud as a resident of the United States about the historical quality of censuses that we do in this country. We do world class censuses in the United States. We will see how things compare for the 2010/2011 round, but we like this chart.

However, the net undercount is only one evaluation tool.  We recognize that like other countries, there are still some groups that we have a harder time counting, for example, renters, young children, young adult males, blacks, Hispanics and American Indians on reservations. Correspondingly, we tend to overcount owners of homes, older persons, females, and White non-Hispanics.  These historic patterns of systematic coverage tendencies still appear, but some of them are better than others.

We designed the post-enumeration survey to measure components of coverage.  We’re just beginning to mine the data to see what operations have positive and negative influences on the quality of the census.

A few initial observations:

  • In the areas where we mailed a bilingual English/Spanish form, it appears to have reduced the undercount of Hispanics relative to other populations.
  • Some things done in the census intended to count people only once (such as instructions provided on the form as to who to count and not count or address listing operations intended to avoid address duplication) didn’t have the impact we had hoped, as evidenced by the higher number of duplicates in the 2010 Census.
  • When people self-respond, especially early on (in this case by mail), it leads to more accurate results.
  • Households that yield interviews in the face-to-face followup stage, especially those with proxy responses from neighbors, yield less accurate data.

A critical question these results raise is “Can we redesign our efforts to followup the nonrespondents to make them more efficient?”  We’re seeking to do a more cost-efficient Census while producing the same high quality results. For example, we’ve committed to use the Internet as a response option and are exploring the use of administrative records to supplement our nonresponse follow-up operation. We’re also looking at ways to target our address list development activities so that we don’t have to canvass the entire country. These three options would allow us to do a chunk of the census at a lower cost so we can focus some of those saved resources on areas and hard-to-count populations that need more focused attention.

There is much more work we have to do to extract the full value of the post-enumeration survey for future censuses.  As with all of our evaluations, we’ll publish them on our website, both the good news and the bad news.

Posted in 2010 Census | 2 Comments

How do we Conduct a Post-Enumeration Survey?

Bookmark and Share

Written by: Director Robert Groves

Post-enumeration surveys are complicated beasts, but their basic features can be communicated simply.

The 2010 post-enumeration survey measures the separate components of census coverage as well as the net coverage error. In other words, we are able to know not just the net undercount or net overcount, but also what feeds that – for example, how many people we counted correctly, how many we may have missed, and how many we may have duplicated or counted in error.

To calculate those estimates, we start by drawing two samples: a post-enumeration sample (P sample) and an enumeration sample (E sample). The first is a sample of housing units and people selected independently of the census. We interview the household members of this sample and then match them to the census on a case-by-case basis to determine whether they were counted or missed in the census.

The second sample, the E sample, is a sample of households counted in the census, in the same geographic areas as the P sample. This sample is designed to help us estimate how many enumerations were correct or erroneous in the census.

We use a statistical method called “dual system estimation” to estimate the coverage of the census, based on capture-recapture methodology. As an example, to estimate the number of fish in a pond using this approach, you would capture a set of fish, tag them for later identification, and place them back in the pond. After the fish have had time to disperse sufficiently, you would capture a second set of fish, the “recapture.” Then you would count the number of recaptured fish, and also how many of them are tagged. All this information would allow you to estimate the total number of fish in the pond.

For this reason, from August through December of 2009, CCM staff independently listed all addresses in census blocks within the P-sample without referring to the address list for the 2010 Census, and without assistance from staff working on census operations. Later, interviewing and other operations were also scheduled in the field to minimize interaction between the census and CCM staffs.

From mid-August to mid-October of 2010, we attempted to conduct an interview of all P-sample households in each sample block cluster. Field interviewers collected information about the current residents of the sample housing unit, generally including people who had moved in or out of the unit.

You could think of the census as the first capture or enumeration, the post-enumeration survey (PES) as the second capture, and those who are enumerated in the census and the survey as the tagged fish. In the accompanying table, you can see the total number of census records, N1+ (shaded in blue). Before using this number, we remove the erroneous enumerations. Next, we take the number of people in the PES who are matched to a census record, N11, and the total number of people in the survey, N+1. (These numbers are shaded in red.)

Enumerated in PESTheir ratio (N11 / N+1) is an estimate of the coverage in the census. We estimate components of coverage error in similar ways.

Relying on the assumption of independence between census and PES operations, we inflate the number of correct census enumerations, N1+, by the inverse of this coverage ratio to estimate the total of the population, N. Although the implementation is much more complex, we can produce an estimate of the population on April 1, 2010, by applying this statistical approach.

On May 22, 2012, we will release the key findings from the post-enumeration survey, our statistical evaluation of the coverage of the 2010 Census. Since we’ve been conducting post-enumeration surveys for some decades, we can compare the coverage estimates from the 2010 Census to those of past censuses.

Stay tuned!

Posted in Measuring America | 2 Comments

A Future Without Key Social and Economic Statistics for the Country

Bookmark and Share

Written by: Director Robert Groves

Our country faces important federal funding challenges linked to the current recession and its aftermath. On the Census Bureau’s part, we have been striving to cut administrative costs, reengineer our survey processes, and find innovative ways to squeeze every cent of taxpayer money we get. This is an important duty, I believe, we have as public servants, and I am proud of the hard work of my Census Bureau colleagues on this score. It is also my duty to inform the country of the impact of budgets on the scope and quality of the nonpartisan statistical information the Census Bureau provides.

This blog post provides information about the implications of the recent budget passed by the House of Representatives.

The Appropriations Bill eliminates the Economic Census, which measures the health of our economy. It terminates the American Community Survey, which produces the social and demographic information that monitors the impact of economic trends on communities throughout the country. It halts crucial development of ways to save money on the next decennial census. In the last three years the Census Bureau has reacted to budget and technological challenges by mounting aggressive operational efficiency programs to make these key statistical cornerstones of the country more cost efficient. Eliminating them halts all the progress to build 21st century statistical tools through those innovations. This bill thus devastates the nation’s statistical information about the status of the economy and the larger society.

The Economic Census
The 2012 Economic Census provides comprehensive information on the health of over 25 million businesses and 1,100 industries. It provides detailed industry and geographic source data for generating quarterly GDP estimates. The economic census is also the benchmark for measures of productivity, producer prices, and many of the nation’s principal economic indicators. At this moment, we are poised to request the key data from individual firms. We have already printed 7.5 million forms, and are preparing the October mailing and internet data collection infrastructure. Cancelling the 2012 Economic Census now wastes $226 million already expended on preparatory activities

The American Community Survey
The ACS is our country’s only source of small area estimates on social and demographic characteristics. Manufacturers and service sector firms use ACS to identify the income, education, and occupational skills of local labor markets they serve. Retail businesses use ACS to understand the characteristics of the neighborhoods in which they locate their stores. Homebuilders and realtors understand the housing characteristics and the markets in their communities. Local communities use ACS to choose locations for new schools, hospitals, and fire stations. There is no substitute from the private sector for ACS small area estimates. Even if the funding problems were solved in the proposed budget, the House bill also bans enforcement of the mandatory nature of participation in the ACS; this alone would require at least $64 million more in funding to achieve the same precision of ACS estimates.

Building a More Efficient Population Census
In the last three years the Census Bureau has launched a transformation in survey and census designs. Both the ongoing economic and demographic surveys and the economic and demographic censuses will use the same technological infrastructure, to produce a leaner, more efficient 21st Century Census Bureau. The reduction in the 2020 Census request will not permit the Census Bureau to undertake the research and testing needed to build shared use of technical infrastructure and more efficient ways of conducting the next decennial census. It eliminates the anonymized public use sample file (PUMS), robbing the country of research discoveries from the 2010 Census by the private sector. The country will lose the chance to mount a 2020 Census at a lower cost per household than that of the 2010 Census.

Modern societies need current, detailed social and economic statistics; the US is losing them.

Posted in About the Agency, Measuring America, Uncategorized | Tagged , , | 100 Comments

How Good was the 2010 Census?, Part 3

Bookmark and Share

Written by: Director Robert Groves

As prior blogs have noted, there are 3 ways to evaluate the quality of a census: 1) process indicators that describe the operations, 2) comparisons with other estimates of population size, and 3) a sample survey matched to the census, often called a “post-enumeration survey.”  The first two sets of evaluations suggest that the 2010 Census was a good one.  On Tuesday, May 22, we’ll report the results of the statistical estimates of coverage of the 2010 Census based on our post-enumeration survey, labeled “Census Coverage Measurement” or CCM.

For the first time, we will also break the estimates into components of coverage.  These include correct and erroneous enumerations in the census, tallies of the number of people for whom all characteristics were imputed (inserted statistically), and estimates of how many people were missed in the census.

We will provide the estimates for the nation; major demographic groups, such as by race, by age and sex categories, and for owners and renters; by important census operations (mailout/mailback, update/leave, and interviewing in person); and for the 50 states, the District of Columbia, and large counties and places.

These results will help us as we plan and conduct research to improve the 2020 Census. On the other hand, results from the CCM have limitations. The estimates have sampling error and a vulnerability to violations of the underlying assumptions.

Net Coverage Error and Components of Coverage

The net coverage error of a census is defined as the difference between the true population size and the census count.  Although the net coverage error provides valuable information about the census count, it is a single number that summarizes the results of various census operations and actual events. For example, the estimated net coverage error for the 2000 Census was very small (we estimated a net overcount of 0.49%, with a standard error of 0.20%). However, evaluations showed that a large number of erroneous enumerations offset a large number of omissions.

To address this, in addition to measuring net coverage error, the CCM program estimated separately the components of census coverage. First, we divided the number of census records into estimates of its three components—correct enumerations, erroneous enumerations, and misses filled-in by whole-person imputations (people for whom all characteristics were imputed).  “Erroneous enumerations” are census records that should not have been included, such as duplicates, fictitious people, people who died before Census Day (April 1, 2010), or were born after Census Day. Second, we split the estimated population into those people deemed to be correct enumerations and those who were missed in the census.

Estimating the number of omissions (those missed in the census) is complicated by practical and conceptual issues. As the missing component of the true population size, omissions cannot be collected and analyzed directly, but can be estimated by deduction through dual system estimation. As with correct and erroneous enumerations, defining what should qualify as an omission is difficult. For example, for some housing units captured in the census, the records of the entire household are whole-person imputations, that is, treated as missing data and filled in statistically from another housing unit. Should such records be considered as omissions from the census? Valid arguments on each side of the issue render it a difficult one, with no solution that satisfies every intended use of the data.  We will present estimates in as transparent way as possible to facilitate such arguments.

The coverage estimates from the post-enumeration survey should be used to build a better 2020 Census. By examining what operations or procedures suffered the most erroneous enumerations or omissions, we can construct targeting features in 2020. By knowing how different successive coverage improvement operations affect the estimated coverage, we can make better cost-quality tradeoff decisions for 2020.

Posted in 2010 Census, Quality Assurance, Uncategorized | 1 Comment

Looking Forward, Looking Back

Bookmark and Share

Written by: Director Robert Groves

I have had the honor of directing the Census Bureau since July of 2009.  As was reported today, I was offered the provost position at Georgetown University and have accepted it, with a start date of late August, 2012.

Many things have happened since July 2009.  The wonderful 2010 Census team defied all the naysayers to complete a successful census on time and $1.9 billion under budget.  We reorganized the Bureau, re-establishing a research directorate and then, in partnership with NSF, launched an 8-node research network at universities across the country.  We put in place a risk management group to help us oversee large investments.  The demographic programs’ and field directorates worked with the sponsors of surveys conducted by the Census Bureau to identify key improvements we could make.  They’re reducing the number of Bureau regional offices from 12 to 6; they’re improving the supervisory structure of field interviewers; they’re mounting a more efficient matrix organization for survey management teams.  The economic directorate is doing a top-to-bottom priority-setting effort, enhancing generalized processing systems.  All the directorates have cooperated in launching a generalized system for the collection, processing, editing, and estimation of survey, internet, and administrative data.  This will achieve a new level of efficiency, linking together the semi-autonomous directorates with shared tools.

On the administrative front, the HR group has launched a corporate hiring process, which moves new hires across the Bureau, building their cross-directorate skills; just recently, they announced a current-staff rotation plan, which has similar goals.  We’re in the middle of a large retirement incentive program, with hundreds of folks taking advantage of the offer.

The new IT leadership has significantly reduced our IT operations costs by consolidation and integration.  They’ve launched an internal social media tool that will improve internal communication/collaboration; they’ve built a private cloud to increase computing efficiency; they’re addressing the mix of research and production computing needs; they’ve built a Center for Applied Technology, which is a safe environment to try out high-risk, high-payoff ideas.

We all read in the papers each day some commentary on how Federal employees are unmotivated, unproductive, and wasteful.  I’ve met many who defy that stereotype. They do, however, need leaders who listen to their ideas, leaders who will support them when they trip attempting stretch goals, leaders who believe that government agencies can be as efficient as any other organization.

The current staff at the Census Bureau knows five things: 1) costs of traditional data collection methods are increasing because of changes in US society, 2) the demand for more statistical information is growing, 3) there are new technologies that can help our business, 4) the Federal government can further reduce burden on the public by using data from existing records, and 5) we will not have more money to do our work.  Hence, everything the Census Bureau staff is doing focuses on creating more efficient processes to free up resources to invest in new and better statistics.

This is hard work.  It takes complete commitment to ongoing innovation. It’s not flashy.  Indeed, public service is rarely sexy.  It is, however, noble.  I’ve learned that in a deep way since July 2009 from the behavior of my colleagues at the Census Bureau.

Posted in About the Agency | 3 Comments