Written by: Director Robert Groves
Post-enumeration surveys are complicated beasts, but their basic features can be communicated simply.
The 2010 post-enumeration survey measures the separate components of census coverage as well as the net coverage error. In other words, we are able to know not just the net undercount or net overcount, but also what feeds that – for example, how many people we counted correctly, how many we may have missed, and how many we may have duplicated or counted in error.
To calculate those estimates, we start by drawing two samples: a post-enumeration sample (P sample) and an enumeration sample (E sample). The first is a sample of housing units and people selected independently of the census. We interview the household members of this sample and then match them to the census on a case-by-case basis to determine whether they were counted or missed in the census.
The second sample, the E sample, is a sample of households counted in the census, in the same geographic areas as the P sample. This sample is designed to help us estimate how many enumerations were correct or erroneous in the census.
We use a statistical method called “dual system estimation” to estimate the coverage of the census, based on capture-recapture methodology. As an example, to estimate the number of fish in a pond using this approach, you would capture a set of fish, tag them for later identification, and place them back in the pond. After the fish have had time to disperse sufficiently, you would capture a second set of fish, the “recapture.” Then you would count the number of recaptured fish, and also how many of them are tagged. All this information would allow you to estimate the total number of fish in the pond.
For this reason, from August through December of 2009, CCM staff independently listed all addresses in census blocks within the P-sample without referring to the address list for the 2010 Census, and without assistance from staff working on census operations. Later, interviewing and other operations were also scheduled in the field to minimize interaction between the census and CCM staffs.
From mid-August to mid-October of 2010, we attempted to conduct an interview of all P-sample households in each sample block cluster. Field interviewers collected information about the current residents of the sample housing unit, generally including people who had moved in or out of the unit.
You could think of the census as the first capture or enumeration, the post-enumeration survey (PES) as the second capture, and those who are enumerated in the census and the survey as the tagged fish. In the accompanying table, you can see the total number of census records, N1+ (shaded in blue). Before using this number, we remove the erroneous enumerations. Next, we take the number of people in the PES who are matched to a census record, N11, and the total number of people in the survey, N+1. (These numbers are shaded in red.)
Their ratio (N11 / N+1) is an estimate of the coverage in the census. We estimate components of coverage error in similar ways.
Relying on the assumption of independence between census and PES operations, we inflate the number of correct census enumerations, N1+, by the inverse of this coverage ratio to estimate the total of the population, N. Although the implementation is much more complex, we can produce an estimate of the population on April 1, 2010, by applying this statistical approach.
On May 22, 2012, we will release the key findings from the post-enumeration survey, our statistical evaluation of the coverage of the 2010 Census. Since we’ve been conducting post-enumeration surveys for some decades, we can compare the coverage estimates from the 2010 Census to those of past censuses.