Written by: Director Robert Groves
In an earlier post, “Quality in a Census Part 5,” I described how a large sample survey called a “post-enumeration survey” is used to evaluate a census.
For 2010 such a survey will be used to estimate the number of persons missed by the census as well as those erroneously enumerated (e.g., duplicates and temporary visitors from other countries).
A post-enumeration survey draws two samples – one from the full population in complete ignorance of whether sample cases were covered in the census; another from the census address universe. After drawing these samples, a survey is done which basically replicates the census. Each address and each person enumerated within the address are carefully matched to the census file, as a way of determining which cases were captured by both the census and the survey or by only one of the methods. From this matching operation, estimates of the misses and erroneous enumerations are made.
Just like demographic analysis, if the post-enumeration sample survey achieves its ideal form, it offers completely accurate estimates (within sampling error) of differential undercount, the tendency for some populations to be covered by the census less well than others. However, the ideal post-enumeration survey is never achieved. I noted in the earlier post that, while a sample survey response completely independent of the census is desirable, it cannot be attained. Further, I noted weaknesses in the sample survey in reconstructing the April 1 household composition.
We’re starting to receive some initial findings from the post-enumeration survey, but the final results won’t be in until 2012. I thought you’d like to see what we’re finding in early analysis.
We’re trying to compare each result to the corresponding result in the 2000 Census, just to see if things look better or worse than that census. First, we have an unweighted estimate of the percentage of housing units in the independent sample that was found to match a unit on the address list used in the Census. We want this percentage to be as high as possible, and the 2010 percentage is higher than that of 2000 (96.5% to 91.4%, respectively). Secondly, we’ve finished the first round of person-level matching, using computer-match algorithms (to be followed with expert review). The match rate for 2010 is about 9 percentage points higher than comparable matches in 2000.
These are findings at the national level. Those who evaluate censuses know that variation in these rates across geography and subgroups is an important evaluative criterion also. Other things being equal, we’d like to see small variation over census tracts, for example, spread throughout the country. The variation across census tracts in both the housing unit match rate and the initial computer-based person match rate is lower for 2010 than 2000. This is good news.
Much, much more remains to be done in the post-enumeration survey. We are even going out one more time to sample units that pose unresolved puzzles in matching them to our census data. I’ll keep you posted.