Written by: Director Robert Groves
Now that we’re finished with the data collection for the 2010 Census, it’s tempting to think everything is complete.
We have a real hard deadline of December 31, 2010 to deliver the state population counts to the country. By that same deadline, we’re also charged with the task of doing the arithmetic of determining how many seats in the House of Representatives each state should have. This announcement is perhaps the most emblematic of the importance of the census in the United States.
To get to that point, there is much left to do. We have important data files that need to get reconciled: 1) the master address file, now updated with all new dispositions for each address (occupied, vacant, deleted) and 2) the data file from census questionnaires from various sources (mailout, enumerator-administered, telephone receipt, “be counted” forms, results from other followup operations). We also must update the geographic area boundaries used for tabulating data for the 2010 Census. This allows us to associate addresses (and the data on the occupants) with the right areas.
For some housing units, we have multiple completed questionnaires. Some correspond to additional people added to a previously-received form, some are duplicates (late mail returns and a nonresponse-followup enumerator-administered form). We have a well-tested method to resolve these duplicates and assemble the correct composition of the household. We make sure that each housing unit found has a resolution.
We also must conduct automated and computer assisted manual coding of write-in responses to the race and Hispanic origin questions.
From this, we create the first real 2010 Census data file, with all the cases that will eventually produce counts. However, for some cases, there are items that were not completed; for a small number of cases, we don’t have any reports of how many people lived in the unit on April 1, despite having a report that it was occupied then. We apply sophisticated editing rules and estimation techniques to fill in those data, using properties of observations from others in the same household when possible or from other like units.
This then creates the “census edited file.” This is subjected to careful review by expert demographers who compare the census results to many other sources of estimates of the population. This step is conducted state-by-state. They will be checking that the edits were programmed correctly and that the data make sense demographically. For example, no one should have a birth date in the 1800’s, or be born in the 13th month, or on April 31st. This edited file also gives us our first chance to look at data by geographical areas. We will look at places and towns and see their population, housing, and group quarters counts.
With this file, we have our first data product – state-level counts that are used for the reapportionment release.
It’s not the flashy side of the decennial census. It’s much more technical, guided by statistical quality standards, and it rests on careful work with large and complex response files and geographic reference files. But it is crucial to the quality of the census, and we must get it done well.