Written by: Director Robert Groves
Let me tell you a wonderful story, a statistical detective story of sorts.
During the summer, you may have seen statistics released from the 2010 Census Summary File 1 on same-sex couple unmarried partner households.
We noticed that reported counts of same-sex couples from the 2010 census were much higher than similar estimates from American Community Survey at earlier years. Our demographic analysts had some immediate ideas, explained nicely in this video:
So we suspected that the format of the nonresponse followup form was the culprit. If that were the case, one should see some obvious mismatches between the name of the person written on the form and the recorded sex of that person. Bingo! A qualitative inspection of some of the records showed suspicious combinations (e.g., “Harold” recorded as a “female”). Past research led us to believe that the name entered was likely to be more accurate than the recorded sex.
How could the unintentional mistakes be fixed? We have an analysis of the full Census that lists the percentage male and female for all first names. Some names are common for both males and females (e.g., “Leslie,” “Dana,” “Alex”). Other names are very dominantly one sex or another (e.g., “Mary,” “Thomas,” “Alicia”). Our analysts identified the names that were 95% or higher male and those 95% or higher female. Then we completely reanalyzed the entire 2010 Census. When we discovered one of the names in the two lists that had a very unlikely sex reported to it, we noted that as a likely error.
When we count those apparent mistakes and reclassify them as a consistent name-sex pair, we found that the same-sex couples counts from the Census agree with other estimates. The best comparison is to the sample-based estimates of the American Community Survey, which moved to the improved question format in 2008. The chart below shows why we are confident that the “preferred estimates” are likely much better than the original counts.
The chart above shows a large decrease in the number of same-sex couples when we changed the format of the American Community Survey in the 2007-2008 time period. We have evidence that the lower estimates are more accurate.
Similarly, we are confident that the “Preferred estimates” at the rightmost bar of the chart are more accurate than the “original counts” from the 2010 Census. The logic of our analysis and repair procedure on the 2010 coding is compelling, and the closer agreement with the just-released 2010 American Community Survey results strengthens our confidence.
This is the technical expertise of the Census Bureau at its finest – examining statistics for anomalies, detecting the cause of a found anomaly, and fixing mistakes from data collection when possible to give the country the best statistics possible.