Estimating the Size of a Small Population

Written by: Director Robert Groves

Let me tell you a wonderful story, a statistical detective story of sorts.

During the summer, you may have seen statistics released from the 2010 Census Summary File 1 on same-sex couple unmarried partner households.

We noticed that reported counts of same-sex couples from the 2010 census were much higher than similar estimates from American Community Survey at earlier years. Our demographic analysts had some immediate ideas, explained nicely in this video:

So we suspected that the format of the nonresponse followup form was the culprit. If that were the case, one should see some obvious mismatches between the name of the person written on the form and the recorded sex of that person. Bingo! A qualitative inspection of some of the records showed suspicious combinations (e.g., “Harold” recorded as a “female”). Past research led us to believe that the name entered was likely to be more accurate than the recorded sex.

How could the unintentional mistakes be fixed? We have an analysis of the full Census that lists the percentage male and female for all first names. Some names are common for both males and females (e.g., “Leslie,” “Dana,” “Alex”). Other names are very dominantly one sex or another (e.g., “Mary,” “Thomas,” “Alicia”). Our analysts identified the names that were 95% or higher male and those 95% or higher female. Then we completely reanalyzed the entire 2010 Census. When we discovered one of the names in the two lists that had a very unlikely sex reported to it, we noted that as a likely error.

When we count those apparent mistakes and reclassify them as a consistent name-sex pair, we found that the same-sex couples counts from the Census agree with other estimates. The best comparison is to the sample-based estimates of the American Community Survey, which moved to the improved question format in 2008. The chart below shows why we are confident that the “preferred estimates” are likely much better than the original counts.

Same sex Couple Households

The chart above shows a large decrease in the number of same-sex couples when we changed the format of the American Community Survey in the 2007-2008 time period. We have evidence that the lower estimates are more accurate.

Similarly, we are confident that the “Preferred estimates” at the rightmost bar of the chart are more accurate than the “original counts” from the 2010 Census. The logic of our analysis and repair procedure on the 2010 coding is compelling, and the closer agreement with the just-released 2010 American Community Survey results strengthens our confidence.

This is the technical expertise of the Census Bureau at its finest – examining statistics for anomalies, detecting the cause of a found anomaly, and fixing mistakes from data collection when possible to give the country the best statistics possible.

10 Responses to Estimating the Size of a Small Population

  1. beachcomberT says:

    “Technical expertise…at its finest?” Ha! I don’t think so. Why wasn’t the problematic “nonresponse follow-up form” field-tested before it was put into general use? Were these errors made by Census surveyors visiting households, or by optical scanners transferring data into computers? The Census statement released yesterday is worded very obliquely. No one seems to want to take responsibility for very botched-up statistics. As a member of the GLBT community, I am suspicious when about 200,000 same-sex “married” households suddenly are purged from your data because of technical errors. I hope you do a better job in future surveys, and, by the way, let’s get rid of the loaded terms of “husband” and “wife.” Use “spouse” instead.

  2. Andrew Z says:

    Could you please publish the data for first name vs sex? I have a similar application, but the best data I found is 21 years old (“Frequently Occurring First Names and Surnames From the 1990 Census”). Since 1990, the popularity of names has changed, and some people may have chosen to switch usage between nick names and full first names (e.g., Patricia vs Pat).

  3. David V says:

    The table is so indelicate! The count of unmarried partners is reduced by a few percentage points, while the count of spouses was slashed to about a third of the original count! To a community that is only recently emerging from fear of self-identification, this must appear like a threatening move, and the bubbly presentation is not a good way to mask it.
    To finish this “wonderful story” of politically motivated careful revisionism, the Census Bureau should at least present the side statistic that proves the same % of misidentification occurred on the base numbers of heterosexual unmarried v married couples… that data point is missing here, and would be much more informative than either the video or the table currently provided.

  4. Veritas says:

    There was no field test of followup for the 2010 census as there was no money for followup tests. Also, you seem to have missed the point of the whole research–it is minute mistakes by opposite sex couples which caused the data flaw, not some conspiracy as you are eager to believe. When the perfect human being is made, then the perfectly answered survey will happen–let’s see you stand on a doorstep with a two foot peice of folded paper balanced on your writst while trying to transcribe information. And you still must remember that the improvement of the data still leaves us with 130,000 same-sex spouses, double the amount that there can possibly be according to all records complied by state administrators.

  5. Jose says:

    the table is bad because many couples are not married

  6. Frank P says:

    David V has basically re-iterated what I wanted to say. Need more info!

  7. Mike says:

    I would have to agree that this wonderful story is not all that wonderful.

  8. stoneroy15 says:

    This is good information. A wonderful story indeed!

  9. stoneroy15 says:

    Nice style and absolutely different. I always prefer to read good information and your article is a example of this. This is very useful

