A while back I came across a blog by Hilary Parker which examined how the popularity of the name “Hillary” has been in decline. The popularity of this name doesn’t concern me, but the blog still caught my attention for two reasons:
- The top 1000 most popular baby names for every year since 1880 are available on a site run by the social security administration – an interesting data set.
- Part of Hillary’s work tied the popularity of names to events in history – this sounds fun.
I thought I’d scrape the data from the social security administration’s website myself and look specifically for male names which show a sudden burst in popularity. Then, I can play a game where I google the names and try to guess an event or person from history which influenced this popularity boom.
For each year in [1880, 2012] I extracted the top 1000 most popular names and the percentage of babies born with those names. This ended up being about 3500 unique names. An intuitive approach to find names with large swings in popularity would be to treat each name as a vector whose components are the percentage of babies born with that name each year, normalize, and then take finite differences to determine which names and years show the greatest change. The problem here is that when a name no longer appears in the top 1000 names for a year, there is no information available. Obviously, some percentage of babies are still given this name, but we don’t have this information. If a name slips into the top 1000 just for a few years, the resulting vector is very sparse; normalizing makes these sparse entries relatively large. This artificially inflates the finite differences. Still, plotting a large chunk of the normalized data set gives some indication of what the data looks like.
Notice some spikes in the data are already clear, and these could indeed be found using finite differences. I am particularly partial to “Jesse”, so I used this to illustrate a name whose finite differences are small compared with “Darren” which became popular in the 60’s. You may also notice how the data get’s “messy” as we get nearer to the present. This is due to the normalization and the fact that many names from more recent years were not in the top 1000 names before 1980.
Since normalizing is a global operation, and this “incomplete” set of data will cause problems, we will instead look at our data locally by measuring the percent increase in the popularity relative to the previous year. For example if 0.2% of babies were named Charles in 2000 and 0.4% were named Charles in 2001, this is 200% of the previous year. This is sometimes referred to as relative risk; drawing names uniformly at random, the “risk” of being named Charles in 2001 is 200% that of the “risk” in 2000. The names with the highest relative risk are shown below along with my guess via a quick web search for why the sudden spike in popularity. The numbers following the name represent the max percent increase from the previous year and the year in which the increase occurred. For example, the percentage of babies named “Devante” in 1992 was 1200% times the percentage from 1991.
- Devante, 12.0, 1992 – popular R&B artist: DeVante Swing
- Dawson, 8.9, 1998 – popular television series character: Dawson’s Creek
- Hobart, 8.3, 1898 – election year – vice president: Garret Hobart
- Woodrow, 8.1, 1912 – election year – president: Woodrow Wilson
- Bret, 7.7, 1959 – popular television series character: Maverick
- Dewey, 6.5, 1898 – Spanish-American war hero: George Dewey
- Tristan, 6.4, 1995 – ???
- Jermaine, 6.3, 1973 – member of the popular Jackson 5: Jermaine Jackson
- Elvis, 6.1, 1957 – needs no explanation: Elvis Presley
- Quentin, 6.1, 1919 – (update) someone pointed out this was Teddy Roosevelt’s son – KIA over France in 1918: Quentin Roosevelt
- Darren, 5.9, 1965 – popular television series character: Bewitched
- Grover, 5.8, 1884 – election year – president: Grover Cleveland
- Bryan, 5.7, 1896 – election year – presidential candidate: William Bryan
- Tyrese, 5.6, 1999 – Grammy winning R&B artist: Tyrese Gibson
- Bailey, 5.4, 1997 – ???
- Maximus, 4.8, 2001 – The name’s inclusion made me laugh: Gladiator
Most surprising to me was that I’ve never heard of the number one name, Devante. Also surprising to me was the inclusion of Maximus, though this made me smile. I don’t have a good guess for Tristan or Bailey.
Finally, below is a chart of a few select names from the list above showing relative risk for all years between 1880 and 2012. As you can see, the bursts in popularity are clearly visible.
Notice the smaller spike for Bret – Bret Maverick returned to television very briefly in 1981.