Ken Lee just recieved his Ph.D. in economics from GMU; I was his thesis advisor; his thesis is here. I am impressed enough with Ken’s thesis that I’ll take the next few posts to describe some of his main findings. The first finding I’ll describe: The main way that US states vary is in their health.
Ken collected 81 features of states, 56 cultural rankings and 25 demographic variables (listed below), and did a factor an analysis on them. A factor analysis finds a few linear combinations of features that can explain the most variance in whole set of features; the variation of all the features could result from variation in just a few behind-the-scenes factors, plus error.
The biggest factor, explaining 27% of the variance between US states, was health – some states are just healthier than others, and this fact can explain many other things about those states. Here are the three biggest factors:
(27% of variance): Top five features: “low cancer deaths, low cardiovascular deaths, low smoking rates, low levels of unnecessary medical care, low obesity rates,” Also: “high well-being index, high exercise rates, healthiest, low mortality rates for blacks and whites, higher in education (IQ Rank, Percentage of Graduates, and Smartest), higher in health (Healthiest, Exercise Frequency, and Percentage with No Insurance), and lower in crime rates (Crime Rate and Violent Crime Rate) rankings.” Map:
(15% of variance): Top five features: “low occupational death rates, high in women’s rights, high in primary care physicians per capita, high in amount of fruit eaten per capita, low in percentage on poverty.” Also: “low in teen births, high on $ spent on K-12 education, high $ for teacher salaries, smartest … a higher percentage of people in the 25-44 age group, higher income, high college graduation rate, and higher urbanization.” Map:
(14% of variance): Top five features: “low rates of infections (HIV, STD), high in IQ, low overall crime rates, high in graduates, low in those having no health insurance.” Also: “low in violent crime, healthiest, low in percentage urban … regular church attendance, a high regard for religion, worse overall state economic health, high manufacturing employment, and high farming output.” Map:
To me, factor 1 seems mainly about health, factor 2 seems about left (~forager) idealism — fruit, women’s rights, safety rules, helping the poor, and spending lots on docs and teachers — and factor 3 seems about right (~farmer) idealism — rural, religious, low crime, sexual restraint, make real stuff, finish what you start.
The fact that health is the biggest factor says that health is very important, even beyond its direct benefits. And the fact that health and a tendency to spend on docs are largely independent says that medicine isn’t very important for health, and there should be enough variation among states to study just how important it is.
Here are those 81 state features:
IQ Rank, Smartest, Obesity Rate, Exercise Rate, Church Attendance, Importance of Religion in Daily Life, Percentage Going Hungry, Freedom Index, Tax Burden, Moocher Index, Coincident Index, Pro-Business Index, Gini Index, Farming as a percentage of State GDP, Farming Productivity, Happiness Index, Well-Being Index, Generosity Index, Manufacturing Employment, Manufacturing Output as a percent of State GDP, Teacher Pay Levels, Education $ Spent per Pupil, Percentage 9th Graders Graduating High School, Womens’ Status ranking, Crime Rate – overall, Violent Crime Rate, Speeding – traffic deaths due to speeding, Traffic Deaths – overall, Gasoline Usage per capita, UFO Sightings, Starbucks per capita, Wal-Mart stores per capita, Pollution levels, Cancer deaths per capita, Coronary heart disease per capita, Cardiovascular deaths per capita, Percentage of children under 18 in poverty, Fruit portions eaten per day, Outcome Disparity within state, Percentage reporting Poor Health, Infectious disease rate, Percentage with No Health Insurance, Unnecessary hospital visits per capita, Primary Care Physicians per capita, Public Health $ per capita, Mortality rate, Autism per capita, Teen Birth rate, White Mortality rate, Black Mortality rate, Occupational Death rate, Years of Potential Life Lost (YPLL), Healthiest, Binge Drinking rate, Smoking percentage, Under-employed percentage, Latitude, Longitude, Urban percentage, Census Region, Census Division, Population Density, Square Miles, Unemployment rate, Poverty Percentage, Income per capita, Female percentage, White percentage, Black percentage, Percentage 0-17 years, Percentage 18-24 years, Percentage 25-44 years, Percentage 45-65 years, Percentage 65+ years, High School Graduation rate, College Graduation rate, Alcohol Use per capita, Smoking Rate per capita, Births per capita, Men Registered to Vote, Women Registered to Vote.
If I was going to name the given factors they would be (1) HA = health awareness, (2) EFF= expensive family formation EFF, (3) and C = conscientiousness.
HA is the least interesting to me from a modeling perspective because it appears to be significantly "cultural" in a kind of geographically arbitrary way. However, from the perspective of "changing your mind and behaviors to get a better outcome" it seems like health awareness is the place to focus.
EFF makes sense in terms of being liberal/educated/urbanized and geographically it appears to be happening in areas where the cities are crammed together or pushed against a border, a great lake, or an ocean. If "housing costs" were taken into account I'd expect it to show up as a factor because as housing costs go up, family formation is more expensive, new humans are harder to make, more attention is paid to investing in the relatively less numerous kids, and you need a paying job in order to afford to stay there during retirement.
C looks like a difference between "ice people" and "sun people" to me. The issues that contribute to the "conscientiousness" label are church, crime, and school completion. I wouldn't be surprised if the geographic distributions have a lot to do seasonal affective disorder and snow (neither of which are available to contribute to the factor, but I'd predict that they would be part of it if they were available, another good factor would be per capita hours of air conditioning).
Louisiana and Mississippi are cheap places with sun people. California is expensive with sun people. Montana is cheap with ice people. The only combination that doesn't exist is top quartile in both expense and coldness, but Wisconsin and Michigan are examples that are close to that combination.
Interpreting EFF and C as opposite ends of the same "farmer-forager" axis seems sloppy to me. States like Montana and California fit the single axis model with their opposite extremes, but the states that are high or low in both EFF and C (like Michigan or Mississippi) give lie to the single axis model.
When you put up something like those maps, how about doing it so that the key isn't unreadably small?