Saturday, April 8, 2017

New Uninsured Estimates Improve Precision of Trump-Hate Group Model (but there's multicollinarity)

The Census bureau just released it's annual Small Area Health Insurance Estimates (SAHIE) for every state and county in the US in 2015. With all of the discussion of repealing and replacing the Affordable Care Act I thought I would take a look at how the state level uninsured rates correlated with the % of the vote that Trump received in 2016.  The graph below shows a statistically significant positive relationship with 23% of the variability accounted for by this relationship.  If 100% of the variability were accounted for, all of the states would form a perfect straight line sloping upward like the red dots on the graph below.  The red dote represent the predicted values for the regression equation:

Trump % = % Uninsured * 1.3% + 36.1%

This equation states that a state with 0% uninsured (which doesn't exist) Trump would receive 36.1% of the vote with every 1% increase in the uninsured rate giving him an increase of 1.34% of the vote.  This univariate model predicted the % of the vote of some states better than others.  For the next step I took a look at whether this relationship would hold up if I added % uninsured to the model I created for hate group rates and Trump's % of the vote.


When I added the % uninsured to the model, I thought took a look at how it would perform in a model with hate group rate, % in poverty, and % with a bachelor's degree or higher.  The full model gives the following estimates.

Trump % = %Uninsured * 0.6% - %Poverty*1.4% + Hate Groups*1.4% - %bachelors*1.6% + 105.6%

The intercept for this model suggests that for a state with values of zero for each of the predictors Trump would receive 105.6% of the vote which is impossible.  The slopes for the predictors % poverty also contradicts the univariate analysis for this variable.  This suggests a problem with multicollinearity which biases the regression coefficients.  The change in the % of the variance explained suggests that the predictors are statistically significant.  I tried to convert the values of the predictors and omit predictors with little success.  To find regression coefficients that are more realistic, I need to use a regression method that is less susceptible to multicollinearity called ridge regression.  


Ridge regression is a biased regression method with a penalty term lambda which shrinks the variability of the estimates.  In an iterative process a value of 40 was chosen for lambda.  This method provided the following estimates for the regression slopes.

 Trump % = %Uninsured*0.4+ %Poverty*0.02% + Hate Groups*0.9% - %bachelors*0.6% + 60.8%

These estimates are closer to what I found univariately and account for 68% of the variability in Trump's % of the vote.  The above chart shows the relationship between % uninsured and Trump's % of the vote for the regular regression model.  The predicted values are closer to the actual values for this model.  The ridge coefficient estimates suggest that the concentration of hate groups have the strongest predictive effect on Trump's % of the vote followed by the % of the population with a bachelor's degree, the % uninsured, and the % in poverty.  

**Update**

Below is the raw data used in this analysis.

State Name
Hate groups 2016
Pop 2016
Hate groups per million '16
% in poverty
% uninsured
% bachelors degree or higher
Trump %
Alabama
27
4863300
5.55
18.5
11.9
15.4
62.9
Alaska
0
741894
0
10.4
16.3
29.7
52.9
Arizona
18
6931071
2.6
17.4
12.8
27.7
49.5
Arkansas
16
2988248
5.35
18.7
11.1
21.8
60.4
California
79
39250017
2.01
15.4
9.7
32.3
32.7
Colorado
16
5540545
2.89
11.5
9.2
39.2
44.4
Connecticut
5
3576452
1.4
10.6
6.9
38.3
41.2
Delaware
4
952065
4.2
12.6
6.9
30.9
41.9
Florida
63
20612439
3.06
15.8
16.3
28.4
49.1
Georgia
32
10310371
3.1
17.2
15.8
29.9
51.3
Hawaii
0
1428557
0
10.7
4.8
31.4
30
Idaho
12
1683140
7.13
14.7
12.8
26
59.2
Illinois
32
12801539
2.5
13.6
8.2
32.9
39.4
Indiana
26
6633053
3.92
14.4
11.3
24.9
57.2
Iowa
4
3134693
1.28
12.1
5.9
26.8
51.8
Kansas
7
2907289
2.41
12.9
10.5
31.7
57.2
Kentucky
23
4436974
5.18
18.3
7.1
23.3
62.5
Louisiana
14
4681666
2.99
19.5
13.8
23.2
58.1
Maine
3
1331479
2.25
13.2
10.3
30.1
45.2
Maryland
18
6016447
2.99
9.9
7.4
38.8
35.3
Massachusetts
12
6811779
1.76
11.5
3.2
41.5
33.5
Michigan
28
9928300
2.82
15.7
7.2
27.8
47.6
Minnesota
10
5519952
1.81
10.2
5.2
34.7
45.4
Mississippi
18
2988726
6.02
22.1
14.8
20.8
58.3
Missouri
24
6093000
3.94
14.8
11.5
27.8
57.1
Montana
10
1042520
9.59
14.4
14.2
30.6
56.5
Nebraska
5
1907116
2.62
12.2
9.4
30.2
60.3
Nevada
4
2940058
1.36
14.9
14.1
23.6
45.5
New Hampshire
6
1334795
4.5
8.4
7.8
35.7
47.2
New Jersey
15
8944469
1.68
10.8
10
37.6
41.8
New Mexico
2
2081015
0.96
19.8
13.1
26.5
40
New York
47
19745289
2.38
15.5
8.2
35
37.5
North Carolina
31
10146788
3.06
16.4
13
29.4
50.5
North Dakota
1
757952
1.32
10.7
8.7
29.1
64.1
Ohio
35
11614373
3.01
14.8
7.7
26.8
52.1
Oklahoma
6
3923561
1.53
16
16.1
24.6
65.3
Oregon
11
4093465
2.69
15.2
8.4
32.2
41.1
Pennsylvania
40
12784227
3.13
13.1
7.6
29.7
48.8
Rhode Island
1
1056426
0.95
14.1
6.7
32.7
39.8
South Carolina
12
4961119
2.42
16.8
13
26.8
54.9
South Dakota
7
865454
8.09
13.5
11.8
27.5
61.5
Tennessee
38
6651194
5.71
16.7
12
25.7
61.1
Texas
55
27862596
1.97
15.9
19.2
28.4
52.6
Utah
3
3051217
0.98
11.2
11.6
31.8
45.9
Vermont
1
624594
1.6
10.4
4.7
36.9
32.6
Virginia
39
8411808
4.64
11.2
10.4
37
45
Washington
21
7288000
2.88
12.2
7.6
34.2
38.2
West Virginia
4
1831102
2.18
18
7.3
19.6
68.7
Wisconsin
9
5778708
1.56
12.1
6.6
28.4
47.9
Wyoming
2
585501
3.42
10.6
13.4
26.2
70.1