Tuesday, July 10, 2018

Managing Predictions with Bayes Theorem


I have been reading Nate Silver's book The Signal and the Noise, about how hard it is to make predictions even when we have complete data. I tend to be a slow reader so I've been reading it for a while.   Finally I got to the chapter on his years as a professional poker player.  He talked about how players like him get overconfident if they start out winning.  Poker is a game of skill and luck.  The great player are lucky and good.  He argues is a thing that can be managed according to Bayes theorem.



In layman's terms, Bayes theorem says that people bring their preconceived notions to a situation and then update their notions based on what happens in that situation.  The image above shows the mathematical formula for the theorem.  If we know the probability of event B occurring given that event A has occurred, If we know the overall (or marginal) probabilities of events A and B occurring separately from each other we can calculate the probability of event A occurring given that B has occurred.  An example of this theorem is shown below.
table1.PNG
The table above (that I used in my first post on data driven journalism) shows the 2016 primary wins for Clinton, Trump, and Sanders.  The probability of Trump winning a state given that Clinton has won in the other party's contest in that state is 25 out of 29 or 86%.  We can use Bayes Theorem to find the probability that Clinton wins the state given that Trump has won.
  

First we need the overall probability of Trump winning the state overall which is 37 out of 51 contests (DC is included) or 73%.  Next we need the probability of Clinton winning a state which is 29 out of 51 contests or 57%.  We can now plug those numbers into Bayes theorem as in the above image.  The percents were converted to decimals for computation sake.  

We update our knowledge of the probability of Trump winning given that Clinton won in the other party with the overall probability of Trump and Clinton winning to find the probability of Clinton winning given that Trump has won.  That number is 67% which is considerable lower than the original 83%.  This is because Trump was more likely to win on his party's side than Clinton was in her party.

We can calculate these probabilities easily after the contests are over but it is different making these predictions beforehand.  Nate Silver was wrong about how many primaries Bernie Sanders would win and who would win the general election after correctly predicting the winner in 2012.  How will he update his predicting model for the 2018 and 2020 elections?  Time will tell.

**Related Posts**

Don’t test me: Using Fisher’s exact test to unearth stories about statistical relationships (Repost)


Friday, June 29, 2018

Events in Annapolis Coincide with Posting on Local Papers


I was planning a post on a published study that showed how cities and towns with local newspapers have greater government efficiency.  Events in Annapolis, MD yesterday seem to have added significance to the post.  A gunman who was angry at the Capitol Gazette for reporting on his harassing of a woman, went into their office and killed 5 of their staff.  

The study I am citing was inspired by an episode of John Oliver's show Last Week Tonight from three years ago (which can be seen above) about the decline of newspapers.  The researchers found a correlation between the lack of a local print journalism outlet and a 5 to 11% increase in municipal borrowing.  This underscores the valuable service that these papers provide.  The whole study can be read here.


Newspapers have been in decline for decades as the internet and other media have crowded them out.  Yesterdays incident brings an added dimension to the difficulties that they face.  Newspapers get complaints about the stories that they run all the time with the occasional threat.  This is the worst attack on a western media outlet since the anthrax attacks in 2001 and Charlie Hebdo in Paris in 2014.  Hopefully these attacks will have no effect on the content that these outlets provide.  The element of fear in reporting is a hard thing to regulate however.

Independent blogs like mine try to fill the void by providing my own take on the news with my own findings thrown in.  But I am one person.  I do not have the resources that the newspapers and TV/Radio journalists have or once had since Johannes Gutenberg created the first printing press and Ben Franklin had his print shop. We all keep on keeping on.

**Related Posts**

Amazon, The Washington Post, and New Media



Saturday, June 16, 2018

NHS Membership, Not School Year predicts Prestige in College Admission at McCort


Two years ago I followed up my post on McCort graduating class year and National Honor Society (NHS) membership with the types ofcolleges that they were admitted to.  This year I found a greater percentage of the 2017 graduating class with NHS membership.  I thought I would take a look how it looks for college admission for the class of '17.

I looked at the other two catholic high schools in the Altoona Johnstown Diocese, Bishop Guilfoyle and Bishop Carroll in their NHS membership.  Bishop Guilfoyle did not publish it's graduating class on its website but it does say "75% achieve GPA exceeding the academic criteria set by the National Honor Society."  The criteria there is 94%.  Bishop Carroll did post it's graduating class and 8 out of 57 graduates were NHS members or 14% which is close to McCort's class of 1987 %.  I couldn't find their criteria for NHS membership.

I looked at McCort's class of '17 graduate profile and compared it to the class of '16 and my class of '88 as to which types of colleges they were admitted to.  I used the same classification I used 2 years ago with US News and World Report's rating system described below.  The results of the table can be seen below.

I categorized schools according to the US News score.  An elite school had a score of 61-100 (U. of Penn, Johns-Hopkins) a second tier school had a score of 35-60 (Pitt, Penn State), a third tier four year school had a score of 34 or below or were unranked (Indiana (PA) or IUP as we in PA call it, Pitt-Johnstown or UPJ).  Community colleges, jr colleges, or advanced technical schools were 4th tier.  Those who went into the military or were employed were placed in the 5th tier and those who were undecided or deferred for a year were placed in tier 6.  This classification is totally mine and you are welcome to disagree with it.  The listing of all schools, their 2016 US News score (if available), the classification, the school considered, and the number of students going to that school from each class are presented at the bottom of this post.


College rank * NHS * Year Crosstabulation
Year
NHS (%)
Total
n
y
1988
College
Rank
1
6(46.2%)
7(53.8%)
13
2
30(61.2%)
19(38.8%)
49
3
54(91.5%)
5(8.5%)
59
4
9(81.8%)
2(18.2%)
11
5
11(91.7%)
1(8.3%)
12
6
4(100.0%)
0
4
Total
114(77.0%)
34(23.0%)
148
2016
College
Rank
1
2(40.0%)
3(60.0%)
5
2
26(53.1%)
23(46.9%)
49
3
24(82.8%)
5(17.2%)
29
4
6(100.0%)
0
6
5
1(100.0%)
0
1
6
7(100.0%)
0
7
Total
66(68.0%)
31(32.0%)
97
2017
College
rank
1
1(14.3%)
6(85.7%)
7
2
20(43.5%)
26(56.5%)
46
3
12(60.0%)
8(40.0%)
20
4
6(100.0%)
0
6
5
5(100.0%)
0
5
6
3(100.0%)
0
3
Total
47(54.0%)
40(46.0%)
87
Total
College rank
1
9(36.0%)
16(64.0%)
25
2
76(52.8%)
68(47.2%)
144
3
90(83.3%)
18(16.7%)
108
4
21(91.3%)
2(8.7%)
23
5
17(94.4%)
1(5.6%)
18
6
14(100.0%)
0
14
Total
227 (68.4%)
105(31.6%)
332


A statistical analysis of these numbers shows that a higher percentage of NHS members were admitted to top and second tier schools.  When school class and NHS were entered into an ordinal logistic regression model NHS membership but not graduating year predicted the type of college that students were admitted to.  A higher percentage of NHS membership in a class predicts a higher % in an upper tier college.  In the last two years McCort had a student admitted to the University of Pennsylvania and Cornell respectively.

McCort has admitted students from China.  I counted 10 in the 2017 graduating class based on their name.  Five of these were NHS members which is close to the overall class rate of 46%.  

Things like the SAT are meant to quantify a students raw ability regardless of where they went to school.  College admissions consider a variety of factors in their decision making process. If they believed that NHS membership was not warranted, the upper tier schools would not admit that student.


**Related Posts**

Testing Fairness, Outliers, and Racism