Transcription of: Machine Learning Fundamentals: Sensitivity and Specificity

when the well runs dry you might be thirsty but this still StatQuest you can watch it StatQuest hello I'm Josh Starmar and welcome to StatQuest today we're gonna continue our series on machine learning fundamentals and we're going to talk about sensitivity and specificity they're gonna be clearly explained this StatQuest follows up on the one that describes the confusion matrix so if you're not already down with that check out the quest the first half of this video will explain how to calculate and interpret sensitivity and specificity when you have a confusion matrix with two rows and two columns and the second half will show you how to calculate and interpret sensitivity and specificity when you have three or more rows and columns even if you're already down with the confusion matrix let's remember that rows correspond to what was predicted and columns correspond to the known truth when there are only two categories to choose from in this case the two choices were has heart disease or does not have heart disease then the top left-hand corner contains the true positives true positives are patients that had heart disease that were also predicted to have heart disease true negatives are in the bottom right hand corner true negatives are patients that did not have heart disease and were predicted not to have heart disease the bottom left-hand corner contains the false negatives false negatives are when a patient has heart disease but the prediction said they didn't lastly the top right hand corner contains the false positives false positives are patients that do not have heart disease but the prediction says that they do once we filled out the confusion matrix we can calculate two useful metrics sensitivity and specificity in this case sensitivity tells us what percentage of patients with heart disease were correctly identified sensitivity is the true positives divided by the sum of the true positives and the false negatives specificity tells us what percentage of patients without heart disease were correctly identified specificity are the true negatives divided by the sum of the true negatives and the false positives in the StatQuest on the confusion matrix we applied logistic regression to a testing data set and ended up with this confusion matrix let's start by calculating sensitivity for this logistic regression here's the formula for sensitivity and for true positives we plug in 139 and for false negatives we plug in 32 when we do the math we get zero point eight one sensitivity tells us that 81% of the people with heart disease were correctly identified by the logistic regression model now let's calculate the specificity here's the formula for specificity and for true negatives we will plug in 112 and for false positives we will plug in 20 when we do the math we get 0.85 specificity tells us that 85% of the people without heart disease were correctly identified by the logistic regression model now let's calculate sensitivity and specificity for the random forest model that we used in the confusion matrix StatQuest here's the confusion matrix here's the formula for sensitivity and when we plug in the numbers we get zero point eight three here's the formula for specificity and when we plug in the numbers we get zero point eight three again now we can compare the sensitivity and specificity values that we calculated for the logistic regression to the values we calculated for the random forest sensitivity tells us that the random forest is slightly better at correctly identifying positives which in this case are patients with heart disease specificity tells us that logistic regression is slightly better correctly identifying negatives which in this case are patients without heart disease we would choose the logistic regression model if correctly identifying patients without heart disease was more important than correctly identifying patients with heart disease alternatively we would choose the random forest model if correctly identifying patients with heart disease was more important than correctly identifying patients without heart disease BAM in the confusion matrix stat quest we calculated this confusion matrix when we tried to predict someone's favorite movie now let's talk about how to calculate sensitivity and specificity when we have a confusion matrix with three rows and three columns the big difference when calculating sensitivity and specificity for larger confusion matrices is that there are no single values that work for the entire matrix instead we calculate a different sensitivity and specificity for each category so for this confusion matrix we'll need to calculate sensitivity and specificity for the movie troll 2 for the movie Gore police and for the movie cool as ice let's start by calculating sensitivity for troll 2 for troll 2 there were 12 true positives people that were correctly predicted to love troll 2 more than Gore police and cool as ice so for true positives we'll plug in 12 and there were 112 plus 83 which equals 195 false negatives people that love to troll 2 but were predicted to love Gore police or cool as ice so for false negatives will plug in 195 and when we do the math we get 0.06 sensitivity for troll 2 tells us that only 6% of the people that loved the movie troll 2 more than Gore police or cool as ice were correctly identified now let's calculate the specificity for troll 2 there were 23 plus 77 plus 92 plus 17 equals 209 true negatives people that were correctly predicted to like Gore police or cool as ice more than troll 2 so for true negatives will plug in 209 and there were 102 plus 93 equals 195 false positives people that loved gore police or cool as ice the most but were predicted to love troll 2 so for false positives will plug in 195 and when we do the math we get 0.52 specificity for troll 2 tells us that 52% of the people who loved Gore police or cool as ice more than troll 2 were correctly identified calculating sensitivity and specificity for Gore police is very similar let's start by calculating sensitivity there are 23 true positives people that were correctly predicted to love Gore police the most and 102 plus 92 equals 194 false negatives people who loved Gore police the most but were predicted to love troll 2 or cool as ice more when we do the math we get 0.11 sensitivity for Gore police tells us that only 11% of the people that loved Gore police were correctly identified now let's calculate specificity there were 12 plus 93 plus 83 plus 17 equals 205 true negatives people correctly identified as loving troll 2 or cool as ice more than gore police and 112 plus 77 equals 189 false positives people predicted to love gore police even though they didn't and when we do the math we get 0.52 specificity for gore police tells us that 52% of the people that loved troll 2 or cool as ice more than gore police were correctly identified lastly calculating sensitivity and specificity for cool as ice follows the same steps we identify the true positives the false positives the true negatives and the false negatives and then plug in the numbers first for sensitivity then for specificity double bam if we had a confusion matrix with four rows and four columns then we would have to calculate sensitivity and specificity for four different categories little bam in summary sensitivity equals the true positives divided by the sum of the true positives and the false negatives and specificity equals the true negatives divided by the sum of the true negatives and the false positives we can use sensitivity and specificity to help us decide which machine learning method would be best for our data if correctly identifying positives is the most important thing to do with the data we should choose a method with higher sensitivity if correctly identifying negatives is more important then we should put more emphasis on specificity hooray we've made it to the end of another exciting StatQuest if you like this StatQuest and want to see more please subscribe and if you want to support stack quest well consider buying one or two of my original songs alright until next time quest on