![]() The idea is that if there is no correlation between the variables, you will get the same ratio of true positives and true negatives for all values of $x$, nevertheless, if there is good correlation (and the same stands for anti-correlation) the ratio of true positives to true negatives will strongly vary as $x$ varies. Repeating the procedure explained above, from min($x$) to max($x$) you will generate the true positive and the false positive rates and then you can plot them like in the figure below and you can calculate the Area Under the Curve. Compare this to the real labels and get the number of true positives and false positives of your prediction. you choose 7, then above $x$=7 are all female (1) and below $x$=7 all male (0). See how many True Positives and False Positives do you get if you choose a value of $x$ as being the threshold between positives and negatives (or male and female) and you compare this to the real labels.įor e.g. Please comment on any error or wrong interpretation so I can change it. I am not an expert in this so I try to keep it simple. But I am not sure what that is called, if it has a name.įor the specified problem, measuring the Area Under the Curve of a Receiver Operator Characteristic curve might help. That is one reasonable measure of correlation! (If there are only a few ties, just ignore them). Can we estimate $\theta$ from our sample? Form all pairs $(X_i, Y_j)$ (assume no ties) and count for how many we have "man is larger" ($X_i > Y_j$)($M$) and for how many "woman is larger" ($ X_i < Y_j$) ($W$). Where $X$ is a random draw among men, $Y$ among women. Now, if the distribution of $X$ and of $Y$ are the same, then $P(X>Y)$ will be 0.5 (let's assume the distribution is purely absolutely continuous, so there are no ties). Let $X_1, \dots, X_n$ be the observations of the continuous variable among men, $Y_1, \dots, Y_m$ same among women. It would be simpler (more interpretable) to simply compare the means! Another approach is the following. Then $\rho$ will become basically some rescaled version of the mean ranks between the two groups. If you replace rank with mean rank, then you will get only two different values, one for men, another for women. Since there are only two possible values for the indicator $I$, there will be a lot of ties, so this formula is not appropriate. ![]() Then Spearman's $\rho$ is calculated based on the ranks of $Z, I$ respectively. Here is one version of that: Let the data be $(Z_i, I_i)$ where $Z$ is the measured variable and $I$ is the gender indicator, say it is 0 (man), 1 (woman). The reviewer should have told you why the Spearman $\rho$ is not appropriate. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |