Construyo unos datos (artificiales, para conocer la verdad):
n <- 10000 x1 <- rnorm(n) x2 <- rnorm(n) probs <- -2 + x1 + x2 probs <- 1 / (1 + exp(-probs)) y <- sapply(probs, function(p) rbinom(1, 1, p)) dat <- data.frame(y = y, x1 = x1, x2 = x2) Construyo un modelo de clasificación (logístico, que hoy no hace falta inventar, aunque podría ser cualquier otro):
summary(glm(y ~ x1 + x2, data = dat, family = binomial)) #Call: #glm(formula = y ~ x1 + x2, family = binomial, data = dat) # #Deviance Residuals: # Min 1Q Median 3Q Max #-2.2547 -0.5967 -0.3632 -0.1753 3.3528 # #Coefficients: # Estimate Std. Error z value Pr(>|z|) #(Intercept) -2.05753 0.03812 -53.97 <2e-16 *** #x1 1.01918 0.03386 30.10 <2e-16 *** #x2 1.00629 0.03405 29.55 <2e-16 *** #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #(Dispersion parameter for binomial family taken to be 1) # # Null deviance: 9485.2 on 9999 degrees of freedom #Residual deviance: 7373.4 on 9997 degrees of freedom #AIC: 7379.4 # #Number of Fisher Scoring iterations: 5 Correcto.
...