Clase 23 - Inferencia Estadística

Estadística

•
SIN SIGLA

tecnologo
15/10/2023
¡Este material tiene más páginas!
Entonces, ¿te gustó este material?
Ayude a animar a otros estudiantes a mejorar el contenido
¿Te gustó este material? ¡Compartir! 🧡
Estadística

5644 Materiales compartidos
Descarga la aplicación para disfrutar aún más
Lea materiales sin conexión, sin usar Internet. Además de muchas otras características!
Vista previa del material en texto
Inferencia Estad́ıstica
Lućıa Babino
Universidad Torcuato Di Tella
1 / 29
Bibliograf́ıa para esta clase
ISLR (https://www.statlearning.com/), cap 3.2 - sec. 3.2.1 y 3.2.1
(sin subsecciones “Two: Deciding on Important Variables” y
“Four: Predictions”)
2 / 29
Repaso
3 / 29
Modelo de Regresión Lineal Múltiple
Y = β0 + β1X1 + · · ·+ βpXp + ϵ
donde
Y : variable de respuesta
X1, . . . , Xp : covariables / variables explicativas o predictoras
ϵ : término del error
4 / 29
Regresión Lineal Múltiple en el ejemplo
Y = β0 + β1X1 + β2X2 + β3X3 + ϵ
donde
Y : sales
X1 : TV
X2 : radio
X3 : newspaper
ϵ : término del error
5 / 29
Modelo de Regresión Lineal Múltiple con supuestos
Yi = β0 + β1xi1 + · · ·+ βpxip + ϵi 1 ≤ i ≤ n
Supuestos:
1 ϵ1, . . . , ϵn independientes
2 E(ϵi) = 0 ∀i
3 V(ϵi) = σ2 ∀i
4 ϵi es normal ∀i
O equivalentemente
ϵ1, . . . , ϵn ∼ N (0, σ2) i.i.d.
En el ej.: Yi = β0 + β1xi1 + β2xi2 + β3xi3 + ϵi, 1 ≤ i ≤ 200
donde
Yi = ventas de i-ésimo mercado
xi1 = inversión en TV en i-ésimo mercado
xi2 = inversión en radio en i-ésimo mercado
xi3 = inversión en diario en i-ésimo mercado 6 / 29
Interpretación de los coeficientes en el ejemplo
1 ajuste_mult <- lm(sales ~ TV + radio + newspaper , data =
datos)
2 summary(ajuste_mult)
3 # Coefficients:
4 # Estimate Std. Error t value Pr(>t)
5 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
6 # TV 0.045765 0.001395 32.809 <2e-16 ***
7 # radio 0.188530 0.008611 21.893 <2e-16 ***
8 # newspaper -0.001037 0.005871 -0.177 0.86
9 # ---
10 # Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’
0.1 ’ ’ 1
11 #
12 # Residual standard error: 1.686 on 196 degrees of freedom
13 # Multiple R-squared: 0.8972 , Adjusted R-squared:
0.8956
14 # F-statistic: 570.3 on 3 and 196 DF, p-value: < 2.2e-16
β̂1 = 0.0458 → cuando incrementamos la inversión en TV en
$1000 y mantenemos fijas las inversiones en radio y diario, las
ventas esperadas aumentan aproximadamente 458 unidades.
7 / 29
Clase de hoy
8 / 29
Interpretación de los coeficientes en el ejemplo
Cambio de notación:
xi1 = ti inversión en TV en i-ésimo mercado
xi2 = ri inversión en radio en i-ésimo mercado
xi3 = di inversión en diario en i-ésimo mercado
Yi = β0 + β1ti + β2ri + β3di + ϵi
⇒
E(Yi) = β0 + β1ti + β2ri + β3di
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
E(Y )(t+1,r,d) = β0 + β1(t+ 1) + β2r + β3d
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
⇒ E(Y )(t+1,r,d) − E(Y )(t,r,d) = β1
⇒ β1 representa el aumento en la media de las ventas cuando la inversión en
TV aumenta en $1000 y las inversiones en radio y diario se mantienen fijas.
(idem con β2 y β3)
9 / 29
Interpretación de los coeficientes en el ejemplo
Cambio de notación:
xi1 = ti inversión en TV en i-ésimo mercado
xi2 = ri inversión en radio en i-ésimo mercado
xi3 = di inversión en diario en i-ésimo mercado
Yi = β0 + β1ti + β2ri + β3di + ϵi ⇒
E(Yi) = β0 + β1ti + β2ri + β3di
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
E(Y )(t+1,r,d) = β0 + β1(t+ 1) + β2r + β3d
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
⇒ E(Y )(t+1,r,d) − E(Y )(t,r,d) = β1
⇒ β1 representa el aumento en la media de las ventas cuando la inversión en
TV aumenta en $1000 y las inversiones en radio y diario se mantienen fijas.
(idem con β2 y β3)
9 / 29
Interpretación de los coeficientes en el ejemplo
Cambio de notación:
xi1 = ti inversión en TV en i-ésimo mercado
xi2 = ri inversión en radio en i-ésimo mercado
xi3 = di inversión en diario en i-ésimo mercado
Yi = β0 + β1ti + β2ri + β3di + ϵi ⇒
E(Yi) = β0 + β1ti + β2ri + β3di
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
E(Y )(t+1,r,d) = β0 + β1(t+ 1) + β2r + β3d
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
⇒ E(Y )(t+1,r,d) − E(Y )(t,r,d) = β1
⇒ β1 representa el aumento en la media de las ventas cuando la inversión en
TV aumenta en $1000 y las inversiones en radio y diario se mantienen fijas.
(idem con β2 y β3)
9 / 29
Interpretación de los coeficientes en el ejemplo
Cambio de notación:
xi1 = ti inversión en TV en i-ésimo mercado
xi2 = ri inversión en radio en i-ésimo mercado
xi3 = di inversión en diario en i-ésimo mercado
Yi = β0 + β1ti + β2ri + β3di + ϵi ⇒
E(Yi) = β0 + β1ti + β2ri + β3di
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
E(Y )(t+1,r,d) = β0 + β1(t+ 1) + β2r + β3d
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
⇒ E(Y )(t+1,r,d) − E(Y )(t,r,d) = β1
⇒ β1 representa el aumento en la media de las ventas cuando la inversión en
TV aumenta en $1000 y las inversiones en radio y diario se mantienen fijas.
(idem con β2 y β3)
9 / 29
Interpretación de los coeficientes en el ejemplo
Cambio de notación:
xi1 = ti inversión en TV en i-ésimo mercado
xi2 = ri inversión en radio en i-ésimo mercado
xi3 = di inversión en diario en i-ésimo mercado
Yi = β0 + β1ti + β2ri + β3di + ϵi ⇒
E(Yi) = β0 + β1ti + β2ri + β3di
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
E(Y )(t+1,r,d) = β0 + β1(t+ 1) + β2r + β3d
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
⇒ E(Y )(t+1,r,d) − E(Y )(t,r,d) = β1
⇒ β1 representa el aumento en la media de las ventas cuando la inversión en
TV aumenta en $1000 y las inversiones en radio y diario se mantienen fijas.
(idem con β2 y β3)
9 / 29
Interpretación de los coeficientes en el ejemplo
Cambio de notación:
xi1 = ti inversión en TV en i-ésimo mercado
xi2 = ri inversión en radio en i-ésimo mercado
xi3 = di inversión en diario en i-ésimo mercado
Yi = β0 + β1ti + β2ri + β3di + ϵi ⇒
E(Yi) = β0 + β1ti + β2ri + β3di
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
E(Y )(t+1,r,d) = β0 + β1(t+ 1) + β2r + β3d
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
⇒ E(Y )(t+1,r,d) − E(Y )(t,r,d) = β1
⇒ β1 representa el aumento en la media de las ventas cuando la inversión en
TV aumenta en $1000 y las inversiones en radio y diario se mantienen fijas.
(idem con β2 y β3)
9 / 29
Interpretación de los coeficientes en el ejemplo
Cambio de notación:
xi1 = ti inversión en TV en i-ésimo mercado
xi2 = ri inversión en radio en i-ésimo mercado
xi3 = di inversión en diario en i-ésimo mercado
Yi = β0 + β1ti + β2ri + β3di + ϵi ⇒
E(Yi) = β0 + β1ti + β2ri + β3di
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
E(Y )(t+1,r,d) = β0 + β1(t+ 1) + β2r + β3d
E(Y )(t,r,d) = β0 + β1t+ β2r + β3d
⇒ E(Y )(t+1,r,d) − E(Y )(t,r,d) = β1
⇒ β1 representa el aumento en la media de las ventas cuando la inversión en
TV aumenta en $1000 y las inversiones en radio y diario se mantienen fijas.
(idem con β2 y β3)
9 / 29
Predicción
¿Cómo prediŕıamos las ventas (o como estimaŕıamos las ventas
medias) si invertimos $50 mil en TV, $100 mil en radio y $10 en
mil en diario?
1 # Coefficients:
2 # Estimate Std. Error t value Pr(>t)
3 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
4 # TV 0.045765 0.001395 32.809 <2e-16 ***
5 # radio 0.188530 0.008611 21.893 <2e-16 ***
6 # newspaper -0.001037 0.005871 -0.177 0.86
2.939 + 0.046 ∗ 50 + 0.189 ∗ 100− 0.001 ∗ 10 = 24.129
Ventas predichas: 24129 unidades
10 / 29
Predicción
¿Cómo prediŕıamos las ventas (o como estimaŕıamos las ventas
medias) si invertimos $50 mil en TV, $100 mil en radio y $10 en
mil en diario?
1 # Coefficients:
2 # Estimate Std. Error t value Pr(>t)
3 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
4 # TV 0.045765 0.001395 32.809 <2e-16 ***
5 # radio 0.188530 0.008611 21.893 <2e-16 ***
6 # newspaper -0.001037 0.005871 -0.177 0.86
2.939 + 0.046 ∗ 50 + 0.189 ∗ 100− 0.001 ∗ 10 = 24.129
Ventas predichas: 24129 unidades
10 / 29
Estimación de los coeficientes (caso p = 2)
Y = β0 + β1X1 + β2X2 + ϵ
Idea: hallar el plano que
más se acerque a los
puntos.
Calculamos (β̂0, β̂1, β̂2) que minimice
L(b0, b1, b2) =
n∑
i=1
[Yi − (b0 + b1xi1 + b2xi2)]2
↑
mide la distancia de los puntos al plano y = b0 + b1x1 + b2x2.
11 / 29
Estimación de los coeficientes (caso p = 2)
Y = β0 + β1X1 + β2X2 + ϵ
Idea: hallar el plano que
más se acerque a los
puntos.
Calculamos (β̂0, β̂1, β̂2) que minimice
L(b0, b1, b2) =
n∑
i=1
[Yi − (b0 + b1xi1 + b2xi2)]2
↑
mide la distancia de los puntos al plano y = b0 + b1x1 + b2x2.
11 / 29
Estimación de los coeficientes (casop = 2)
Y = β0 + β1X1 + β2X2 + ϵ
Idea: hallar el plano que
más se acerque a los
puntos.
Calculamos (β̂0, β̂1, β̂2) que minimice
L(b0, b1, b2) =
n∑
i=1
[Yi − (b0 + b1xi1 + b2xi2)]2
↑
mide la distancia de los puntos al plano y = b0 + b1x1 + b2x2.
11 / 29
Estimación de los coeficientes (caso p = 2)
Y = β0 + β1X1 + β2X2 + ϵ
Idea: hallar el plano que
más se acerque a los
puntos.
Calculamos (β̂0, β̂1, β̂2) que minimice
L(b0, b1, b2) =
n∑
i=1
[Yi − (b0 + b1xi1 + b2xi2)]2
↑
mide la distancia de los puntos al plano y = b0 + b1x1 + b2x2.
11 / 29
Estimación de los coeficientes por ḿınimos cuadrados
En el caso p gral., calculamos (β̂0, β̂1, . . . , β̂p) que minimice
L(b0, b1, . . . , bp) =
n∑
i=1
[Yi − (b0 + b1xi1 + b2xi2 + · · ·+ bpxip)]2
Se puede ver que también el ḿınimo se encuentra derivando e
igualando a cero.
Es decir, resolviendo un sistema de (p+ 1) ecuaciones por (p+ 1)
incógnitas → álgebra de matrices (no lo veremos)
12 / 29
Estimación de los coeficientes por ḿınimos cuadrados
En el caso p gral., calculamos (β̂0, β̂1, . . . , β̂p) que minimice
L(b0, b1, . . . , bp) =
n∑
i=1
[Yi − (b0 + b1xi1 + b2xi2 + · · ·+ bpxip)]2
Se puede ver que también el ḿınimo se encuentra derivando e
igualando a cero.
Es decir, resolviendo un sistema de (p+ 1) ecuaciones por (p+ 1)
incógnitas → álgebra de matrices (no lo veremos)
12 / 29
1 # Coefficients:
2 # Estimate Std. Error t value Pr(>t)
3 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
4 # TV 0.045765 0.001395 32.809 <2e-16 ***
5 # radio 0.188530 0.008611 21.893 <2e-16 ***
6 # newspaper -0.001037 0.005871 -0.177 0.86
¿Cómo interpretamos cada coeficiente estimado?
13 / 29
Regresión múltiple vs. simples
1 summary(ajuste_mult)
2 # Estimate Std. Error t value Pr(>t)
3 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
4 # TV 0.045765 0.001395 32.809 <2e-16 ***
5 # radio 0.188530 0.008611 21.893 <2e-16 ***
6 # newspaper -0.001037 0.005871 -0.177 0.86
1 summary(ajusteTV)
2 # Estimate Std. Error t value Pr(>t)
3 # (Intercept) 7.032594 0.457843 15.36 <2e-16 ***
4 # TV 0.047537 0.002691 17.67 <2e-16 ***
5
6 summary(ajusteRadio)
7 # Estimate Std. Error t value Pr(>t)
8 # (Intercept) 9.31164 0.56290 16.542 <2e-16 ***
9 # radio 0.20250 0.02041 9.921 <2e-16 ***
10
11 summary(ajusteNews)
12 # Estimate Std. Error t value Pr(>t)
13 # (Intercept) 12.35141 0.62142 19.88 < 2e-16 ***
14 # newspaper 0.05469 0.01658 3.30 0.00115 **
14 / 29
Regresión múltiple vs. simples
Los coeficientes de TV y radio son parecidos y significativos
en ambas regresiones
El coeficiente de newspaper es
positivo y significativo en la regresión lineal simple
negativo y no significativo en la regresión lineal múltiple.
¿Por qué?
15 / 29
Correlación entre covariables
1 cor(datos)
2 # TV radio newspaper sales
3 # TV 1.00000000 0.05480866 0.05664787 0.7822244
4 # radio 0.05480866 1.00000000 0.35410375 0.5762226
5 # newspaper 0.05664787 0.35410375 1.00000000 0.2282990
6 # sales 0.78222442 0.57622257 0.22829903 1.0000000
plot(datos)
ĉorr(radio, newspaper) = 0.35
16 / 29
Correlación entre covariables
Comandos 1er gráfico:
plot(datos$newspaper, datos$radio, xlab = "newspaper", ylab =
"radio")
abline(lm(radio ∼ newspaper, data = datos))
17 / 29
Moraleja
Asociación ̸= Causalidad
18 / 29
Preguntas importantes
1 ¿Existe alguna asociación entre el presupuesto invertido en
publicidad y las ventas?
¿Qué respondimos hasta ahora?
¿Con qué herramienta?
19 / 29
Preguntas importantes
1 ¿Existe alguna asociación entre el presupuesto invertido en
publicidad y las ventas?
¿Qué respondimos hasta ahora?
¿Con qué herramienta?
19 / 29
Preguntas importantes
1 ¿Existe alguna asociación entre el presupuesto invertido en
publicidad y las ventas?
Modelo:
Y = β0 + β1TV + β2R+ β3D + ϵ
Hipótesis:
Posible solución: hacer 3 test
H
(1)
0 : β1 = 0 vs. H
(1)
1 : β1 ̸= 0
H
(2)
0 : β2 = 0 vs. H
(2)
1 : β2 ̸= 0
H
(3)
0 : β3 = 0 vs. H
(3)
1 : β3 ̸= 0
20 / 29
Preguntas importantes
1 ¿Existe alguna asociación entre el presupuesto invertido en
publicidad y las ventas?
Modelo:
Y = β0 + β1TV + β2R+ β3D + ϵ
Hipótesis:
Posible solución: hacer 3 test
H
(1)
0 : β1 = 0 vs. H
(1)
1 : β1 ̸= 0
H
(2)
0 : β2 = 0 vs. H
(2)
1 : β2 ̸= 0
H
(3)
0 : β3 = 0 vs. H
(3)
1 : β3 ̸= 0
20 / 29
Preguntas importantes
1 ¿Existe alguna asociación entre el presupuesto invertido en
publicidad y las ventas?
Modelo:
Y = β0 + β1TV + β2R+ β3D + ϵ
Hipótesis:
Posible solución: hacer 3 test
H
(1)
0 : β1 = 0 vs. H
(1)
1 : β1 ̸= 0
H
(2)
0 : β2 = 0 vs. H
(2)
1 : β2 ̸= 0
H
(3)
0 : β3 = 0 vs. H
(3)
1 : β3 ̸= 0
20 / 29
Preguntas importantes
1 ¿Existe alguna asociación entre el presupuesto invertido en
publicidad y las ventas?
Modelo:
Y = β0 + β1TV + β2R+ β3D + ϵ
Hipótesis:
Posible solución: hacer 3 test
H
(1)
0 : β1 = 0 vs. H
(1)
1 : β1 ̸= 0
H
(2)
0 : β2 = 0 vs. H
(2)
1 : β2 ̸= 0
H
(3)
0 : β3 = 0 vs. H
(3)
1 : β3 ̸= 0
20 / 29
Preguntas importantes
1 ¿Existe alguna asociación entre el presupuesto invertido en
publicidad y las ventas?
Modelo:
Y = β0 + β1TV + β2R+ β3D + ϵ
Hipótesis:
Posible solución: hacer 3 test
H
(1)
0 : β1 = 0 vs. H
(1)
1 : β1 ̸= 0
H
(2)
0 : β2 = 0 vs. H
(2)
1 : β2 ̸= 0
H
(3)
0 : β3 = 0 vs. H
(3)
1 : β3 ̸= 0
20 / 29
Preguntas importantes
Hipótesis:
H0 : β1 = β2 = β3 = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó β3 ̸= 0
Posible solución: hacer 3 test
H
(1)
0 : β1 = 0 vs. H
(1)
1 : β1 ̸= 0
H
(2)
0 : β2 = 0 vs. H
(2)
1 : β2 ̸= 0
H
(3)
0 : β3 = 0 vs. H
(3)
1 : β3 ̸= 0
Rechazamos H0 cuando
rechazamos H
(1)
0 ó H
(2)
0 ó H
(3)
0
Desventaja: Sup. que cada test “parcial” tiene nivel
α = 0.05, ¿qué nivel tiene el test “global”?
A los sumo 3α = 0.15. En gral, si tenemos p covariables ¿cuál
es el nivel global?
↑
problema de multiple testing
21 / 29
Preguntas importantes
Hipótesis:
H0 : β1 = β2 = β3 = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó β3 ̸= 0
Posible solución: hacer 3 test
H
(1)
0 : β1 = 0 vs. H
(1)
1 : β1 ̸= 0
H
(2)
0 : β2 = 0 vs. H
(2)
1 : β2 ̸= 0
H
(3)
0 : β3 = 0 vs. H
(3)
1 : β3 ̸= 0
Rechazamos H0 cuando rechazamos H
(1)
0 ó H
(2)
0 ó H
(3)
0
Desventaja: Sup. que cada test “parcial” tiene nivel
α = 0.05, ¿qué nivel tiene el test “global”?
A los sumo 3α = 0.15. En gral, si tenemos p covariables ¿cuál
es el nivel global?
↑
problema de multiple testing
21 / 29
Preguntas importantes
Hipótesis:
H0 : β1 = β2 = β3 = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó β3 ̸= 0
Posible solución: hacer 3 test
H
(1)
0 : β1 = 0 vs. H
(1)
1 : β1 ̸= 0
H
(2)
0 : β2 = 0 vs. H
(2)
1 : β2 ̸= 0
H
(3)
0 : β3 = 0 vs. H
(3)
1 : β3 ̸= 0
Rechazamos H0 cuando rechazamos H
(1)
0 ó H
(2)
0 ó H
(3)
0
Desventaja: Sup. que cada test “parcial” tiene nivel
α = 0.05, ¿qué nivel tiene el test “global”?
A los sumo 3α = 0.15. En gral, si tenemos p covariables ¿cuál
es el nivel global?
↑
problema de multiple testing
21 / 29
Preguntas importantes
Hipótesis:
H0 : β1 = β2 = β3 = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó β3 ̸= 0
Posible solución: hacer 3 test
H
(1)
0 : β1 = 0 vs. H
(1)
1 : β1 ̸= 0
H
(2)
0 : β2 = 0 vs. H
(2)
1 : β2 ̸= 0
H
(3)
0 : β3 = 0 vs. H
(3)
1 : β3 ̸= 0
Rechazamos H0 cuando rechazamos H
(1)
0 ó H
(2)
0 ó H
(3)
0
Desventaja: Sup. que cada test “parcial” tiene nivel
α = 0.05, ¿qué nivel tiene el test “global”?
A los sumo 3α = 0.15. En gral, si tenemos p covariables ¿cuál
es el nivel global?
↑
problema de multiple testing
21 / 29
Preguntas importantes
Hipótesis:
H0 : β1 = β2 = β3 = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó β3 ̸= 0
Posible solución: hacer 3 test
H
(1)
0 : β1 = 0 vs. H
(1)
1 : β1 ̸= 0
H
(2)
0 : β2 = 0 vs. H
(2)
1 : β2 ̸= 0
H
(3)
0 : β3 = 0 vs. H
(3)
1 : β3 ̸= 0
Rechazamos H0 cuando rechazamos H
(1)
0 ó H
(2)
0 ó H
(3)
0
Desventaja: Sup. que cada test “parcial” tiene nivel
α = 0.05, ¿qué nivel tiene el test “global”?
A los sumo 3α = 0.15. En gral, si tenemos p covariables ¿cuál
es el nivel global?↑
problema de multiple testing 21 / 29
Test F
Hipótesis:
H0 : β1 = · · · = βp = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó · · · ó βp ̸= 0
Estadistico:
F =
(TSS −RSS)/p
RSS/(n− p− 1)
donde...
TSS =
n∑
i=1
(Yi − Y )2 → Total Sum of Squares
RSS =
n∑
i=1
e2i =
n∑
i=1
(Yi − Ŷi)2 → Residual Sum of Squares
¿Dónde están los β̂j?
22 / 29
Test F
Hipótesis:
H0 : β1 = · · · = βp = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó · · · ó βp ̸= 0
Estadistico:
F =
(TSS −RSS)/p
RSS/(n− p− 1)
donde...
TSS =
n∑
i=1
(Yi − Y )2 → Total Sum of Squares
RSS =
n∑
i=1
e2i =
n∑
i=1
(Yi − Ŷi)2 → Residual Sum of Squares
¿Dónde están los β̂j?
22 / 29
Test F
Hipótesis:
H0 : β1 = · · · = βp = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó · · · ó βp ̸= 0
Estadistico:
F =
(TSS −RSS)/p
RSS/(n− p− 1)
¿Por qué nos sirve de estad́ıstico?
A mayor F , menor RSS, o sea mejor es el ajuste ⇒
rechazaremos H0 cuando F sea grande. (F ≥ 0)
Distribución del estadistico bajo H0:
Bajo H0, F ∼ Fp, n−p−1 → “Distribución F con p y n− p− 1
grados de libertad”
Región de rechazo:
R = {F > Fp, n−p−1, α}
p-valor = P(Fp, n−p−1 ≥ Fobs) con Fp, n−p−1 ∼ Fp, n−p−1
23 / 29
Test F
Hipótesis:
H0 : β1 = · · · = βp = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó · · · ó βp ̸= 0
Estadistico:
F =
(TSS −RSS)/p
RSS/(n− p− 1)
¿Por qué nos sirve de estad́ıstico?
A mayor F , menor RSS, o sea mejor es el ajuste ⇒
rechazaremos H0 cuando F sea grande. (F ≥ 0)
Distribución del estadistico bajo H0:
Bajo H0, F ∼ Fp, n−p−1 → “Distribución F con p y n− p− 1
grados de libertad”
Región de rechazo:
R = {F > Fp, n−p−1, α}
p-valor = P(Fp, n−p−1 ≥ Fobs) con Fp, n−p−1 ∼ Fp, n−p−1
23 / 29
Test F
Hipótesis:
H0 : β1 = · · · = βp = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó · · · ó βp ̸= 0
Estadistico:
F =
(TSS −RSS)/p
RSS/(n− p− 1)
¿Por qué nos sirve de estad́ıstico?
A mayor F , menor RSS, o sea mejor es el ajuste ⇒
rechazaremos H0 cuando F sea grande. (F ≥ 0)
Distribución del estadistico bajo H0:
Bajo H0, F ∼ Fp, n−p−1 → “Distribución F con p y n− p− 1
grados de libertad”
Región de rechazo:
R = {F > Fp, n−p−1, α}
p-valor = P(Fp, n−p−1 ≥ Fobs) con Fp, n−p−1 ∼ Fp, n−p−1
23 / 29
Test F
Hipótesis:
H0 : β1 = · · · = βp = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó · · · ó βp ̸= 0
Estadistico:
F =
(TSS −RSS)/p
RSS/(n− p− 1)
¿Por qué nos sirve de estad́ıstico?
A mayor F , menor RSS, o sea mejor es el ajuste ⇒
rechazaremos H0 cuando F sea grande. (F ≥ 0)
Distribución del estadistico bajo H0:
Bajo H0, F ∼ Fp, n−p−1 → “Distribución F con p y n− p− 1
grados de libertad”
Región de rechazo:
R = {F > Fp, n−p−1, α}
p-valor = P(Fp, n−p−1 ≥ Fobs) con Fp, n−p−1 ∼ Fp, n−p−1
23 / 29
Test F
Hipótesis:
H0 : β1 = · · · = βp = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó · · · ó βp ̸= 0
Estadistico:
F =
(TSS −RSS)/p
RSS/(n− p− 1)
¿Por qué nos sirve de estad́ıstico?
A mayor F , menor RSS, o sea mejor es el ajuste ⇒
rechazaremos H0 cuando F sea grande. (F ≥ 0)
Distribución del estadistico bajo H0:
Bajo H0, F ∼ Fp, n−p−1 → “Distribución F con p y n− p− 1
grados de libertad”
Región de rechazo:
R = {F > Fp, n−p−1, α}
p-valor = P(Fp, n−p−1 ≥ Fobs) con Fp, n−p−1 ∼ Fp, n−p−1
23 / 29
Test F
Hipótesis:
H0 : β1 = · · · = βp = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó · · · ó βp ̸= 0
Estadistico:
F =
(TSS −RSS)/p
RSS/(n− p− 1)
¿Por qué nos sirve de estad́ıstico?
A mayor F , menor RSS, o sea mejor es el ajuste ⇒
rechazaremos H0 cuando F sea grande. (F ≥ 0)
Distribución del estadistico bajo H0:
Bajo H0, F ∼ Fp, n−p−1 → “Distribución F con p y n− p− 1
grados de libertad”
Región de rechazo:
R = {F > Fp, n−p−1, α}
p-valor = P(Fp, n−p−1 ≥ Fobs) con Fp, n−p−1 ∼ Fp, n−p−1
23 / 29
Test F
Hipótesis:
H0 : β1 = · · · = βp = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó · · · ó βp ̸= 0
Estadistico:
F =
(TSS −RSS)/p
RSS/(n− p− 1)
¿Por qué nos sirve de estad́ıstico?
A mayor F , menor RSS, o sea mejor es el ajuste ⇒
rechazaremos H0 cuando F sea grande. (F ≥ 0)
Distribución del estadistico bajo H0:
Bajo H0, F ∼ Fp, n−p−1 → “Distribución F con p y n− p− 1
grados de libertad”
Región de rechazo:
R = {F > Fp, n−p−1, α}
p-valor
= P(Fp, n−p−1 ≥ Fobs) con Fp, n−p−1 ∼ Fp, n−p−1
23 / 29
Test F
Hipótesis:
H0 : β1 = · · · = βp = 0 vs. H1 : β1 ̸= 0 ó β2 ̸= 0 ó · · · ó βp ̸= 0
Estadistico:
F =
(TSS −RSS)/p
RSS/(n− p− 1)
¿Por qué nos sirve de estad́ıstico?
A mayor F , menor RSS, o sea mejor es el ajuste ⇒
rechazaremos H0 cuando F sea grande. (F ≥ 0)
Distribución del estadistico bajo H0:
Bajo H0, F ∼ Fp, n−p−1 → “Distribución F con p y n− p− 1
grados de libertad”
Región de rechazo:
R = {F > Fp, n−p−1, α}
p-valor = P(Fp, n−p−1 ≥ Fobs) con Fp, n−p−1 ∼ Fp, n−p−1
23 / 29
Test F en el ejemplo
1 summary(ajuste_mult)
2 # Coefficients:
3 # Estimate Std. Error t value Pr(>t)
4 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
5 # TV 0.045765 0.001395 32.809 <2e-16 ***
6 # radio 0.188530 0.008611 21.893 <2e-16 ***
7 # newspaper -0.001037 0.005871 -0.177 0.86
8 # ---
9 # Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’
0.1 ’ ’ 1
10 #
11 # Residual standard error: 1.686 on 196 degrees of freedom
12 # Multiple R-squared: 0.8972 , Adjusted R-squared:
0.8956
13 # F-statistic: 570.3 on 3 and 196 DF, p-value: < 2.2e-16
p− valor ∼= 0 ⇒ rechazamos
H0 : β1 = β2 = β3 = 0
a favor de
H1 : β1 ̸= 0 ó β2 ̸= 0 ó β3 ̸= 0
24 / 29
Preguntas importantes
3 ¿Qué medios están asociados con las ventas?
¿qué parte del summary nos sirve para responder esta pregunta?
1 summary(ajuste_mult)
2 # Coefficients:
3 # Estimate Std. Error t value Pr(>t)
4 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
5 # TV 0.045765 0.001395 32.809 <2e-16 ***
6 # radio 0.188530 0.008611 21.893 <2e-16 ***
7 # newspaper -0.001037 0.005871 -0.177 0.86
8 # ---
9 # Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’
0.1 ’ ’ 1
10 #
11 # Residual standard error: 1.686 on 196 degrees of freedom
12 # Multiple R-squared: 0.8972 , Adjusted R-squared:
0.8956
13 # F-statistic: 570.3 on 3 and 196 DF , p-value: < 2.2e-16
25 / 29
Preguntas importantes
3 ¿Qué medios están asociados con las ventas?
¿qué parte del summary nos sirve para responder esta pregunta?
1 summary(ajuste_mult)
2 # Coefficients:
3 # Estimate Std. Error t value Pr(>t)
4 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
5 # TV 0.045765 0.001395 32.809 <2e-16 ***
6 # radio 0.188530 0.008611 21.893 <2e-16 ***
7 # newspaper -0.001037 0.005871 -0.177 0.86
8 # ---
9 # Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’
0.1 ’ ’ 1
10 #
11 # Residual standard error: 1.686 on 196 degrees of freedom
12 # Multiple R-squared: 0.8972 , Adjusted R-squared:
0.8956
13 # F-statistic: 570.3 on 3 and 196 DF , p-value: < 2.2e-16
25 / 29
Preguntas importantes
1 summary(ajuste_mult)
2 # Coefficients:
3 # Estimate Std. Error t value Pr(>t)
4 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
5 # TV 0.045765 0.001395 32.809 <2e-16 ***
6 # radio 0.188530 0.008611 21.893 <2e-16 ***
7 # newspaper -0.001037 0.005871 -0.177 0.86
8 # ---
9 # Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’
0.1 ’ ’ 1
10 #
11 # Residual standard error: 1.686 on 196 degrees of freedom
12 # Multiple R-squared: 0.8972 , Adjusted R-squared:
0.8956
13 # F-statistic: 570.3 on 3 and 196 DF, p-value: < 2.2e-16
¿Qué medios están asociados con las ventas?
¿A qué hipótesis corresponden esos p-valores?
¿Cómo se calculan?
26 / 29
Preguntas importantes
1 summary(ajuste_mult)
2 # Coefficients:
3 # Estimate Std. Error t value Pr(>t)
4 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
5 # TV 0.045765 0.001395 32.809 <2e-16 ***
6 # radio 0.188530 0.008611 21.893 <2e-16 ***
7 # newspaper -0.001037 0.005871 -0.177 0.86
8 # ---
9 # Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’
0.1 ’ ’ 1
10 #
11 # Residual standard error: 1.686 on 196 degrees of freedom
12 # Multiple R-squared: 0.8972 , Adjusted R-squared:
0.8956
13 # F-statistic: 570.3 on 3 and 196 DF, p-value: < 2.2e-16
¿Qué medios están asociados con las ventas?¿A qué hipótesis corresponden esos p-valores?
¿Cómo se calculan?
26 / 29
Preguntas importantes
1 summary(ajuste_mult)
2 # Coefficients:
3 # Estimate Std. Error t value Pr(>t)
4 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
5 # TV 0.045765 0.001395 32.809 <2e-16 ***
6 # radio 0.188530 0.008611 21.893 <2e-16 ***
7 # newspaper -0.001037 0.005871 -0.177 0.86
8 # ---
9 # Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’
0.1 ’ ’ 1
10 #
11 # Residual standard error: 1.686 on 196 degrees of freedom
12 # Multiple R-squared: 0.8972 , Adjusted R-squared:
0.8956
13 # F-statistic: 570.3 on 3 and 196 DF, p-value: < 2.2e-16
¿Qué medios están asociados con las ventas?
¿A qué hipótesis corresponden esos p-valores?
¿Cómo se calculan?
26 / 29
Test t para los coeficientes - Ej.: test para β1 (TV)
Hipótesis:
H0 : β1 = 0 vs. H1 : β1 ̸= 0
Estad́ıstico:
T =
β̂1
ŜE(β̂1)
∼ tn−p−1 bajo H0
Región de rechazo:
R = {|T | > tn−p−1, α/2}
p-valor = P(|Tn−p−1| ≥ |Tobs|) con Tn−p−1 ∼ tn−p−1
27 / 29
Test t para los coeficientes - Ej.: test para β1 (TV)
Hipótesis:
H0 : β1 = 0 vs. H1 : β1 ̸= 0
Estad́ıstico:
T =
β̂1
ŜE(β̂1)
∼ tn−p−1 bajo H0
Región de rechazo:
R = {|T | > tn−p−1, α/2}
p-valor = P(|Tn−p−1| ≥ |Tobs|) con Tn−p−1 ∼ tn−p−1
27 / 29
Test t para los coeficientes - Ej.: test para β1 (TV)
Hipótesis:
H0 : β1 = 0 vs. H1 : β1 ̸= 0
Estad́ıstico:
T =
β̂1
ŜE(β̂1)
∼ tn−p−1 bajo H0
Región de rechazo:
R = {|T | > tn−p−1, α/2}
p-valor = P(|Tn−p−1| ≥ |Tobs|) con Tn−p−1 ∼ tn−p−1
27 / 29
Test t para los coeficientes - Ej.: test para β1 (TV)
Hipótesis:
H0 : β1 = 0 vs. H1 : β1 ̸= 0
Estad́ıstico:
T =
β̂1
ŜE(β̂1)
∼ tn−p−1 bajo H0
Región de rechazo:
R = {|T | > tn−p−1, α/2}
p-valor = P(|Tn−p−1| ≥ |Tobs|) con Tn−p−1 ∼ tn−p−1
27 / 29
Test t para los coeficientes - Ej.: test para β1 (TV)
Hipótesis:
H0 : β1 = 0 vs. H1 : β1 ̸= 0
Estad́ıstico:
T =
β̂1
ŜE(β̂1)
∼ tn−p−1 bajo H0
Región de rechazo:
R = {|T | > tn−p−1, α/2}
p-valor = P(|Tn−p−1| ≥ |Tobs|) con Tn−p−1 ∼ tn−p−1
27 / 29
Test t para los coeficientes - Ej.: test para β1 (TV)
Hipótesis:
H0 : β1 = 0 vs. H1 : β1 ̸= 0
Estad́ıstico:
T =
β̂1
ŜE(β̂1)
∼ tn−p−1 bajo H0
Región de rechazo:
R = {|T | > tn−p−1, α/2}
p-valor = P(|Tn−p−1| ≥ |Tobs|) con Tn−p−1 ∼ tn−p−1
27 / 29
Test t para los coeficientes - Ej.: test para β1 (TV)
Hipótesis:
H0 : β1 = 0 vs. H1 : β1 ̸= 0
Estad́ıstico:
T =
β̂1
ŜE(β̂1)
∼ tn−p−1 bajo H0
Región de rechazo:
R = {|T | > tn−p−1, α/2}
p-valor
= P(|Tn−p−1| ≥ |Tobs|) con Tn−p−1 ∼ tn−p−1
27 / 29
Test t para los coeficientes - Ej.: test para β1 (TV)
Hipótesis:
H0 : β1 = 0 vs. H1 : β1 ̸= 0
Estad́ıstico:
T =
β̂1
ŜE(β̂1)
∼ tn−p−1 bajo H0
Región de rechazo:
R = {|T | > tn−p−1, α/2}
p-valor = P(|Tn−p−1| ≥ |Tobs|) con Tn−p−1 ∼ tn−p−1
27 / 29
Test t en summary
1 summary(ajuste_mult)
2 # Coefficients:
3 # Estimate Std. Error t value Pr(>t)
4 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
5 # TV 0.045765 0.001395 32.809 <2e-16 ***
6 # radio 0.188530 0.008611 21.893 <2e-16 ***
7 # newspaper -0.001037 0.005871 -0.177 0.86
T = β̂1
ŜE(β̂1)
= 0.0457650.001395 = 32.80645
p-valor= P(|T196| ≥ 32.80) = 2P(T196 ≥ 32.80) =
2 * (1 - pt(32.80, df = 196)) ∼= 0
28 / 29
Test t en summary
1 summary(ajuste_mult)
2 # Coefficients:
3 # Estimate Std. Error t value Pr(>t)
4 # (Intercept) 2.938889 0.311908 9.422 <2e-16 ***
5 # TV 0.045765 0.001395 32.809 <2e-16 ***
6 # radio 0.188530 0.008611 21.893 <2e-16 ***
7 # newspaper -0.001037 0.005871 -0.177 0.86
T = β̂1
ŜE(β̂1)
= 0.0457650.001395 = 32.80645
p-valor= P(|T196| ≥ 32.80) = 2P(T196 ≥ 32.80) =
2 * (1 - pt(32.80, df = 196)) ∼= 0
28 / 29
Ejercicios de la práctica que pueden hacer
Práctica 5 - parte 2: Ej.1 hasta ı́tem f)
29 / 29