GLM and Exponential Family
GLM
Ordinary Linear Regression assumes that y changes a constant value when x changes a constance value. It fails in following situation 1. Response has certain range, say life or death [0,1]. 2. When we need to estimate the probability. 3. Data are not normal distributed, say, the new ebola victims each day in mali.
Generalized Linear model solved this problem by specify $g(u)=x\beta$, which means the function of mean varies constant ,rather than the mean itself with $x\beta$.
I hope you do not get bored, let's look at form of generalized linear model, which will help you understand logistics model, poisson model, probit model well. A glm is made up with the following:
1.$ \eta=x\beta$,which is called systematic component.
2.$g(\mu)=\theta$,which is called linked function
3.Last but not the least, distribution comes from natural exponential family.
Natural Parameter Famliy
Any distribution with a pdf can be written in the following way belong to natural parameter family.
$f(y_i,\theta,\phi)=exp(\frac{y_i\theta+b(\theta)}{a(\phi)}+c(y_i,\phi))$
$\theta$:function of $\mu$
$a(\phi)$:dispersion parameter
Normal distribution is one of this famliy: $\frac{1}{\sqrt{2\pi}}exp-\frac{(y_i-\mu)^2}{2\sigma^2} $=$exp(\frac{y_i\theta-\theta^2/2}{\sigma^2}+\frac{-y^2}{2\sigma^2}-log(2\pi))$
Here we can see:
$\theta:\mu$
$b(\theta):\theta^2/2$
$a(\phi):\sigma^2$
$c(y_i,\phi):\frac{-y^2}{2\sigma^2}-log(2\pi)$
Also binomial distribution is natural exponential family:
$\binom ny p^{y_i}(1-p)^{n-y_i}$=$exp[y_ilog(\frac{p}{1-p})+nlog(1-p)+log(\binom ny)]$
$\theta=log(\frac{p}{1-p})$
$\binom ny p^{y_i}(1-p)^{n-y_i}=exp[y_i\theta-nlog(1+e^\theta)+nlog(\binom ny)]$
Here:
$\theta:\log(\frac{\mu}{1-\mu})$
$b(\theta):nlog(1+e^\theta)$
$a(\phi):1$
Exponetial famliy
$\lambda exp(-\lambda y_i)=exp[-\lambda y_i +log(\lambda)]$
$u=\frac{1}{\lambda}$
$\lambda exp(-\lambda y_i)=exp[-\frac{1}{\mu}y_i -log(\mu)]$
$\theta=-\frac{1}{\mu}$
$\lambda exp(-\lambda y_i)=exp[\theta y_i +log(-\theta)]$
$\theta=-\frac{1}{\mu}$
$b(\theta)=-log(-\theta)$
canonical link
if link function \theta=g(\mu) in natrual exponential family equals \theta in the natural parameter function, this kind of link is called canonical link.
| Distribution | Canonical Link |
|---|---|
| Normal | $\theta=\mu$ |
| Poisson | $\theta=log(\mu)$ |
| Exponential | $\theta=-\frac{1}{\mu}$ |
| Bernoulli | $log(\frac{\mu}{1-\mu})$ |
You can get the cannoical link of natrual exponential family by reorgnize the distribution in the way in the example
ps: You might have heard identity link,$u=\theta$, normal distribution's canonlical link is also its identity link.
why do we use cannoical link ?
How to get the parameter?
How to inference it?
What is the realtion between GLM and OLM?
See you next time !
Reference:
1.http://en.wikipedia.org/wiki/Exponential_distribution 2.http://en.wikipedia.org/wiki/Generalized_linear_model
3.Categorical Data Analysis by Alan Agresti (Wiley, 2013)
没有评论:
发表评论