Logit regression

If one assumes that the probability is $P(i→j)$ so that actor $i = 1..n$ chooses alternative $j = 1..k$ is proportional (within the set of alternative choices) to the exponent of a linear combination of $p = 1..p$ data values $X_{ijp}$ related to $i$ and $j$, one arrives at the logit model, or more formally:

Assume $P(i \to j) \sim w_{ij} \\ w_{ij} :=exp(v_{ij}) \\ v_{ij} := \sum\limits_{p} \beta_p X_{ijp} $

Thus $L(i→j) := log(P(i→j)) ∼ v_{ij}$.

Consequently, $w_{ij} > 0$ and $P(i \to j) := { w_{ij} \over \sum\limits_{j’}w_{ij’}}$, since $\sum\limits_{j}P_{ij}$ must be $1$.

Note that:

$v_{ij}$ is a linear combination of $X_{ijp}$ with weights $β_p$ as logit model parameters.
the odds ratio $P(i \to j) \over P(i \to j’)$ of choice $j$ against alternative $j′$ is equal to ${w_{ij} \over w_{ij’}} = exp( v_{ij} - v_{ij’} ) = exp \sum\limits_{p} \beta_p \left( X_{ijp}- X_{ij’p} \right)$
this formulation does not require a separate beta index (aka parameter space dimension) per alternative choice $j$ for each exogenous variable.

observed data

Observed choices $Y_{ij}$ are assumed to be drawn from a repreated Bernoulli experiment with probabilites $P(i→j)$.

Thus $P(Y) = \prod\limits_{ij} {N_i ! \times P(i \to j)^{Y_{ij}} \over Y_{ij}! }$ with $N_i := \sum\limits_{j} Y_{ij}$.

Thus $L(Y) := log(P(Y))$

$= log \prod\limits_{ij} {N_i ! \times P(i \to j)^{Y_{ij}} \over Y_{ij}! }$

$= C + \sum\limits_{ij} (Y_{ij} \times log(P_{ij}))$

$= C + \sum\limits_{i} \left[{\sum\limits_{j}Y_{ij} \times L(i \to j)}\right]$

$= C + \sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times \left(v_{ij} - log \sum\limits_{j’}w_{ij’}\right)}\right]$

$= C +\sum\limits_{i} \left[{ \left( \sum\limits_{j}Y_{ij} \times v_{ij} \right) - N_i \times log \sum\limits_{j}w_{ij}}\right]$

with $C = \sum\limits_{i} C_i$ and $C_i := [log (N_i!) - \sum\limits_{j} log (Y_{ij}!)]$, which is independent of $P_{ij}$ and $β_j$. Note that: $N_i = 1 ⟹ C_i = 0$

specification

The presented form $v_{ij} := β_p \times X_{ij}^p$ (using Einstein Notation from here) is more generic than known implementations of logistic regression (such as in SPSS and R), where $X_i^q$, a set of $q = 1..q$ data values given for each $i$ ($X_i^0)$ is set to $1$ to represent the incident for each $j$) and $(k−1) \times (q+1)$ parameters are to be estimated, thus $v_{ij} := β_{jq} \times X_i^q$ for $j = 2..k$ which requires a different beta for each alternative choice and data set, causing unnecessary large parameter space.

The latter specification can be reduced to the more generic form by:

assigning a unique $p$ to each $jq$ combination, represented by $A_{jq}^p$.
defining $X_{ij}^p := A_{jq}^p \times X_i^q$ for $j = 2..k$, thus creating redundant and zero data values.

However, a generical model cannot be reduced to a specification with different $β$’s for each alternative choice unless the latter parameter space can be restricted to contain no more dimensions than a generic form. With large $n$ and $k$, the data values $X_{ijk}$ can be huge. To mitigate the data size, the following tricks can be applied:

limit the set of combinations of $i$ and $j$ to the most probable or near $j$’s for each $i$ and/or cluster the other $j$’s.
use only a sample from the set of possible $i$’s.
support specific forms of data:

#	form	reduction	description
0	$β_p X_{ij}^p$		general form of p factors specific for each i and j
1	$β_p A_{jq}^p X_i^q$	$X_{ij}^p := A_{jq}^p X_i^q$	q factors that vary with i but not with j.
2	$β_p X_i^p X_j^p$	$X_{ij}^p := X_j^p X_i^p$	p specific factors in simple multiplicative form
3	$β_{jq} X_i^q$		q factors that vary with j but not with i.
4	$β_p X_j^p$	$X_{ij}^p := X_j^p$	state constants D_j
5	$β_j$		state dependent intercept
6	$β_p (J_i^p == j)$		usage of a recorded preference

regression

The $β_p$’s are found by maximizing the likelihood $L(Y | β)$ which is equivalent to finding the maximum of $\sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times v_{ij} - N_i \times log \sum\limits_{j}w_{ij}}\right]$

First order conditions, for each $p$: $0 = { \partial L \over \partial\beta_p } = \sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times { \partial v_{ij} \over \partial \beta_p } - N_i \times { \partial log \sum\limits_{j}w_{ij} \over \partial \beta_p }} \right]$

Thus, for each $p$: $\sum\limits_{ij} Y_{ij} \times X_{ijp} = \sum\limits_{ij} N_i \times P_{ij} \times X_{ijp}$ as ${ \partial v_{ij} \over \partial \beta_p } = X^p_{ij}$ and

$({\partial log \sum\limits_{j}w_{ij} \over \partial \beta_p } \times {\sum\limits_{j} {\partial w_{ij} / \partial \beta_p } \over \sum\limits_{j}w_{ij} } \times {\sum\limits_{j} {w_{ij} \times \partial v_{ij} / \partial \beta_p } \over \sum\limits_{j}w_{ij} } \times {\sum\limits_{j} {w_{ij} \times X_{ijp} } \over \sum\limits_{j}w_{ij} } \times \sum\limits_{j} P_{ij} \times X_{ijp} )$

example

logit regression of rehousing logit_regression_of_rehousing “wikilink”.