Estimate a cross table from a series of row and columntotals

Consider the following problem:

Known:

  • Ggr: number of objects of g-type g in zone r, aka row-totals.
  • Hhr: number of objects of h-type h in zone r, aka column-totals.

Requested:

  • Qg**hr, an estimated number of objects with g-type g and h-type h in zone r.
  • estimated number of objects per h-type as a function (preferably a linear combination) of object numbers per h-type (for estimating a h-type distribution given a g-type distribution of new objects).

We assume that:

  • For each zone r, the F and G count all objects, thus: $\forall r: \sum\limits_g G^r_g = \sum\limits_h H^r_h$
  • Phi is equal for all i in the same r and with the same g, thus depends only on P(h|r**g).
  • Qgh*</sub>*r* has a *G**g**r* repeated categorical distribution per row *g* and zone *r*; thus *E*[*Q**g**h**r*] = *P*(*h*|*rg) ⋅ Ggr
  • Qg**hr := fg ⋅ fh ⋅ Pg**h such that $\sum\limits_{h} Q^r_{gh} = G^r_g$ and $\sum\limits_{g} Q^r_{gh} = H^r_h$, to be determined by Iterative proportional fitting

Thus:

  • $\sum\limits_{h} P(h|rg) = 1$, $f_g = G^r_g / \sum\limits_{h} f_h \cdot P_{gh}$, and $f_h = H^r_h / \sum\limits_{g} f_g \cdot P_{gh}$.
  • Pg**h is to be determined by regression of Hhr by Ggr, thus, written in matrix notation: H = G × P + ϵ with ϵ a r × h matrix of independent stochasts with zero expectation.
  • it follows that: P := (GT×G)−1 × (GT×H)

ToDo:

  • Consider Alternative: Qbjh is determined by discrete allocation with the given constraints and suitability Pg**h
  • Consider Alternative: P := (GT×H) × (HT×H)−1 which follows from regression of Ggr by Hhr.
  • Consider Alternative: P := (GT×H)
  • Consider Effect of possible heteroscedasticity of ϵ with H; i.e. assume var(ϵ) ∼ H, as each element of H is assumed to be the sum over j of the results of Ggr trials of categorical distributions with conditional probabilities Pg|h.