Estimation of life-length in England from the 16th to 18th century

Transcription

Estimation of life-length in England from the 16th to 18th century
Estimation of life-length in England from the 16th to
18th century
Marianne A. Jonker
Free University Amsterdam, Department F.E.W.
De Boelelaan 1081
Amsterdam, the Netherlands
[email protected]
To estimate life-length in England from the sixteenth to the eighteenth century historical demographers use data from parish registers. These registers contain the dates of
baptisms (births), marriages and burials (deaths). In many parishes the registers are incomplete; whole years or some events are missing or some of the individuals who are listed
cannot be identied. There was a large mobility in those days; about 40% of the people migrated at least once in their life. The data concerning one person may therefore be scattered
over several records from dierent parishes. Since so many records are useless or lost it is
impossible to re-assemble life-histories from birth to death for all individuals. Consequently,
for approximately 40% of the people time of death is missing. In these cases we observe
the time of birth and possibly a sequence of `life-events' (marriage, births and deaths of
children, death of spouse and remarriage). The main reason for a missing death is thought
to be migration to another parish. This time, though, is never observed.
Since the time of emigration (censoring) is unknown, estimators cannot be found by
standard techniques. Leaving the migrated people out of the data will cause a bias: persons
who lived long were more likely to migrate than persons who died young. However by using
the age at the last recorded life-event as an independent censoring time will cause a bias
too, because during the residence in the parish between this last life-event and the time
of migration the person is also at risk to die. So using the Kaplan Meier estimator with
the censoring time set equal to this age we underestimate the number of persons at risk at
any time-point and therefore overestimate the risk of death. Since about 40% of the people
observed emigrated, the bias would be large.
Up to now historical demographers use ad-hoc methods to estimate the life-length
distribution function. Blum (1987) and Ruggles (1992) propose two dierent methods for
estimating age at time of migration and then use standard techniques to nd upper- and
lower-estimates for life-length.
Gill (1997) and Jonker and Van der Vaart (1999) describe a model to estimate life-length.
They assume that life can be described by three independent processes: life-length, emigration (censoring) and the times of the life-events. The distribution functions for life-length
and emigration are taken to be completely unknown and it is assumed that the life-events
follow a Poisson process with intensity rate . Gill (1997) assumes to be known in contrast
to Jonker and Van der Vaart (1999), who assume it to be unknown. Under these assumptions
both prove that the non-parametric maximum likelihood estimator for life-length is
pn-consistent.
However, some of the assumptions are not satisfying and the models of Gill
and Jonker and Van der Vaart are not used in practise. The main assumption which fails is
the independence between the processes of migration and the life-events. If a woman marries
a man from another parish the wedding usually takes place in the parish of the woman. She
often migrates directly after the wedding-ceremony to the parish of her husband. Besides
this, the Poisson process for the process of the life-events is too simple, since the life-events
are inhomogeneous in time and the times between births are dependent.
In my presentation I will describe a new model for estimating life-length from the 16th
to the 18th century. The distribution for life-length is still taken completely unknown and
life-time is assumed to be independent of migration, marriage and having children. Time of
migration is also assumed to be independent of having children but may depend on time of
marriage; with a positive probability time of migration and marriage equal. For the distribution function for marriage we take an appropriate parametric distribution and the times
between the births of children are modeled to be dependent of each other.
This model reects reality much better, but mathematically seen it is more complicated.
It can be proved that the maximum likelihood estimators for the unknown distribution
functions (life-length and migration) and a nite dimensional parameter are asymptotically
consistent. These estimators are not explicitly dened, but as the values which maximize
the log-likelihood. A numerical algorithm is used to nd the estimates.
REFERENCES
Blum, S. (1987). Estimation de la mortalite locale des adultes a partir des ches de familles.
Population 42, 39-56. (English language version: An estimate of local adult mortality of
family cards. Population 44, English selection no 1 (1989), 39-56.)
Gill, R.D. (1997). Nonparametric estimation under censoring and passive registration. Statistica Neerlandica 51, 35-54.
Jonker, M.A. and Vaart, A.W. van der. (1999) A semiparametric model for censored and
passively registered data. Preprint.
Jonker, M.A. (1999) Estimation of life-length based on censored and passively registered
data. Preprint.
Ruggles, S. (1992). Migration, marriage, and mortality: correcting sources of bias in English
family reconstitutions. Population studies 46, 507-522.
Wrigley, E.A., Davies, R.S., Oeppen, J.E. and Schoeld, R.S. (1997). English population
history from family reconstitution 1580-1837. Cambridge University press, England.
FRENCH RESUM
E
An d'estimer la duree de vie en Angleterre du seizieme au dix-huitieme siecle les
demographes historiques utilisent les donnees des registres paroissales. Ces registres contiennent les donnees des baptemes, marriages et funerailles. Dans beaucoup de registres
les donnees sont incompletes, faute d'une grande mobilite ce qui a eu pour eet que les
donnees concernant une personne sont eparpillees dans les registres de plusieurs paroisses.
Comme beaucoup de ches sont perdues ou inutilisables, il est impossible de reconstituer les
histoires de vie pour tous les individus. Par consequence, les temps de mort d'environ 40
manquent. Dans ces cas nous observons le temp de naissance et, peut-etre, les temps d'une
serie d' "evenements de vie": marriage, naissances et morts d'enfants, mort de l'epouse,
remarriage. La raison principale pour un temps de mort manquant est la migration dans
une autre [parish]. Toutefois, le temps de cet evenement n'est jamais observe. Dans ma
presentation je decrirerai un modele pour l'estimation des durees de vie.