Numerous pieces of military,
medical, and socio-economic information covering the lifetime of the recruits
in the UA data set have been collected
from military, census, and medical and pension records. The collected
information has been divided into three main sets of variables, corresponding
to the three main data sets: Specific
information on the cleaning process applied to each variable can be found under
the variable names on these lists. (However, see below on Surgeon’s
Certificates Variables.)
Data cleaning was the final step
in the processing of the UA data. Cleaning took place at the Center for
Population Economics at the University
of Chicago. In general
terms, cleaning involved the standardization of values, including the
correction of spelling errors and standardization of variant or archaic spellings
as well as the standardization of punctuation, use of abbreviations, and word
order. Also important was the exclusion of values that did not fit the form or
logical possible content of a particular variable, such as when a number had
been input for a variable requiring an alphabetic value. Whenever possible,
errors of this sort have been corrected by consulting the original records, a
process that continues where needed today. Some variables (such as residence
and occupation) also have been subjected to a coding process, undertaken for
the purposes of clarifying and simplifying the range of data values. Any coding
of a variable will be described in detail in the codebooks, which contain the most intricate information about the
variables in the data sets.
Note on Surgeon’s
Data collection from the
surgeon’s certificates presented special
challenges. Variables in this data set are listed according to the “disease
screens” through which the data was input. The disease screens principally
consist of distinct inputting screens pertaining to various disease conditions
for which a recruit may have been examined. Most of the information in this
data set is coded according to “answer classes” and “modifiers” devised by Dr
Louis Nguyen. (For more information, see the description under The Surgeon’s Certificates Variables.)