Introduction to the Union Army Study

The Union Army Data Set consists of approximately 39,340 white males mustered into the Union Army during the Civil War, for whom military, socio-economic, and medical information from several sources throughout their lifetimes has been collected.

The Union Army Data Set is composed of three principal data sets that are based on three different sources:

  • The "Military, Pension, and Medical Records" data set
    The largest data set is the "Military, Pension, and Medical Records" data set, which is derived from military-related documents housed in the National Archives in Washington, D.C. These include both war-time records and applications made by veterans for pension support.
  • The "Surgeons' Certificates" data set
    Associated with these pension applications are detailed physical examinations, completed by physicians, that certify the veterans' health and disability status. Information from these examinations is collected in the second major dataset, known as the "Surgeons' Certificates" data set.
  • The "Census Records" data set
    The "Census Records" data set contains all information on the veterans that is available in the U.S. Federal Censuses of 1850, 1860, 1900, and 1910, though not all veterans could be linked successfully to the Census documents.

All individuals in the Union Army Data Set can be linked across data sets by means of a unique identification number.

The following pages cover in more detail the various aspects of the Union Army Data Set:

  • The Background and Significance pages describe the motivation for compiling the Union Army Data Set and briefly illustrate the wide range of issues the data set has been and can be used to address, ranging from factors affecting the aging process to economic determinants of labor force participation.
  • The Sample Design pages describe how the sample of the approximately 39,340 observations included in the Union Army Data Set was drawn. These pages also include a statistical analysis to assess how representative the recruits sample is of the Union Army and of the entire white male military-age population at the time. In addition, a list of the companies that are covered in the sample is provided.
  • The Data Sources pages contain an in-depth description of the three data sets that together constitute the Union Army Data Set, namely the "Military, Pension, and Medical Records" data set, the "Surgeons' Certificates" data set , and the "Census Records" data set . These pages explain how the data were collected at the time of the Civil War, as well as how they were recorded at the time and preserved over a period of more than one hundred years.
  • The Variables pages list the information (variables) available for each recruit (observation) in the Union Army Data Set. These pages describe how information on the recruits that was found in the original records was standardized and coded to convert the original information into machine-readable, numeric data.
  • The Download Data icon provides a link to the Center's interactive platform that allows researchers to access the Union Army Data Set. (under construction)

The data in the Union Army Data Set comprises a portion of the historical data collected by the project Early Indicators of Later Work Levels, Disease, and Death (abbreviated EI), sponsored by the National Institutes of Health and the National Science Foundation (Grant Numbers NIH P01 AG10120 and NSF SBR 9114981). The data were collected under the direction of the Department of Economics at Brigham Young University (BYU) and processed by the Center for Population Economics (CPE) at the University of Chicago. The goal of the project is to construct datasets suitable for longitudinal studies of factors affecting the aging process.