Union Army Study - Census Records

|Military, Pension, and Medical Records| |Surgeons' Certificates||Census Records|



Introduction to the Census Records

The Early Indicators project attempted to find Census records whenever there was enough information from other sources indicating where to initiate a search. Collection began by extracting informative data from the Military, Pension, and Medical Records data set. The 1900 Census was searched first because it gave the year of immigration, if applicable. If an immigrant veteran was found in 1900, the immigration information could be used to determine whether or not it was possible to search for the veteran in 1850 or 1860. After the 1850 and 1860 Census collections were completed for an entire company, the data were sent to the Family History Library in Salt Lake City for completion of the 1910 Census collection because the BYU library did not have complete soundex films for 1910.

The search for veterans in the 1900 and 1910 Census was usually more successful than 1850 and 1860 for two reasons. First, because Census households were indexed by head of household, veterans' names were much easier to find in 1900 or 1910, when they were likely the head of household, than in 1850 or 1860, when they were usually children or adolescents. Second, data collectors searched the 1900 and 1910 Census only for veterans who received Civil War pensions and who were assumed, based on information in the pension record (PEN), to be alive in these later Census years. Without the information in the PEN, not enough identifying information was known about the recruit to make a positive identification in the later Census years; often, there was insufficient information to even begin searching since the 40-50 years that had passed were probably his years of greatest mobility and change. Pension records often contained the veteran's wife's name, marriage date, children's names and birth dates, and residence information. This kind of information aided significantly in accurately identifying individuals in the later Censuses. For the 1850 and 1860 searches, however, researchers looked for everyone with sufficient information from the Military, Pension, and Medical Records data set indicating where to begin a search, even if the only information was the place of enlistment.

The paragraphs below give further information on the following topics:


back to top

Searching the 1850 and 1860 Censuses

For the collection of the 1850 and 1860 Censuses, a printed index listing only heads of households (alphabetically by surname) was used for each state. In some of the more populous states the indexes were divided by region within the state. Researchers searched for every recruit with sufficient information from the Military, Pension, and Medical Records data set. In general, the county of birth and surrounding counties were searched in 1850, and the county of enlistment and surrounding counties were searched in 1860. Many recruits were searched for in both places in both years.

Locating recruits in the 1850 and 1860 Censuses was challenging not only because of the difficulties listed above (Searching the 1900 and 1910 Censuses), but also because so few recruits were heads of households. Therefore, it was necessary to know additional information about the recruit's family. Knowledge of the father's name was particularly important, since the Census records were indexed by the name of the head of household. Unfortunately, for the majority of recruits, the father's name was unknown. As a result, it was necessary to search all households in a county or town with the same surname. For example, a researcher searching for a recruit in 1850 named Edwin Church in Philadelphia County, Pennsylvania, father's name unknown, would have found thirteen listings with the surname Church. All thirteen would have to have been checked for an Edwin of the approximate age. If the name being searched for was more common, there would have been even more listings to be considered. Obviously with some names there were too many possibilities to search in a reasonable amount of time. Surnames such as Smith, Jones, Anderson, Baker, Cook, or Miller had pages of listings in the index, and when the father's given name was not known, these men could not be searched for because of time constraints.

If the recruit was living in the county of his birth or if there was residence information in the pension records, then it was significantly more likely that he would have been located in the Census. Even when this information was present, however, it was sometimes difficult to make successful linkages because there were often numerous variant spellings in the records. Researchers were careful to check the index for other possible spellings of a surname and all known variations found in the Military, Pension, and Medical Records data set.


back to top
Searching the 1900 and 1910 Censuses

The Census soundex indexing system was used to find pensioners in the 1900 and 1910 Censuses. Under the soundex system, surnames were converted to a code consisting of the first letter of the surname and a three-digit suffix. If there were not enough consonants in a name to convert to 3 digits, 0's were used to make 3 digits. If there were more than 3 consonants, only the first 3 were used. All vowels, including W, H, and Y, were deleted in the soundexing system and doubled letters counted as single letters. (For example, the surname of Bassett would have been converted to a soundex code of B-320.) Within a specific soundex surname code, individuals were identified alphabetically by given name.

The soundex indexing system was extremely valuable for finding individuals in the Census records. Unfortunately, however, even with this tool there were problems with the Census records that made it difficult to identify individuals. Some difficulties included the following:

  1. Spelling Variations. Different spellings of veterans' names on pension and Census records were common, making identification difficult, particularly if the first letter was incorrect.
  2. Multiple given (first) names. Within a specific code individuals were searched for by given name. The researcher had to know the given name used in the Census.
  3. Incorrect coding due to illegible records. Some Census records were extremely difficult to read because of poor handwriting, poor filming, or damage to the original record. Occasionally a person found in the soundex could not be found on the microfilm because of these problems. Also, sometimes the Census film was too dark or too light to read, and only partial information could be obtained.
  4. Lack of location/residence information. State of residence had to be known in order to locate the recruit in the records.
  5. Missing head of household. If a recruit was living with a relative of the same surname and was not the head of household, a researcher had to know the name of the head of household since only the head of household was soundexed by his or her given name within the soundex code.

Although the methods for searching the 1900 and 1910 Censuses were similar, only 33 states in the 1910 Census had been soundexed. This posed considerable difficulty in finding veterans in the unsoundexed states. In some places they could not be searched for at all because of time constraints. In the unsoundexed states researchers used city directories, maps, street number indexes, and other library resources. If information from the pension records indicated that a veteran had lived in a town for several years before and after the Census year, that town was searched completely if it could be done in a reasonable amount of time. Using alternate methods of searching added to the time required to find a person, and so each case was analyzed to determine whether or not the time should be spent.


back to top
The Walker Collection Data

The Walker data was collected during the summers of 1980 and 1981 by Kent and Mini (Marion) Walker. From the muster rolls in Ohio and New York, the Walkers collected file number, record number, first name, middle initial, last name, age, height, place of enlistment, length of enlistment, occupation, birthplace, skin, hair, eyes, comments regarding wounds, discharge, death date and place, cause of death, promotions, and battles. They then searched the 1850 and 1860 censuses for those recruits, using the age, place of enlistment, birthplace, and occupation as verifiers. During the collection for the Early Indicators project, these data were compared against information collected from the pension (PEN) and military (MSR) records to determine correct matches. If a recruit was not considered a verifiable match, he was deleted from the database and searched for again using current collection methods. Those recruits considered to be verifiable finds in the Census were not searched again, but no attempt was made to assign quality codes (see below, The Quality Code System). The Walker data set was then merged into the current Census Records data set. There is a binary variable in the Early Indicators Census Records data set, walker, indicating whether or not the observation originally came from the Walker collection.


back to top
Census Information

The U.S. Census changed many times since its inception. These changes are reflected in the variables that were collected in each Census year. Below is an alphabetical list of the variables and the Census years that contain the indicated variable. Of course, the Census documents were incomplete in some cases, so not all of the information below was available for every recruit. A complete layout of variables is given in Section III, which also contains the number of non-missing values for each variable. Detailed variable descriptions are given in Appendix A.

  1. Identification of Individuals
    1. Identification Number (1850, 1860, 1900, 1910)
    2. Name (1850, 1860, 1900, 1910)
    3. Relationship to Household Head (1900, 1910)

  2. Demographic and Socio-Economic Variables
    1. Age
      • At Time of Census (1850, 1860)
      • At Last Birthday (1900, 1910)
    2. Year, Month, and Place of Birth
      • Birth Year (1900)
      • Birth Month (1900)
      • Birthplace (1850, 1860, 1900, 1910)
    3. Children
      • Number of Living Children (1900, 1910)
      • Number of Children (1900, 1910)
    4. Color of Skin (1850, 1860, 1900, 1910)
    5. Disability
      • Deaf, Dumb, Blind, or Insane (1850, 1860)
      • Deaf and Dumb (1910)
      • Blind in Both Eyes (1910)
    6. Education
      • Attended School Within the Last Year (1850, 1860)
      • Number of Months in School Since 09/01/1899 (1900)
      • School Attended Since 09/09/1909 (1910)
    7. Employment Status
      • Number of Months Unemployed Within Year (1900)
      • Unemployed on 4/15/1910 (1910)
      • Employment Status (Worker or Employer) (1910)
      • Number of Weeks Unemployed in 1909 (1910)
    8. Gender (1850, 1860, 1900, 1910)
    9. Immigration / Naturalization
      • Number of Years in U.S. (1900)
      • Year of Immigration to the U.S. (1900, 1910)
      • Naturalization Status (1900, 1910)
    10. Language
      • Speaks English (1900)
      • English or Other Language (1910)
    11. Literacy
      • Household Member over 20 is Illiterate (1850, 1860)
      • Reads (1900, 1910)
      • Writes (1900, 1910)
    12. Marital Status
      • Married Within the Last Year (1850, 1860)
      • Marital Status (1900, 1910)
      • Number of Years Married (1900, 1910)
    13. Occupation
      • Occupation, Trade, or Other Work (1850, 1860, 1900, 1910)
      • Nature of Industry or Business (1910)
      • Occupation Code (1850, 1860, 1900, 1910)
    14. Parents' Birthplace
      • Father's Birthplace (1900, 1910)
      • Mother's Birthplace (1900, 1910)
    15. Property / Home Ownership
      • Owns or Rents Home (1900, 1910)
      • Owns Property in Question Free or Mortgaged (1900, 1910)
      • Farm or house (1900, 1910)
    16. Veteran
      • Veteran of Union or Confederate Army (1910)
      • Veteran of Union or Confederate Navy (1910)
    17. Wealth
      • Real Estate Owned (1850, 1860)
      • Personal Property (1860)

  3. Quality Codes and Remarks
    1. Quality of Link Code (1850, 1860, 1900, 1910)
    2. Remarks about Individuals
    3. Inputter Remarks

  4. Enumeration Date and Place
    1. Date (1850, 1860, 1900, 1910)
    2. House Number on Street (1900, 1910)
    3. Institution (1900, 1910)
    4. Street Address (1900, 1910)
    5. Post Office District (1860)
    6. Enumeration District (1900, 1910)
    7. Supervisor's District (1900, 1910)
    8. Political Ward (1900, 1910)
    9. Town
      • Name of Town (1850, 1860)
      • Name of Township (1900, 1910)
      • Name of Subdivision (1900)
      • Name of Incorporated City, Town, or Village (1910)
    10. County (1850, 1860, 1900, 1910)
    11. State (1850, 1860, 1900, 1910)

  5. Census Record Information

    There are also variables in the current data submission that reference the original Census record used for each recruit. The variables are given below:

    1. Family Number (1850, 1860, 1900, 1910)
    2. Dwelling Number (1850, 1860, 1900, 1910)
    3. Farm Schedule Number (1900, 1910)
    4. Library Call Number for Film (1850, 1860, 1900, 1910)
    5. Page Number of Census Manuscript (1850, 1860, 1900, 1910)
    6. Sheet Number of Enumeration (1900, 1910)

back to top
The Quality Code System

A quality code of 1, 2, 3, or 4 was assigned each time a veteran was successfully linked to one of the Census years, except those previously found in the Walker collection, where codes were not assigned (see above, The Walker Collection Data). The quality code indicates the reliability of the linkage or, in other words, the extent to which information from PEN and MSR verified the information in the Census. A quality code of 1 indicates the strongest match and 4 the weakest. Although effort was made to make the quality codes specific and objective, some subjectivity was involved in each assignment, particularly in codes 3 and 4. In all cases an individual who was considered found in the Census had to have had a name and an approximate age to match those in the recruit's PEN and MSR. However, the name may have been any one of several different names or spellings that appeared in the PEN and MSR.

Quality Code 1:
In addition to agreement of name and age, in order to earn a quality code of 1 a person found in the Census had to have two or more of the following identifying pieces of information:
  1. Place of birth (state or country)
  2. Father's name
  3. Mother's name
  4. Names of siblings
  5. Names of children
  6. Name of spouse
  7. Specific address

In 1850 and 1860, a quality code of 1 was justified by the father's and mother's name or siblings' names, as well as birthplace. In the 1850 Census, 1 was rarely assigned because the parents' names were seldom known. However, by 1860 more of the men in the sample were married and these were often a code 1 because the name of the spouse was known. In 1900 and 1910, names of a wife and children, or a specific address with house number and street justified a code of 1. The 1900 Census asked for the number of years married. When this number corresponded to the actual marriage date found in the pension records, it was used as an additional piece of identifying information.

Quality Code 2:
A quality code of 2 was given when specific names of family members were not known but there was corroborating information indicating a strong link with the recruit. In addition to agreement of recruit's name and age, at least two of the following criteria had to exist to justify a quality code of 2:
  1. Living in the expected place to be found. This could be a birthplace, enlistment place, or marriage place at a date close to the Census year.
  2. Skilled occupation. A match occurred when a skilled occupation was found that matched the occupation found in the PEN and MSR. This criteria was used frequently in geographic areas where most of the men on the Census were farmers. It was not as reliable in urban areas.
  3. Surname was unique. If in the county or township where the recruit was expected to be found there were no other families of the same surname, this criteria could be met. This could be determined from the index.
  4. A very uncommon name. This could be either a surname or a given name.
  5. Name and middle initial matched Pension information. If a person was found where expected and his age and birthplace matched the information found in the PEN and MSR, a name with middle initial could have justified a quality code of 2.
  6. Living next door or in close proximity to other recruits in the same company. Since companies were often formed by volunteers from the same town, this criteria could have justified a quality code of 2.

Quality Code 3:
A quality code of 3 was given when a person was living where expected and had matching information for name, age, and birthplace. A code of 3 was considered to be a good link and very often the person on the Census was thought by the researcher to be a "sure find," but lacking names of other family members, a higher quality code could not be given. Most veterans found in 1850 and 1860 are code 3 because they were usually of minor age with no parent, sibling, or spouse information.

Quality Code 4:
A quality code of 4 indicates that a possible link exists because the name and age matched the information from the PEN and MSR, but there was not enough information for the researcher to justify a higher quality code. This occurred frequently for recruits from large cities. The name, age, and birthplace may have matched the PEN and MSR information, but multiple possibilities caused the link to be uncertain. Immigrants without parental information were especially difficult to link in 1850 and 1860 when the date of entry to the United States was unknown. This meant that the only location available to the researcher was the place of enlistment, which may not have been the permanent place of residence.

Summary

Quality codes have been used with the Census data in an attempt to indicate the accuracy of linkage. The codes were designed to be as concise and objective as possible. However, there are many subtleties of Census research that cannot be codified. The codes should, nonetheless, prove to be valuable guides to data users.


back to top
Linkage Rates for Census Records

Some sample selection bias may arise in the use of the data in this ICPSR submission due to linkage failures: the failure to find a given individual from the main sample in the Census records. As noted earlier, XX% of veterans are linked to at least one of the four Census years. The direction and magnitude of the selection bias will depend on how closely the variables in the linked data are correlated with the factors that determine linkage to the Census manuscripts. Factors that are known to influence linkage to the Census data include date of death, migration from one state to another or within a state, movement into or out of different households, and socio-economic status.

Users of the Census data should also take note of the differences in variables across the different Census years. Some variables, such as birthplace (recbpl) or occupation (recocc) can be traced across all four Census years, while others, such as birth month (recbmo), or blindness, (recbnd), occur only in a particular Census year (in this case 1900 or 1910). Furthermore, it is possible that the quality of data differs across locations, years, and census takers.