
Microdata vs. Aggregated Data |
One of the best ways to understand what microdata is would be comparing how it is different from aggregated data. Aggregated Data
Click here to see an exemplary microdata (raw data) file on the the website of Data and Program Library Service at the University of Wisconsin - Madison Click here to see a microdata file that has been read into SPSS.
|
Codebooks and Microdata |
Raw data in microdata files are often in ASCII format and compressed. In microdata sets, numbers may or may not be delimited with space, commas, lines, or tabs, thus which number corresponds to which variable is an essential issue. Therefore, a codebook or data dictionary is a must for downloading and using a microdata file. Without codebooks or data dictionaries to specify the exact location (columns and rows) of each variable and its value, a data file would be just a collection of meaningless numbers. These are some of the resources discussing or demonstrating the importance of codebooks for acquiring and using microdata or other machine-readable statistical data sets:
An example demonstrating how a codebook is meaningful to a data set, in a powerpoint presentation entitled "Statistical Literacy and the Role of Data Services: The Social Sciences", by Elissa Cochran, Ann Fiegen, Chris Kollen, and Cathy Larson.
A FAQ answer provided by Statistical Services of the University of Texas at Austin An article that describes the information contained in the online codebooks that the UCLA's Advanced Technology Service has made available to help users use the Online Census Data Files. |
PUMS, IPUMS, PUMF, and SAR |
It was the advent of computers that made it possible to process, store and distribute anonymized electronic data. The USA led the field in the release of microdata. The first microdata files were released from the 1960 U. S. Census - although retrospective microdata files have later been extracted for earlier years. The U. S. microdata was first called Public Use Sample (PUS), and renamed as Public Use Microdata Sample (PUMS) in 1980. Canada first released public use microdata files (PUMFs) from the 1971 Census and have continued this policy for every quinquennial census since then. Australia first produced microdata files for its 1981 Census. In U. K., the practice of releasing samples of anomymized records was accepted by the Census Offices in 1989 and heavily influenced by the US and Canadian experiences. Public Use Microdata Samples (PUMS) in USA
Three different files
were released from the 1990 Census: 5 percent and 1 percent samples of
housing units, and 3 percent sample of the elderly. The 5 percent and 1
percent samples have the same content and are structured in a way that
the relationship between individuals in the same households is retained.
The difference between these two files is in the geographic coverage of
the public use microdata area (PUMA):
The Social History
Research Laboratory at the University of Minnesota has also proposed to
adapt this system, to internationalize IPUMS, by incorporating census microdata
samples for the highest quality censuses with the longest time-spans from
all other countries in the world. IPUMS-International ( IPUMSi) proposes
to integrate individual level census samples for a large number of countries
into a single databank. The plan is, first, to standardize census microdata
for selected countries from the 2000 round of censuses to the earliest
available date (usually the 1960s or 1970s), and then, to distribute the
integrated databank via the WWW, CD-ROM or other means suitable for the
delivery of massive datasets.
Public Use Microdata Files (PUMF) in Canada
PUMFs (formerly known as Public Use Sample Tapes (PUSTs) contain samples of anonymized responses to the long form, 2B census questionnaires in respective censuses. Three files are available: an Individual file, a Household and Housing file, and a Family file. Microdata files provide access to unaggregated data. However, to ensure the anonymity of the respondents, geographic identifiers are in most cases restricted to the provinces/territories and large metropolitan areas. The sample size for the original set of microdata files from the 1971 Census was 1 per cent. This increased to two per cent in the 1980s for the individual file and was increased to 3 per cent for all three files for the 1991 Census. But it went down to 2.8 per cent for the PUMFs from the 1996 Census. Samples of Anonymised Records (SARs) in U. K. In U. K., samples of microdata were produced for the first time following the 1991 Census. Two SARs have been extracted from the 1991 census for the U. K.:
Similar files are planned for the 2001 Census of Britain with the sample size increasing to 3 per cent. Requests have been made for SARs to be released from the British censuses prior to the 1991 Census.
|
Is Microdata What You Are Looking For? |
If you are looking for
statistical data sets, and you feel your data and research have to meet
one or more of the following criteria, then microdata or PUMS (and its
international counterparts) might fit your data needs.
|