COMPUSTAT Information

The COMPUSTAT database from Standard and Poors is one of the most extensive databases of business financial data available.  It provides annual and quarterly income statement and balance sheet data on over 10,000 firms dating back to the 1950s.

The files are all stored on unix in the /usr/finance/compustat  directory. 

This page documents a variety of things, from installing COMPUSTAT onto Unix from tape, to running jobs and accessing the data.  The file cutoff date for the 1999 files is 7/20/1999 this year.

The annual and quarterly files are each split into 3 categories:

If you want all firms that meet a certain criteria without survival bias, you need to access all three files for a given time period.

The annual files are also each split into 3 20-year time periods.  For the 1998 files these years are:

Notice that the backdata and wayback data overlap.  This is because COMPUSTAT wants to send 20 years worth of data on every annual tape.

The quarterly files are each split into 4 12-year time periods for a total of 48 quarters of data for each tape.  For the 1999 files these years are:

Again notice that in the case of the quarterly data, the way wayback data and the wayback data overlap.  I know the year overlaps don't look right for the current versus backdata.  You need to check the dates on the quarterly files as you extract data from them .

back to top

Fortran Access to COMPUSTAT Data

I am assuming that the COMPUSTAT data is loaded onto hard drive.

back to top

Installation

The COMPUSTAT tapes come zipped up via ftp.  This format is really the same as the tapes that were once shipped in an ASCII format on 2.5 Gb 8mm tapes.  These files have no end of block or end of record delimiters, so you have to have them inserted after they are unzipped.  The easiest way is to copy the unzipped files to ascii files by using the Unix dd command.  Below are examples of the dd command for copying the annual, quarterly, pst, and bif files

Annual Files

For the PST Annual, FCOTC Annual, Annual Research, Annual Backdata files use the following dd statement.  The blocksize and record sizes are both 8332. 

dd if=name_of_flat_input_file of=name_of_output_file ibs=8332 cbs=8332 conv=unblock

Be sure to name the output file something easily identifiable.  We put these on /usr/finance/compustat/  and call them
fcotc_way_backdata_1950-1969.dat
mrged_annual_research_backdata_1959-1978.dat
pst_annual_current.dat

Quarterly Files

For the PST Quarterly, FCOTC Quarterly, Quarterly Research, Quarterly Backdata files use the following dd statement.  The blocksize and record sizes are both

dd if=name_of_flat_input_file of=name_of_output_file ibs=27552 cbs=9184 conv=unblock

(I think the quarterly blocksize is 27552--or 3 records.  They changed it for the 1997 files)

We also put these in the /usr/compustat sub directory and call them
pst_qtrly_current.dat

Price, Dividends, and Earnings (PDE) Files

US PDE file
dd if=name_of_flat_input_file of=name_of_output_file ibs=19632 cbs=3272 conv=unblock
We call it ......

Canadian PDE file
dd if=name_of_flat_input_file of=name_of_output_file ibs=20928 cbs=3488 conv=unblock
We call it ......

Business Information Files (BIF)

Reference File of SIC Codes
dd if=name_of_flat_input_file of=name_of_output_file ibs=7200 cbs=80 conv=unblock
we call it.......

SIC File
dd if=name_of_flat_input_file of=name_of_output_file ibs=4800 cbs=240 conv=unblock
We call it  bif_sic.dat

Industry Segment File
dd if=name_of_flat_input_file of=name_of_output_file ibs=7740 cbs=774 conv=unblock
We call it bif_industry_segment.dat

Geographic Segment File
dd if=name_of_flat_input_file of=name_of_output_file ibs=8040 cbs=804 conv=unblock
We call it bif_geographic_segment.dat

Fortran access to the COMPUSTAT data is pretty straightforward after it is loaded onto disk.  Note, however, that there is a BUG IN THE SUN FORTRAN F90 VERISION 1.2 COMPILER that makes things interesting if you don't know about it!!!  The insidious little rascal will not allow you to read a record that is longer than 267 characters, and SUN has done very little to document its existence and its fix.  You will know that you have the bug if, when you try to access any file, except the SIC file, or the Reference File of SIC Codes, you get a lib-1217 runtime error.   There are two fixes.

1.  Don't use f90 to compile.  Use f77 to compile instead.  This works fine, except that you are probably using f90 to access the CRSP files, and those bozos at University of Chicago will only support f90 version 1.1 or later (and begrudgingly at that)--they won't provide the source codes for the runtime libraries so you can use whichever compiler you want.  This leaves you in the situation of having to use one compiler for COMPUSTAT programs, and another compiler for CRSP programs.  Kind of a pain in the pattootie.

2.  This is lots better.  Put in an recl=nnnn statement in the open statement where nnnn is the correct record length for the file.  Even though the compiler is supposed to ignore this command in a sequential access open, it needs it.

Compustat Tapes, Files, and DCB Information

Annual Industrial Files (Balance Sheet, Income Statement)

Tape Name filename on Unix (Note: I will rewrite these to binary unformatted files.   Then the file suffix will be .bin instead of .ascii) dcb information
PST ANNUAL Current pst_ann.ascii ibs=8332 cbs=8332 conv=unblock
FCOTC ANNUAL Current fcotc_ann.ascii ibs=8332 cbs=8332 conv=unblock
MRGED (PST&FCOTC) ANNUAL RESEARCH Current mrged_ann_res.ascii ibs=8332 cbs=8332 conv=unblock
CDN CDN$ ANNUAL Current cdn_ann.ascii ibs=8332 cbs=8332 conv=unblock
PST ANNUAL Backdata 1959-1978 pst_ann_back.ascii ibs=8332 cbs=8332 conv=unblock
FCOTC ANNUAL Backdata 1959-1978 fcotc_ann_back.ascii ibs=8332 cbs=8332 conv=unblock
MRGED (PST&FCOTC) ANNUAL RESEARCH Backdata 1959-1978 mrged_ann_res_back.ascii ibs=8332 cbs=8332 conv=unblock
PST ANNUAL Wayback 1950-1969 pst_ann_wback.ascii ibs=8332 cbs=8332 conv=unblock
FCOTC ANNUAL Wayback 1950-1969 fcotc_ann_wback.ascii ibs=8332 cbs=8332 conv=unblock
MRGED (PST&FCOTC) ANNUAL RESEARCH Wayback 1950-1969 mrged_ann_res_wback.ascii ibs=8332 cbs=8332 conv=unblock

 

Quarterly Industrial Files (Balance Sheet, Income Statement)

Tape Name filename on Unix (Note: I will rewrite these to binary unformatted files.   Then the file suffix will be .bin instead of .ascii) dcb information
PST QTRLY CURRENT pst_qtr.ascii ibs=27552 cbs=9184 conv=unblock
FCOTC QTRLY CURRENT fcotc_qtr.ascii ibs=27552 cbs=9184 conv=unblock
MRGED (PST&FCOTC) QTRLY RESEARCH CURRENT mrged_qtr_res.ascii ibs=27552 cbs=9184 conv=unblock
CDN CDN$ QTRLY Current cdn_qtr.ascii ibs=27552 cbs=9184 conv=unblock
PST QTRLY Backdata 1977-1988 pst_qtr_back.ascii ibs=27552 cbs=9184 conv=unblock
FCOTC QTRLY Backdata 1977-1988 fcotc_qtr_back.ascii ibs=27552 cbs=9184 conv=unblock
MRGED (PST&FCOTC) QTRLY RESEARCH Backdata 1977-1988 mrged_qtr_res_back.ascii ibs=27552 cbs=9184 conv=unblock
PST QTRLY Wayback 1966-1977 pst_qtr_wback.ascii ibs=27552 cbs=9184 conv=unblock
FCOTC QTRLY Wayback 1966-1977 fcotc_qtr_wback.ascii ibs=27552 cbs=9184 conv=unblock
MRGED (PST&FCOTC) QTRLY RESEARCH Wayback 1966-1977 mrged_qtr_res_wback.ascii ibs=27552 cbs=9184 conv=unblock
PST QTRLY Way Wayback 1962-1973 pst_qtr_wwback.ascii ibs=27552 cbs=9184 conv=unblock
FCOTC QTRLY Way Wayback 1962-1973 fcotc_qtr_wwback.ascii ibs=27552 cbs=9184 conv=unblock
MRGED (PST&FCOTC) QTRLY RESEARCH Way Wayback 1962-1973 mrged_qtr_res_wwback.ascii ibs=27552 cbs=9184 conv=unblock

 

Business Information Files

Tape Name filename on Unix (Note: I will rewrite these to binary unformatted files.   Then the file suffix will be .bin instead of .ascii) dcb information
Reference file of SIC Codes (missing)reference_sic.ascii ibs=7200 cbs=80 conv=unblock
SIC File (missing)sic.ascii ibs=4800 cbs=240 conv=unblock
S&P Index Fundamentals -- Annual compustat needs to send me documentation on this one

sandp_fundamentals.ascii

?what is this file?
BIF Industry Segment Current industry_segment.ascii ibs=7740 cbs=774 conv=unblock
BIF Geographic Segment Current geographic_segment.ascii ibs=8040 cbs=804 conv=unblock

back to top

Missing Observation Codes

When data is missing from the Compustat Database, Compustat assigns a missing observation code.  You have to be careful when reading the data, because the missing observations are just coded as special numbers, and you might process the data as if it were actual accounting data rather than a missing observation code.  Following are the codes that Compustat uses, and what they mean:

Code Meaning
-0.001 Data not available. 
-0.007 Not Meaningful
-0.004 Combined Figure.  This item is combined into another data item.
-0.008 Insignificant figure.  The company has reported this item as insignificant
-0.002 Semi-annual figure.  If the data is only available on a semi-annual basis, then this code appears in the first and third quarter.  Actual data appear in the second and fourth quarters.