2001 CRSP Files on the Unix system

Overview

The 2001 CRSP Access97 product as it is implemented on UNIX is MUCH MUCH MUCH BETTER than what we used to have!  It will take you a few minutes to change each program to run on Unix, but I believe that the time is well spent for the following reasons:

Advantages

1.  The new version allows random access of companies.  This means that you don't need to read the entire file to find a single, or multiple companies.  This means that your jobs will run LOTS FASTER (usually), and your programming will be EASIER.  

2.  The new version is on Unix.  Disk space is cheaper on Unix, and using Unix instead of MVS will save the department on the order of $20,000 a year!

3.  Directory structures on your Unix accounts are more like PC directory structures.  That means you can store all data files and program files for each project in its own directory, rather than having to name each program specially so you can figure out what project it is for.  You can also use 32 character names to name your program--which should allow more descriptive names and make it easier for you to do your research.

4. "They" are going to throw us off MVS in a couple of months anyway, so if we didn't migrate to Unix we will be SOL.  But since this is better--why complain?

 

Disadvantages (Things that are worse on Unix)

1.  The text editors still stink!  I'll teach you to use pico, but it sure would be nice to have a text editor that looks like xedit.  I'm working again on finding one for us that works like xedit on cms with a telnet connection.  There are some available, but I don't have "superuser" privileges, and I'm just plain stupid about Unix, so I have to go through the systems programmers for everything.

Things that are different, but neither better nor worse.

1.  No jcl.  However you have to compile and execute your fortran or sas program in separate steps.  I've written a fortran batch file that makes your commands really easy, so you don't need to learn anything special about unix fortran.   You will have to use different commands to open input and output files, but they actually easier than MVS jcl.

2.  CRSP calling routine names have changed.  You won't use bical or biget to get data.  Rather, you will us udopen and usgetxxx routines (described under the implementation section).  This isn't good, but it isn't bad either, because the new routines allow random access of the data.  Below I have some sample programs and access routines.

3.  If you use write(*,*) you will get output to your screen, rather than to an output listing like on MVS.  If you want to save what you would otherwise get to keep using write(*,*) on MVS, you need to write to an output file.  I'll include examples in the sample programs.

4.  Unix commands and filenames are CASE-SENSITIVE!  Most of the commands are lower case.  Of course you can name files anything you like.

5.  The fortran compiler that is used on Unix is Fortran 90.  It is backward compatible with Fortran 77, which is what we were used to running on MVS, so your old programs should work fine, once you replace the jcl, and stick in open statements for files.  However Fortran 90 recognizes lowercase letters in filenames, and these translate to lowercase letters when you create or read a file.  Just make sure you know how you name things.  The file OUTDATA.DAT is not the same file as outdata.dat.  

 

How to setup your account to access CRSP

1.  Make sure you have a Unix account.  If you receive email on Eudora or Internet Mail, or Outlook Express, then you have a Unix account.  If you still use CMS for your mail, then you might not have a Unix account.

2.  Log in to your account.  I do this using EWAN telnet.  You need to connect to unix.cas.utk.edu.  Set your terminal to a vt100 in EWAN.  Of course you need to know your password.  This will be the same as your email password (except those of you still on CMS mail).

3.  Once you are logged in, you should see a $ as your prompt.  If you see something else, like a % or a >, then you are using a "shell" that is different from the one I use.  I suggest that you change your shell to the korn shell.  Of course I have no earthly idea how you do that.  You will need to call 4-9800 and ask.  (I'll try and figure it out for you and put it in here!)

4.  enter the command

setup crsp

This will give you access to the fortran compiler, sas, all of the paths to the CRSP files, and the batch files I have written to make your life easier.

5.  Now you can get a program from cms, or write your own, and run a crsp job.   See the next section for how to do it.

 

Example or a fortran program to access CRSP

Notice these new items in the example program that follows:

include 'us.txt'
        This replaces the old INCLUDE(ALLINCLS) statement.  If you want to also access the data in the NMS files you need an additional line which says include 'nms.txt'.

open (unit=10,file='output.dat',status='new',form='formatted')
       This replaces the DD statment from JCL for MVS.   What this says is when you use the command write(10,998)--or whatever format line number you want, or if you use write(10,*) you will write to a file called output.dat.   If it doesn't exist, then the program will create it in the directory from which you have run the program.  If it already exists, the job will bomb because you have specified status = 'new'.  Here presumably you will write out formatted output.   If you want to use unformatted writes--ie if you like to write out with the statement write(10), then you would specify form='unformatted'.  Remember, if you use unformatted output, you won't be able to read it using pico.  I suggest formatted output unless either: 1) the dataset is too big otherwise, and you need to save the space, or 2) you are worried about the size of the numbers, and you want them written out in machine language so you can maintain precision.

Syntax for other opens:

open (unit=10,file='output.dat')
this is the simple example--and you will use it most of the time.  It gives sequential access, formatted output, if the file doesn't exist, it creates it, and if the file does exist, it overwrites it.

open (unit=10,file='output.dat',form='unformatted')
this allows you to write out as an unformatted file.  You will use the write(10)command form for this file type.

call udopen(nyseamex) 
     This, and the next statement, replace the old call bical and call biget statements. It opens the daily stock files, index files and calendar files for the nyseamex dataset.  If you want the indices for all nasdaq stocks, use the statement call udcal(nasdaq).  If you want the combined nyseamex and nasdaq indices, then use call udcal(nyseamex+nasdaq).  The variable nyseamex is set to 1, the variable nasdaq is 2, so nyseamex+nasdaq is 3, and therefore you could use the statement call udcal(1), or call udcal(2), or call udcal(3) the same way.  If you want the monthly file, use call umcal(nyseamex) etc.

call usgetper(1,68923,every,*900)
     There are several usgetxxx subroutines.  This one gets data for the firm with permno 68923.  It retrieves all of the data available (except nms data).  This call takes longer because it loads all of the price, return, volume etc data.  If you only want company information, and not time series information (things like distributions, names, exchanges etc. but not returns, prices, volumes)  then use call usgetper(1,68923,infos,*900)where the keyword infos replaces the keyword all.  This call is several hundred times faster.  If you want company information and returns, but not prices and volumes then use the keyword returns instead of infos.  Keywords you are likely to use are:

CRSP ACCESS Subroutine Program Keywords

Keyword Value Variable(s) Loaded
NONE 0 /HEADER/ block; PERMNO, CUSIP, NUMNAM, BEGDAT, etc.
INFOS 1 /INFO/ block; NAMES, DISTS, SHARES, DELIST, NASDIN
PRICES 2 PRC array in /DDATA/ block
RETURNS 4 RET array in /DDATA/ block
BIDS 8 BIDLO array in /DDATA/ block
ASKS 16 ASKHI array in /DDATA/ block
VOLUMES 32 VOL array in /DDATA/ block
SDEVS 64 SXRET array in /ADATA/ block
BETAS 128 BXRET array in /ADATA/ block
YEARLY 256 YRVAL and PRTNUM yearly arrays in /ADATA/ block
RETXS 512 RETX array in /ADATA/ block (monthly only)
PRICE2S 1024 PRC2 array in /ADATA/ block (monthly only)
ALL 7 short for INFOS+PRICES+RETURNS
EVERYXS 511 short for ALL+SXRET+BXRET+BIDS+ASKS+VOLUMES+YEARLY
EVERY 1855 short for ALL+BIDS+ASKS+VOLUMES+RETXS+PRICE2S+YEARLY

If you want some subset of these, just add them up.  For example the statement call usgetper(1,68923,infos+returns,*900)will give you company information plus returns.   Note the first argument is 1.  It doesn't mean a thing.  Its there to confuse you so just put a 1 there every time and don't worry about it.  The *900 means that the subroutine will branch to line 900 once you get to the end of the CRSP file.  See the Programmers Guide page 17 and 18 for more information about this.    Important note.  If you use the keyword inext instead of the permno, then the statment will make sequential calls to the dataset, and cycle through all 21,154 companies.  That is how you used to get to individual companies on MVS and it is how you can go through the whole database fishing for stuff.

The other forms of usgetxxx are:
call usgetcus(1,cusip,all,*900)
call usgethis(1,historical cusip,all,*900)
call usgetpco(1,permanent company number,all,*900)
call usgetsic(1,sic code,all,*900)

usgetcus gets companies by their current cusip number
usgethis gets companies by their historical cusip number
usgetpco gets companies by their permanent company number (permco)
usgetsic gets companies by their sic code.  See the Programmers Guide page 19 for more information about these forms.

Here is a sample program that cycles through the entire CRSP database and writes out some company information to a file called output.dat (remember the 6 spaces before the code in each line!)

      program sample1
      include 'us.txt'
      call udopen(nyseamex)
      open (unit=10,file='output.dat')
100  call usgetper(1,inext,infos,*900)
      write(10,990)cusip,permno,compnm(numnam),hexcd,hsiccd,
     *    caldt(begdat),caldt(enddat)
990  format(1x,a8,1x,i5,1x,a32,1x,i3,1x,i5,1x,i6,'-',i6)
      goto 100
900  call udclose
      stop
      end

Here is an example of a program to read a bunch of permnos from an external file and match them up for processing data:

      program sample2
      include 'us.txt'
      call udopen(nyseamex)
      open (unit=9,file='permnos.dat')
      open (unit=10,file='output.dat')
100  read(9,190)iperm
190  format(i5)
      call usgetper(1,iperm,every,*900)

c  insert your processing statements here
c
c  done with processing statments

      write(10,191) stuff you want to write out for that company
  191 format(how you want to format stuff)
      goto 100
900  write(*,*) stuff you want to write to the screen when you are done
      call udclose
      stop
      end

Notice that you didn't have to read in all the permnos and cycle through them, and the entire database to get the ones you want!

To see more examples, see APPENDIX A starting on page 91 of the Programmers Guide

 

How to run this job on Unix

Suppose you have some fortran source code in a file called getreturns.f and you want to run a CRSP job.  When you compile the program, it creates an executable file called a.out.  The listing looks like a.out* where the * indicates that it is an executable file.  To compile and run the program the commands you would enter are:

setup crsp
compile getreturns.f
a.out

(Note:  you only need to enter the command setup crsp when you first log on to Unix.  I guess you could set up your account to do this automatically for you whenever you log in)

when the job is done, your screen will display

stop

and you can look at your output.  If you have either compile errors or runtime errors, the screen will display them.

If you have lots of compile errors and want a listing of your source code and its errors, you can use a batch file I created for you.  However, this only works for files that end in .f  (seems stupid, but I can't get the program to work if it doesn't have a .f)  So if you have named your program something other than   xxxxx.f, you will need to copy it or rename it to xxxx.f  (xxxx is your program name.  don't really call it 'xxxx'!)  To get a listing enter

listing xxxx.f

the listing, and the errors will appear in a file in your directory called listing.out.   Open this in your editor to see what your compile errors are.

I haven't yet figured out how to identify the line numbers of run time errors and how to log them like you could on MVS.  However there is a debug facility that is easy to use.  If you get run time errors--like floating point overflows-- you can do the following:

dbx a.out

press the space bar until you get to the dbx prompt

(dbx)

enter (without the (dbx))

(dbx) catch FPE
(dbx) run

this will run the program until you get to a floating point error, give you the line number and the source code.  To find more, just enter run again.  To list more of the source code--say from lines 1 to 10 enter

(dbx) list 1 10

As of today I don't know how to save the errors generated by a program to a file

How to get your files from CMS onto your Unix account