@07@The following information on data and file formats standards
of the Mamala Bay Database Study Project has been faxed to all PI's
on 10/26/94. If you have not read the information, please take some
time to read it on-line or you can download this file (fstandrd.txt,
ASCII text file) in File Area 9: Utilities and Misc. Area.


Text of data and file format standards faxed to all PI's on 10/25/94
=======================================================================

We have received several data files from various investigators in
this program.  As we privately expected, few of these contain enough
information to allow a database user to figure out what they
contain.  We enclose requirements for information to be included in
submitted data files so that these files will be useful for current
and future investigators on Mamala Bay.

We are not doing this just to make your life difficult.  There are
several reasons why we think you should be concerned about making
your data available and accessible to all:

    If you are like me you will eventually forget exactly what you
     did and you will be glad your files are well labeled.

    Data in poorly labeled and identified files are subject to
     misuse by others,

    Part of the reason why this project came into being is that
     people could not agree on questions of fact.  If all of the
     data are available with sufficient detail that anybody can see
     and interpret the data, fewer questions will arise about what
     you did and what you found.

    You owe it to your co-investigators on the Mamala Bay project,
     and to the taxpayers footing the bill, to provide a very clear
     set of data and results.

If you have questions about these requirements, or would like to see
additional data included in the files, please let me know.

For those who have already sent in files, please resubmit these
files with the appropriate header information at your earliest
convenience.  After you have received this memo, we will no longer
accept data without the information described in the attachment.

STANDARD FILE FORMATS FOR MAMALA BAY DATA BASE

We have received a number of data files in various formats and with
different degrees of documentation.  In most cases the
documentation included in the files is either missing or
insufficient for users of the data to determine what the data are
or where they came from.

Therefore, we would like to impose standards for data format and
requirements for informative header information on all data files
uploaded or sent to us. Originally, in an attempt to create a
standard that would accommodate all users, we planned to base the
standards on the data formats being used by the various
investigators. However, we have received only a few data files so
far.  Therefore we will base the standards on these files, and if
necessary amend the standards when we have a better idea of the
range of data types and files we will receive.

The objective of this exercise is to provide users of the database
with usable data, both during and after the completion of the
Mamala Bay program.  To that end, all files must be usable with
minimum requirements for specialized software, and all must be
annotated so that a user can determine what the data are, where
they came from, how they were collected, and who collected them.

All data files residing in the Mamala Bay BBS should have the
following three common components which should allow a user to
easily identify and use the file:

1.   The data file must be in one of the file formats in the table
     below. These formats are now the "standard" file formats for
     the Mamala Bay Database Project (MB-2) and nearly all off-the-
     shelf software packages include an option to save or export
     files into one of these formats. If you have trouble or are
     unable to save data in one of these formats, please contact
     us.

 Data File Formats

 ͻ
  Type of     File        File      Description                     
  File        Format      Extension                                 
 ͹
  Spreadsheet Lotus 1-2-3 WK1       "Save As" Wk1 file option is    
                                    available in nearly all         
                                    spreadsheet applications. Use   
                                    standard Wk1 format without     
                                    Impress or Allways page layout  
                                    settings.                       
 Ĵ
  Database    dBase IV    DBF       "Save As" or "Export" to dbf    
                                    Format option is available in   
                                    nearly all database and         
                                    spreadsheet applications.       
 Ĵ
  Formatted & WordPerfect WP5       "Save As" WP 5.0/5.1 or ASCII   
  non-        5.0/5.1               text format option is           
  formatted   or ASCII              available in nearly all word    
  Text        text                  processing applications.        
 Ĵ
  ASCII       ASCII       ASC       All ASCII text data either      
  text                     or       must be delimited by commas     
  (data)                  TXT       and quotes or have fixed field  
                                    length (with field definition   
                                    file attached).                 
 


2.   All data files must include header information which explains
     and summarizes the data.  Lotus and ASCII files should contain
     this information internally as headers.  For database files,
     include with the data file an additional ASCII text file
     containing the header information.  This provides you, the
     investigator, the choice of information used to describe the
     dataset on the BBS.  It will also reduce or eliminate future
     questions about the data.  Thus, although this will cost you
     some effort now, it will probably save hassles later.

     If you send in multiple files containing similar data, you
     will need to let us know whether the new files contain updates
     (i.e. older versions can be deleted) or additional data to
     that which we already have.  If there are multiple files
     containing similar information (e.g. from successive sampling
     dates), you may be tempted to put the header information in
     only one file.  However, since it is easy to put the
     information in additional files and this sort of information
     does not take up much space, it would be better to include it
     in all files.

     At minimum, the header information should include the
     following (where applicable).

         The file name with extension

         Mamala Bay Project number and name, e.g., MB-10,
          Environmental Impacts of Receptors and Resources

         Person(s) and organization collecting the data or sample

         Person(s) entering, converting, or translating the data
          into the current data file

         Contact person's name, affiliation, and phone number in
          case there are questions about the data file (normally
          the PI of this study)

         A detailed file description.  This will be copied into
          the BBS and used as the "detailed file description" for
          users to view while on-line using the [I]nfo option.
          Thus, this description should include enough information
          to allow a user to decide whether to download the data.
          It should be concise, in sentences or phrases, and fully
          descriptive of the content of the file.

         Structure of the data file.  This should indicate what
          the rows and columns of data contain.  Data structures
          include:

          -    Flat file (each row containing all information
               about a particular datum, such as date, time,
               station, depth, taxon, abundance)

          -    Table, in which rows and columns represent two
               dimensions of a matrix (e.g., rows are taxa and
               columns are stations)

          -    Multi-table, in which a third (or even fourth)
               dimension of a data matrix is represented by
               multiple tables having the same structure (e.g.,
               rows are taxa, columns are stations, and each table
               represents a different sampling date)

         Time period of data or sample collection or measurement
          This should include the date or range of dates or times
          depending on content.  This should enable a reader to
          distinguish among multiple files containing similar
          information.

         Locations of data collection, including station
          identification and latitude and longitude.  Station
          identifiers may be used for the data tables, and latitude
          and longitude may be given in a key in the header.

         A list of variables included in the file, in the same
          order as they appear in the data set.  This should
          include the variable name as it appears in the table and
          a description of what was measured.

         For each variable measured, the methods of data
          collection, measurement, and analysis.  This should be as
          complete as possible, but also concise; ordinarily,
          citations either to the open literature or to documents
          in the Mamala Bay collection will suffice if methods have
          not changed.  If necessary to save space, this material
          can be put only in the first file of a series; however,
          if that is done the file descriptions of all files should
          indicate the location of the full descriptions.

         Any other pertinent information

3.   Generally follow the data format layout described below.

If the format is Flat File, be sure that each row of the data table
contains all of the information needed to describe the measurements
contained in that row.  This format is most suitable for dBase or
ASCII files.  The table or multi-table format is more suitable for
spreadsheet files.

All data fields must be adequately labeled.  It must be clear to a
new reader what has been measured, where, how, when, and how many
times.  Data field descriptions should correspond exactly to the
descriptions in the header.


All electronic file transfer via the BBS should be done with
compressed (e.g., PKZIP) files to reduce the on-line time and to
conserve hard disk space. Shareware versions of file
compression/de-compression utilities PKZIP and PKUNZIP are
available for download in File Area 9: Utilities and Misc. Area.
PKZIP allows compressing multiple files into one compressed file
which makes it very convenient to include the descriptive ASCII
text file with data file(s).

