You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
220 lines
9.0 KiB
220 lines
9.0 KiB
# @(#)PORTING.NOTES 2.1.8.1
|
|
|
|
Table of Contents
|
|
==================
|
|
1. General Program Structure
|
|
2. Naming Conventions and Variable Usage
|
|
3. Porting Procedures
|
|
4. Compilation Options
|
|
5. Customizing QGEN
|
|
6. Further Enhancements
|
|
7. Known Porting Problems
|
|
8. Reporting Problems
|
|
|
|
1. General Program Structure
|
|
|
|
The code provided with TPC-H and TPC-R benchmarks includes a database
|
|
population generator (DBGEN) and a query template translator(QGEN). It
|
|
is written in ANSI-C, and is meant to be easily portable to a broad variety
|
|
of platforms. The program is composed of five source files and some
|
|
support and header files. The main modules are:
|
|
|
|
build.c: each table in the database schema is represented by a
|
|
routine mk_XXXX, which populates a structure
|
|
representing one row in table XXXX.
|
|
See Also: dss_types.h, bm_utils.c, rnd.*
|
|
print.c: each table in the database schema is represented by a
|
|
routine pr_XXXX, which prints the contents of a
|
|
structure representing one row in table XXX.
|
|
See Also: dss_types.h, dss.h
|
|
driver.c: this module contains the main control functions for
|
|
DBGEN, including command line parsing, distribution
|
|
management, database scaling and the calls to mk_XXXX
|
|
and pr_XXXX for each table generated.
|
|
qgen.c: this module contains the main control functions for
|
|
QGEN, including query template parsing.
|
|
varsub.c: each query template includes one or more parameter
|
|
substitution points; this routine handles the
|
|
parameter generation for the TPC-H/TPC-R benchmark.
|
|
|
|
The support utilities provide a generalized set of functions for data
|
|
generation and include:
|
|
|
|
bm_utils.c: data type generators, string management and
|
|
portability routines.
|
|
|
|
rnd.*: a general purpose random number generator used
|
|
throughout the code.
|
|
|
|
dss.h:
|
|
shared.h: a set of '#defines' for limits, formats and fixed
|
|
values
|
|
dsstypes.h: structure definitions for each table definition
|
|
|
|
2. Naming Conventions and Variable Usage
|
|
|
|
Since DBGEN will be maintained by a large number of people, it is
|
|
particularly important to observe the coding, variable naming and usage
|
|
conventions detailed here.
|
|
|
|
#define
|
|
--------
|
|
All #define directives are found in header files (*.h). In general,
|
|
the header files segregate variables and macros as follows:
|
|
rnd.h -- anything exclusively referenced by rnd.c
|
|
dss.h -- general defines for the benchmark, including *all*
|
|
extern declarations (see below).
|
|
shared.h -- defines related to the tuple definitions in
|
|
dsstypes.h. Isolated to ease automatic processing needed by many
|
|
direct load routines (see below).
|
|
dsstypes.h -- structure definitons and typedef directives to
|
|
detail the contents of each table's tuples.
|
|
config.h -- any porting and configuration related defines should
|
|
go here, to localize the changes necessary to move the suite
|
|
from one machine to another.
|
|
tpcd.h -- defines related to QGEN, rather than DBGEN
|
|
|
|
extern
|
|
------
|
|
DBGEN and QGEN make extensive use of extern declarations. This could
|
|
probably stand to be changed at some point, but has made the rapid
|
|
turnaround of prototypes easier. In order to be sure that each
|
|
declaration was matched by exactly one definition per executatble,
|
|
they are all declared as EXTERN, a macro dependent on DECLARER. In
|
|
any module that defines DECLARER, all variables declared EXTERN will
|
|
be defined as globals. DECLARER should be declared only in modules
|
|
containing a main() routine.
|
|
|
|
Naming Conventions
|
|
------------------
|
|
defines
|
|
o All defines use upper case
|
|
o All defines use a table prefix, if appropriate:
|
|
O_* relates to orders table
|
|
L_* realtes to lineitem table
|
|
P_* realtes to part table
|
|
PS_* relates to partsupplier table
|
|
C_* realtes to customer table
|
|
S_* relates to supplier table
|
|
N_* relates to nation table
|
|
R_* realtes to region table
|
|
T_* relates to time table
|
|
o All defines have a usage prefix, if appropriate:
|
|
*_TAG environment variable name
|
|
*_DFLT environment variable default
|
|
*_MAX upper bound
|
|
*_MIN lower bound
|
|
*_LEN average length
|
|
*_SD random number seed (see rnd.*)
|
|
*_FMT printf format string
|
|
*_SCL divisor (for scaled arithmetic)
|
|
*_SIZE tuple length
|
|
|
|
3. Porting Procedures
|
|
|
|
The code provided should be easily portable to any machine providing an
|
|
ANSI C compiler.
|
|
-- Copy makefile.suite to makefile
|
|
-- Edit the makefile to match the name of your C compiler
|
|
and to include appropriate compilation options in the CFLAGS
|
|
definition
|
|
-- make.
|
|
|
|
Special care should be taken in modifying any of the monetary calcu-
|
|
lations in DBGEN. These have proven to be particularly sensitive to
|
|
portability problems. If you decide to create the routines for inline
|
|
data load (see below), be sure to compare the resulting data to that
|
|
generated by a flat file data generation to be sure that all numeric
|
|
conversions have been correct.
|
|
|
|
If the compile generates errors, refer to "Compilation Options", below.
|
|
The problem you are encountering may already have been addressed in the
|
|
code.
|
|
|
|
If the compile is successful, but QGEN is not generating the appropriate
|
|
query syntax for your environment, refer to "Customizing QGEN", below.
|
|
|
|
For other problems, refer to "Reporting Problems" at the end of this
|
|
document.
|
|
|
|
4. Compilation Options
|
|
|
|
config.h and makefile.suite contain a number of compile time options intended
|
|
to make the process of porting the code provided with TPC-H/TPC-R as easy as
|
|
possible on a broad range of platforms. Most ports should consist of reviewing
|
|
the possible settings described in config.h and modifying the makefile
|
|
to employ them appropriately.
|
|
|
|
5. Customizing QGEN
|
|
|
|
QGEN relies on a number of vendor-specific conventions to generate
|
|
appropriate query syntax. These are controlled by #defines in tpcd.h,
|
|
and enabled by a #define in config.h. If you find that the syntax
|
|
generated by QGEN is not sufficient for your environment you will need
|
|
to modify these to files. It is strongly recomended that you not change
|
|
the general organization of the files.
|
|
|
|
Currently defined options are:
|
|
|
|
VTAG -- marks a variable substitution point [:]
|
|
QDIR_TAG -- environent variable which points to query templates
|
|
[DSS_QUERY]
|
|
GEN_QUERY_PLAN -- syntax to generate a query plan ["Set Explain On;"]
|
|
START_TRAN -- syntax to begin a transaction ["Begin Work;"]
|
|
END_TRAN -- syntax to end a transaction ["Commit Work;"]
|
|
SET_OUTPUT -- syntax to redirect query output ["Output to"]
|
|
SET_ROWCOUNT -- syntax to set the number of rows returned
|
|
["{return %d rows}"]
|
|
SET_DBASE -- syntax to connect to a database
|
|
|
|
6. Further Enhancements
|
|
|
|
load_stub.c provides entry points for two likely enhancements.
|
|
|
|
The ld_XXXX routines make it possible to load the
|
|
database directly from DBGEN without first writing the database
|
|
population out to the filesystem. This may prove particularly useful
|
|
when loading larger database populations. Be particularly careful about
|
|
monetary amounts. To assure portability, all monetary calcualtion are
|
|
done using long integers (which hold money amounts as a number of
|
|
pennies). These will need to be scaled to dollars and cents (by dividing
|
|
by 100), before the values are presented to the DBMS.
|
|
|
|
The hd_XXXX routines allow header information to be written before the
|
|
creation of the flat files. This should allow system which require
|
|
formatting information in database load files to use DBGEN with only
|
|
a small amount of custom code.
|
|
|
|
qgen.c defines the translation table for query templates in the
|
|
routine qsub().
|
|
|
|
varsub.c defines the parameter substitutions in the routine varsub().
|
|
|
|
If you are porting DBGEN to a machine that is not supports a native word
|
|
size larger that 32 bits, you may wish to modify the default values for
|
|
BITS_PER_LONG and MAX_LONG. These values are used in the generation of
|
|
the sparse primary keys in the order and lineitem tables. The code has
|
|
been structured to run on any machine supporting a 32 bit long, but
|
|
may be slightly more efficient on machines that are able to make use of
|
|
a larger native type.
|
|
|
|
7. Known Porting Problems
|
|
|
|
The current codeline will not compile under SunOS 4.1. Solaris 2.4 and later
|
|
are supported, and anyone wishing to use DBGEN on a Sun platform is
|
|
encouraged to use one of these OS releases.
|
|
|
|
|
|
8. Reporting Problems
|
|
|
|
The code provided with TPC-H/TPC-R has been written to be easily portable,
|
|
and has been tested on a wide variety of platforms, If you have any
|
|
trouble porting the code to your platform, please help us to correct
|
|
the problem in a later release by sending the following information
|
|
to the TPC D subcommittee:
|
|
|
|
Computer Make and Model
|
|
Compiler Type and Revision Number
|
|
Brief Description of the problem
|
|
Suggested modification to correct the problem
|
|
|
|
|