# @(#)HISTORY 2.1.8.3 Changes as of 10/11/99 -- versions: TPCH 1.2.0a, TPCR 1.1.0a -- Correction to segmented updates that was causing extra file to be generated -- Porting changes for DigUnix Changes as of 08/28/99 -- versions: TPCH 1.2.0, TPCR 1.1.0 -- reduced parameter substitution range for Q18 -- added new option to specify location of dists file (-b) -- added DBGEN option to suppress all output (-q) Changes as of 08/16/99 -- versions: TPCH 1.1.0a, TPCR 1.0.1e -- prevent "reuse" of original data in update files -- correction to lint target in makefile.suite -- removal of vestigal l_partkey predicate from 21.sql -- reorder lineitem/order join in q5 -- removal of table aliases from 2.sql -- randomize seeding of qgen RNG to close bug 52 -- correct possible round off error in segmented update files -- corrected soft copy answer set for Q22 -- corrected percision of answer set for Q19 Changes as of 07/08/99 -- versions: TPCH 1.1.0, TPCR 1.0.1 -- WORKLOAD must be set to either TPCH or TPCR in the makefile -- unneeded reference to part table removed from q21 template Changes as of 06/04/99 -- version 1.0.1d -- Restarted version numbering to match specification revisions for TPC-H and TPC-R -- Corrected answer set for for Q13 -- Corrected parameter substitutions for Q16, Q17, Q19, Q20, Q21, Q22 -- Corrected RNG initialization in qgen.c -- added adhoc.c adhoc.h to code base to support randomized data sets; currently disabled -- replaced calls to UnifInt() row_stop with call to NthElement() -- Corrected a problem that caused small negative money values to print as a positive value -- Simplication of PR_xxx macros -- QGEN building correct parameter logs again ****************** * NOTE NOTE NOTE * ****************** Below this line the file refers to TPC-D which was retired in favor of TPC-H and TPC-R. Since the new speicifications are numbered from 1.0.0 the program version was reset. ****************** * NOTE NOTE NOTE * ****************** Changes as of 01/05/99 -- version 2.0.1 -- added 1999 to the copyright notice -- corrected C++ compilation problem -- sub-select phrasing corrected in Q4, Q21, Q22 -- added support for segmenting update files (contributed by Larry Kemp, HP) Changes as of 12/08/98 -- version 2.0.0 -- removed permute.h from clean target in makefile Changes as of 11/17/98 -- version 2.0.0 Alpha 8 -- corrected o_custkey overrun bug -- removed upper bound on -C command option -- added static permute.h to distribution to match the specification Changes as of 10/23/98 -- version 2.0.0 Alpha 7 -- removed references to DSS_SEED and SEED_TAG -- minor query template cleanup -- V2 answer sets added -- correction to hd_sparse for SF > 300 -- added static declaration to row types in gen_tbl to fix update problem -- permuted params to Q22 Changes as of 5/19/98 -- version 2.0.0 Alpha6b -- removed trailing apostrophe from dists.dss nouns for Tandem loader -- corrected mk_sparse() problem with alpha6 -- added 64b support for NCR/Metaware -- corrected revision problem with 2.0.0.6 Changes as of 5/7/98 -- version 2.0.0 Alpha6 -- corrected generation of parent/child tables in parallel -- renamed ORDER table to ORDERS table -- revision of DBGEN synced with revision of 2.0 specification -- portability changes to process termination provided by John Matzka -- portability changes for Watcom C provided by Andrew Eisenberg -- indentation of specifications/templates now matches -- queries now include a consistant header format Changes as of 4/28/98 -- version 2.0.0 Alpha5 -- NO RELEASE OF ALPHA 5 ; skipped to sync spec/DBGEN revision levels Changes as of 4/6/98 -- version 2.0.0 Alpha4 -- corrected parallel table generation -- minor corrections to query templates -- portability changes for HP Changes as of 3/24/98 -- version 2.0.0 Alpha3 -- include substitution parameters for Q22 -- correct substitution parameters for Q16 under AIX -- include permute.h until unix/NT makefile fix -- correct orderkey generation Changes as of 3/20/98 -- version 2.0.0 Alpha2 -- correct runtime malloc error from bad INIT_HUGE macro -- improve pseudo text distribution in comments -- fix problem with parallelism of data gen -- re-enable generation of parent/child tables -- remove recombinaton code for parallel flat files Changes as of 3/11/98 -- version 2.0.0 Alpha1 -- removed the TIME table -- removed the need for seed files -- made 1GB the validation database size -- add pseudo text support in comments -- correct character selection in a_rnd() -- correct population of P_NAME -- removed unclaimed variants -- added new queries 18-22, replaced Q13 Changes as of 2/6/98 -- version 1.3.1 -- Revised 64 bit support to clean up bcd2_bin()and mk_sparse() -- Add 64b support for NT Changes as of 12/31/97 -- version 1.3.0 -- support for seed generation > 1TB (data gen still to be tested) -- rework of 64b support -- added bcd support for subtraction, comparison, modulo -- added 1998 to the copyright notice -- clarified comments in dists.dss -- corrected substitution problem in Q11 -- standardized fopen() error messages with OPEN_CHECK() -- introduced PATH_SEP in config.h to allow changes in path separators Changes as of 12/15/96 -- version 1.2.0 -- corrected typos in queries 8a, 8c, 8d, 11a, 12F and 14F, 17a -- added variant 15c -- defined MAX_SCALE and MIN_SCALE; issued error messages for SF > 1000 since implementation is incomplete -- seed file generation can now be resumed with dbgen -R ... -- corrected slight compile bug under Solaris 2.5.1 -- documented compile problems under SunOS Changes as of 8/1/96 -- version 1.1.0D -- included new variants for queries 8 and 15 -- re-introduced answer sets in the source tree Changes as of 5/1/96 -- version 1.1.0C -- unified version numbering of DBGEN and QGEN -- updated BUGS list -- removed FAQ from soft appendix; web site will keep the current version of the FAQ -- added 1996 to the copyright notice -- corrected bug in PR_DATE macro; NO CHANGE TO DATA SET -- properly initialize param values for cleaner logging -- adjusted output format of Q11 partam to allow scaling to 1TB -- corrected typos in variant 14c -- corrected data type for YEAR in variant 8c -- corrected typos in variant 10a -- added variant 8d Changes as of 1/23/96 -- qgen version 1.1.0B -- include support for ANSI semantics -- improved patch for seed sensetivity Changes as of 1/23/96 -- updated BUGS list -- dbgen version 1.1.0A -- patch to limit BCD2 fields to 12 characters for columnar output -- qgen version 1.1.0A -- patch to fix the "unknown flag" problem -- patch to fix the seed sensetivity problem Changes as of 12/19/95 -- updated BUGS list -- dbgen version 1.1.0 -- upped default value of MAX_CHILDREN to 1000 -- corrected naming of detail tables in incremental load -- corrected range delete output -- forced delete files to truncate existing files -- removed fixed size tables from seed generation -- corrected overflow problem with large scale seed generation -- allow date generation as MM-DD-YY based on config.h #define -- correct truncation problem with columnar output in PR_VSTR() -- added support for Windows NT -- added PLATFORM macro to makefile, removed platform defines from config.h -- removed MAX_CHILDREN define from config.h (set to 1000 in dss.h) -- qgen version 1.1.0 -- correct SET_OUTPUT macro to TDAT -- use %ld in output for q17; portability -- add support for SQLSERVER database dialect -- add support for SYBASE database dialect -- adjust parameter ranges for Q1, Q3, Q6 -- add -T/-t option to usage summary -- added support for Windows NT Changes as of 09/01/95 -- qgen version 1.0.1 -- formalized version numbering -- -p now generates correct query permutations -- added separate verion number for qgen -- corrected Q3 substitution problem -- updated permissible range for Q10 -- corrected rowcount_dflt and the MAX row indicator (-1) -- expanded param logging to include all possible parameters -- allowed qgen's -d option to be used at all scale factors -- made parameter substitution permutation-independent -- added qgen suppport for END_TRAN (-E) and DFLT_NUM (-N) -- correct handling of :n directive -- added more complete explanation of QGEN to README -- rename of random to rndm, for portability -- dbgen version 1.0.1 -- formalized version numbering -- inclusion of SF=1 seed file -- correct typo in usage() update example -- patch to driver.c to allow correct updates -- documentation change to README to clarify seed/stage/update intereaction -- corrected minor glitch in "open failed" error msg in print.c -- added missing line continuation to makefile.suite -- seed files are now based on scale factor and number of generators -- seed files now hold seeds for one "step" of a given build -- clean up of parallel load routines -- inclusion of faster seed generation routines from Susanne Englert -- removed the -E(xisting) option -- assure proper scaling of O_CUSTKEY -- corrected default update percentage -- proper handling of child tables with '-O f' -- removed seed files from the distribution -- modified rpb_routine() to limit contribution of partkey in retailprice -- added '-S(tep)' option to allow multi-stage loads -- roll in of 32 bit speed_seed routines from Dick Shelton -- miscelaneous typo corrections in the documentation -- cleanup of usage output Changes as of 05/08/95 -- version 1.0 -- add Teradata defines to tpcd.h for QGEN -- add :c to query templates for database CONNECT syntax -- add examples of DBGEN and QGEN usage to README -- add -T option to qgen to allow time able usage -- query template names only requre .sql suffix, rest is arbitrary Changes as of 03/13/95 -- version 9.1 -- surround DBNAME with ifndef in config.h -- remove -DDBNAME from makefile.suite -- sync varchar handling with 9.1 draft Changes as of 02/21/95 -- version 9.0a -- fixed bug in qgen that incorrectly included rnd.h -- included revised DDL with changes for char/varchar and l_quantity -- updated DBGEN help message to include new single table options for order/lineitem and part/partsupp -- included handling for multi-set seed files TPCDSEED.xxx -- generated seeds up through 400GB; headed to 1TB! -- ANSI lint cleanup; more needed -- UF2 now defaults to key lists; use "-O r" to generate key ranges also note, this routine this routine does NOT use the BCD2_* routines. As a result, it WILL fail if the keys being deleted exceed 32 bits. Since this would require ~660 update iterations, this seems an acceptable oversight Changes as of 01/19/95 -- version 9.0 -- allowed command line seeding of RNG for QGEN -- order and number of params in QGEN now matches presentation in spec -- fixed bug in time table format of O_ORDERDATE -- changed l_QUANTITY to FLOAT in dss.ddl -- reworked QGEN options to be more useful -- allowed creation of sparse keys beyond 32 bits (for 1TB) -- removed unused '#ifdef' and associated code -- allowed independent generation of master/detail tables (eg, order/lineitem) Changes as of 12/06/94 -- version 8.6 -- fixed renaming of flat files for child tables -- various documentation fixes -- added naming convention section to Porting.Notes -- added -DIBM flag to config.h -- synced up QGEN with draft 8.1 Changes as of 10/25/94 -- version 8.5a -- corrected bug in columnar output of pr_supp -- added pr_drange to generate a list of order keys to be deleted instead of generating SQL -- added '-O d' to generate range delete as SQL -- updated default values for QGEN to sync with spec 8.1 -- corrected MK_SPARSE to reflect groups of 8 -- corrected a bug in o_orderstatus -- regenerated seed files for SF in [1,10] -- ANSI cleanup (primarily function declarations) Changes as of 10/11/94 -- version 8.5 -- remove deletes/inserts to other than order/lineitem -- increased cardinality for part.type part.container -- '-r' argument is now integer; percentage in basis points -- initial roll-in of new update scheme -- added BBB comments to supplier table Changes as of 9/27/94 -- version 8.4 -- all money calculations now use integer math. This should bring everyone's data sets into exact aggreement. Changes as of 9/21/94 -- version 8.3b -- fixed handling of MAX_STREAM -- added floor function to RPRICE bridge -- misc lint cleanup (type fixes, new prototypes, etc.) -- MONEY format becomes lf for DOS -- further cleanup of PR_VSTR and its length argument -- change to parameter generation for Q6 to allow for float discount Changes as of 9/15/94 -- version 8.3a -- isolated MONEY format for Unisys (Lf) using DOS -- make sure all arguments to MAKE_MONEY were double's -- rolled in NEW_PTEXT to allow Berni to experiment Changes as of 9/12/94 -- version 8.3 -- added -T n and -T r to usage to match getopt() and README -- changed PR_MONEY to remove leading blanks -- included revised DDL from Berni -- included some MVS portability fixes in re malloc.h -- cleaned up error messages in qgen and made #define ofp usage universal -- additional DOS portability changes -- added {c,a}len to provide specific length for columnar output of varchar -- added PR_VSTR to handle varchar printing under MVS -- fixed bit masking in a_rnd and cleaned up prototype match with V_STR -- PR_MONEY now used %Lf -- added revised pseudo text under NEW_PTEXT ifdef for experiments Changes as of 9/09/94 -- version 8.2 -- l_discount and l_tax are now fractional (per teleconference) -- money calculations moved to scaled integer math to clean up answer sets -- changed PR_FLT() to PR_MONEY to clarify usage -- portability changes for SYBASE: dbname --> db_name STATUS --> DBGEN_STATUS -- added nations2 to dists.dss to handle qgen needs for now -- reintroduced #ifndef DOS -- reintroduced U2200 define to control kill_load() -- broke out nation and region separately in -T option -- updated dss.ddl based on mail from Berni Changes as of 8/31/94 -- version 8.1 -- scaling for clerks needed to be 1000 (was 100) -- added qgen parameter for scale -- changed qgen parameter from s)tream to p)ermutation -- synced qgen paramter values with 8.0 spec -- corrected duplications in dists.dss Changes as of 8/24/94 -- version 8.0 -- added sparse keys to lineitem/order -- added varchar generation for comments/addresses -- added variable lineitems/orders -- removed ifdef for normalized code_tables -- included code for parameter generation and template->EQT routines -- updated README and Porting.Notes to reflect QGEN -- included DDL and RI examples from Berni Changes as of 6/15/94 -- version 7.0b (numbers now match spec revsion) -- rework of code tables to properly map nation/region; when compiled with -DCODE_TABLES distributions are taken from code.dss and two additional fields are generated for customers and suppliers, [cs]_ncode and [cs]_rcode, immediately following [cs]_region -- replaced ifdef's around DEAD_DATA with opposites. DEAD_DATA is now the default -- worked through code to see that it conformed to 7.0 specification -- adjusted scale factors/rowcounts for 1 GB == sf1 -- brought help message in line with current code -- fixed order per customer at 10 -- make suppkey scalable in lineitem/partsupp Changes as of 4/25/94 -- version 1.5 -- added the customers with no orders; Compile with -DDEAD_DATA to activate the change. -- added the code table for nation and region; Compile with -DCODE_TABLES to activate the change. Changes as of 3/17/94 -- version 1.41 -- completed implementation of JULIAN_DAY after talks with Berni -- misc cleanup in usage/README files -- removed all tabs and capped line length at 75 -- added -n option to allowing naming of inline-loaded database Changes as of 3/16/94 -- version 1.4 -- prottyped julian day/month for query re-write work. Compile with -DJULIAN_DAY to enable -- removed gen_times() from driver.c -- added VMS ifdef to config.h to clean up fork/signal issues -- added ICL ifdef to config.h to clean up getopt() issues -- changed header file references to config.h from machine.h Changes as of 3/2/94 -- version 1.31 -- corrected format of C_NAME to match S_NAME and O_CLERK -- re-allowed fractional scale factors < 1 (updates not contiguous) -- added DSS_CONFIG environemnt variable -- reworked read_dist() to look for DSS_DIST in DSS_CONFIG -- updated the README file Changes as of 2/16/94 -- version 1.3 -- added command line options for parallel load and data set expansion -- changed dists.dss delimiter to | for portability -- limited scale factors to integer values -- added command line option for seed file generation -- added all seed files to distribution for SFs 1 - 10 -- moved machine.h to config.h and added MAX_CHILDREN define -- added 'f' flag to options to allow renaming of output files -- added generation of SQL delete statements to match updates (Note: updates are still single-threaded; -C is cleared by -U) -- corrected field sizing in dsstypes.h typedefs to match v 6.4 -- update percentage default set to 1% Changes as of 12/3/93 -- version 1.2 -- added command line option to adjust update percentage -- fixed update gneration for proper primary key ordering -- renamed UUSR/PRC to RUSSIA/CHINA in dists.dss -- cleaned up phone number generation to be consistant regard- less of order of evaluation -- adjusted size of lineitem comment to bring data in line with 100 MB == SF=1 Changes as of 10/15/93 -- added command line option for update data creation -- miscelaneous porting and cleanup changes -- reworked table generation to allow reuse for updates -- added comment field to tdefs structure -- added load_state and store_state to sync data gen and update gen Changes as of 7/26/93 -- combined loader and header stubs in load_stubs.c -- separated Revision History (this file) from README -- simplified makefile -- removed redundancies from colors distribution -- added getopt() for portability -- created Porting.Notes -- adjusted scaling rules -- added help option to the command line Changes as of 2/26/93 -- combined all typedefs in one header: dsstypes.h -- combined flat file generation in print.ec -- combined typedef population in build.ec -- added -P to control rowcnt scaling (P for percentage) -- added -D option for Direct data generation and added appropriate hooks in tdefs[] structure -- added -F option for flat file generation -- reused -T option (use -P 0.1 to build test size database) now accepts suboptions c,o,p,s for single table builds. -- dropped -M option (scaling is now by rowcount) -- added -O option for optional controls. Currently defined: -O t -- generate optional time table a join fields in order/lineitem -O h -- generate headers for flat file output -O m -- generate fixed column-length output -- removed dynamic memory allocation, redundant calls to UnifInt, etc to improve performance Changes as of 1/12/92 -- julian() changed to handle orders->orderdate correctly -- rflag distributions corrected in dists.dss -- sea, gold removed from color distribution to clean up substring problems -- part->number and supplier-> adjusted for 1-based indexing -- time->day changed to be day of month, not day of year -- t.week changed to be week in year, not day of week Changes as of 11/18/92 -- checked line length and tab for transmission -- another chapter in the portability wars. added #include "machine.h" to dss.h (which is included by everyone else). Any machine particular porting changes should go here. -- fixed fixed-field formats to prevent double printing -- expanded PR_FLT formats to %010.2 Changes as of 10/21/92 -- added fixed format and column header handling; users of headers will have to define the header functions to be called in int (*tdefs.header)() Changes as of 10/09/92: -- added ansi prototypes and recompiled with gcc -ansi. users may need to change the CC definition in the makefile and the contents of CFLAGS to reflect their particular ansi compiler. -- replaced all int references with long -- replaced all float references with double -- found and fixed odate/julian problem TS mentioned in 10/09 phone call Changes as of 9/09/92: -- Park/Miller random number generator included -- clerk scaling changed to 100 * scale -- parts.name always built from 5 selections from colors set -- test scaling changed to ~60MB (TEST_SCALING == 10) -- logarithmic scaling removed -- mfgcost removed and retail/supplier cost bounds adjusted -- agg_str memory leak fixed -- independent RNG streams on a per column basis This is the revised data generator for DSS. The rewrite tried to accomplish three things: (1) identify and isolate all the implicit assumptions about limits, bounds, ranges, distribu- tions, etc.; (2) standardize the way any given table was generated/ printed to ease understanding and maintenance; (3) bring the generator in line with the current work of the committee and the excellent spec the Indira put together; (4) provide an easy way to adjust distribu- tions, string contents and to facilitate experimentation to get a better idea of the impact of data population changes. The files included are: driver.c ------- main and the calling routines for the generators dist.c ------- should really be named dss_util.c; misc routines customer.c ------- generation and print routines for customer table orders.c ------- "" "" order table parts.c ------- "" "" parts/partsupp suppliers.c ------- "" "" suppliers table time.c ------- "" "" time table customer.h ------- associate header files; contain structure definitions dss.h dss.h holds the large number of assumptions and orders.h values that have been used as IFDEFs. parts.h suppliers.h time.h dists.dss ------- string selections and weights; used to build distributions Running make will create an executable (using the compiler flags in CFLAGS, the ld flags in LDFLAGS and the libraries in LIBS [-O, -s, and -lm by default]) which will create flat files suitable for dbload. t