BSTIndex

Embed Size (px)

Citation preview

  • 8/13/2019 BSTIndex

    1/9

    CS 2604 Project 3 Spring 2002

    Page 1 of 9

    BST Index

    For this project you will implement a simple database program that will support search and modify operations on a filecontaining simple records of the following form:

    unique key value K, an unsigned short integer, followed bystring field length SLen , an unsigned short integer, followed bychecksum, an integer, followed bystring field S , consisting of Slen arbitrary characters and possible padding.

    The string field will contain at most 24 characters; if the actual string is shorter than that, the record will contain paddingcharacters to fill the record out to a total of 32 bytes. The actual padding character is unspecified and should be of noconcern to your implementation.

    The database file will consist of a sequence of these 32-byte records, stored in binary format. There is no stated limit on thenumber of records. It is guaranteed that no two records will contain the same key value.

    So far this is similar to the previous project. The primary difference is that the system will include a record index to supportfinding a specific record given its key value. The index will store (key, offset) pairs where the offset is the byte number atwhich the corresponding record begins in the database file. When performing a search, the system will pass the index thedesired key value and the index will return the file offset for the matching record, if it exists.

    On program startup, the system will read the database file, block by block via the buffer pool, and build the index structure.There are many structures that could be used for the index. You will use a simple binary search tree (BST) as described inthe course notes. This will not be a self-balancing tree, such as a splay or AVL tree, so there is the possibility that the indexwill provide suboptimal performance. That deficiency may be addressed in a later project.

    As before, the project assumes that the file of records may be too large to store all the records in primary memory at once.Therefore, you will also implement a general-purpose buffer pool that will mediate the disk operat ions and ideally reducethe number of individual disk accesses that must be performed. Note that if you implemented the buffer pool properly in the

    previous project, you can simply reuse it here.

    Program Invocation:

    Your program must take the names of the input and output files from the command line failure to do this will irritate the person for whom you will demo your project. The program will be invoked as:

    BinaryBP

    If either of the specified input files does not exist, the program should print an appropriate error message and either exit or prompt the user for a correction.

    Data Structures:

    The primary data structures element of this project is a binary search tree (BST). Your implementation is under thefollowing specific requirements:

    The BST must be encapsulated as a C++ template.The underlying structure must be linked, and you must use a C++ template for the tree nodes.The behavior of the BST must conform to the description given in class. Note that in this project, we do notallow entries with duplicate key values.For testing, your BST should have the ability to display itself to a specified output stream, as described in thenotes. If you expect to receive help, be sure that your display function conforms to the formatting described inthe course notes, but you may reverse the sides if you prefer.

  • 8/13/2019 BSTIndex

    2/9

    CS 2604 Project 3 Spring 2002

    Page 2 of 9

    The system will also use a buffer pool to mediate transactions with the disk file, just as in the last project. All therequirements that were given there for the buffer pool still apply.

    Your design must make appropriate use of classes. The specification may imply the existence of additional classes besidesthose involved in the implementation of the BST. Aside from nodes and buffer objects used only within an encapsulatingclass, data members of classes must be private.

    If an error occurs during the parsing of the input file, theres an error in your code. However, your program should still

    attempt to recover, by flushing the current input l ine and proceeding to the next input line.

    Other System Elements:

    There must be a controller that receives data and commands from the command file and triggers the appropriate responsesin the other system elements. The controller may be purely procedural. The controller may take responsibility for verifyingthe existence of the input files, and opening and closing the various file streams. The controller should log the initial systemconfiguration (described below). The controller should create the data manager and buffer pool objects, and trigger thecreation of the BST index.

    There must be a data manager class, separate from the controller, which is responsible for managing the execution of theshow and update commands described below. The data manager will deal with record objects, not with raw data. The

    data manager should log results from processing show and update commands.The data records should be encapsulated as objects when being processed by the data manager. There should also be a filemanager object that handles reading the command file, and stripping out comments.

    It may be useful to have a translator class that handles the data conversions that must take place when the data manager and buffer pool communicate, but this is not required. It may also be useful for each buffer to be an object.

    The Binary Database File:

    The database (dB) file used in this project will be in binary format. The dB file will consist of a sequence of records, asdescribed above. Each record will consist of the four sections described above, in that order. There will not be any headerdata at the beginning of the file, or any extra data at the end. A hex dump of a sample binary dB file is included later in thisdocument. Here is an annotated hex display of the first record in that file:

    Pos 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F------------------------------------------------------------------

    Value 82 13 0B 00 29 1C 00 00 64 78 66 7A 74 72 6C 646F 6D 6A 20 20 20 20 20 20 20 20 20 20 20 20 20

    A complete binary data file is available on the course website.

    Logically the dB file can be viewed as a sequence of records, and each record can be located by moving the file pointer tothe correct offset within the file.

    Note that you are absolutely forbidden to simply read and store the entire file in memory. Your implementation must makeuse of a buffer pool, as described above.

    Key: 4994 Length: 11 Check sum: 7209 First character: d

    Last character: j Padding

  • 8/13/2019 BSTIndex

    3/9

    CS 2604 Project 3 Spring 2002

    Page 3 of 9

    Command File:

    The execution of the program will be driven by a script file, as in the first project. As before, lines beginning with asemicolon character ( ';' ) are comments and should be ignored.

    The command file will start with a header that may include comments and will definitely include a line specifying the buffer pool "geometry":

    bufferAfter that header, each non-comment line of the command file will specify one of the commands described below. Eachline consists of a sequence of tokens which will be separated by single tab characters. A newline character willimmediately follow the final token on each line. The command file is guaranteed to conform to this specification, so youdon't need to worry about error-checking when reading it.

    The following commands must be supported:

    showLog the values of all of the data fields of the indicated record. These should be interpreted, not written in raw

    binary form. If the sequence number corresponds to a non-existent record, log an error message.

    This command will result in transactions with the BST index to obtain the file offset, and with the buffer pool,which must determine whether the record is in memory or not, load the appropriate file block if necessary, and thenreturn the record data for display.

    updateReplace the data for the indicated record with the given data. Log a message confirming the update. If the givenkey corresponds to a non-existent record, add that record to the database. Note that this will require adding anentry to the index, and writing a record to the dB file (through the buffer pool, of course). Added records should

    be written to the end of the dB file.

    When performing the update, you will usually have to pad the record out to 32 bytes; you must use the asteriskcharacter '*' for padding.

    This command will also result in transactions with BST index and the buffer pool. If the targeted record is not inmemory, the buffer pool must load the appropriate file block. Once the targeted record is in memory, the buffer

    pool must over-write its data (in memory) with the supplied data (in binary format).

    debug buffersLog the current contents of the buffer pool. The display should be neatly formatted. For each file block stored inthe buffer pool, log the file offset at which the block begins and the bytes stored for that block, formatted as pairsof hex digits. (Code that can be adapted for this purpose will be posted along with this specification.)

    debug indexLog the current contents of the BST index. The display should be neatly formatted, and reflect a (possiblymodified) inorder traversal, as shown in the course notes. For each tree node you should display the key value andfile offset stored there.

    exitTerminate program execution. The buffer pool should perform any necessary writebacks before the dB file isclosed. Summary statistics, described below, should be logged. All dynamic memory should be properlydeallocated.

    A sample command script is included later in this document.

  • 8/13/2019 BSTIndex

    4/9

    CS 2604 Project 3 Spring 2002

    Page 4 of 9

    Log File Description:

    Since this assignment will be graded by TAs, rather than the Curator, the format of the output is left up to you. Of course,your output should be clear, concise, well labeled, and correct. The first two lines should contain your name, sectionspecification (e.g., CS 2604 11:15 MWF), and project title.

    The next section of the log file should contain some initialization information:

    the names of the dB, command, and log filesthe buffer pool configuration including the number of slots and the size of each bufferthe number of records stored in the dB file (same as the number of nodes in the BST)

    The remainder of the log file output should come directly from your processing of the command file. You are required toecho each command that you process to the log file so that its easy to determine which command each section of youroutput corresponds to. Each command should be numbered, starting with 1, and the output from each command should bewell formatted, and delimited from the output resulting from processing other commands.

    A complete sample log is included later in this document.

    Submitting Your Program:

    You will submit a gzipped tar file containing your project to the Curator System (read the Student Guide ), and it will bearchived until you demo it for one of the GTAs. Instructions for submitting are contained in the Student Guide . You willfind a list of the required contents for the zipped file on the course website. Follow the instructions there carefully; it is verycommon for students to suffer a loss of points (often major) because they failed to include the specified items.

    Be very careful to include all the necessary source code files. It is amazingly common for students to omit required headeror cpp files, or to submit the wrong version of their program. In such a case, it is obviously impossible to perform a test ofthe submitted program unless the student is allowed to supply the missing files. When that happens, to be fair to otherstudents, we must assess the late penalty that would apply at the time of the demo.

    To avoid such problems, once you've prepared your gzipped tar file for upload, copy it to a new location, unarchive it, buildan executable and test that executable. If you do that you can at least be sure you're not submitting an old, incompleteversion.

    You will be allowed up to five submissions for this assignment, in case you need to correct mistakes. Test your programthoroughly before submitting it. If you discover an error you may fix it and make another submission. Your last submissionwill be graded, so fixing an error after the due date will result in a late penalty.

    The submission client can be found at: http://eags.cs.vt.edu:8080/curator/

    Programming Standards:

    The GTAs will be carefully evaluating your source code on this assignment for programming style, so you should observegood practice. See the Programming Standards page on the course website for specific requirements that should beobserved in this course.

    As always, you should practice good object-centered design and implementation.

    Evaluation:

    You will schedule a demo with your assigned GTA. At the demo, the TA will supply your submitted project, and you will perform a build and run your program on the supplied test data. The GTA will evaluate the correctness of your results. Inaddition, the GTA will evaluate your project for good internal documentation and software engineering practice.

  • 8/13/2019 BSTIndex

    5/9

    CS 2604 Project 3 Spring 2002

    Page 5 of 9

    Pledge:

    Each of your program submissions must be pledged to conform to the Honor Code requirements for this course.Specifically, you must include the pledge statement provided with the earl ier project specifications in the header commentfor your main source code file.

    Sample command script:; Script file for P3;; Buffer pool configuration:buffer 64 5;;debug index;show 467show 246show 1402show 1577

    show 539debug buffers;show 999debug buffers;update 467 3 42 abcdebug buffers;; Quit:exit

  • 8/13/2019 BSTIndex

    6/9

    CS 2604 Project 3 Spring 2002

    Page 6 of 9

    Sample log output:

    Programmer: Bill McQuainCS 2604 Buffer Pool and Binary I/O

    Database file: Data.binCommand file: Script.txtLog file: Log.txt

    Number of buffers: 5Buffer size in bytes: 64

    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1Command: debug index

    175:128176:96

    246:160303:64

    387:224467:192

    484:256529:32

    539:352609:320

    619:384670:288

    742:448842:416

    879:480905:0

    994:6081007:576

    1059:6401136:544

    1144:7041175:672

    1262:7361270:512

    1361:8321402:800

    1439:8641531:768

    1577:9281593:896

    1676:960++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2Command: show 467

    going to root 905:0going left to 529:32going left to 303:64going right to 467:192Record found.

    Record should be at offset 192Buffer pool adding new block in buffer #0Desired record found in buffer #0467 13 10168 vinyguyudmvzb++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3Command: show 246

    going to root 905:0going left to 529:32going left to 303:64going left to 176:96going right to 246:160Record found.

    Record should be at offset 160Buffer pool adding new block in buffer #1Desired record found in buffer #1246 7 3160 fksxdpy++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4

  • 8/13/2019 BSTIndex

    7/9

    CS 2604 Project 3 Spring 2002

    Page 7 of 9

    Command: show 1402going to root 905:0going right to 1270:512going right to 1531:768going left to 1402:800Record found.

    Record should be at offset 800Buffer pool adding new block in buffer #2Desired record found in buffer #2

    1402 5 1605 vlnra++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 5Command: show 1577

    going to root 905:0going right to 1270:512going right to 1531:768going right to 1593:896going left to 1577:928Record found.

    Record should be at offset 928Buffer pool adding new block in buffer #3Desired record found in buffer #31577 9 4824 prusempaj++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 6Command: show 539

    going to root 905:0

    going left to 529:32going right to 670:288going left to 609:320going left to 539:352Record found.

    Record should be at offset 352Buffer pool adding new block in buffer #4Desired record found in buffer #4539 14 11648 pqhbsiujwdawzq++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 7Command: debug buffersBuffer Offset Bytes-------------------------------------------------------------------------------------

    0 192 D3 01 0D 00 B8 27 00 00 76 69 6E 79 67 75 79 7564 6D 76 7A 62 20 20 20 20 20 20 20 20 20 20 2083 01 05 00 4C 06 00 00 68 78 76 6A 62 20 20 20

    20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

    1 128 AF 00 17 00 9F 7A 00 00 76 64 69 6B 74 73 78 726C 62 6F 79 75 77 6C 6E 73 70 79 76 6F 70 75 20F6 00 07 00 58 0C 00 00 66 6B 73 78 64 70 79 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

    2 768 FB 05 02 00 41 01 00 00 61 70 20 20 20 20 20 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 207A 05 05 00 45 06 00 00 76 6C 6E 72 61 20 20 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

    3 896 39 06 15 00 FA 61 00 00 61 65 79 70 6C 6D 69 6E63 6B 64 6D 66 79 71 71 72 71 69 6A 68 20 20 2029 06 09 00 D8 12 00 00 70 72 75 73 65 6D 70 616A 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

    4 320 61 02 16 00 47 6D 00 00 79 75 78 67 70 6E 74 6964 78 66 7A 74 72 6C 64 6F 6D 6A 71 6B 76 20 201B 02 0E 00 80 2D 00 00 70 71 68 62 73 69 75 6A77 64 61 77 7A 71 20 20 20 20 20 20 20 20 20 20

    -------------------------------------------------------------------------------------++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 8Command: show 999

    going to root 905:0going right to 1270:512going left to 1136:544

  • 8/13/2019 BSTIndex

    8/9

    CS 2604 Project 3 Spring 2002

    Page 8 of 9

    going left to 1007:576going left to 994:608going right to empty subtree.

    No record with key value 999++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 9Command: debug buffersBuffer Offset Bytes-------------------------------------------------------------------------------------

    0 192 D3 01 0D 00 B8 27 00 00 76 69 6E 79 67 75 79 75

    64 6D 76 7A 62 20 20 20 20 20 20 20 20 20 20 2083 01 05 00 4C 06 00 00 68 78 76 6A 62 20 20 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

    1 128 AF 00 17 00 9F 7A 00 00 76 64 69 6B 74 73 78 726C 62 6F 79 75 77 6C 6E 73 70 79 76 6F 70 75 20F6 00 07 00 58 0C 00 00 66 6B 73 78 64 70 79 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

    2 768 FB 05 02 00 41 01 00 00 61 70 20 20 20 20 20 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 207A 05 05 00 45 06 00 00 76 6C 6E 72 61 20 20 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

    3 896 39 06 15 00 FA 61 00 00 61 65 79 70 6C 6D 69 6E63 6B 64 6D 66 79 71 71 72 71 69 6A 68 20 20 20

    29 06 09 00 D8 12 00 00 70 72 75 73 65 6D 70 616A 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

    4 320 61 02 16 00 47 6D 00 00 79 75 78 67 70 6E 74 6964 78 66 7A 74 72 6C 64 6F 6D 6A 71 6B 76 20 201B 02 0E 00 80 2D 00 00 70 71 68 62 73 69 75 6A77 64 61 77 7A 71 20 20 20 20 20 20 20 20 20 20

    -------------------------------------------------------------------------------------++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 10Command: update 467 3 42 abcPutting: 192 3 42 abcWriting data in buffer 0Record updated.++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11Command: debug buffers

    Buffer Offset Bytes-------------------------------------------------------------------------------------0 192 C0 00 03 00 2A 00 00 00 61 62 63 20 20 20 20 20

    20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 2083 01 05 00 4C 06 00 00 68 78 76 6A 62 20 20 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

    1 128 AF 00 17 00 9F 7A 00 00 76 64 69 6B 74 73 78 726C 62 6F 79 75 77 6C 6E 73 70 79 76 6F 70 75 20F6 00 07 00 58 0C 00 00 66 6B 73 78 64 70 79 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

    2 768 FB 05 02 00 41 01 00 00 61 70 20 20 20 20 20 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 207A 05 05 00 45 06 00 00 76 6C 6E 72 61 20 20 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

    3 896 39 06 15 00 FA 61 00 00 61 65 79 70 6C 6D 69 6E63 6B 64 6D 66 79 71 71 72 71 69 6A 68 20 20 2029 06 09 00 D8 12 00 00 70 72 75 73 65 6D 70 616A 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

    4 320 61 02 16 00 47 6D 00 00 79 75 78 67 70 6E 74 6964 78 66 7A 74 72 6C 64 6F 6D 6A 71 6B 76 20 201B 02 0E 00 80 2D 00 00 70 71 68 62 73 69 75 6A77 64 61 77 7A 71 20 20 20 20 20 20 20 20 20 20

    -------------------------------------------------------------------------------------

  • 8/13/2019 BSTIndex

    9/9

    CS 2604 Project 3 Spring 2002

    Page 9 of 9

    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 12Command: exitexit command++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Buffer pool cleaning up:Buffer pool writing block in buffer #0 to disk

    Hits: 1Misses: 5

    Writebacks: 1

    Hex dump of sample binary data file:

    00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F------------------------------------------------------------------89 03 16 00 E4 6C 00 00 63 7A 63 7A 62 6D 64 646E 78 72 6F 6D 74 76 74 70 75 67 70 64 6B 20 2011 02 10 00 73 39 00 00 6B 74 71 64 74 67 66 756A 63 75 74 70 66 66 6D 20 20 20 20 20 20 20 202F 01 0D 00 DB 24 00 00 6C 64 72 64 66 74 6F 6465 62 6C 65 65 20 20 20 20 20 20 20 20 20 20 20B0 00 0F 00 9D 32 00 00 6C 63 61 70 6F 76 6D 64

    6E 71 6C 77 67 6D 63 20 20 20 20 20 20 20 20 20AF 00 17 00 9F 7A 00 00 76 64 69 6B 74 73 78 726C 62 6F 79 75 77 6C 6E 73 70 79 76 6F 70 75 20F6 00 07 00 58 0C 00 00 66 6B 73 78 64 70 79 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20. . .7A 05 05 00 45 06 00 00 76 6C 6E 72 61 20 20 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 2051 05 12 00 47 49 00 00 6B 68 72 70 75 76 74 6C76 66 66 62 6B 6A 69 73 6B 7A 20 20 20 20 20 209F 05 06 00 F0 08 00 00 65 70 77 68 70 69 20 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 2039 06 15 00 FA 61 00 00 61 65 79 70 6C 6D 69 6E63 6B 64 6D 66 79 71 71 72 71 69 6A 68 20 20 2029 06 09 00 D8 12 00 00 70 72 75 73 65 6D 70 616A 20 20 20 20 20 20 20 20 20 20 20 20 20 20 208C 06 0E 00 4B 2E 00 00 7A 68 71 77 6A 6D 75 7470 74 61 6A 78 7A 20 20 20 20 20 20 20 20 20 20

    ------------------------------------------------------------------00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F