Part 4 - Our First Tests (and Bugs)

Part 3 - An In-Memory, Append-Only, Single-Table Database

Part 5 - Persistence to Disk

We’ve got the ability to insert rows into our database and to print out all rows. Let’s take a moment to test what we’ve got so far.

I’m going to use rspec to write my tests because I’m familiar with it, and the syntax is fairly readable.

I’ll define a short helper to send a list of commands to our database program then make assertions about the output:

  1. describe 'database' do
  2. def run_script(commands)
  3. raw_output = nil
  4. IO.popen("./db", "r+") do |pipe|
  5. commands.each do |command|
  6. pipe.puts command
  7. end
  8. pipe.close_write
  9. # Read entire output
  10. raw_output = pipe.gets(nil)
  11. end
  12. raw_output.split("\n")
  13. end
  14. it 'inserts and retreives a row' do
  15. result = run_script([
  16. "insert 1 user1 person1@example.com",
  17. "select",
  18. ".exit",
  19. ])
  20. expect(result).to match_array([
  21. "db > Executed.",
  22. "db > (1, user1, person1@example.com)",
  23. "Executed.",
  24. "db > ",
  25. ])
  26. end
  27. end

This simple test makes sure we get back what we put in. And indeed it passes:

  1. bundle exec rspec
  2. .
  3. Finished in 0.00871 seconds (files took 0.09506 seconds to load)
  4. 1 example, 0 failures

Now it’s feasible to test inserting a large number of rows into the database:

  1. it 'prints error message when table is full' do
  2. script = (1..1401).map do |i|
  3. "insert #{i} user#{i} person#{i}@example.com"
  4. end
  5. script << ".exit"
  6. result = run_script(script)
  7. expect(result[-2]).to eq('db > Error: Table full.')
  8. end

Running tests again…

  1. bundle exec rspec
  2. ..
  3. Finished in 0.01553 seconds (files took 0.08156 seconds to load)
  4. 2 examples, 0 failures

Sweet, it works! Our db can hold 1400 rows right now because we set the maximum number of pages to 100, and 14 rows can fit in a page.

Reading through the code we have so far, I realized we might not handle storing text fields correctly. Easy to test with this example:

  1. it 'allows inserting strings that are the maximum length' do
  2. long_username = "a"*32
  3. long_email = "a"*255
  4. script = [
  5. "insert 1 #{long_username} #{long_email}",
  6. "select",
  7. ".exit",
  8. ]
  9. result = run_script(script)
  10. expect(result).to match_array([
  11. "db > Executed.",
  12. "db > (1, #{long_username}, #{long_email})",
  13. "Executed.",
  14. "db > ",
  15. ])
  16. end

And the test fails!

  1. Failures:
  2. 1) database allows inserting strings that are the maximum length
  3. Failure/Error: raw_output.split("\n")
  4. ArgumentError:
  5. invalid byte sequence in UTF-8
  6. # ./spec/main_spec.rb:14:in `split'
  7. # ./spec/main_spec.rb:14:in `run_script'
  8. # ./spec/main_spec.rb:48:in `block (2 levels) in <top (required)>'

If we try it ourselves, we’ll see that there’s some weird characters when we try to print out the row. (I’m abbreviating the long strings):

  1. db > insert 1 aaaaa... aaaaa...
  2. Executed.
  3. db > select
  4. (1, aaaaa...aaa\�, aaaaa...aaa\�)
  5. Executed.
  6. db >

What’s going on? If you take a look at our definition of a Row, we allocate exactly 32 bytes for username and exactly 255 bytes for email. But C strings are supposed to end with a null character, which we didn’t allocate space for. The solution is to allocate one additional byte:

  1. const uint32_t COLUMN_EMAIL_SIZE = 255;
  2. struct Row_t {
  3. uint32_t id;
  4. - char username[COLUMN_USERNAME_SIZE];
  5. - char email[COLUMN_EMAIL_SIZE];
  6. + char username[COLUMN_USERNAME_SIZE + 1];
  7. + char email[COLUMN_EMAIL_SIZE + 1];
  8. };
  9. typedef struct Row_t Row;

And indeed that fixes it:

  1. bundle exec rspec
  2. ...
  3. Finished in 0.0188 seconds (files took 0.08516 seconds to load)
  4. 3 examples, 0 failures

We should not allow inserting usernames or emails that are longer than column size. The spec for that looks like this:

  1. it 'prints error message if strings are too long' do
  2. long_username = "a"*33
  3. long_email = "a"*256
  4. script = [
  5. "insert 1 #{long_username} #{long_email}",
  6. "select",
  7. ".exit",
  8. ]
  9. result = run_script(script)
  10. expect(result).to match_array([
  11. "db > String is too long.",
  12. "db > Executed.",
  13. "db > ",
  14. ])
  15. end

In order to do this we need to upgrade our parser. As a reminder, we’re currently using scanf():

  1. if (strncmp(input_buffer->buffer, "insert", 6) == 0) {
  2. statement->type = STATEMENT_INSERT;
  3. int args_assigned = sscanf(
  4. input_buffer->buffer, "insert %d %s %s", &(statement->row_to_insert.id),
  5. statement->row_to_insert.username, statement->row_to_insert.email);
  6. if (args_assigned < 3) {
  7. return PREPARE_SYNTAX_ERROR;
  8. }
  9. return PREPARE_SUCCESS;
  10. }

But scanf has some disadvantages. If the string it’s reading is larger than the buffer it’s reading into, it will cause a buffer overflow and start writing into unexpected places. We want to check the length of each string before we copy it into a Row structure. And to do that, we need to divide the input by spaces.

I’m going to use strtok() to do that. I think it’s easiest to understand if you see it in action:

  1. +PrepareResult prepare_insert(InputBuffer* input_buffer, Statement* statement) {
  2. + statement->type = STATEMENT_INSERT;
  3. +
  4. + char* keyword = strtok(input_buffer->buffer, " ");
  5. + char* id_string = strtok(NULL, " ");
  6. + char* username = strtok(NULL, " ");
  7. + char* email = strtok(NULL, " ");
  8. +
  9. + if (id_string == NULL || username == NULL || email == NULL) {
  10. + return PREPARE_SYNTAX_ERROR;
  11. + }
  12. +
  13. + int id = atoi(id_string);
  14. + if (strlen(username) > COLUMN_USERNAME_SIZE) {
  15. + return PREPARE_STRING_TOO_LONG;
  16. + }
  17. + if (strlen(email) > COLUMN_EMAIL_SIZE) {
  18. + return PREPARE_STRING_TOO_LONG;
  19. + }
  20. +
  21. + statement->row_to_insert.id = id;
  22. + strcpy(statement->row_to_insert.username, username);
  23. + strcpy(statement->row_to_insert.email, email);
  24. +
  25. + return PREPARE_SUCCESS;
  26. +}
  27. +
  28. PrepareResult prepare_statement(InputBuffer* input_buffer,
  29. Statement* statement) {
  30. if (strncmp(input_buffer->buffer, "insert", 6) == 0) {
  31. + return prepare_insert(input_buffer, statement);
  32. - statement->type = STATEMENT_INSERT;
  33. - int args_assigned = sscanf(
  34. - input_buffer->buffer, "insert %d %s %s", &(statement->row_to_insert.id),
  35. - statement->row_to_insert.username, statement->row_to_insert.email);
  36. - if (args_assigned < 3) {
  37. - return PREPARE_SYNTAX_ERROR;
  38. - }
  39. - return PREPARE_SUCCESS;
  40. }

Calling strtok successively on the the input buffer breaks it into substrings by inserting a null character whenever it reaches a delimiter (space, in our case). It returns a pointer to the start of the substring.

We can call strlen() on each text value to see if it’s too long.

We can handle the error like we do any other error code:

  1. enum PrepareResult_t {
  2. PREPARE_SUCCESS,
  3. + PREPARE_STRING_TOO_LONG,
  4. PREPARE_SYNTAX_ERROR,
  5. PREPARE_UNRECOGNIZED_STATEMENT
  6. };
  1. switch (prepare_statement(input_buffer, &statement)) {
  2. case (PREPARE_SUCCESS):
  3. break;
  4. + case (PREPARE_STRING_TOO_LONG):
  5. + printf("String is too long.\n");
  6. + continue;
  7. case (PREPARE_SYNTAX_ERROR):
  8. printf("Syntax error. Could not parse statement.\n");
  9. continue;

Which makes our test pass

  1. bundle exec rspec
  2. ....
  3. Finished in 0.02284 seconds (files took 0.116 seconds to load)
  4. 4 examples, 0 failures

While we’re here, we might as well handle one more error case:

  1. it 'prints an error message if id is negative' do
  2. script = [
  3. "insert -1 cstack foo@bar.com",
  4. "select",
  5. ".exit",
  6. ]
  7. result = run_script(script)
  8. expect(result).to match_array([
  9. "db > ID must be positive.",
  10. "db > Executed.",
  11. "db > ",
  12. ])
  13. end
  1. enum PrepareResult_t {
  2. PREPARE_SUCCESS,
  3. + PREPARE_NEGATIVE_ID,
  4. PREPARE_STRING_TOO_LONG,
  5. PREPARE_SYNTAX_ERROR,
  6. PREPARE_UNRECOGNIZED_STATEMENT
  7. @@ -148,9 +147,6 @@ PrepareResult prepare_insert(InputBuffer* input_buffer, Statement* statement) {
  8. }
  9. int id = atoi(id_string);
  10. + if (id < 0) {
  11. + return PREPARE_NEGATIVE_ID;
  12. + }
  13. if (strlen(username) > COLUMN_USERNAME_SIZE) {
  14. return PREPARE_STRING_TOO_LONG;
  15. }
  16. @@ -230,9 +226,6 @@ int main(int argc, char* argv[]) {
  17. switch (prepare_statement(input_buffer, &statement)) {
  18. case (PREPARE_SUCCESS):
  19. break;
  20. + case (PREPARE_NEGATIVE_ID):
  21. + printf("ID must be positive.\n");
  22. + continue;
  23. case (PREPARE_STRING_TOO_LONG):
  24. printf("String is too long.\n");
  25. continue;

Alright, that’s enough testing for now. Next is a very important feature: persistence! We’re going to save our database to a file and read it back out again.

It’s gonna be great.

Here’s the complete diff for this part:

  1. enum PrepareResult_t {
  2. PREPARE_SUCCESS,
  3. + PREPARE_NEGATIVE_ID,
  4. + PREPARE_STRING_TOO_LONG,
  5. PREPARE_SYNTAX_ERROR,
  6. PREPARE_UNRECOGNIZED_STATEMENT
  7. };
  8. @@ -33,8 +35,8 @@ const uint32_t COLUMN_USERNAME_SIZE = 32;
  9. const uint32_t COLUMN_EMAIL_SIZE = 255;
  10. struct Row_t {
  11. uint32_t id;
  12. - char username[COLUMN_USERNAME_SIZE];
  13. - char email[COLUMN_EMAIL_SIZE];
  14. + char username[COLUMN_USERNAME_SIZE + 1];
  15. + char email[COLUMN_EMAIL_SIZE + 1];
  16. };
  17. typedef struct Row_t Row;
  18. @@ -133,17 +135,40 @@ MetaCommandResult do_meta_command(InputBuffer* input_buffer) {
  19. }
  20. }
  21. +PrepareResult prepare_insert(InputBuffer* input_buffer, Statement* statement) {
  22. + statement->type = STATEMENT_INSERT;
  23. +
  24. + char* keyword = strtok(input_buffer->buffer, " ");
  25. + char* id_string = strtok(NULL, " ");
  26. + char* username = strtok(NULL, " ");
  27. + char* email = strtok(NULL, " ");
  28. +
  29. + if (id_string == NULL || username == NULL || email == NULL) {
  30. + return PREPARE_SYNTAX_ERROR;
  31. + }
  32. +
  33. + int id = atoi(id_string);
  34. + if (id < 0) {
  35. + return PREPARE_NEGATIVE_ID;
  36. + }
  37. + if (strlen(username) > COLUMN_USERNAME_SIZE) {
  38. + return PREPARE_STRING_TOO_LONG;
  39. + }
  40. + if (strlen(email) > COLUMN_EMAIL_SIZE) {
  41. + return PREPARE_STRING_TOO_LONG;
  42. + }
  43. +
  44. + statement->row_to_insert.id = id;
  45. + strcpy(statement->row_to_insert.username, username);
  46. + strcpy(statement->row_to_insert.email, email);
  47. +
  48. + return PREPARE_SUCCESS;
  49. +}
  50. +
  51. PrepareResult prepare_statement(InputBuffer* input_buffer,
  52. Statement* statement) {
  53. if (strncmp(input_buffer->buffer, "insert", 6) == 0) {
  54. - statement->type = STATEMENT_INSERT;
  55. - int args_assigned = sscanf(
  56. - input_buffer->buffer, "insert %d %s %s", &(statement->row_to_insert.id),
  57. - statement->row_to_insert.username, statement->row_to_insert.email);
  58. - if (args_assigned < 3) {
  59. - return PREPARE_SYNTAX_ERROR;
  60. - }
  61. - return PREPARE_SUCCESS;
  62. + return prepare_insert(input_buffer, statement);
  63. }
  64. if (strcmp(input_buffer->buffer, "select") == 0) {
  65. statement->type = STATEMENT_SELECT;
  66. @@ -205,6 +230,12 @@ int main(int argc, char* argv[]) {
  67. switch (prepare_statement(input_buffer, &statement)) {
  68. case (PREPARE_SUCCESS):
  69. break;
  70. + case (PREPARE_NEGATIVE_ID):
  71. + printf("ID must be positive.\n");
  72. + continue;
  73. + case (PREPARE_STRING_TOO_LONG):
  74. + printf("String is too long.\n");
  75. + continue;
  76. case (PREPARE_SYNTAX_ERROR):
  77. printf("Syntax error. Could not parse statement.\n");
  78. continue;

And we added tests:

  1. +describe 'database' do
  2. + def run_script(commands)
  3. + raw_output = nil
  4. + IO.popen("./db", "r+") do |pipe|
  5. + commands.each do |command|
  6. + pipe.puts command
  7. + end
  8. +
  9. + pipe.close_write
  10. +
  11. + # Read entire output
  12. + raw_output = pipe.gets(nil)
  13. + end
  14. + raw_output.split("\n")
  15. + end
  16. +
  17. + it 'inserts and retreives a row' do
  18. + result = run_script([
  19. + "insert 1 user1 person1@example.com",
  20. + "select",
  21. + ".exit",
  22. + ])
  23. + expect(result).to match_array([
  24. + "db > Executed.",
  25. + "db > (1, user1, person1@example.com)",
  26. + "Executed.",
  27. + "db > ",
  28. + ])
  29. + end
  30. +
  31. + it 'prints error message when table is full' do
  32. + script = (1..1401).map do |i|
  33. + "insert #{i} user#{i} person#{i}@example.com"
  34. + end
  35. + script << ".exit"
  36. + result = run_script(script)
  37. + expect(result[-2]).to eq('db > Error: Table full.')
  38. + end
  39. +
  40. + it 'allows inserting strings that are the maximum length' do
  41. + long_username = "a"*32
  42. + long_email = "a"*255
  43. + script = [
  44. + "insert 1 #{long_username} #{long_email}",
  45. + "select",
  46. + ".exit",
  47. + ]
  48. + result = run_script(script)
  49. + expect(result).to match_array([
  50. + "db > Executed.",
  51. + "db > (1, #{long_username}, #{long_email})",
  52. + "Executed.",
  53. + "db > ",
  54. + ])
  55. + end
  56. +
  57. + it 'prints error message if strings are too long' do
  58. + long_username = "a"*33
  59. + long_email = "a"*256
  60. + script = [
  61. + "insert 1 #{long_username} #{long_email}",
  62. + "select",
  63. + ".exit",
  64. + ]
  65. + result = run_script(script)
  66. + expect(result).to match_array([
  67. + "db > String is too long.",
  68. + "db > Executed.",
  69. + "db > ",
  70. + ])
  71. + end
  72. +
  73. + it 'prints an error message if id is negative' do
  74. + script = [
  75. + "insert -1 cstack foo@bar.com",
  76. + "select",
  77. + ".exit",
  78. + ]
  79. + result = run_script(script)
  80. + expect(result).to match_array([
  81. + "db > ID must be positive.",
  82. + "db > Executed.",
  83. + "db > ",
  84. + ])
  85. + end
  86. +end

Part 3 - An In-Memory, Append-Only, Single-Table Database

Part 5 - Persistence to Disk

原文: https://cstack.github.io/db_tutorial/parts/part4.html