Part 2 - World's Simplest SQL Compiler and Virtual Machine

Part 1 - Introduction and Setting up the REPL

Part 3 - An In-Memory, Append-Only, Single-Table Database

We’re making a clone of sqlite. The “front-end” of sqlite is a SQL compiler that parses a string and outputs an internal representation called bytecode.

This bytecode is passed to the virtual machine, which executes it.

|SQLite Architecture (https://www.sqlite.org/arch.html)

Breaking things into two steps like this has a couple advantages:

  • Reduces the complexity of each part (e.g. virtual machine does not worry about syntax errors)
  • Allows compiling common queries once and caching the bytecode for improved performance
    With this in mind, let’s refactor our main function and support two new keywords in the process:
  1. int main(int argc, char* argv[]) {
  2. InputBuffer* input_buffer = new_input_buffer();
  3. while (true) {
  4. print_prompt();
  5. read_input(input_buffer);
  6. - if (strcmp(input_buffer->buffer, ".exit") == 0) {
  7. - exit(EXIT_SUCCESS);
  8. - } else {
  9. - printf("Unrecognized command '%s'.\n", input_buffer->buffer);
  10. + if (input_buffer->buffer[0] == '.') {
  11. + switch (do_meta_command(input_buffer)) {
  12. + case (META_COMMAND_SUCCESS):
  13. + continue;
  14. + case (META_COMMAND_UNRECOGNIZED_COMMAND):
  15. + printf("Unrecognized command '%s'\n", input_buffer->buffer);
  16. + continue;
  17. + }
  18. }
  19. +
  20. + Statement statement;
  21. + switch (prepare_statement(input_buffer, &statement)) {
  22. + case (PREPARE_SUCCESS):
  23. + break;
  24. + case (PREPARE_UNRECOGNIZED_STATEMENT):
  25. + printf("Unrecognized keyword at start of '%s'.\n",
  26. + input_buffer->buffer);
  27. + continue;
  28. + }
  29. +
  30. + execute_statement(&statement);
  31. + printf("Executed.\n");
  32. }
  33. }

Non-SQL statements like .exit are called “meta-commands”. They all start with a dot, so we check for them and handle them in a separate function.

Next, we add a step that converts the line of input into our internal representation of a statement. This is our hacky version of the sqlite front-end.

Lastly, we pass the prepared statement to execute_statement. This function will eventually become our virtual machine.

Notice that two of our new functions return enums indicating success or failure:

  1. enum MetaCommandResult_t {
  2. META_COMMAND_SUCCESS,
  3. META_COMMAND_UNRECOGNIZED_COMMAND
  4. };
  5. typedef enum MetaCommandResult_t MetaCommandResult;
  6. enum PrepareResult_t { PREPARE_SUCCESS, PREPARE_UNRECOGNIZED_STATEMENT };
  7. typedef enum PrepareResult_t PrepareResult;

“Unrecognized statement”? That seems a bit like an exception. But exceptions are bad (and C doesn’t even support them), so I’m using enum result codes wherever practical. The C compiler will complain if my switch statement doesn’t handle a member of the enum, so we can feel a little more confident we handle every result of a function. Expect more result codes to be added in the future.

do_meta_command is just a wrapper for existing functionality that leaves room for more commands:

  1. MetaCommandResult do_meta_command(InputBuffer* input_buffer) {
  2. if (strcmp(input_buffer->buffer, ".exit") == 0) {
  3. exit(EXIT_SUCCESS);
  4. } else {
  5. return META_COMMAND_UNRECOGNIZED_COMMAND;
  6. }
  7. }

Our “prepared statement” right now just contains an enum with two possible values. It will contain more data as we allow parameters in statements:

  1. enum StatementType_t { STATEMENT_INSERT, STATEMENT_SELECT };
  2. typedef enum StatementType_t StatementType;
  3. struct Statement_t {
  4. StatementType type;
  5. };
  6. typedef struct Statement_t Statement;

prepare_statement (our “SQL Compiler”) does not understand SQL right now. In fact, it only understands two words:

  1. PrepareResult prepare_statement(InputBuffer* input_buffer,
  2. Statement* statement) {
  3. if (strncmp(input_buffer->buffer, "insert", 6) == 0) {
  4. statement->type = STATEMENT_INSERT;
  5. return PREPARE_SUCCESS;
  6. }
  7. if (strcmp(input_buffer->buffer, "select") == 0) {
  8. statement->type = STATEMENT_SELECT;
  9. return PREPARE_SUCCESS;
  10. }
  11. return PREPARE_UNRECOGNIZED_STATEMENT;
  12. }

Note that we use strncmp for “insert” since the “insert” keyword will be followed by data. (e.g. insert 1 cstack foo@bar.com)

Lastly, execute_statement contains a few stubs:

  1. void execute_statement(Statement* statement) {
  2. switch (statement->type) {
  3. case (STATEMENT_INSERT):
  4. printf("This is where we would do an insert.\n");
  5. break;
  6. case (STATEMENT_SELECT):
  7. printf("This is where we would do a select.\n");
  8. break;
  9. }
  10. }

Note that it doesn’t return any error codes because there’s nothing that could go wrong yet.

With these refactors, we now recognize two new keywords!

  1. ~ ./db
  2. db > insert foo bar
  3. This is where we would do an insert.
  4. Executed.
  5. db > delete foo
  6. Unrecognized keyword at start of 'delete foo'.
  7. db > select
  8. This is where we would do a select.
  9. Executed.
  10. db > .tables
  11. Unrecognized command '.tables'
  12. db > .exit
  13. ~

The skeleton of our database is taking shape… wouldn’t it be nice if it stored data? In the next part, we’ll implement insert and select, creating the world’s worst data store. In the mean time, here’s the entire diff from this part:

  1. @@ -10,6 +10,23 @@ struct InputBuffer_t {
  2. };
  3. typedef struct InputBuffer_t InputBuffer;
  4. +enum MetaCommandResult_t {
  5. + META_COMMAND_SUCCESS,
  6. + META_COMMAND_UNRECOGNIZED_COMMAND
  7. +};
  8. +typedef enum MetaCommandResult_t MetaCommandResult;
  9. +
  10. +enum PrepareResult_t { PREPARE_SUCCESS, PREPARE_UNRECOGNIZED_STATEMENT };
  11. +typedef enum PrepareResult_t PrepareResult;
  12. +
  13. +enum StatementType_t { STATEMENT_INSERT, STATEMENT_SELECT };
  14. +typedef enum StatementType_t StatementType;
  15. +
  16. +struct Statement_t {
  17. + StatementType type;
  18. +};
  19. +typedef struct Statement_t Statement;
  20. +
  21. InputBuffer* new_input_buffer() {
  22. InputBuffer* input_buffer = malloc(sizeof(InputBuffer));
  23. input_buffer->buffer = NULL;
  24. @@ -35,16 +52,66 @@ void read_input(InputBuffer* input_buffer) {
  25. input_buffer->buffer[bytes_read - 1] = 0;
  26. }
  27. +MetaCommandResult do_meta_command(InputBuffer* input_buffer) {
  28. + if (strcmp(input_buffer->buffer, ".exit") == 0) {
  29. + exit(EXIT_SUCCESS);
  30. + } else {
  31. + return META_COMMAND_UNRECOGNIZED_COMMAND;
  32. + }
  33. +}
  34. +
  35. +PrepareResult prepare_statement(InputBuffer* input_buffer,
  36. + Statement* statement) {
  37. + if (strncmp(input_buffer->buffer, "insert", 6) == 0) {
  38. + statement->type = STATEMENT_INSERT;
  39. + return PREPARE_SUCCESS;
  40. + }
  41. + if (strcmp(input_buffer->buffer, "select") == 0) {
  42. + statement->type = STATEMENT_SELECT;
  43. + return PREPARE_SUCCESS;
  44. + }
  45. +
  46. + return PREPARE_UNRECOGNIZED_STATEMENT;
  47. +}
  48. +
  49. +void execute_statement(Statement* statement) {
  50. + switch (statement->type) {
  51. + case (STATEMENT_INSERT):
  52. + printf("This is where we would do an insert.\n");
  53. + break;
  54. + case (STATEMENT_SELECT):
  55. + printf("This is where we would do a select.\n");
  56. + break;
  57. + }
  58. +}
  59. +
  60. int main(int argc, char* argv[]) {
  61. InputBuffer* input_buffer = new_input_buffer();
  62. while (true) {
  63. print_prompt();
  64. read_input(input_buffer);
  65. - if (strcmp(input_buffer->buffer, ".exit") == 0) {
  66. - exit(EXIT_SUCCESS);
  67. - } else {
  68. - printf("Unrecognized command '%s'.\n", input_buffer->buffer);
  69. + if (input_buffer->buffer[0] == '.') {
  70. + switch (do_meta_command(input_buffer)) {
  71. + case (META_COMMAND_SUCCESS):
  72. + continue;
  73. + case (META_COMMAND_UNRECOGNIZED_COMMAND):
  74. + printf("Unrecognized command '%s'\n", input_buffer->buffer);
  75. + continue;
  76. + }
  77. }
  78. +
  79. + Statement statement;
  80. + switch (prepare_statement(input_buffer, &statement)) {
  81. + case (PREPARE_SUCCESS):
  82. + break;
  83. + case (PREPARE_UNRECOGNIZED_STATEMENT):
  84. + printf("Unrecognized keyword at start of '%s'.\n",
  85. + input_buffer->buffer);
  86. + continue;
  87. + }
  88. +
  89. + execute_statement(&statement);
  90. + printf("Executed.\n");
  91. }
  92. }

Part 1 - Introduction and Setting up the REPL

Part 3 - An In-Memory, Append-Only, Single-Table Database

原文: https://cstack.github.io/db_tutorial/parts/part2.html