Extended SQL Parser Design

This documentation explains the technical decision made in building a SQLparser in Go. It is used to parsed the extended SELECT syntax of SQL thatintegrates TensorFlow Estimators.

Lexer and Parser Generator

In 2001, when I was in graduate school, I defined an extended SQL syntax forquerying distributed relational databases, as part of the course project ofDistributed RDBMS by Prof. Li-zhu Zhou. I wrote the parser using bison (amodern version of yacc) and flex (a modern version oflex). yacc and lex generate C code;bison and flex generate C++ code. However, this time, I’d use Go.

I surveyed goyacc, astandard Go tool. The usage is very similar to that of yacc and bison.However, the Go toolchain doesn’t provide a tool like lex/flex.

Google revealed golex, which is out ofmaintenance.

The Mediumpostrecommends Ragel, which is a C++program and could generate Go lexer; however, it lacks documents.

Handwritten Lexer and Parser

Some documents, including thisonerecommends handwriting lexers. However, it doesn’t explain how to write theparser.

GoAcademy always provides high-quality tech blog posts. Thisone is from theauthor of InfluxDB. However, Istopped at where it explains wrapping a SQL statement as a string by anio.Reader, because it is obvious that we should keep the string as a string sothat that token strings could refer to the same memory storage of the SQLstatement.

Following a link in the above GoAcademy post, I found Rob Pike’s excellent talkon how to write a lexer in Go in 2011. Many works after that change Rob’simplementation somehow but always lead to longer and less comprehensiblecodebases.

The Choice

Therefore, I wrote the lexer and parser both following Rob Pike’s idea. Afterfew days work, I realized that:

  • I should borrow the idea from Rob to represent SQL statements as strings, butnot io.Reader as other work do,
  • but no need to use channels and goroutines at all, and
  • it is technically intractable to write a SQL lexer/parser manually.So, I switched to write a lexer manually, and to generate the parser usinggoyacc. During my work, I referred to thisexample andthe official yaccmanual for detailsabout operator association and precedence.