From AST to IR

Now that we have the infrastructure in place we can begin ingest our AST from Syntax.hs and construct a LLVM module from it. We will create a new Emit.hs module and spread the logic across two functions. The first codegenTop will emit toplevel constructions in modules ( functions and external definitions ) and will return a LLVM monad. The last instruction on the stack we’ll bind into the ret instruction to ensure and emit as the return value of the function. We’ll also sequentially assign each of the named arguments from the function to a stack allocated value with a reference in our symbol table.

  1. codegenTop :: S.Expr -> LLVM ()
  2. codegenTop (S.Function name args body) = do
  3. define double name fnargs bls
  4. where
  5. fnargs = toSig args
  6. bls = createBlocks $ execCodegen $ do
  7. entry <- addBlock entryBlockName
  8. setBlock entry
  9. forM args $ \a -> do
  10. var <- alloca double
  11. store var (local (AST.Name a))
  12. assign a var
  13. cgen body >>= ret
  14. codegenTop (S.Extern name args) = do
  15. external double name fnargs
  16. where fnargs = toSig args
  17. codegenTop exp = do
  18. define double "main" [] blks
  19. where
  20. blks = createBlocks $ execCodegen $ do
  21. entry <- addBlock entryBlockName
  22. setBlock entry
  23. cgen exp >>= ret
  24. toSig :: [String] -> [(AST.Type, AST.Name)]
  25. toSig = map (\x -> (double, AST.Name x))

The second is the expression level code generation (cgen) which will recursively walk the AST pushing instructions on the stack and changing the current block as needed. The simplest AST node is constant integers and floating point values which simply return constant values in LLVM IR.

  1. cgen :: S.Expr -> Codegen AST.Operand
  2. cgen (S.Float n) = return $ cons $ C.Float (F.Double n)

We need to reference local variables so we’ll invoke our getvar function in conjunction with a load use values. The conscious reader will intuit that this might result in an excessive amount of extraneous instructions pushing temporary values on the stack, something that we’ll address later with a simple optimization pass.

  1. cgen (S.Var x) = getvar x >>= load

For Call we’ll first evaluate each argument and then invoke the function with the values. Since our language only has double type values, this is trivial and we don’t need to worry too much.

  1. cgen (S.Call fn args) = do
  2. largs <- mapM cgen args
  3. call (externf (AST.Name fn)) largs

Finally for our operators we’ll construct a predefined association map of symbol strings to implementations of functions with the corresponding logic for the operation.

  1. binops = Map.fromList [
  2. ("+", fadd)
  3. , ("-", fsub)
  4. , ("*", fmul)
  5. , ("/", fdiv)
  6. , ("<", lt)
  7. ]

For the comparison operator we’ll invoke the uitofp which will convert a unsigned integer quantity to a floating point value. LLVM requires the unsigned single bit types as the values for comparison and test operations but we prefer to work entirely with doubles where possible.

  1. lt :: AST.Operand -> AST.Operand -> Codegen AST.Operand
  2. lt a b = do
  3. test <- fcmp FP.ULT a b
  4. uitofp double test

Just like the call instruction above we simply generate the code for operands and invoke the function we just looked up for the symbol.

  1. cgen (S.BinaryOp op a b) = do
  2. case Map.lookup op binops of
  3. Just f -> do
  4. ca <- cgen a
  5. cb <- cgen b
  6. f ca cb
  7. Nothing -> error "No such operator"

Putting everything together we find that we nice little minimal language that supports both function abstraction and basic arithmetic. The final step is to hook into LLVM bindings to generate a string representation of the LLVM IR which will print out the string on each action in the REPL. We’ll discuss these functions in more depth in the next chapter.

  1. codegen :: AST.Module -> [S.Expr] -> IO AST.Module
  2. codegen mod fns = withContext $ \context ->
  3. liftError $ withModuleFromAST context newast $ \m -> do
  4. llstr <- moduleLLVMAssembly m
  5. putStrLn llstr
  6. return newast
  7. where
  8. modn = mapM codegenTop fns
  9. newast = runLLVM mod modn

Running Main.hs we can observe our code generator in action.

  1. ready> def foo(a b) a*a + 2*a*b + b*b;
  2. ; ModuleID = 'my cool jit'
  3. define double @foo(double %a, double %b) {
  4. entry:
  5. %0 = fmul double %a, %a
  6. %1 = fmul double %a, 2.000000e+00
  7. %2 = fmul double %1, %b
  8. %3 = fadd double %0, %2
  9. %4 = fmul double %b, %b
  10. %5 = fadd double %4, %3
  11. ret double %5
  12. }
  13. ready> def bar(a) foo(a, 4.0) + bar(31337);
  14. define double @bar(double %a) {
  15. entry:
  16. %0 = alloca double
  17. store double %a, double* %0
  18. %1 = load double* %0
  19. %2 = call double @foo(double %1, double 4.000000e+00)
  20. %3 = call double @bar(double 3.133700e+04)
  21. %4 = fadd double %2, %3
  22. ret double %4
  23. }