Chapter 3 ( Code Generation ) - Instructions - 《Implementing a JIT Compiled Language with Haskell and LLVM》

Instructions

Instructions

Now that we have the basic infrastructure in place we’ll wrap the raw llvm-hs AST nodes inside a collection of helper functions to push instructions onto the stack held within our monad.

Instructions in LLVM are either numbered sequentially (%0, %1, …) or given explicit variable names (%a, %foo, ..). For example, the arguments to the following function are named values, while the result of the add instruction is unnamed.

define i32 @add(i32 %a, i32 %b) {
  %1 = add i32 %a, %b
  ret i32 %1
}

In the implementation of llvm-hs both these types are represented in a sum type containing the constructors UnName and Name. For most of our purpose we will simply use numbered expressions and map the numbers to identifiers within our symbol table. Every instruction added will increment the internal counter, to accomplish this we add a fresh name supply.

fresh :: Codegen Word
fresh = do
  i <- gets count
  modify $ \s -> s { count = 1 + i }
  return $ i + 1

Throughout our code we will however refer named values within the module, these have a special data type Name (with an associated IsString instance so that Haskell can automatically perform the boilerplate coercions between String types) for which we’ll create a second name supply map which guarantees that our block names are unique.

type Names = Map.Map String Int
uniqueName :: String -> Names -> (String, Names)
uniqueName nm ns =
  case Map.lookup nm ns of
    Nothing -> (nm,  Map.insert nm 1 ns)
    Just ix -> (nm ++ show ix, Map.insert nm (ix+1) ns)

Since we can now work with named LLVM values we need to create several functions for referring to references of values.

local ::  Name -> Operand
local = LocalReference double
externf :: Name -> Operand
externf = ConstantOperand . C.GlobalReference double

Our function externf will emit a named value which refers to a toplevel function (@add) in our module or will refer to an externally declared function (@putchar). For instance:

declare i32 @putchar(i32)
define i32 @add(i32 %a, i32 %b) {
  %1 = add i32 %a, %b
  ret i32 %1
}
define void @main() {
  %1 = call i32 @add(i32 0, i32 97)
  call i32 @putchar(i32 %1)
  ret void
}

Since we’d like to refer to values on the stack by named quantities we’ll implement a simple symbol table as an association list letting us assign variable names to operand quantities and subsequently look them up when used.

assign :: String -> Operand -> Codegen ()
assign var x = do
  lcls <- gets symtab
  modify $ \s -> s { symtab = [(var, x)] ++ lcls }
getvar :: String -> Codegen Operand
getvar var = do
  syms <- gets symtab
  case lookup var syms of
    Just x  -> return x
    Nothing -> error $ "Local variable not in scope: " ++ show var

Now that we have a way of naming instructions we’ll create an internal function to take an llvm-hs AST node and push it on the current basic block stack. We’ll return the left hand side reference of the instruction. Instructions will come in two flavors, instructions and terminators. Every basic block has a unique terminator and every last basic block in a function must terminate in a ret.

instr :: Instruction -> Codegen (Operand)
instr ins = do
  n <- fresh
  let ref = (UnName n)
  blk <- current
  let i = stack blk
  modifyBlock (blk { stack = (ref := ins) : i } )
  return $ local ref
terminator :: Named Terminator -> Codegen (Named Terminator)
terminator trm = do
  blk <- current
  modifyBlock (blk { term = Just trm })
  return trm

Using the instr function we now wrap the AST nodes for basic arithmetic operations of floating point values.

fadd :: Operand -> Operand -> Codegen Operand
fadd a b = instr $ FAdd NoFastMathFlags a b []
fsub :: Operand -> Operand -> Codegen Operand
fsub a b = instr $ FSub NoFastMathFlags a b []
fmul :: Operand -> Operand -> Codegen Operand
fmul a b = instr $ FMul NoFastMathFlags a b []
fdiv :: Operand -> Operand -> Codegen Operand
fdiv a b = instr $ FDiv NoFastMathFlags a b []

On top of the basic arithmetic functions we’ll add the basic control flow operations which will allow us to direct the control flow between basic blocks and return values.

br :: Name -> Codegen (Named Terminator)
br val = terminator $ Do $ Br val []
cbr :: Operand -> Name -> Name -> Codegen (Named Terminator)
cbr cond tr fl = terminator $ Do $ CondBr cond tr fl []
ret :: Operand -> Codegen (Named Terminator)
ret val = terminator $ Do $ Ret (Just val) []

Finally we’ll add several “effect” instructions which will invoke memory and evaluation side-effects. The call instruction will simply take a named function reference and a list of arguments and evaluate it and simply invoke it at the current position. The alloca instruction will create a pointer to a stack allocated uninitialized value of the given type.

call :: Operand -> [Operand] -> Codegen Operand
call fn args = instr $ Call Nothing CC.C [] (Right fn) (toArgs args) [] []
alloca :: Type -> Codegen Operand
alloca ty = instr $ Alloca ty Nothing 0 []
store :: Operand -> Operand -> Codegen Operand
store ptr val = instr $ Store False ptr val Nothing 0 []
load :: Operand -> Codegen Operand
load ptr = instr $ Load False ptr Nothing 0 []