Chapter 19 Interfacing C with OCaml

19.1 Overview and compilation information

19.1.1 Declaring primitives

definition::=
external value-name : typexpr = external-declaration
external-declaration::= string-literal [ string-literal [ string-literal ] ]

User primitives are declared in an implementation file orstruct…end module expression using the external keyword:

  1. external name : type = C-function-name

This defines the value name name as a function with typetype that executes by calling the given C function.For instance, here is how the input primitive is declared in thestandard library module Pervasives:

  1. external input : in_channel -> bytes -> int -> int -> int
  2. = "input"

Primitives with several arguments are always curried. The C functiondoes not necessarily have the same name as the ML function.

External functions thus defined can be specified in interface files orsig…end signatures either as regular values

  1. val name : type

thus hiding their implementation as C functions, or explicitly as“manifest” external functions

  1. external name : type = C-function-name

The latter is slightly more efficient, as it allows clients of themodule to call directly the C function instead of going through thecorresponding OCaml function. On the other hand, it should not be usedin library modules if they have side-effects at toplevel, as thisdirect call interferes with the linker’s algorithm for removing unusedmodules from libraries at link-time.

The arity (number of arguments) of a primitive is automaticallydetermined from its OCaml type in the external declaration, bycounting the number of function arrows in the type. For instance,input above has arity 4, and the input C function is called withfour arguments. Similarly,

  1. external input2 : in_channel * bytes * int * int -> int = "input2"

has arity 1, and the input2 C function receives one argument (whichis a quadruple of OCaml values).

Type abbreviations are not expanded when determining the arity of aprimitive. For instance,

  1. type int_endo = int -> int
  2. external f : int_endo -> int_endo = "f"
  3. external g : (int -> int) -> (int -> int) = "f"

f has arity 1, but g has arity 2. This allows a primitive toreturn a functional value (as in the f example above): just rememberto name the functional return type in a type abbreviation.

The language accepts external declarations with one or twoflag strings in addition to the C function’s name. These flags arereserved for the implementation of the standard library.

19.1.2 Implementing primitives

User primitives with arity n ≤ 5 are implemented by C functionsthat take n arguments of type value, and return a result of typevalue. The type value is the type of the representations for OCamlvalues. It encodes objects of several base types (integers,floating-point numbers, strings, …) as well as OCaml datastructures. The type value and the associated conversionfunctions and macros are described in detail below. For instance,here is the declaration for the C function implementing the inputprimitive:

  1. CAMLprim value input(value channel, value buffer, value offset, value length)
  2. {
  3. ...
  4. }

When the primitive function is applied in an OCaml program, the Cfunction is called with the values of the expressions to which theprimitive is applied as arguments. The value returned by the function ispassed back to the OCaml program as the result of the functionapplication.

User primitives with arity greater than 5 should be implemented by twoC functions. The first function, to be used in conjunction with thebytecode compiler ocamlc, receives two arguments: a pointer to anarray of OCaml values (the values for the arguments), and aninteger which is the number of arguments provided. The other function,to be used in conjunction with the native-code compiler ocamlopt,takes its arguments directly. For instance, here are the two Cfunctions for the 7-argument primitive Nat.add_nat:

  1. CAMLprim value add_nat_native(value nat1, value ofs1, value len1,
  2. value nat2, value ofs2, value len2,
  3. value carry_in)
  4. {
  5. ...
  6. }
  7. CAMLprim value add_nat_bytecode(value * argv, int argn)
  8. {
  9. return add_nat_native(argv[0], argv[1], argv[2], argv[3],
  10. argv[4], argv[5], argv[6]);
  11. }

The names of the two C functions must be given in the primitivedeclaration, as follows:

  1. external name : type =
  2. bytecode-C-function-name native-code-C-function-name

For instance, in the case of add_nat, the declaration is:

  1. external add_nat: nat -> int -> int -> nat -> int -> int -> int -> int
  2. = "add_nat_bytecode" "add_nat_native"

Implementing a user primitive is actually two separate tasks: on theone hand, decoding the arguments to extract C values from the givenOCaml values, and encoding the return value as an OCamlvalue; on the other hand, actually computing the result from the arguments.Except for very simple primitives, it is often preferable to have twodistinct C functions to implement these two tasks. The first functionactually implements the primitive, taking native C values asarguments and returning a native C value. The second function,often called the “stub code”, is a simple wrapper around the firstfunction that converts its arguments from OCaml values to C values,call the first function, and convert the returned C value to OCamlvalue. For instance, here is the stub code for the inputprimitive:

  1. CAMLprim value input(value channel, value buffer, value offset, value length)
  2. {
  3. return Val_long(getblock((struct channel *) channel,
  4. &Byte(buffer, Long_val(offset)),
  5. Long_val(length)));
  6. }

(Here, Val_long, Long_val and so on are conversion macros for thetype value, that will be described later. The CAMLprim macroexpands to the required compiler directives to ensure that thefunction is exported and accessible from OCaml.)The hard work is performed by the function getblock, which isdeclared as:

  1. long getblock(struct channel * channel, char * p, long n)
  2. {
  3. ...
  4. }

To write C code that operates on OCaml values, the followinginclude files are provided:

Include fileProvides
caml/mlvalues.hdefinition of the value type, and conversionmacros
caml/alloc.hallocation functions (to create structured OCamlobjects)
caml/memory.hmiscellaneous memory-related functionsand macros (for GC interface, in-place modification of structures, etc).
caml/fail.hfunctions for raising exceptions(see section 19.4.5)
caml/callback.hcallback from C to OCaml (seesection 19.7).
caml/custom.hoperations on custom blocks (seesection 19.9).
caml/intext.hoperations for writing user-definedserialization and deserialization functions for custom blocks(see section 19.9).
caml/threads.hoperations for interfacing in the presenceof multiple threads (see section 19.12).

These files reside in the caml/ subdirectory of the OCamlstandard library directory, which is returned by the commandocamlc -where (usually /usr/local/lib/ocaml or /usr/lib/ocaml).

By default, header files in the caml/ subdirectory give only accessto the public interface of the OCaml runtime. It is possible to definethe macro CAML_INTERNALS to get access to a lower-level interface,but this lower-level interface is more likely to change and breakprograms that use it.

Note: It is recommended to define the macro CAMLNAME_SPACEbefore including these header files. If you do not define it, theheader files will also define short names (without the caml prefix)for most functions, which usually produce clashes with names definedby other C libraries that you might use. Including the header fileswithout CAML_NAME_SPACE is only supported for backwardcompatibility.

19.1.3 Statically linking C code with OCaml code

The OCaml runtime system comprises three main parts: the bytecodeinterpreter, the memory manager, and a set of C functions thatimplement the primitive operations. Some bytecode instructions areprovided to call these C functions, designated by their offset in atable of functions (the table of primitives).

In the default mode, the OCaml linker produces bytecode for thestandard runtime system, with a standard set of primitives. Referencesto primitives that are not in this standard set result in the“unavailable C primitive” error. (Unless dynamic loading of Clibraries is supported – see section 19.1.4 below.)

In the “custom runtime” mode, the OCaml linker scans theobject files and determines the set of required primitives. Then, itbuilds a suitable runtime system, by calling the native code linker with:

  • the table of the required primitives;
  • a library that provides the bytecode interpreter, thememory manager, and the standard primitives;
  • libraries and object code files (.o files) mentioned on thecommand line for the OCaml linker, that provide implementationsfor the user’s primitives.This builds a runtime system with the required primitives. The OCamllinker generates bytecode for this custom runtime system. Thebytecode is appended to the end of the custom runtime system, so thatit will be automatically executed when the output file (customruntime + bytecode) is launched.

To link in “custom runtime” mode, execute the ocamlc command with:

  • the -custom option;
  • the names of the desired OCaml object files (.cmo and .cma files) ;
  • the names of the C object files and libraries (.o and .afiles) that implement the required primitives. Under Unix and Windows,a library named libname.a (respectively, .lib) residing in one ofthe standard library directories can also be specified as -cclib -lname.If you are using the native-code compiler ocamlopt, the -customflag is not needed, as the final linking phase of ocamlopt alwaysbuilds a standalone executable. To build a mixed OCaml/C executable,execute the ocamlopt command with:

  • the names of the desired OCaml native object files (.cmx and.cmxa files);

  • the names of the C object files and libraries (.o, .a,.so or .dll files) that implement the required primitives.Starting with Objective Caml 3.00, it is possible to record the-custom option as well as the names of C libraries in an OCamllibrary file .cma or .cmxa. For instance, consider an OCaml librarymylib.cma, built from the OCaml object files a.cmo and b.cmo,which reference C code in libmylib.a. If the library isbuilt as follows:
  1. ocamlc -a -o mylib.cma -custom a.cmo b.cmo -cclib -lmylib

users of the library can simply link with mylib.cma:

  1. ocamlc -o myprog mylib.cma ...

and the system will automatically add the -custom and -cclib -lmylib options, achieving the same effect as

  1. ocamlc -o myprog -custom a.cmo b.cmo ... -cclib -lmylib

The alternative is of course to build the library without extraoptions:

  1. ocamlc -a -o mylib.cma a.cmo b.cmo

and then ask users to provide the -custom and -cclib -lmyliboptions themselves at link-time:

  1. ocamlc -o myprog -custom mylib.cma ... -cclib -lmylib

The former alternative is more convenient for the final users of thelibrary, however.

19.1.4 Dynamically linking C code with OCaml code

Starting with Objective Caml 3.03, an alternative to static linking of C codeusing the -custom code is provided. In this mode, the OCaml linkergenerates a pure bytecode executable (no embedded custom runtimesystem) that simply records the names of dynamically-loaded librariescontaining the C code. The standard OCaml runtime system ocamlrunthen loads dynamically these libraries, and resolves references to therequired primitives, before executing the bytecode.

This facility is currently supported and known to work well underLinux, MacOS X, and Windows. It is supported, but notfully tested yet, under FreeBSD, Tru64, Solaris and Irix. It is notsupported yet under other Unixes.

To dynamically link C code with OCaml code, the C code must first becompiled into a shared library (under Unix) or DLL (under Windows).This involves 1- compiling the C files with appropriate C compilerflags for producing position-independent code (when required by theoperating system), and 2- building ashared library from the resulting object files. The resulting sharedlibrary or DLL file must be installed in a place where ocamlrun canfind it later at program start-up time (seesection 11.3).Finally (step 3), execute the ocamlc command with

  • the names of the desired OCaml object files (.cmo and .cma files) ;
  • the names of the C shared libraries (.so or .dll files) thatimplement the required primitives. Under Unix and Windows,a library named dllname.so (respectively, .dll) residingin one of the standard library directories can also be specified as-dllib -lname.Do not set the -custom flag, otherwise you’re back to static linkingas described in section 19.1.3.The ocamlmklib tool (see section 19.14)automates steps 2 and 3.

As in the case of static linking, it is possible (and recommended) torecord the names of C libraries in an OCaml .cma library archive.Consider again an OCaml librarymylib.cma, built from the OCaml object files a.cmo and b.cmo,which reference C code in dllmylib.so. If the library isbuilt as follows:

  1. ocamlc -a -o mylib.cma a.cmo b.cmo -dllib -lmylib

users of the library can simply link with mylib.cma:

  1. ocamlc -o myprog mylib.cma ...

and the system will automatically add the -dllib -lmylib option,achieving the same effect as

  1. ocamlc -o myprog a.cmo b.cmo ... -dllib -lmylib

Using this mechanism, users of the library mylib.cma do not need toknown that it references C code, nor whether this C code must bestatically linked (using -custom) or dynamically linked.

19.1.5 Choosing between static linking and dynamic linking

After having described two different ways of linking C code with OCamlcode, we now review the pros and cons of each, to help developers ofmixed OCaml/C libraries decide.

The main advantage of dynamic linking is that it preserves theplatform-independence of bytecode executables. That is, the bytecodeexecutable contains no machine code, and can therefore be compiled onplatform A and executed on other platforms B, C, …, as longas the required shared libraries are available on all theseplatforms. In contrast, executables generated by ocamlc -custom runonly on the platform on which they were created, because they embark acustom-tailored runtime system specific to that platform. Inaddition, dynamic linking results in smaller executables.

Another advantage of dynamic linking is that the final users of thelibrary do not need to have a C compiler, C linker, and C runtimelibraries installed on their machines. This is no big deal underUnix and Cygwin, but many Windows users are reluctant to installMicrosoft Visual C just to be able to do ocamlc -custom.

There are two drawbacks to dynamic linking. The first is that theresulting executable is not stand-alone: it requires the sharedlibraries, as well as ocamlrun, to be installed on the machineexecuting the code. If you wish to distribute a stand-aloneexecutable, it is better to link it statically, using ocamlc -custom -ccopt -static or ocamlopt -ccopt -static. Dynamic linking alsoraises the “DLL hell” problem: some care must be taken to ensurethat the right versions of the shared libraries are found at start-uptime.

The second drawback of dynamic linking is that it complicates theconstruction of the library. The C compiler and linker flags tocompile to position-independent code and build a shared library varywildly between different Unix systems. Also, dynamic linking is notsupported on all Unix systems, requiring a fall-back case to staticlinking in the Makefile for the library. The ocamlmklib command(see section 19.14) tries to hide some of these systemdependencies.

In conclusion: dynamic linking is highly recommended under the nativeWindows port, because there are no portability problems and it is muchmore convenient for the end users. Under Unix, dynamic linking shouldbe considered for mature, frequently used libraries because itenhances platform-independence of bytecode executables. For new orrarely-used libraries, static linking is much simpler to set up in aportable way.

19.1.6 Building standalone custom runtime systems

It is sometimes inconvenient to build a custom runtime system eachtime OCaml code is linked with C libraries, like ocamlc -custom does.For one thing, the building of the runtime system is slow on somesystems (that have bad linkers or slow remote file systems); foranother thing, the platform-independence of bytecode files is lost,forcing to perform one ocamlc -custom link per platform of interest.

An alternative to ocamlc -custom is to build separately a customruntime system integrating the desired C libraries, then generate“pure” bytecode executables (not containing their own runtimesystem) that can run on this custom runtime. This is achieved by the-make-runtime and -use-runtime flags to ocamlc. For example,to build a custom runtime system integrating the C parts of the“Unix” and “Threads” libraries, do:

  1. ocamlc -make-runtime -o /home/me/ocamlunixrun unix.cma threads.cma

To generate a bytecode executable that runs on this runtime system,do:

  1. ocamlc -use-runtime /home/me/ocamlunixrun -o myprog \
  2. unix.cma threads.cma your .cmo and .cma files

The bytecode executable myprog can then be launched as usual:myprogargs or /home/me/ocamlunixrun myprogargs.

Notice that the bytecode libraries unix.cma and threads.cma mustbe given twice: when building the runtime system (so that ocamlcknows which C primitives are required) and also when building thebytecode executable (so that the bytecode from unix.cma andthreads.cma is actually linked in).

19.2 The value type

All OCaml objects are represented by the C type value,defined in the include file caml/mlvalues.h, along with macros tomanipulate values of that type. An object of type value is either:

  • an unboxed integer;
  • a pointer to a block inside the heap (such as the blocksallocated through one of the camlalloc* functions below);
  • a pointer to an object outside the heap (e.g., a pointer to a blockallocated by malloc, or to a C variable).

19.2.1 Integer values

Integer values encode 63-bit signed integers (31-bit on 32-bitarchitectures). They are unboxed (unallocated).

19.2.2 Blocks

Blocks in the heap are garbage-collected, and therefore have strictstructure constraints. Each block includes a header containing thesize of the block (in words), and the tag of the block.The tag governs how the contents of the blocks are structured. A taglower than No_scan_tag indicates a structured block, containingwell-formed values, which is recursively traversed by the garbagecollector. A tag greater than or equal to No_scan_tag indicates araw block, whose contents are not scanned by the garbage collector.For the benefit of ad-hoc polymorphic primitives such as equality andstructured input-output, structured and raw blocks are furtherclassified according to their tags as follows:

TagContents of the block
0 to No_scan_tag−1A structured block (an array ofOCaml objects). Each field is a value.
Closure_tagA closure representing a functional value. The firstword is a pointer to a piece of code, the remaining words arevalue containing the environment.
String_tagA character string or a byte sequence.
Double_tagA double-precision floating-point number.
Double_array_tagAn array or record of double-precisionfloating-point numbers.
Abstract_tagA block representing an abstract datatype.
Custom_tagA block representing an abstract datatypewith user-defined finalization, comparison, hashing,serialization and deserialization functions atttached.

19.2.3 Pointers outside the heap

Any word-aligned pointer to an address outside the heap can be safelycast to and from the type value. This includes pointers returned bymalloc, and pointers to C variables (of size at least one word)obtained with the & operator.

Caution: if a pointer returned by malloc is cast to the type valueand returned to OCaml, explicit deallocation of the pointer usingfree is potentially dangerous, because the pointer may still beaccessible from the OCaml world. Worse, the memory space deallocatedby free can later be reallocated as part of the OCaml heap; thepointer, formerly pointing outside the OCaml heap, now points insidethe OCaml heap, and this can crash the garbage collector. To avoidthese problems, it is preferable to wrap the pointer in a OCaml blockwith tag Abstract_tag or Custom_tag.

19.3 Representation of OCaml data types

This section describes how OCaml data types are encoded in thevalue type.

19.3.1 Atomic types

OCaml typeEncoding
intUnboxed integer values.
charUnboxed integer values (ASCII code).
floatBlocks with tag Double_tag.
bytesBlocks with tag String_tag.
stringBlocks with tag String_tag.
int32Blocks with tag Custom_tag.
int64Blocks with tag Custom_tag.
nativeintBlocks with tag Custom_tag.

19.3.2 Tuples and records

Tuples are represented by pointers to blocks, with tag 0.

Records are also represented by zero-tagged blocks. The ordering oflabels in the record type declaration determines the layout ofthe record fields: the value associated to the labeldeclared first is stored in field 0 of the block, the value associatedto the second label goes in field 1, and so on.

As an optimization, records whose fields all have static type floatare represented as arrays of floating-point numbers, with tagDouble_array_tag. (See the section below on arrays.)

As another optimization, unboxable record types are representedspecially; unboxable record types are the immutable record types thathave only one field. An unboxable type will be represented in one oftwo ways: boxed or unboxed. Boxed record types are represented asdescribed above (by a block with tag 0 or Double_array_tag). Anunboxed record type is represented directly by the value of its field(i.e. there is no block to represent the record itself).

The representation is chosen according to the following, in decreasingorder of priority:

  • An attribute ([@@boxed] or [@@unboxed]) on the type declaration.
  • A compiler option (-unboxed-types or -no-unboxed-types).
  • The default representation. In the present version of OCaml, thedefault is the boxed representation.

19.3.3 Arrays

Arrays of integers and pointers are represented like tuples,that is, as pointers to blocks tagged 0. They are accessed with theField macro for reading and the caml_modify function for writing.

Arrays of floating-point numbers (type float array)have a special, unboxed, more efficient representation.These arrays are represented by pointers to blocks with tagDouble_array_tag. They should be accessed with the Double_fieldand Store_double_field macros.

19.3.4 Concrete data types

Constructed terms are represented either by unboxed integers (forconstant constructors) or by blocks whose tag encode the constructor(for non-constant constructors). The constant constructors and thenon-constant constructors for a given concrete type are numberedseparately, starting from 0, in the order in which they appear in theconcrete type declaration. A constant constructor is represented bythe unboxed integer equal to its constructor number. A non-constantconstructor declared with n arguments is represented bya block of size n, tagged with the constructor number; the nfields contain its arguments. Example:

Constructed termRepresentation
()Val_int(0)
falseVal_int(0)
trueVal_int(1)
[]Val_int(0)
h::tBlock with size = 2 and tag = 0; first fieldcontains h, second field t.

As a convenience, caml/mlvalues.h defines the macros Val_unit,Val_false and Val_true to refer to (), false and true.

The following example illustrates the assignment ofintegers and block tags to constructors:

  1. type t =
  2. | A (* First constant constructor -> integer "Val_int(0)" *)
  3. | B of string (* First non-constant constructor -> block with tag 0 *)
  4. | C (* Second constant constructor -> integer "Val_int(1)" *)
  5. | D of bool (* Second non-constant constructor -> block with tag 1 *)
  6. | E of t * t (* Third non-constant constructor -> block with tag 2 *)

As an optimization, unboxable concrete data types are representedspecially; a concrete data type is unboxable if it has exactly oneconstructor and this constructor has exactly one argument. Unboxableconcrete data types are represented in the same ways as unboxablerecord types: see the description insection 19.3.2.

19.3.5 Objects

Objects are represented as blocks with tag Object_tag. The firstfield of the block refers to the object’s class and associated methodsuite, in a format that cannot easily be exploited from C. The secondfield contains a unique object ID, used for comparisons. The remainingfields of the object contain the values of the instance variables ofthe object. It is unsafe to access directly instance variables, as thetype system provides no guarantee about the instance variablescontained by an object.

One may extract a public method from an object using the C functioncaml_get_public_method (declared in <caml/mlvalues.h>.)Since public method tags are hashed in the same way as variant tags,and methods are functions taking self as first argument, if you wantto do the method call foo#bar from the C side, you should call:

  1. callback(caml_get_public_method(foo, hash_variant("bar")), foo);

19.3.6 Polymorphic variants

Like constructed terms, polymorphic variant values are represented eitheras integers (for polymorphic variants without argument), or as blocks(for polymorphic variants with an argument). Unlike constructedterms, variant constructors are not numbered starting from 0, butidentified by a hash value (an OCaml integer), as computed by the C functionhash_variant (declared in <caml/mlvalues.h>):the hash value for a variant constructor named, say, VConstris hash_variant("VConstr").

The variant value VConstr is represented byhash_variant(&#34;VConstr&#34;). The variant valueVConstr(v) isrepresented by a block of size 2 and tag 0, with field number 0containing hash_variant("VConstr") and field number 1 containingv.

Unlike constructed values, polymorphic variant values taking severalarguments are not flattened.That is, `VConstr(v, w) is represented by a blockof size 2, whose field number 1 contains the representation of thepair (v, w), rather than a block of size 3containing v and w in fields 1 and 2.

19.4 Operations on values

19.4.1 Kind tests

  • Is_long(v) is true if value v is an immediate integer,false otherwise
  • Is_block(v) is true if value v is a pointer to a block,and false if it is an immediate integer.

19.4.2 Operations on integers

  • Val_long(l) returns the value encoding the long intl.
  • Long_val(v) returns the long int encoded in value v.
  • Val_int(i) returns the value encoding the inti.
  • Int_val(v) returns the int encoded in value v.
  • Val_bool(x) returns the OCaml boolean representing thetruth value of the C integer x.
  • Bool_val(v) returns 0 if v is the OCaml booleanfalse, 1 if v is true.
  • Val_true, Val_false represent the OCaml booleans true and false.

19.4.3 Accessing blocks

  • Wosize_val(v) returns the size of the block v, in words,excluding the header.
  • Tag_val(v) returns the tag of the block v.
  • Field(v, n) returns the value contained in thenth field of the structured block v. Fields are numbered from 0 toWosize_val(v)−1.
  • Store_field(b, n, v) stores the valuev in the field number n of value b, which must be astructured block.
  • Code_val(v) returns the code part of the closure v.
  • caml_string_length(v) returns the length (number of bytes)of the string or byte sequence v.
  • Byte(v, n) returns the nth byte of the stringor byte sequence v, with type char. Bytes are numbered from 0 tostring_length(v)−1.
  • Byte_u(v, n) returns the nth byte of the stringor byte sequence v, with type unsigned char. Bytes arenumbered from 0 to string_length(v)−1.
  • String_val(v) returns a pointer to the first byte of the stringv, with type char or, when OCaml is configured with-force-safe-string, with type const char .This pointer is a valid C string: there is a null byte after the lastbyte in the string. However, OCaml strings can contain embedded null bytes,which will confuse the usual C functions over strings.
  • Bytes_val(v) returns a pointer to the first byte of thebyte sequence v, with type unsigned char *.
  • Double_val(v) returns the floating-point number contained invalue v, with type double.
  • Double_field(v, n) returnsthe nth element of the array of floating-point numbers v (ablock tagged Double_array_tag).
  • Store_double_field(v, n, d) stores the double precision floating-point number din the nth element of the array of floating-point numbers v.
  • Data_custom_val(v) returns a pointer to the data partof the custom block v. This pointer has type void * and mustbe cast to the type of the data contained in the custom block.
  • Int32_val(v) returns the 32-bit integer containedin the int32v.
  • Int64_val(v) returns the 64-bit integer containedin the int64v.
  • Nativeint_val(v) returns the long integer containedin the nativeintv.
  • caml_field_unboxed(v) returns the value of the fieldof a value v of any unboxed type (record or concrete data type).
  • caml_field_boxed(v) returns the value of the fieldof a value v of any boxed type (record or concrete data type).
  • caml_field_unboxable(v) calls eithercaml_field_unboxed or caml_field_boxed according to the defaultrepresentation of unboxable types in the current version of OCaml.The expressions Field(v, n),Byte(v, n) andByte_u(v, n)are valid l-values. Hence, they can be assigned to, resulting in anin-place modification of value v.Assigning directly to Field(v, n) mustbe done with care to avoid confusing the garbage collector (seebelow).

19.4.4 Allocating blocks

Simple interface

  • Atom(t) returns an “atom” (zero-sized block) with tag t.Zero-sized blocks are preallocated outside of the heap. It isincorrect to try and allocate a zero-sized block using the functions below.For instance, Atom(0) represents the empty array.
  • caml_alloc(n, t) returns a fresh block of size nwith tag t. If t is less than No_scan_tag, then thefields of the block are initialized with a valid value in order tosatisfy the GC constraints.
  • caml_alloc_tuple(n) returns a fresh block of sizen words, with tag 0.
  • caml_alloc_string(n) returns a byte sequence (or string) value oflength n bytes. The sequence initially contains uninitialized bytes.
  • caml_alloc_initialized_string(n, p) returns a byte sequence(or string) value of length n bytes. The value is initialized from then bytes starting at address p.
  • caml_copy_string(s) returns a string or byte sequence valuecontaining a copy of the null-terminated C string s (a char *).
  • caml_copy_double(d) returns a floating-point value initializedwith the doubled.
  • caml_copy_int32(i), caml_copy_int64(i) andcaml_copy_nativeint(i) return a value of OCaml type int32,int64 and nativeint, respectively, initialized with the integeri.
  • caml_alloc_array(f, a) allocates an array of values, callingfunction f over each element of the input array a to transform itinto a value. The array a is an array of pointers terminated by thenull pointer. The function f receives each pointer as argument, andreturns a value. The zero-tagged block returned byalloc_array(f, a) is filled with the values returned by thesuccessive calls to f. (This function must not be used to buildan array of floating-point numbers.)
  • caml_copy_string_array(p) allocates an array of strings or bytesequences, copied from the pointer to a string array p(a char **). p must be NULL-terminated.
  • caml_alloc_float_array(n) allocates an array of floating pointnumbers of size n. The array initially contains uninitialized values.
  • caml_alloc_unboxed(v) returns the value (of any unboxedtype) whose field is the value v.
  • caml_alloc_boxed(v) allocates and returns a value (ofany boxed type) whose field is the value v.
  • caml_alloc_unboxable(v) calls eithercaml_alloc_unboxed or caml_alloc_boxed according to the defaultrepresentation of unboxable types in the current version of OCaml.

Low-level interface

The following functions are slightly more efficient than caml_alloc, butalso much more difficult to use.

From the standpoint of the allocation functions, blocks are dividedaccording to their size as zero-sized blocks, small blocks (with sizeless than or equal to Max_young_wosize), and large blocks (withsize greater than Max_young_wosize). The constantMax_young_wosize is declared in the include file mlvalues.h. Itis guaranteed to be at least 64 (words), so that any block withconstant size less than or equal to 64 can be assumed to be small. Forblocks whose size is computed at run-time, the size must be comparedagainst Max_young_wosize to determine the correct allocation procedure.

  • caml_alloc_small(n, t) returns a fresh small block of sizen ≤ Max_young_wosize words, with tag t.If this block is a structured block (i.e. if t < No_scan_tag), thenthe fields of the block (initially containing garbage) must be initializedwith legal values (using direct assignment to the fields of the block)before the next allocation.
  • caml_alloc_shr(n, t) returns a fresh block of sizen, with tag t.The size of the block can be greater than Max_young_wosize. (Itcan also be smaller, but in this case it is more efficient to callcaml_alloc_small instead of caml_alloc_shr.)If this block is a structured block (i.e. if t < No_scan_tag), thenthe fields of the block (initially containing garbage) must be initializedwith legal values (using the caml_initialize function described below)before the next allocation.

19.4.5 Raising exceptions

Two functions are provided to raise two standard exceptions:

  • caml_failwith(s), where s is a null-terminated C string (withtype char *), raises exception Failure with argument s.
  • caml_invalid_argument(s), where s is a null-terminated Cstring (with type char *), raises exception Invalid_argumentwith argument s.Raising arbitrary exceptions from C is more delicate: theexception identifier is dynamically allocated by the OCaml program, andtherefore must be communicated to the C function using theregistration facility described below in section 19.7.3.Once the exception identifier is recovered in C, the followingfunctions actually raise the exception:

  • caml_raise_constant(id) raises the exception id withno argument;

  • caml_raise_with_arg(id, v) raises the exceptionid with the OCaml value v as argument;
  • caml_raise_with_args(id, n, v)raises the exception id with the OCaml valuesv[0], …, v[n-1] as arguments;
  • caml_raise_with_string(id, s), where s is anull-terminated C string, raises the exception id with a copy ofthe C string s as argument.

19.5 Living in harmony with the garbage collector

Unused blocks in the heap are automatically reclaimed by the garbagecollector. This requires some cooperation from C code thatmanipulates heap-allocated blocks.

19.5.1 Simple interface

All the macros described in this section are declared in thememory.h header file.

Rule 1 A function that has parameters or local variables of type value mustbegin with a call to one of the CAMLparam macros and return withCAMLreturn, CAMLreturn0, or CAMLreturnT. In particular, CAMLlocaland CAMLxparam can only be called _after _CAMLparam.

There are six CAMLparam macros: CAMLparam0 to CAMLparam5, whichtake zero to five arguments respectively. If your function has no morethan 5 parameters of type value, use the corresponding macroswith these parameters as arguments. If your function has more than 5parameters of type value, use CAMLparam5 with five of theseparameters, and use one or more calls to the CAMLxparam macros forthe remaining parameters (CAMLxparam1 to CAMLxparam5).

The macros CAMLreturn, CAMLreturn0, and CAMLreturnT are used toreplace the Ckeyword return. Every occurrence of return x must be replaced byCAMLreturn (x) if x has type value, or CAMLreturnT (t, x)(where t is the type of x); every occurrence of return withoutargument must bereplaced by CAMLreturn0. If your C function is a procedure (i.e. ifit returns void), you must insert CAMLreturn0 at the end (to replaceC’s implicit return).

Note:

some C compilers give bogus warnings about unusedvariables caml__dummy_xxx at each use of CAMLparam andCAMLlocal. You should ignore them.

Example:

  1. void foo (value v1, value v2, value v3)
  2. {
  3. CAMLparam3 (v1, v2, v3);
  4. ...
  5. CAMLreturn0;
  6. }
Note:

if your function is a primitive with more than 5 argumentsfor use with the byte-code runtime, its arguments are not values andmust not be declared (they have types value * and int).

Rule 2 Local variables of type value must be declared with one of theCAMLlocal macros. Arrays of values are declared withCAMLlocalN. These macros must be used at the beginning of thefunction, not in a nested block.

The macros CAMLlocal1 to CAMLlocal5 declare and initialize one tofive local variables of type value. The variable names are given asarguments to the macros. CAMLlocalN(x, n) declaresand initializes a local variable of type value [n]. You canuse several calls to these macros if you have more than 5 localvariables.

Example:

  1. value bar (value v1, value v2, value v3)
  2. {
  3. CAMLparam3 (v1, v2, v3);
  4. CAMLlocal1 (result);
  5. result = caml_alloc (3, 0);
  6. ...
  7. CAMLreturn (result);
  8. }

Rule 3 Assignments to the fields of structured blocks must be done with theStore_field macro (for normal blocks) or Store_double_field macro(for arrays and records of floating-point numbers). Other assignmentsmust not use Store_field nor Store_double_field.

Store_field (b, n, v) stores the valuev in the field number n of value b, which must be ablock (i.e. Is_block(b) must be true).

Example:

  1. value bar (value v1, value v2, value v3)
  2. {
  3. CAMLparam3 (v1, v2, v3);
  4. CAMLlocal1 (result);
  5. result = caml_alloc (3, 0);
  6. Store_field (result, 0, v1);
  7. Store_field (result, 1, v2);
  8. Store_field (result, 2, v3);
  9. CAMLreturn (result);
  10. }
Warning:

The first argument of Store_field andStore_double_field must be a variable declared by CAMLparam ora parameter declared by CAMLlocal to ensure that a garbagecollection triggered by the evaluation of the other arguments will notinvalidate the first argument after it is computed.

Use with CAMLlocalN:

Arrays of values declared usingCAMLlocalN must not be written to using Store_field.Use the normal C array syntax instead.

Rule 4 Global variables containing values must be registeredwith the garbage collector using the caml_register_global_root function.

Registration of a global variable v is achieved by callingcaml_register_global_root(&v) just before or just after a validvalue is stored in v for the first time. You must not call anyof the OCaml runtime functions or macros between registering andstoring the value.

A registered global variable v can be un-registered by callingcaml_remove_global_root(&v).

If the contents of the global variable v are seldom modified afterregistration, better performance can be achieved by callingcaml_register_generational_global_root(&v) to register v (afterits initialization with a valid value, but before any allocation orcall to the GC functions),and caml_remove_generational_global_root(&v) to un-register it. Inthis case, you must not modify the value of v directly, but you mustuse caml_modify_generational_global_root(&v,x) to set it to x.The garbage collector takes advantage of the guarantee that v is notmodified between calls to caml_modify_generational_global_root to scan itless often. This improves performance if themodifications of v happen less often than minor collections.

Note:

The CAML macros use identifiers (local variables, typeidentifiers, structure tags) that start with caml. Do not use anyidentifier starting with caml in your programs.

19.5.2 Low-level interface

We now give the GC rules corresponding to the low-level allocationfunctions caml_alloc_small and caml_alloc_shr. You can ignore those rulesif you stick to the simplified allocation function caml_alloc.

Rule 5 After a structured block (a block with tag less thanNo_scan_tag) is allocated with the low-level functions, all fieldsof this block must be filled with well-formed values before the nextallocation operation. If the block has been allocated withcaml_alloc_small, filling is performed by direct assignment to the fieldsof the block:

  1. Field(v, n) = vn;

If the block has been allocated with caml_alloc_shr, filling is performedthrough the caml_initialize function:

  1. caml_initialize(&Field(v, n), vn);

The next allocation can trigger a garbage collection. The garbagecollector assumes that all structured blocks contain well-formedvalues. Newly created blocks contain random data, which generally donot represent well-formed values.

If you really need to allocate before the fields can receive theirfinal value, first initialize with a constant value (e.g.Val_unit), then allocate, then modify the fields with the correctvalue (see rule 6).

Rule 6 Direct assignment to a field of a block, as in

  1. Field(v, n) = w;

is safe only if v is a block newly allocated by caml_alloc_small;that is, if no allocation took place between theallocation of v and the assignment to the field. In all other cases,never assign directly. If the block has just been allocated by caml_alloc_shr,use caml_initialize to assign a value to a field for the first time:

  1. caml_initialize(&Field(v, n), w);

Otherwise, you are updating a field that previously contained awell-formed value; then, call the caml_modify function:

  1. caml_modify(&Field(v, n), w);

To illustrate the rules above, here is a C function that builds andreturns a list containing the two integers given as parameters.First, we write it using the simplified allocation functions:

  1. value alloc_list_int(int i1, int i2)
  2. {
  3. CAMLparam0 ();
  4. CAMLlocal2 (result, r);
  5.  
  6. r = caml_alloc(2, 0); /* Allocate a cons cell */
  7. Store_field(r, 0, Val_int(i2)); /* car = the integer i2 */
  8. Store_field(r, 1, Val_int(0)); /* cdr = the empty list [] */
  9. result = caml_alloc(2, 0); /* Allocate the other cons cell */
  10. Store_field(result, 0, Val_int(i1)); /* car = the integer i1 */
  11. Store_field(result, 1, r); /* cdr = the first cons cell */
  12. CAMLreturn (result);
  13. }

Here, the registering of result is not strictly needed, because noallocation takes place after it gets its value, but it’s easier andsafer to simply register all the local variables that have type value.

Here is the same function written using the low-level allocationfunctions. We notice that the cons cells are small blocks and can beallocated with caml_alloc_small, and filled by direct assignments ontheir fields.

  1. value alloc_list_int(int i1, int i2)
  2. {
  3. CAMLparam0 ();
  4. CAMLlocal2 (result, r);
  5.  
  6. r = caml_alloc_small(2, 0); /* Allocate a cons cell */
  7. Field(r, 0) = Val_int(i2); /* car = the integer i2 */
  8. Field(r, 1) = Val_int(0); /* cdr = the empty list [] */
  9. result = caml_alloc_small(2, 0); /* Allocate the other cons cell */
  10. Field(result, 0) = Val_int(i1); /* car = the integer i1 */
  11. Field(result, 1) = r; /* cdr = the first cons cell */
  12. CAMLreturn (result);
  13. }

In the two examples above, the list is built bottom-up. Here is analternate way, that proceeds top-down. It is less efficient, butillustrates the use of caml_modify.

  1. value alloc_list_int(int i1, int i2)
  2. {
  3. CAMLparam0 ();
  4. CAMLlocal2 (tail, r);
  5.  
  6. r = caml_alloc_small(2, 0); /* Allocate a cons cell */
  7. Field(r, 0) = Val_int(i1); /* car = the integer i1 */
  8. Field(r, 1) = Val_int(0); /* A dummy value
  9. tail = caml_alloc_small(2, 0); /* Allocate the other cons cell */
  10. Field(tail, 0) = Val_int(i2); /* car = the integer i2 */
  11. Field(tail, 1) = Val_int(0); /* cdr = the empty list [] */
  12. caml_modify(&Field(r, 1), tail); /* cdr of the result = tail */
  13. CAMLreturn (r);
  14. }

It would be incorrect to performField(r, 1) = tail directly, because the allocation of tailhas taken place since r was allocated.

19.6 A complete example

This section outlines how the functions from the Unix curses librarycan be made available to OCaml programs. First of all, here isthe interface curses.ml that declares the curses primitives anddata types:

  1. (* File curses.ml -- declaration of primitives and data types *)
  2. type window (* The type "window" remains abstract *)
  3. external initscr: unit -> window = "caml_curses_initscr"
  4. external endwin: unit -> unit = "caml_curses_endwin"
  5. external refresh: unit -> unit = "caml_curses_refresh"
  6. external wrefresh : window -> unit = "caml_curses_wrefresh"
  7. external newwin: int -> int -> int -> int -> window = "caml_curses_newwin"
  8. external addch: char -> unit = "caml_curses_addch"
  9. external mvwaddch: window -> int -> int -> char -> unit = "caml_curses_mvwaddch"
  10. external addstr: string -> unit = "caml_curses_addstr"
  11. external mvwaddstr: window -> int -> int -> string -> unit
  12. = "caml_curses_mvwaddstr"
  13. (* lots more omitted *)

To compile this interface:

  1. ocamlc -c curses.ml

To implement these functions, we just have to provide the stub code;the core functions are already implemented in the curses library.The stub code file, curses_stubs.c, looks like this:

  1. /* File curses_stubs.c -- stub code for curses */
  2. #include <curses.h>
  3. #include <caml/mlvalues.h>
  4. #include <caml/memory.h>
  5. #include <caml/alloc.h>
  6. #include <caml/custom.h>
  7.  
  8. /* Encapsulation of opaque window handles (of type WINDOW *)
  9. as OCaml custom blocks. */
  10.  
  11. static struct custom_operations curses_window_ops = {
  12. "fr.inria.caml.curses_windows",
  13. custom_finalize_default,
  14. custom_compare_default,
  15. custom_hash_default,
  16. custom_serialize_default,
  17. custom_deserialize_default,
  18. custom_compare_ext_default,
  19. custom_fixed_length_default
  20. };
  21.  
  22. /* Accessing the WINDOW * part of an OCaml custom block */
  23. #define Window_val(v) (*((WINDOW **) Data_custom_val(v)))
  24.  
  25. /* Allocating an OCaml custom block to hold the given WINDOW * */
  26. static value alloc_window(WINDOW * w)
  27. {
  28. value v = alloc_custom(&curses_window_ops, sizeof(WINDOW *), 0, 1);
  29. Window_val(v) = w;
  30. return v;
  31. }
  32.  
  33. value caml_curses_initscr(value unit)
  34. {
  35. CAMLparam1 (unit);
  36. CAMLreturn (alloc_window(initscr()));
  37. }
  38.  
  39. value caml_curses_endwin(value unit)
  40. {
  41. CAMLparam1 (unit);
  42. endwin();
  43. CAMLreturn (Val_unit);
  44. }
  45.  
  46. value caml_curses_refresh(value unit)
  47. {
  48. CAMLparam1 (unit);
  49. refresh();
  50. CAMLreturn (Val_unit);
  51. }
  52.  
  53. value caml_curses_wrefresh(value win)
  54. {
  55. CAMLparam1 (win);
  56. wrefresh(Window_val(win));
  57. CAMLreturn (Val_unit);
  58. }
  59.  
  60. value caml_curses_newwin(value nlines, value ncols, value x0, value y0)
  61. {
  62. CAMLparam4 (nlines, ncols, x0, y0);
  63. CAMLreturn (alloc_window(newwin(Int_val(nlines), Int_val(ncols),
  64. Int_val(x0), Int_val(y0))));
  65. }
  66.  
  67. value caml_curses_addch(value c)
  68. {
  69. CAMLparam1 (c);
  70. addch(Int_val(c)); /* Characters are encoded like integers */
  71. CAMLreturn (Val_unit);
  72. }
  73.  
  74. value caml_curses_mvwaddch(value win, value x, value y, value c)
  75. {
  76. CAMLparam4 (win, x, y, c);
  77. mvwaddch(Window_val(win), Int_val(x), Int_val(y), Int_val(c));
  78. CAMLreturn (Val_unit);
  79. }
  80.  
  81. value caml_curses_addstr(value s)
  82. {
  83. CAMLparam1 (s);
  84. addstr(String_val(s));
  85. CAMLreturn (Val_unit);
  86. }
  87.  
  88. value caml_curses_mvwaddstr(value win, value x, value y, value s)
  89. {
  90. CAMLparam4 (win, x, y, s);
  91. mvwaddstr(Window_val(win), Int_val(x), Int_val(y), String_val(s));
  92. CAMLreturn (Val_unit);
  93. }
  94.  
  95. /* This goes on for pages. */

The file curses_stubs.c can be compiled with:

  1. cc -c -I`ocamlc -where` curses_stubs.c

or, even simpler,

  1. ocamlc -c curses_stubs.c

(When passed a .c file, the ocamlc command simply calls the Ccompiler on that file, with the right -I option.)

Now, here is a sample OCaml program prog.ml that uses the cursesmodule:

  1. (* File prog.ml -- main program using curses *)
  2. open Curses;;
  3. let main_window = initscr () in
  4. let small_window = newwin 10 5 20 10 in
  5. mvwaddstr main_window 10 2 "Hello";
  6. mvwaddstr small_window 4 3 "world";
  7. refresh();
  8. Unix.sleep 5;
  9. endwin()

To compile and link this program, run:

  1. ocamlc -custom -o prog unix.cma curses.cmo prog.ml curses_stubs.o -cclib -lcurses

(On some machines, you may need to put-cclib -lcurses -cclib -ltermcap or -cclib -ltermcapinstead of -cclib -lcurses.)

19.7 Advanced topic: callbacks from C to OCaml

So far, we have described how to call C functions from OCaml. In thissection, we show how C functions can call OCaml functions, either ascallbacks (OCaml calls C which calls OCaml), or with the main programwritten in C.

19.7.1 Applying OCaml closures from C

C functions can apply OCaml function values (closures) to OCaml values.The following functions are provided to perform the applications:

  • caml_callback(f, a) applies the functional value f tothe value a and returns the value returned by f.
  • caml_callback2(f, a, b) applies the functional value f(which is assumed to be a curried OCaml function with two arguments) toa and b.
  • caml_callback3(f, a, b, c) applies the functional value f(a curried OCaml function with three arguments) to a, b and c.
  • caml_callbackN(f, n, args) applies the functional value fto the n arguments contained in the array of values args.If the function f does not return, but raises an exception thatescapes the scope of the application, then this exception ispropagated to the next enclosing OCaml code, skipping over the Ccode. That is, if an OCaml function f calls a C function g thatcalls back an OCaml function h that raises a stray exception, then theexecution of g is interrupted and the exception is propagated backinto f.

If the C code wishes to catch exceptions escaping the OCaml function,it can use the functions caml_callback_exn, caml_callback2_exn,caml_callback3_exn, caml_callbackN_exn. These functions take the samearguments as their non-_exn counterparts, but catch escapingexceptions and return them to the C code. The return value v of thecaml_callback*_exn functions must be tested with the macroIs_exception_result(v). If the macro returns “false”, noexception occured, and v is the value returned by the OCamlfunction. If Is_exception_result(v) returns “true”,an exception escaped, and its value (the exception descriptor) can berecovered using Extract_exception(v).

Warning:

If the OCaml function returned with an exception,Extract_exception should be applied to the exception result priorto calling a function that may trigger garbage collection.Otherwise, if v is reachable during garbage collection, the runtimecan crash since v does not contain a valid value.

Example:

  1. value call_caml_f_ex(value closure, value arg)
  2. {
  3. CAMLparam2(closure, arg);
  4. CAMLlocal2(res, tmp);
  5. res = caml_callback_exn(closure, arg);
  6. if(Is_exception_result(res)) {
  7. res = Extract_exception(res);
  8. tmp = caml_alloc(3, 0); /* Safe to allocate: res contains valid value. */
  9. ...
  10. }
  11. CAMLreturn (res);
  12. }

19.7.2 Obtaining or registering OCaml closures for use in C functions

There are two ways to obtain OCaml function values (closures) tobe passed to the callback functions described above. One way is topass the OCaml function as an argument to a primitive function. Forexample, if the OCaml code contains the declaration

  1. external apply : ('a -> 'b) -> 'a -> 'b = "caml_apply"

the corresponding C stub can be written as follows:

  1. CAMLprim value caml_apply(value vf, value vx)
  2. {
  3. CAMLparam2(vf, vx);
  4. CAMLlocal1(vy);
  5. vy = caml_callback(vf, vx);
  6. CAMLreturn(vy);
  7. }

Another possibility is to use the registration mechanism provided byOCaml. This registration mechanism enables OCaml code to registerOCaml functions under some global name, and C code to retrieve thecorresponding closure by this global name.

On the OCaml side, registration is performed by evaluatingCallback.registern v. Here, n is the global name(an arbitrary string) and v the OCaml value. For instance:

  1. let f x = print_string "f is applied to "; print_int x; print_newline()
  2. let _ = Callback.register "test function" f

On the C side, a pointer to the value registered under name n isobtained by calling caml_named_value(n). The returnedpointer must then be dereferenced to recover the actual OCaml value.If no value is registered under the name n, the null pointer isreturned. For example, here is a C wrapper that calls the OCaml function fabove:

  1. void call_caml_f(int arg)
  2. {
  3. caml_callback(*caml_named_value("test function"), Val_int(arg));
  4. }

The pointer returned by caml_named_value is constant and can safelybe cached in a C variable to avoid repeated name lookups. On the otherhand, the value pointed to can change during garbage collection andmust always be recomputed at the point of use. Here is a moreefficient variant of call_caml_f above that calls caml_named_valueonly once:

  1. void call_caml_f(int arg)
  2. {
  3. static value * closure_f = NULL;
  4. if (closure_f == NULL) {
  5. /* First time around, look up by name */
  6. closure_f = caml_named_value("test function");
  7. }
  8. caml_callback(*closure_f, Val_int(arg));
  9. }

19.7.3 Registering OCaml exceptions for use in C functions

The registration mechanism described above can also be used tocommunicate exception identifiers from OCaml to C. The OCaml coderegisters the exception by evaluatingCallback.register_exceptionn exn, where n is anarbitrary name and exn is an exception value of theexception to register. For example:

  1. exception Error of string
  2. let _ = Callback.register_exception "test exception" (Error "any string")

The C code can then recover the exception identifier usingcaml_named_value and pass it as first argument to the functionsraise_constant, raise_with_arg, and raise_with_string (describedin section 19.4.5) to actually raise the exception. Forexample, here is a C function that raises the Error exception withthe given argument:

  1. void raise_error(char * msg)
  2. {
  3. caml_raise_with_string(*caml_named_value("test exception"), msg);
  4. }

19.7.4 Main program in C

In normal operation, a mixed OCaml/C program starts by executing theOCaml initialization code, which then may proceed to call Cfunctions. We say that the main program is the OCaml code. In someapplications, it is desirable that the C code plays the role of themain program, calling OCaml functions when needed. This can be achieved asfollows:

  • The C part of the program must provide a main function,which will override the default main function provided by the OCamlruntime system. Execution will start in the user-defined main functionjust like for a regular C program.
  • At some point, the C code must call caml_main(argv) toinitialize the OCaml code. The argv argument is a C array of strings(type char **), terminated with a NULL pointer,which represents the command-line arguments, aspassed as second argument to main. The OCaml array Sys.argv willbe initialized from this parameter. For the bytecode compiler,argv[0] and argv[1] are also consulted to find the file containingthe bytecode.
  • The call to caml_main initializes the OCaml runtime system,loads the bytecode (in the case of the bytecode compiler), andexecutes the initialization code of the OCaml program. Typically, thisinitialization code registers callback functions using Callback.register.Once the OCaml initialization code is complete, control returns to theC code that called caml_main.
  • The C code can then invoke OCaml functions using the callbackmechanism (see section 19.7.1).

19.7.5 Embedding the OCaml code in the C code

The bytecode compiler in custom runtime mode (ocamlc -custom)normally appends the bytecode to the executable file containing thecustom runtime. This has two consequences. First, the final linkingstep must be performed by ocamlc. Second, the OCaml runtime librarymust be able to find the name of the executable file from thecommand-line arguments. When using caml_main(argv) as insection 19.7.4, this means that argv[0] or argv[1] mustcontain the executable file name.

An alternative is to embed the bytecode in the C code. The-output-obj option to ocamlc is provided for this purpose. Itcauses the ocamlc compiler to output a C object file (.o file,.obj under Windows) containing the bytecode for the OCaml part of theprogram, as well as a caml_startup function. The C object fileproduced by ocamlc -output-obj can then be linked with C code usingthe standard C compiler, or stored in a C library.

The caml_startup function must be called from the main C program inorder to initialize the OCaml runtime and execute the OCamlinitialization code. Just like caml_main, it takes one argvparameter containing the command-line parameters. Unlike caml_main,this argv parameter is used only to initialize Sys.argv, but notfor finding the name of the executable file.

The caml_startup function calls the uncaught exception handler (orenters the debugger, if running under ocamldebug) if an exception escapesfrom a top-level module initialiser. Such exceptions may be caught in theC code by instead using the caml_startup_exn function and testing the resultusing Is_exception_result (followed by Extract_exception ifappropriate).

The -output-obj option can also be used to obtain the C source file.More interestingly, the same option can also produce directly a sharedlibrary (.so file, .dll under Windows) that contains the OCamlcode, the OCaml runtime system and any other static C code given toocamlc (.o, .a, respectively, .obj, .lib). This use of-output-obj is very similar to a normal linking step, but instead ofproducing a main program that automatically runs the OCaml code, itproduces a shared library that can run the OCaml code on demand. Thethree possible behaviors of -output-obj are selected accordingto the extension of the resulting file (given with -o).

The native-code compiler ocamlopt also supports the -output-objoption, causing it to output a C object file or a shared librarycontaining the native code for all OCaml modules on the command-line,as well as the OCaml startup code. Initialization is performed bycalling caml_startup (or caml_startup_exn) as in the case of thebytecode compiler.

For the final linking phase, in addition to the object file producedby -output-obj, you will have to provide the OCaml runtimelibrary (libcamlrun.a for bytecode, libasmrun.a for native-code),as well as all C libraries that are required by the OCaml librariesused. For instance, assume the OCaml part of your program uses theUnix library. With ocamlc, you should do:

  1. ocamlc -output-obj -o camlcode.o unix.cma other .cmo and .cma files
  2. cc -o myprog C objects and libraries \
  3. camlcode.o -Locamlc -where -lunix -lcamlrun

With ocamlopt, you should do:

  1. ocamlopt -output-obj -o camlcode.o unix.cmxa other .cmx and .cmxa files
  2. cc -o myprog C objects and libraries \
  3. camlcode.o -Locamlc -where -lunix -lasmrun
Warning:

On some ports, special options are required on the finallinking phase that links together the object file produced by the-output-obj option and the remainder of the program. Those optionsare shown in the configuration file Makefile.config generated duringcompilation of OCaml, as the variable OC_LDFLAGS.

  • Windows with the MSVC compiler: the object file produced byOCaml have been compiled with the /MD flag, and thereforeall other object files linked with it should also be compiled with/MD.
  • other systems: you may have to add one or more of -lcurses,-lm, -ldl, depending on your OS and C compiler.
Stack backtraces.

When OCaml bytecode produced byocamlc -g is embedded in a C program, no debugging information isincluded, and therefore it is impossible to print stack backtraces onuncaught exceptions. This is not the case when native code producedby ocamlopt -g is embedded in a C program: stack backtraceinformation is available, but the backtrace mechanism needs to beturned on programmatically. This can be achieved from the OCaml sideby calling Printexc.record_backtrace true in the initialization ofone of the OCaml modules. This can also be achieved from the C sideby calling caml_record_backtrace(Val_int(1)); in the OCaml-C glue code.

Unloading the runtime.

In case the shared library produced with -output-obj is to be loaded andunloaded repeatedly by a single process, care must be taken to unload theOCaml runtime explicitly, in order to avoid various system resource leaks.

Since 4.05, caml_shutdown function can be used to shut the runtime downgracefully, which equals the following:

  • Running the functions that were registered with Pervasives.at_exit.
  • Triggering finalization of allocated custom blocks (seesection 19.9). For example, Pervasives.in_channel andPervasives.out_channel are represented by custom blocks that enclose filedescriptors, which are to be released.
  • Unloading the dependent shared libraries that were loaded by the runtime,including dynlink plugins.
  • Freeing the memory blocks that were allocated by the runtime withmalloc. Inside C primitives, it is advised to use camlstat functionsfrom memory.h for managing static (that is, non-moving) blocks of heapmemory, as all the blocks allocated with these functions are automaticallyfreed by camlshutdown. For ensuring compatibility with legacy C stubs thathave used caml_stat incorrectly, this behaviour is only enabled if theruntime is started with a specialized caml_startup_pooled function.As a shared library may have several clients simultaneously, it is made forconvenience that caml_startup (and caml_startup_pooled) may be calledmultiple times, given that each such call is paired with a corresponding callto caml_shutdown (in a nested fashion). The runtime will be unloaded oncethere are no outstanding calls to caml_startup.

Once a runtime is unloaded, it cannot be started up again without reloading theshared library and reinitializing its static data. Therefore, at the moment, thefacility is only useful for building reloadable shared libraries.

19.8 Advanced example with callbacks

This section illustrates the callback facilities described insection 19.7. We are going to package some OCaml functionsin such a way that they can be linked with C code and called from Cjust like any C functions. The OCaml functions are defined in thefollowing mod.ml OCaml source:

  1. (* File mod.ml -- some "useful" OCaml functions *)
  2.  
  3. let rec fib n = if n < 2 then 1 else fib(n-1) + fib(n-2)
  4.  
  5. let format_result n = Printf.sprintf "Result is: %d\n" n
  6.  
  7. (* Export those two functions to C *)
  8.  
  9. let _ = Callback.register "fib" fib
  10. let _ = Callback.register "format_result" format_result

Here is the C stub code for calling these functions from C:

  1. /* File modwrap.c -- wrappers around the OCaml functions */
  2.  
  3. #include <stdio.h>
  4. #include <string.h>
  5. #include <caml/mlvalues.h>
  6. #include <caml/callback.h>
  7.  
  8. int fib(int n)
  9. {
  10. static value * fib_closure = NULL;
  11. if (fib_closure == NULL) fib_closure = caml_named_value("fib");
  12. return Int_val(caml_callback(*fib_closure, Val_int(n)));
  13. }
  14.  
  15. char * format_result(int n)
  16. {
  17. static value * format_result_closure = NULL;
  18. if (format_result_closure == NULL)
  19. format_result_closure = caml_named_value("format_result");
  20. return strdup(String_val(caml_callback(*format_result_closure, Val_int(n))));
  21. /* We copy the C string returned by String_val to the C heap
  22. so that it remains valid after garbage collection. */
  23. }

We now compile the OCaml code to a C object file and put it in a Clibrary along with the stub code in modwrap.c and the OCaml runtime system:

  1. ocamlc -custom -output-obj -o modcaml.o mod.ml
  2. ocamlc -c modwrap.c
  3. cp `ocamlc -where`/libcamlrun.a mod.a && chmod +w mod.a
  4. ar r mod.a modcaml.o modwrap.o

(One can also use ocamlopt -output-obj instead of ocamlc -custom -output-obj. In this case, replace libcamlrun.a (the bytecoderuntime library) by libasmrun.a (the native-code runtime library).)

Now, we can use the two functions fib and format_result in any Cprogram, just like regular C functions. Just remember to callcaml_startup (or caml_startup_exn) once before.

  1. /* File main.c -- a sample client for the OCaml functions */
  2.  
  3. #include <stdio.h>
  4. #include <caml/callback.h>
  5.  
  6. extern int fib(int n);
  7. extern char * format_result(int n);
  8.  
  9. int main(int argc, char ** argv)
  10. {
  11. int result;
  12.  
  13. /* Initialize OCaml code */
  14. caml_startup(argv);
  15. /* Do some computation */
  16. result = fib(10);
  17. printf("fib(10) = %s\n", format_result(result));
  18. return 0;
  19. }

To build the whole program, just invoke the C compiler as follows:

  1. cc -o prog -I `ocamlc -where` main.c mod.a -lcurses

(On some machines, you may need to put -ltermcap or-lcurses -ltermcap instead of -lcurses.)

19.9 Advanced topic: custom blocks

Blocks with tag Custom_tag contain both arbitrary user data and apointer to a C struct, with type struct custom_operations, thatassociates user-provided finalization, comparison, hashing,serialization and deserialization functions to this block.

19.9.1 The struct custom_operations

The struct custom_operations is defined in <caml/custom.h> andcontains the following fields:

  • char *identifierA zero-terminated character string serving as an identifier forserialization and deserialization operations.
  • void (*finalize)(value v)The finalize field contains a pointer to a C function that is calledwhen the block becomes unreachable and is about to be reclaimed.The block is passed as first argument to the function.The finalize field can also be custom_finalize_default to indicate that nofinalization function is associated with the block.
  • int (*compare)(value v1, value v2)The compare field contains a pointer to a C function that iscalled whenever two custom blocks are compared using OCaml’s genericcomparison operators (=, <>, <=, >=, <, > andcompare). The C function should return 0 if the data contained inthe two blocks are structurally equal, a negative integer if the datafrom the first block is less than the data from the second block, anda positive integer if the data from the first block is greater thanthe data from the second block.The compare field can be set to custom_compare_default; thisdefault comparison function simply raises Failure.

  • int (*compare_ext)(value v1, value v2)(Since 3.12.1)The compare_ext field contains a pointer to a C function that iscalled whenever one custom block and one unboxed integer are compared using OCaml’s genericcomparison operators (=, <>, <=, >=, <, > andcompare). As in the case of the compare field, the C functionshould return 0 if the two arguments are structurally equal, anegative integer if the first argument compares less than the secondargument, and a positive integer if the first argument comparesgreater than the second argument.The compare_ext field can be set to custom_compare_ext_default; thisdefault comparison function simply raises Failure.

  • intnat (*hash)(value v)The hash field contains a pointer to a C function that is calledwhenever OCaml’s generic hash operator (see module Hashtbl) isapplied to a custom block. The C function can return an arbitraryinteger representing the hash value of the data contained in thegiven custom block. The hash value must be compatible with thecompare function, in the sense that two structurally equal data(that is, two custom blocks for which compare returns 0) must havethe same hash value.The hash field can be set to custom_hash_default, in which casethe custom block is ignored during hash computation.

  • void (serialize)(value v, uintnat bsize32, uintnat * bsize_64)The serialize field contains a pointer to a C function that iscalled whenever the custom block needs to be serialized (marshaled)using the OCaml functions output_value or Marshal.to….For a custom block, those functions first write the identifier of theblock (as given by the identifier field) to the output stream,then call the user-provided serialize function. That function isresponsible for writing the data contained in the custom block, usingthe serialize_… functions defined in and listedbelow. The user-provided serialize function must then store in itsbsize_32 and bsize_64 parameters the sizes in bytes of the datapart of the custom block on a 32-bit architecture and on a 64-bitarchitecture, respectively.The serialize field can be set to custom_serialize_default,in which case the Failure exception is raised when attempting toserialize the custom block.

  • uintnat (deserialize)(void dst)The deserialize field contains a pointer to a C function that iscalled whenever a custom block with identifier identifier needs tobe deserialized (un-marshaled) using the OCaml functions inputvalueor Marshal.from…. This user-provided function is responsible forreading back the data written by the serialize operation, using thedeserialize_… functions defined in and listedbelow. It must then rebuild the data part of the custom blockand store it at the pointer given as the dst argument. Finally, itreturns the size in bytes of the data part of the custom block.This size must be identical to the wsize_32 result ofthe serialize operation if the architecture is 32 bits, orwsize_64 if the architecture is 64 bits.The deserialize field can be set to custom_deserialize_defaultto indicate that deserialization is not supported. In this case,do not register the struct custom_operations with the deserializerusing register_custom_operations (see below).

  • const struct custom_fixed_length* fixed_lengthNormally, space in the serialized output is reserved to write thebsize_32 and bsize_64 fields returned by serialize. However, forvery short custom blocks, this space can be larger than the dataitself! As a space optimisation, if serialize always returns thesame values for bsize_32 and bsize_64, then these values may bespecified in the fixed_length structure, and do not consume space inthe serialized output.Note: the finalize, compare, hash, serialize and deserializefunctions attached to custom block descriptors must never trigger agarbage collection. Within these functions, do not call any of theOCaml allocation functions, and do not perform a callback into OCamlcode. Do not use CAMLparam to register the parameters to thesefunctions, and do not use CAMLreturn to return the result.

19.9.2 Allocating custom blocks

Custom blocks must be allocated via caml_alloc_custom orcaml_alloc_custom_mem:

caml_alloc_custom(ops, size, used, max)

returns a fresh custom block, with room for size bytes of userdata, and whose associated operations are given by ops (apointer to a struct custom_operations, usually statically allocatedas a C global variable).

The two parameters used and max are used to control thespeed of garbage collection when the finalized object containspointers to out-of-heap resources. Generally speaking, theOCaml incremental major collector adjusts its speed relative to theallocation rate of the program. The faster the program allocates, theharder the GC works in order to reclaim quickly unreachable blocksand avoid having large amount of “floating garbage” (unreferencedobjects that the GC has not yet collected).

Normally, the allocation rate is measured by counting the in-heap sizeof allocated blocks. However, it often happens that finalizedobjects contain pointers to out-of-heap memory blocks and other resources(such as file descriptors, X Windows bitmaps, etc.). For thoseblocks, the in-heap size of blocks is not a good measure of thequantity of resources allocated by the program.

The two arguments used and max give the GC an idea of howmuch out-of-heap resources are consumed by the finalized blockbeing allocated: you give the amount of resources allocated to thisobject as parameter used, and the maximum amount that you wantto see in floating garbage as parameter max. The units arearbitrary: the GC cares only about the ratio used / max.

For instance, if you are allocating a finalized block holding an XWindows bitmap of w by h pixels, and you’d rather nothave more than 1 mega-pixels of unreclaimed bitmaps, specifyused = w * h and max = 1000000.

Another way to describe the effect of the used and maxparameters is in terms of full GC cycles. If you allocate many customblocks with used / max = 1 / N, the GC will then do onefull cycle (examining every object in the heap and callingfinalization functions on those that are unreachable) every Nallocations. For instance, if used = 1 and max = 1000,the GC will do one full cycle at least every 1000 allocations ofcustom blocks.

If your finalized blocks contain no pointers to out-of-heap resources,or if the previous discussion made little sense to you, just takeused = 0 and max = 1. But if you later find that thefinalization functions are not called “often enough”, considerincreasing the used / max ratio.

caml_alloc_custom_mem(ops, size, used)

Use this function when your custom block holds only out-of-heap memory(memory allocated with malloc or caml_stat_alloc) and no otherresources. used should be the number of bytes of out-of-heapmemory that are held by your custom block. This function works likecaml_alloc_custom except that the max parameter is under thecontrol of the user (via the custom_major_ratio,custom_minor_ratio, and custom_minor_max_size parameters) andproportional to the heap sizes.

19.9.3 Accessing custom blocks

The data part of a custom block v can beaccessed via the pointer Data_custom_val(v). This pointerhas type void * and should be cast to the actual type of the datastored in the custom block.

The contents of custom blocks are not scanned by the garbagecollector, and must therefore not contain any pointer inside the OCamlheap. In other terms, never store an OCaml value in a custom block,and do not use Field, Store_field nor caml_modify to access the datapart of a custom block. Conversely, any C data structure (notcontaining heap pointers) can be stored in a custom block.

19.9.4 Writing custom serialization and deserialization functions

The following functions, defined in <caml/intext.h>, are provided towrite and read back the contents of custom blocks in a portable way.Those functions handle endianness conversions when e.g. data iswritten on a little-endian machine and read back on a big-endian machine.

FunctionAction
camlserialize_int_1Write a 1-byte integer
caml_serialize_int_2Write a 2-byte integer
caml_serialize_int_4Write a 4-byte integer
caml_serialize_int_8Write a 8-byte integer
caml_serialize_float_4Write a 4-byte float
caml_serialize_float_8Write a 8-byte float
caml_serialize_block_1Write an array of 1-byte quantities
caml_serialize_block_2Write an array of 2-byte quantities
caml_serialize_block_4Write an array of 4-byte quantities
caml_serialize_block_8Write an array of 8-byte quantities
caml_deserialize_uint_1Read an unsigned 1-byte integer
caml_deserialize_sint_1Read a signed 1-byte integer
caml_deserialize_uint_2Read an unsigned 2-byte integer
caml_deserialize_sint_2Read a signed 2-byte integer
caml_deserialize_uint_4Read an unsigned 4-byte integer
caml_deserialize_sint_4Read a signed 4-byte integer
caml_deserialize_uint_8Read an unsigned 8-byte integer
caml_deserialize_sint_8Read a signed 8-byte integer
caml_deserialize_float_4Read a 4-byte float
caml_deserialize_float_8Read an 8-byte float
caml_deserialize_block_1Read an array of 1-byte quantities
caml_deserialize_block_2Read an array of 2-byte quantities
caml_deserialize_block_4Read an array of 4-byte quantities
caml_deserialize_block_8Read an array of 8-byte quantities
caml_deserialize_errorSignal an error during deserialization;input_value or Marshal.from… raise a Failure exception aftercleaning up their internal data structures

Serialization functions are attached to the custom blocks to whichthey apply. Obviously, deserialization functions cannot be attachedthis way, since the custom block does not exist yet whendeserialization begins! Thus, the struct custom_operations thatcontain deserialization functions must be registered with thedeserializer in advance, using the register_custom_operationsfunction declared in <caml/custom.h>. Deserialization proceeds byreading the identifier off the input stream, allocating a custom blockof the size specified in the input stream, searching the registeredstruct custom_operation blocks for one with the same identifier, andcalling its deserialize function to fill the data part of the custom block.

19.9.5 Choosing identifiers

Identifiers in struct custom_operations must be chosen carefully,since they must identify uniquely the data structure for serializationand deserialization operations. In particular, consider including aversion number in the identifier; this way, the format of the data canbe changed later, yet backward-compatible deserialisation functionscan be provided.

Identifiers starting with _ (an underscore character) are reservedfor the OCaml runtime system; do not use them for your customdata. We recommend to use a URL(http://mymachine.mydomain.com/mylibrary/version-number)or a Java-style package name(com.mydomain.mymachine.mylibrary.version-number)as identifiers, to minimize the risk of identifier collision.

19.9.6 Finalized blocks

Custom blocks generalize the finalized blocks that were present inOCaml prior to version 3.00. For backward compatibility, theformat of custom blocks is compatible with that of finalized blocks,and the alloc_final function is still available to allocate a customblock with a given finalization function, but default comparison,hashing and serialization functions. caml_alloc_final(n, f, used, max) returns a fresh custom block ofsize n+1 words, with finalization function f. The firstword is reserved for storing the custom operations; the othern words are available for your data. The two parametersused and max are used to control the speed of garbagecollection, as described for caml_alloc_custom.

19.10 Advanced topic: Bigarrays and the OCaml-C interface

This section explains how C stub code that interfaces C or Fortrancode with OCaml code can use Bigarrays.

19.10.1 Include file

The include file <caml/bigarray.h> must be included in the C stubfile. It declares the functions, constants and macros discussedbelow.

19.10.2 Accessing an OCaml bigarray from C or Fortran

If v is a OCaml value representing a Bigarray, the expressionCaml_ba_data_val(v) returns a pointer to the data part of the array.This pointer is of type void * and can be cast to the appropriate Ctype for the array (e.g. double [], char [][10], etc).

Various characteristics of the OCaml Bigarray can be consulted from Cas follows:

C expressionReturns
Caml_ba_array_val(v)->num_dimsnumber of dimensions
Caml_ba_array_val(v)->dim[i]i-th dimension
Caml_ba_array_val(v)->flags & BIGARRAY_KIND_MASKkind of array elements

The kind of array elements is one of the following constants:

ConstantElement kind
CAML_BA_FLOAT3232-bit single-precision floats
CAML_BA_FLOAT6464-bit double-precision floats
CAML_BA_SINT88-bit signed integers
CAML_BA_UINT88-bit unsigned integers
CAML_BA_SINT1616-bit signed integers
CAML_BA_UINT1616-bit unsigned integers
CAML_BA_INT3232-bit signed integers
CAML_BA_INT6464-bit signed integers
CAML_BA_CAML_INT31- or 63-bit signed integers
CAML_BA_NATIVE_INT32- or 64-bit (platform-native) integers

The following example shows the passing of a two-dimensional Bigarrayto a C function and a Fortran function.

  1. extern void my_c_function(double * data, int dimx, int dimy);
  2. extern void my_fortran_function_(double * data, int * dimx, int * dimy);
  3.  
  4. value caml_stub(value bigarray)
  5. {
  6. int dimx = Caml_ba_array_val(bigarray)->dim[0];
  7. int dimy = Caml_ba_array_val(bigarray)->dim[1];
  8. /* C passes scalar parameters by value */
  9. my_c_function(Caml_ba_data_val(bigarray), dimx, dimy);
  10. /* Fortran passes all parameters by reference */
  11. my_fortran_function_(Caml_ba_data_val(bigarray), &dimx, &dimy);
  12. return Val_unit;
  13. }

19.10.3 Wrapping a C or Fortran array as an OCaml Bigarray

A pointer p to an already-allocated C or Fortran array can bewrapped and returned to OCaml as a Bigarray using the caml_ba_allocor caml_ba_alloc_dims functions.

  • caml_ba_alloc(kind|layout, numdims, p, dims)Return an OCaml Bigarray wrapping the data pointed to by p.kind is the kind of array elements (one of the CAML_BA_kind constants above). layout is CAML_BA_C_LAYOUT for anarray with C layout and CAML_BA_FORTRAN_LAYOUT for an array withFortran layout. numdims is the number of dimensions in thearray. dims is an array of numdims long integers, givingthe sizes of the array in each dimension.

  • caml_ba_alloc_dims(kind|layout, numdims,p, (long) dim1, (long) dim2, …, (long) dimnumdims)Same as caml_ba_alloc, but the sizes of the array in each dimensionare listed as extra arguments in the function call, rather than beingpassed as an array.

The following example illustrates how statically-allocated C andFortran arrays can be made available to OCaml.

  1. extern long my_c_array[100][200];
  2. extern float my_fortran_array_[300][400];
  3.  
  4. value caml_get_c_array(value unit)
  5. {
  6. long dims[2];
  7. dims[0] = 100; dims[1] = 200;
  8. return caml_ba_alloc(CAML_BA_NATIVE_INT | CAML_BA_C_LAYOUT,
  9. 2, my_c_array, dims);
  10. }
  11.  
  12. value caml_get_fortran_array(value unit)
  13. {
  14. return caml_ba_alloc_dims(CAML_BA_FLOAT32 | CAML_BA_FORTRAN_LAYOUT,
  15. 2, my_fortran_array_, 300L, 400L);
  16. }

19.11 Advanced topic: cheaper C call

This section describe how to make calling C functions cheaper.

Note: this only applies to the native compiler. So whenever youuse any of these methods, you have to provide an alternative byte-codestub that ignores all the special annotations.

19.11.1 Passing unboxed values

We said earlier that all OCaml objects are represented by the C typevalue, and one has to use macros such as Int_val to decode data fromthe value type. It is however possible to tell the OCaml native-codecompiler to do this for us and pass arguments unboxed to the C function.Similarly it is possible to tell OCaml to expect the result unboxed and boxit for us.

The motivation is that, by letting ‘ocamlopt‘ deal with boxing, it canoften decide to suppress it entirely.

For instance let’s consider this example:

  1. external foo : float -> float -> float = "foo"
  2.  
  3. let f a b =
  4. let len = Array.length a in
  5. assert (Array.length b = len);
  6. let res = Array.make len 0. in
  7. for i = 0 to len - 1 do
  8. res.(i) <- foo a.(i) b.(i)
  9. done

Float arrays are unboxed in OCaml, however the C function foo expectits arguments as boxed floats and returns a boxed float. Hence theOCaml compiler has no choice but to box a.(i) and b.(i) and unboxthe result of foo. This results in the allocation of 3 * lentemporary float values.

Now if we annotate the arguments and result with [@unboxed], thenative-code compiler will be able to avoid all these allocations:

  1. external foo
  2. : (float [@unboxed])
  3. -> (float [@unboxed])
  4. -> (float [@unboxed])
  5. = "foo_byte" "foo"

In this case the C functions must look like:

  1. CAMLprim double foo(double a, double b)
  2. {
  3. ...
  4. }
  5.  
  6. CAMLprim value foo_byte(value a, value b)
  7. {
  8. return caml_copy_double(foo(Double_val(a), Double_val(b)))
  9. }

For convenicence, when all arguments and the result are annotated with[@unboxed], it is possible to put the attribute only once on thedeclaration itself. So we can also write instead:

  1. external foo : float -> float -> float = "foo_byte" "foo" [@@unboxed]

The following table summarize what OCaml types can be unboxed, andwhat C types should be used in correspondence:

OCaml typeC type
floatdouble
int32int32_t
int64int64_t
nativeintintnat

Similarly, it is possible to pass untagged OCaml integers betweenOCaml and C. This is done by annotating the arguments and/or resultwith [@untagged]:

  1. external f : string -> (int [@untagged]) = "f_byte" "f"

The corresponding C type must be intnat.

Note: do not use the C int type in correspondence with (int [@untagged]). This is because they often differ in size.

19.11.2 Direct C call

In order to be able to run the garbage collector in the middle ofa C function, the OCaml native-code compiler generates some bookkeepingcode around C calls. Technically it wraps every C call with the C functioncaml_c_call which is part of the OCaml runtime.

For small functions that are called repeatedly, this indirection can havea big impact on performances. However this is not needed if we know thatthe C function doesn’t allocate and doesn’t raise exceptions. We caninstruct the OCaml native-code compiler of this fact by annotating theexternal declaration with the attribute [@@noalloc]:

  1. external bar : int -> int -> int = "foo" [@@noalloc]

In this case calling bar from OCaml is as cheap as calling any otherOCaml function, except for the fact that the OCaml compiler can’tinline C functions…

19.11.3 Example: calling C library functions without indirection

Using these attributes, it is possible to call C library functionswith no indirection. For instance many math functions are defined thisway in the OCaml standard library:

  1. external sqrt : float -> float = "caml_sqrt_float" "sqrt"
  2. [@@unboxed] [@@noalloc]
  3. (** Square root. *)
  4.  
  5. external exp : float -> float = "caml_exp_float" "exp" [@@unboxed] [@@noalloc]
  6. (** Exponential. *)
  7.  
  8. external log : float -> float = "caml_log_float" "log" [@@unboxed] [@@noalloc]
  9. (** Natural logarithm. *)

19.12 Advanced topic: multithreading

Using multiple threads (shared-memory concurrency) in a mixed OCaml/Capplication requires special precautions, which are described in thissection.

19.12.1 Registering threads created from C

Callbacks from C to OCaml are possible only if the calling thread isknown to the OCaml run-time system. Threads created from OCaml (throughthe Thread.create function of the system threads library) areautomatically known to the run-time system. If the applicationcreates additional threads from C and wishes to callback into OCamlcode from these threads, it must first register them with the run-timesystem. The following functions are declared in the include file<caml/threads.h>.

  • caml_c_thread_register() registers the calling thread with the OCamlrun-time system. Returns 1 on success, 0 on error. Registering analready-register thread does nothing and returns 0.
  • caml_c_thread_unregister() must be called before the threadterminates, to unregister it from the OCaml run-time system.Returns 1 on success, 0 on error. If the calling thread was notpreviously registered, does nothing and returns 0.

19.12.2 Parallel execution of long-running C code

The OCaml run-time system is not reentrant: at any time, at most onethread can be executing OCaml code or C code that uses the OCamlrun-time system. Technically, this is enforced by a “master lock”that any thread must hold while executing such code.

When OCaml calls the C code implementing a primitive, the master lockis held, therefore the C code has full access to the facilities of therun-time system. However, no other thread can execute OCaml codeconcurrently with the C code of the primitive.

If a C primitive runs for a long time or performs potentially blockinginput-output operations, it can explicitly release the master lock,enabling other OCaml threads to run concurrently with its operations.The C code must re-acquire the master lock before returning to OCaml.This is achieved with the following functions, declared inthe include file <caml/threads.h>.

  • caml_release_runtime_system()The calling thread releases the master lock and other OCaml resources,enabling other threads to run OCaml code in parallel with the executionof the calling thread.
  • caml_acquire_runtime_system()The calling thread re-acquires the master lock and other OCamlresources. It may block until no other thread uses the OCaml run-timesystem.After caml_release_runtime_system() was called and untilcaml_acquire_runtime_system() is called, the C code must not accessany OCaml data, nor call any function of the run-time system, nor callback into OCaml code. Consequently, arguments provided by OCaml to theC primitive must be copied into C data structures before callingcaml_release_runtime_system(), and results to be returned to OCamlmust be encoded as OCaml values after caml_acquire_runtime_system()returns.

Example: the following C primitive invokes gethostbyname to find theIP address of a host name. The gethostbyname function can block fora long time, so we choose to release the OCaml run-time system while itis running.

  1. CAMLprim stub_gethostbyname(value vname)
  2. {
  3. CAMLparam1 (vname);
  4. CAMLlocal1 (vres);
  5. struct hostent * h;
  6. char * name;
  7.  
  8. /* Copy the string argument to a C string, allocated outside the
  9. OCaml heap. */
  10. name = caml_stat_strdup(String_val(vname));
  11. /* Release the OCaml run-time system */
  12. caml_release_runtime_system();
  13. /* Resolve the name */
  14. h = gethostbyname(name);
  15. /* Free the copy of the string, which we might as well do before
  16. acquiring the runtime system to benefit from parallelism. */
  17. caml_stat_free(name);
  18. /* Re-acquire the OCaml run-time system */
  19. caml_acquire_runtime_system();
  20. /* Encode the relevant fields of h as the OCaml value vres */
  21. ... /* Omitted */
  22. /* Return to OCaml */
  23. CAMLreturn (vres);
  24. }

Callbacks from C to OCaml must be performed while holding the masterlock to the OCaml run-time system. This is naturally the case if thecallback is performed by a C primitive that did not release therun-time system. If the C primitive released the run-time systempreviously, or the callback is performed from other C code that wasnot invoked from OCaml (e.g. an event loop in a GUI application), therun-time system must be acquired before the callback and releasedafter:

  1. caml_acquire_runtime_system();
  2. /* Resolve OCaml function vfun to be invoked */
  3. /* Build OCaml argument varg to the callback */
  4. vres = callback(vfun, varg);
  5. /* Copy relevant parts of result vres to C data structures */
  6. caml_release_runtime_system();

Note: the acquire and release functions described above wereintroduced in OCaml 3.12. Older code uses the following historicalnames, declared in <caml/signals.h>:

  • caml_enter_blocking_section as an alias forcaml_release_runtime_system
  • caml_leave_blocking_section as an alias forcaml_acquire_runtime_systemIntuition: a “blocking section” is a piece of C code that does notuse the OCaml run-time system, typically a blocking input/output operation.

19.13 Advanced topic: interfacing with Windows Unicode APIs

This section contains some general guidelines for writing C stubs that useWindows Unicode APIs.

Note: This is an experimental feature of OCaml: the set of APIs below, aswell as their exact semantics are not final and subject to change in futurereleases.

The OCaml system under Windows can be configured at build time in one of twomodes:

  • legacy mode: All path names, environment variables, command linearguments, etc. on the OCaml side are assumed to be encoded using the current8-bit code page of the system.
  • Unicode mode: All path names, environment variables, command linearguments, etc. on the OCaml side are assumed to be encoded using UTF-8.In what follows, we say that a string has the OCaml encoding if it isencoded in UTF-8 when in Unicode mode, in the current code page in legacy mode,or is an arbitrary string under Unix. A string has the _platform encoding_if it is encoded in UTF-16 under Windows or is an arbitrary string under Unix.

From the point of view of the writer of C stubs, the challenges of interactingwith Windows Unicode APIs are twofold:

  • The Windows API uses the UTF-16 encoding to support Unicode. The runtimesystem performs the necessary conversions so that the OCaml programmer onlyneeds to deal with the OCaml encoding. C stubs that call Windows Unicode APIsneed to use specific runtime functions to perform the necessary conversions in acompatible way.
  • When writing stubs that need to be compiled under both Windows and Unix,the stubs need to be written in a way that allow the necessary conversions underWindows but that also work under Unix, where typically nothing particular needsto be done to support Unicode.The native C character type under Windows is WCHAR, two bytes wide, whileunder Unix it is char, one byte wide. A type char_os is defined in<caml/misc.h> that stands for the concrete C character type of eachplatform. Strings in the platform encoding are of type char_os *.

The following functions are exposed to help write compatible C stubs. To usethem, you need to include both <caml/misc.h> and <caml/osdeps.h>.

  • char_os caml_stat_strdup_to_os(const char ) copies the argument whiletranslating from OCaml encoding to the platform encoding. This function istypically used to convert the char * underlying an OCaml string before passingit to an operating system API that takes a Unicode argument. Under Unix, it isequivalent to caml_stat_strdup.Note: For maximum backwards compatibility in Unicode mode, if the argumentis not a valid UTF-8 string, this function will fall back to assuming that it isencoded in the current code page.

  • char caml_stat_strdup_of_os(const char_os ) copies the argument whiletranslating from the platform encoding to the OCaml encoding. It is the inverseof caml_stat_strdup_to_os. This function is typically used to convert a stringobtained from the operating system before passing it on to OCaml code. UnderUnix, it is equivalent to caml_stat_strdup.

  • value caml_copy_string_of_os(char_os *) allocates an OCaml string withcontents equal to the argument string converted to the OCaml encoding. Thisfunction is essentially equivalent to caml_stat_strdup_of_os followed bycaml_copy_string, except that it avoids the allocation of the intermediatestring returned by caml_stat_strdup_of_os. Under Unix, it is equivalent tocaml_copy_string.Note: The strings returned by caml_stat_strdup_to_os andcaml_stat_strdup_of_os are allocated using caml_stat_alloc, so they need tobe deallocated using caml_stat_free when they are no longer needed.
Example

We want to bind the function getenv in a way that worksboth under Unix and Windows. Under Unix this function has the prototype:

  1. char *getenv(const char *);

While the Unicode version under Windows has the prototype:

  1. WCHAR *_wgetenv(const WCHAR *);

In terms of char_os, both functions take an argument of type char_os * andreturn a result of the same type. We begin by choosing the right implementationof the function to bind:

  1. #ifdef _WIN32
  2. #define getenv_os _wgetenv
  3. #else
  4. #define getenv_os getenv
  5. #endif

The rest of the binding is the same for both platforms:

  1. /* The following define is necessary because the API is experimental */
  2. #define CAML_INTERNALS
  3.  
  4. #include <caml/mlvalues.h>
  5. #include <caml/misc.h>
  6. #include <caml/alloc.h>
  7. #include <caml/fail.h>
  8. #include <caml/osdeps.h>
  9. #include <stdlib.h>
  10.  
  11. CAMLprim value stub_getenv(value var_name)
  12. {
  13. CAMLparam1(var_name);
  14. CAMLlocal1(var_value);
  15. char_os *var_name_os, *var_value_os;
  16.  
  17. var_name_os = caml_stat_strdup_to_os(String_val(var_name));
  18. var_value_os = getenv_os(var_name_os);
  19. caml_stat_free(var_name_os);
  20.  
  21. if (var_value_os == NULL)
  22. caml_raise_not_found();
  23.  
  24. var_value = caml_copy_string_of_os(var_value_os);
  25.  
  26. CAMLreturn(var_value);
  27. }

19.14 Building mixed C/OCaml libraries: ocamlmklib

The ocamlmklib command facilitates the construction of librariescontaining both OCaml code and C code, and usable both in staticlinking and dynamic linking modes. This command is available underWindows since Objective Caml 3.11 and under other operating systems sinceObjective Caml 3.03.

The ocamlmklib command takes three kinds of arguments:

  • OCaml source files and object files (.cmo, .cmx, .ml)comprising the OCaml part of the library;
  • C object files (.o, .a, respectively, .obj, .lib)comprising the C part of the library;
  • Support libraries for the C part (-llib).It generates the following outputs:

  • An OCaml bytecode library .cma incorporating the .cmo and.ml OCaml files given as arguments, and automatically referencing theC library generated with the C object files.

  • An OCaml native-code library .cmxa incorporating the .cmx and.ml OCaml files given as arguments, and automatically referencing theC library generated with the C object files.
  • If dynamic linking is supported on the target platform, a.so (respectively, .dll) shared library built from the C object files given as arguments,and automatically referencing the support libraries.
  • A C static library .a(respectively, .lib) built from the C object files.In addition, the following options are recognized:

  • -cclib, -ccopt, -I, -linkall

  • These options are passed as is to ocamlc or ocamlopt.See the documentation of these commands.
  • -rpath, -R, -Wl,-rpath, -Wl,-R
  • These options are passed as is to the C compiler. Refer to thedocumentation of the C compiler.
  • -custom
  • Force the construction of a statically linked libraryonly, even if dynamic linking is supported.
  • -failsafe
  • Fall back to building a statically linked libraryif a problem occurs while building the shared library (e.g. some ofthe support libraries are not available as shared libraries).
  • -Ldir
  • Add dir to the search path for supportlibraries (-llib).
  • -ocamlccmd
  • Use cmd instead of ocamlc to callthe bytecode compiler.
  • -ocamloptcmd
  • Use cmd instead of ocamlopt to callthe native-code compiler.
  • -ooutput
  • Set the name of the generated OCaml library.ocamlmklib will generate output.cma and/or output.cmxa.If not specified, defaults to a.
  • -ocoutputc
  • Set the name of the generated C library.ocamlmklib will generate liboutputc.so (if sharedlibraries are supported) and liboutputc.a.If not specified, defaults to the output name given with -o.On native Windows, the following environment variable is also consulted:

  • OCAML_FLEXLINK

  • Alternative executable to use instead of theconfigured value. Primarily used for bootstrapping.
Example

Consider an OCaml interface to the standard libzC library for reading and writing compressed files. Assume thislibrary resides in /usr/local/zlib. This interface iscomposed of an OCaml part zip.cmo/zip.cmx and a C part zipstubs.ocontaining the stub code around the libz entry points. Thefollowing command builds the OCaml libraries zip.cma and zip.cmxa,as well as the companion C libraries dllzip.so and libzip.a:

  1. ocamlmklib -o zip zip.cmo zip.cmx zipstubs.o -lz -L/usr/local/zlib

If shared libraries are supported, this performs the followingcommands:

  1. ocamlc -a -o zip.cma zip.cmo -dllib -lzip \
  2. -cclib -lzip -cclib -lz -ccopt -L/usr/local/zlib
  3. ocamlopt -a -o zip.cmxa zip.cmx -cclib -lzip \
  4. -cclib -lzip -cclib -lz -ccopt -L/usr/local/zlib
  5. gcc -shared -o dllzip.so zipstubs.o -lz -L/usr/local/zlib
  6. ar rc libzip.a zipstubs.o

Note: This example is on a Unix system. The exact command linesmay be different on other systems.

If shared libraries are not supported, the following commands areperformed instead:

  1. ocamlc -a -custom -o zip.cma zip.cmo -cclib -lzip \
  2. -cclib -lz -ccopt -L/usr/local/zlib
  3. ocamlopt -a -o zip.cmxa zip.cmx -lzip \
  4. -cclib -lz -ccopt -L/usr/local/zlib
  5. ar rc libzip.a zipstubs.o

Instead of building simultaneously the bytecode library, thenative-code library and the C libraries, ocamlmklib can be calledthree times to build each separately. Thus,

  1. ocamlmklib -o zip zip.cmo -lz -L/usr/local/zlib

builds the bytecode library zip.cma, and

  1. ocamlmklib -o zip zip.cmx -lz -L/usr/local/zlib

builds the native-code library zip.cmxa, and

  1. ocamlmklib -o zip zipstubs.o -lz -L/usr/local/zlib

builds the C libraries dllzip.so and libzip.a. Notice that thesupport libraries (-lz) and the corresponding options(-L/usr/local/zlib) must be given on all three invocations of ocamlmklib,because they are needed at different times depending on whether sharedlibraries are supported.