Saturday, December 12, 2009

Basic cgo

Go has provided within its distribution a couple of examples to describe how one uses cgo, the command line utility that one uses to interact with C libraries.  Beyond that there is practically no official documentation, perhaps due to the Go team's desire to encourage the development of native Go libraries.  However, not everyone is particularly keen on reinventing the wheel, making cgo rather important.

In this article I will try to describe how one would go about linking to a C library.  My example will basically be the cgo equivalent of an ncurses "Hello, world" program.  It provides several basics about cgo usage: interacting with C libraries, calling and passing parameters to C functions, and the Makefile.

First off, the very basics:

package gocurses 


/*
#include <ncurses.h>
import "C"
*/

"C" is not actually a real package.  It is merely a signal to cgo that the lines commented immediately prior to the import statement are to be compiled as C code and that their contents are to be accessible to the succeeding Go code.  Here we are including the ncurses header.  The functions, types, and variables exposed in that header are accessed as if they existed in the "C" package.  Thus, to create the ncurses window, we call:

C.initscr ();

Here's the entire program, which I will subsequently explain:

package gocurses

/*
#define NCURSES_ENABLE_STDBOOL_H 0

#include <ncurses.h>
#include <stdlib.h>

int Printw (const char* str) { return printw (str); }
*/
import "C"

import (
    "fmt";
    "unsafe"
)

func Hello (s string) {
    C.initscr ();

    p := C.CString (fmt.Sprintf ("Hello, %s", s));
    C.Printw (p);
    C.free (unsafe.Pointer (p));

    C.refresh ();
    C.getch ();
    C.endwin ()
}

Starting from the beginning, the macro is required due to ncurses handling of the bool type.  Do not worry too much about it for the purpose of this example.

Further down we see a wrapper around the ncurses 'printw' function.  This is necessary because cgo cannot deal with C's '...' parameter.  Fortunately we do not even need it, as it exists in this case merely for string formatting.

Note that I separate the import for "C" from the rest of the imports.  This is necessary, as putting it with the rest will (at the time of writing this article) prevent cgo from detecting where our C code is.

Further along we create our 'Hello' function and call 'initscr' to create a new ncurses window, as described earlier.  Now we deal with passing strings to C.  'C.CString' converts our Go string, 's', into a C-style string.  C strings are the only type whose conversion function is prefixed by 'C'.  All other types retain their own names.  Thus: for C ints, 'C.int'; for C floats, 'C.float'; etc.

We see here why we do not need to worry about the '...' parameter in 'printw'.  'fmt.Sprintf' can perform the formatting we need.  There are, perhaps, other uses of the '...' parameter that will require different workarounds, but for the purpose of this example we need not worry about them.

After calling Printw, we need to free the memory used by our new C string.  It is, after all, simply an array of chars created on the heap and we do not want a memory leak.  This is why we included 'stdlib.h': for the 'free' function.  'free' receives a pointer to void, a type that does not exist in Go.  'unsafe.Pointer' returns a compatible type, though.

The rest of the program is fairly basic.  Here's our main.go:

package main

import "gocurses"

func main () {
    gocurses.Hello ("world")
}

Now, we still need to compile everything.  We'll use a Makefile, as there a fair number of steps to compiling a cgo file.


include $(GOROOT)/src/Make.$(GOARCH)

PKGDIR=$(GOROOT)/pkg/$(GOOS)_$(GOARCH)

TARG=gocurses
CGOFILES=gocurses.go
CGO_LDFLAGS=-lncurses

include $(GOROOT)/src/Make.pkg

CLEANFILES+=main $(PKGDIR)/$(TARG).a

main: install main.go
        $(GC) main.go
        $(LD) -o $@ main.$O

The first line includes a file that simply defines some variables: CC, GC, LD, and O.  These define the compilers, the linker, and the file extension used for your platform.  On a 32-bit x86 platform, the values would be 8c, 8g, 8l, and .8, respectively.  On an amd64 platform, they would be 6c, 6g, 6l, and .6.

Next is simply a shortcut to Go's package directory.  It makes for less typing in Makefiles for larger projects.

The next five lines are cgo-specific:

TARG defines the name of the module we are compiling.  Since we defined it as gocurses in our source, that is the name here.

CGOFILES is a list of all .go files necessary for this module.  Since we everything is in a single file, we need only the one.

CGO_LDFLAGS defines any linker flags we need.  ncurses is not part of C's standard library, so we must tell cgo to explicitly link to it.

Next we include the file that Go provides to automate the cgo compilation process.  Take a look inside, if you wish, but I will not be going very deeply into what it does.  I will, however, explain a bit:

The cgo command line program creates four new files.  In our case they are gocurses.cgo1.go, gocurses.cgo2.go, gocurses.cgo3.c, and gocurses.cgo4.c.  The first is a translation of the original gocurses.go into normal go code.  The second defines all C types and variables that we used.  The third and fourth contain C code that wraps our calls.

When errors occur in the compilation of the Go code in a cgo file, it will often point to lines in *.cgo1.go.  Do not edit the contents of that file.  Use it as a reference to find where the error is in your .go file.

The actual steps the Makefile performs to compile these four files can be seen when one calls 'make main', though I will not go over them.

Back to the Makefile: CLEANFILES is just a list of files that Make.pkg uses to automatically remove all build files.  We add our executable to the list and also the package that it will install.

Finally we compile our main.go.  The 'install' directive there is what tells 'make' to actually start compiling our cgo files.

Please note that the final two lines must be indented using tab characters, not spaces.  When editing Makefiles you must ensure that your editor uses tabs and does not convert them to spaces.

I use vim and have set it to convert tabs to spaces, since I also code in Python.  However, when I need a tab I can create one using by pressing ctrl-v and hitting the tab key.

Well, that's it.  In the command line type 'make main' and the program should compile.  Run the executable 'main' and "Hello, world" should print.  Press any key and the program will exit.  Hopefully this little tutorial helps someone.

Sunday, December 6, 2009

Reflection Package in Go

While Go does not have the level of introspection that allows a language like Python to define new types during runtime, it at least has the ability to examine and manipulate arbitrary types. One can, for example, pass a random object into a function that will identify the object's type and, in the case of a struct, its fields.

Indeed, if one looks into the 'fmt' package at the Printf family of functions, we see that Go's implementation of arbitrary parameters (the ... parameter) requires the use of 'reflect' to extract those parameters. Most interesting, however, is the 'xml' package, which contains a function call 'Unmarshal'. Unmarshal parses an xml file and attempts to implant the data into a given struct, matching xml elements with struct fields.

Unfortunately, reflection has a major issue: a lack of documentation. All we have to work with is the package specification and the aforementioned examples of the 'fmt' and 'xml' packages to learn from. Furthermore, these examples are rather complex. For this reason, I will make an attempt at producing a series of simple examples.

First off, let us have Go identify a parameter of arbitrary type.

Go allows one to take a parameter of any type and store it or pass it into a function via the empty interface: 'interface{}'. Since the empty interface defines no methods, all types implement it -- even basic types like integers or booleans. Thus we have the skeleton of our 'Identify' function:

func Identify (val interface{}) {

}

The first thing that 'reflect' must do for us is turn 'val' into something that the package can work with. The function that does this is NewValue, which takes as a parameter any value and returns a reflection value. Note that the type 'Value' is an interface. This will be more important later.

func Identify (val interface{}) {
    v := reflect.NewValue (val)
}


From here, the most basic way to identify the type is to simply ask for it via reflect.Value's 'Type' method. This returns a type object, which in turn implements a method call 'Name' that returns a string.

func Identify (val interface{}) {
    v := reflect.NewValue (val);

    fmt.Println (v.Type ().Name ())
}


Thus, when we call 'Identify (5)' the output is 'int'. Call 'Identify (Stuff{3})' and the output is 'Stuff'.

Not a very useful function except, perhaps, for debugging. Let's try something more complex, like diving into a struct instance to extract all field names and values and print them. Basically, it will be a generic Print function.

type Stuff struct {
    Num int;
    Str string;
    Boo bool
}

func PrintAny (val interface{}) {
}


We start the same way as with 'Identify', but creating a reflection value. However, to keep things generic and to avoid massive indentation, we'll use PrintAny simply as a feeder for the function that will do the actual work.

func printAny (val reflect.Value) {

}

func PrintAny (val interface{}) {
    v := reflect.NewValue (val);
    printAny (v)
}


To start we'll just have printAny deal with basic types. For the sake of simplicity, I'll just deal with the three basic types we'll need for our struct: int, string, and bool. First we'll use an if-statement to determine the type. Since the second value returned by a type assertion tells us whether the cast was successful, we can use that to determine whether we've found the correct type. The cast variable is of a type that implements a function called 'Get', which returns the value of the variable. The basic interface Value does not implement Get because it does not know what type to return.

func printAny (val reflect.Value) {
    if v, ok := val.(*reflect.IntValue); ok {
        fmt.Printf ("%d\n", v.Get ())
    } else if v, ok := val.(*reflect.StringValue); ok {
        fmt.Println (v.Get ())
    } else if v, ok := val.(*reflect.BoolValue); ok {
        fmt.Printf ("%t\n", v.Get ())
    } else fmt.Println ("Unknown Type")
}

This is the cumbersome way of doing this. I show it simply to demonstrate type assertions with reflect. Now let's simplify it using the ability of the switch-statement to deal with types. This way, we do not have to worry about 'ok'. The switch-statement automatically looks over the cases to decide which is valid, with 'default' handling the final 'else' case.


func printAny (val reflect.Value) {
    switch v := val.(type) {
        case *reflect.IntValue:
            fmt.Printf ("%d\n", v.Get ())
        case *reflect.StringValue:
            fmt.Println (v.Get ())
        case *reflect.BoolValue:
            fmt.Printf ("%t\n", v.Get ())
        default:
            fmt.Println ("Unknown Type")
    }

}

Thus:

func main () {
    var num int = 4;
    PrintAny (num)
}

. . . will output '4'. However, what if we call 'PrintAny (&num)'? It'll run into the default case. Let's get around that.

The reflect package handles pointers via the 'PtrValue' type. The value held by the memory pointed to is accessed by PtrValue's 'Elem' method, which returns a generic reflect 'Value'. To get the actual value, we must run it through the switch-statement again, but instead of nesting one, we'll simply recurse by calling printAny again with the result of 'Elem' as the parameter.

func printAny (val reflect.Value) {
    switch v := val.(type) {
        case *reflect.PtrValue:
            if v.IsNil () {
                fmt.Println ("nil")
            }
            printAny (v.Elem ())
        case *reflect.IntValue:
            fmt.Printf ("%d\n", v.Get ())
        case *reflect.StringValue:
            fmt.Println (v.Get ())
        case *reflect.BoolValue:
            fmt.Printf ("%t\n", v.Get ())
        default:
            fmt.Println ("Unknown Type")
    }
}


This is why I separated the bulk of the code from the original PrintAny function.

Note that I check to see if the pointer is nil. If I don't, when the function recurses it'll land in the default case.

Next is dealing with structs. All structs, regardless of the actual type, are defined by 'reflect.StructValue'. It implements all methods necessary to iterate through its fields and extract their data. The fields are accessed individually by calling the 'Field' method with an index as the parameter. 'Field' returns a basic 'Value', so we can recurse again.

func printAny (val reflect.Value) {
    switch v := val.(type) {
        case *reflect.PtrValue:
            if v.IsNil () {
                fmt.Println ("nil")
            }
            printAny (v.Elem ())
        case *reflect.StructValue:
            for i := 0; i < v.NumField (); i++ {
                printAny (v.Field (i))
            }
        case *reflect.IntValue:
            fmt.Printf ("%d\n", v.Get ())
        case *reflect.StringValue:
            fmt.Println (v.Get ())
        case *reflect.BoolValue:
            fmt.Printf ("%t\n", v.Get ())
        default:
            fmt.Println ("Unknown Type")
    }
}


Wait, though: in the introduction to this example I said that I wanted the field names, too. Fortunately 'reflect' allows us to find those via the 'StructType' type. 'StructType' and other '*Type' types in 'reflect' provide information on the type itself. In the case of structs, this includes information on the fields. This is accessed by casting the value returned by the 'Type' method.

The methods for accessing the fields of a 'StructType' do not return a 'Value', but a 'StructField', a struct that contains data about the field, including what we are looking for: the field's name. As such, we iterate through it much as we would the 'StructValue''s fields.

func printAny (val reflect.Value) {
    switch v := val.(type) {
        case *reflect.PtrValue:
            if v.IsNil () {
                fmt.Println ("nil")
            }
            printAny (v.Elem ())
        case *reflect.StructValue:
            typ := v.Type ().(*reflect.StructType);

            for i := 0; i < typ.NumField (); i++ {
                name := typ.Field (i).Name;
                field := v.FieldByName (name);

                fmt.Printf ("%s: ", name);
                printAny (field)
            }
        case *reflect.IntValue:
            fmt.Printf ("%d\n", v.Get ())
        case *reflect.StringValue:
            fmt.Println (v.Get ())
        case *reflect.BoolValue:
            fmt.Printf ("%t\n", v.Get ())
        default:
            fmt.Println ("Unknown Type")
    }

}

And that's it! Try it out with our 'Stuff' struct:


func main () {
    a := Stuff{4, "Hi!", true};
    PrintAny (&a)
}


The obvious application for a similar function would be serialization, though I would not use it for complex structures. It would indeed be best only for small structures tailor made to save only that which needs to be saved. Everything else can be derived when reading from the saved data.

Another application is the ... parameter, as I mentioned at the beginning of this article. The ... parameter is treated as a struct containing the values passed into it. For example:

fmt.Printf (format, 1, 2.3, true, "me");

The types that Printf deals with internally are actually a string and a struct with an int field, a float field, a bool field, and a string field. Of course, it also handles more types that my little example does, but it is not difficult to implement them.

Followers