Mapping between C and Python Data

The C and Python languages are quite different. The purpose of this document is to explain how different C types are represented in the PyKdump framework and how to emulate C operators such as dereference.

Integers and Pointers

There are many integer types in C and only one integer type in Python. See Numeric Types — int, float, complex for details about Python.

While emulating C-code in Python, we should take into account that Python integers never overflow. As a result, after doing operations on integers, we might need to mask bits and/or do other conversions manually, taking into account the size of integers in C-code. See Conversion of Integers for some examples.

A pointer is essentially an integer. But to be able to represent pointers to structures or other typed data, PyKdump subclasses int adding to it some additional data.

class EnumInfo(dict)

We use this to represent info about enumerations - mapping between symbolic names and integer values.

As this inherits dictionaries, you can find the integer value using its name as a key. For reverse lookup you can use:

getname(ival)
Parameters:

ival -- integer value for this enum

Returns:

a string with enum name

class tEnum(int)

When data is represented by enums, we have an integer value and symbolic name. We can use instances of this class as normal integers but if needed, we can retrieve their name as well:

__repr__()
Returns:

symbolic name

Attributes:

einfo
Returns:

EnumInfo instance

class tPtr(addr, ti)

Create a typed pointer with address addr and TypeInfo ti. Normally you do not create these objects yourself but rather rely on the framework API to return objects of this type when needed.

Index access for pointers is implemented as in C, so that you can do something like:

tptr[i]

There are two attributes implemented as properties:

ptype

returns typeinfo of this object

Deref

returns a dereference for this pointer (with appropriate type)

As Python does not have the -> operator, both struct pointers and structs themselves are represented similarly, with an object that has address, type information, and access methods to its fields. Normally you do not use this class constructor yourself but rather rely on framework subroutines to create these objects when calling functions. We will describe different methods of this class in the next section.

Working with Structs and Unions

In the following discussion we will mainly talk about struct, but in most cases the same methods are applicable to union objects as well. PyKdump uses the same class to represent both of them.

As mentioned in the previous section, there is no difference in representation between a struct and a pointer to a struct. In C they both have addresses, and only access to their fields is different. For example:

struct A {
   int ifield;
   ...
};

struct A a;
struct A *ap;

a.ifield;   // field value
&a;         // struct address

ap;         // a pointer with struct address
ap->ifield; // field value

In Python, we will get a StructResult object in both cases.

Dereference Chains

Assuming that in C we have the following:

result = a->f1.f2.f3->f4;  // result variable should be of appropriate type

we will use the following in Python:

result = a.f1.f2.f3.f4

PyKdump analyzes intermediate field types and interprets them as structs or pointers to structs as needed, so that we ultimately reach the f4 value. Please note that this works for simple pointers only, not for a pointer to a pointer like struct B **dp;. The type of result object will be as needed, according to its C definition.

Useful Methods and Fields of StructResult

class StructResult

This is an object representing a struct or union. It is created by the framework as needed, as a result of calling subroutines to read structs or as a result of dereference.

Note

In most cases, we obtain instances of subclasses of this class, one per C-struct. This is an optimization as this lets us analyze symbolic info obtained from GDB once only and cache it as subclass class methods.

__len__()
Returns:

an integer with struct size

__str__()
Returns:

q string suitable for printing, e.g.:

<struct nfs_client 0xffff88042e947000>

castTo(sname)

Analog of type-casting in C

Parameters:

sname -- a string with struct name

Returns:

an object of a new type

Example:

skbhead = sd.input_pkt_queue.castTo("struct sk_buff")
Dump(indent=0)

Dump object contents for debugging purposes, with indentation if needed.

Eval(estr)

This method is useful if we have a StructResult object and want to do a complex dereference. For example, our object is S, it has a field a which is another struct, and we want to do something like:

S.a.b.c
Parameters:

estr -- a string describing a dereference chain, possibly with multiple dereferences, such as "a.b.c" for example above

Returns:

result of dereference

This is mainly useful for performance reasons. When we do:

S.a.b.c

this does dereferencing sequentially. But if we do:

S.Eval("a.b.c")

this creates an optimized dereferencer for the "a.b.c" chain, caches it, and next time reuses it.

fieldOffset(fname)
Parameters:

name -- a string with field name

Returns:

an integer with offset of this field

hasField(fname)
Parameters:

fname -- a string with field name

Returns:

whether a field with this name exists in this struct

Example:

if t.hasField("rlim"):
    ...
isNamed(sname)
Parameters:

sname -- a string with struct name

Returns:

whether this instance represents struct with such name

Example:

o.isNamed("struct sock")
shortStr()

When we want to display struct name and address in our programs, we usually rely on the str() subroutine. This method is useful when we want to save space (e.g. to fit output into an 80-char string). So we do not display struct/union like __str__ does, e.g.:

<nfs_client 0xffff88042e947000>

Strings

In C, there is no special string type, so strings are typically represented as an array of characters:

char *var;
char s[10];

The problem is that we cannot be 100% sure that char s[10] is really used for a string or is just an array of 10 signed 8-byte values. So while it is reasonable to assume that this is a string, we should have a way to interpret it as simple bytes instead.

To deal with this ambiguity, variables that "look" like strings are converted not to text but rather to special objects.

class SmartString(str)

This class is a subclass of the generic Python str - Unicode strings, so instances of it can be used as normal strings; you can print them, search them, etc.

At the same time, depending on how these objects are created, they have some addional methods. First of all, if the C definition was just a pointer, we cannot know the length of this string. C-strings are NULL-terminated - but how many bytes do we need to read? We read 256 bytes, search for NULL, and then convert the found number of bytes to ASCII (non-ASCII bytes are represented with backslash escapes).

At the same time, you can access raw data using a special attrubute of these objects.

Finally, if char *s is a member of a struct/union, we might be interested not only in the pointer value, but the address of this pointer as well. So if this is a member of struct A a, we might like to know &a.s.

__long__()
Returns:

an integer with address of this object

Attributes:

ByteArray

byte array, without any conversion to Unicode

addr

(unsigned long) a.s

ptr

(unsigned long) &a.s

Accessing Global/Static variables by Name

Many kernel tables and variables are defined either as globals or static, and usually we can access them using their name.

readSymbol(symbol)

This subroutine gets symbolic information based on the C definition for this variable name and returns the needed Python object automatically, e.g.:

int ivar; -> integer in Python
int iarr[10]; -> a list of 10 integers in Python
struct tcp *tcps; -> a StructResult for this type and address
Parameters:

symbol -- a string with variable name, the same as C identifier

Returns:

an object with proper type

Accessing Information about Types

When you get an object by using readSymbol() or from some other subroutine, you might need to check the object type. For example, some global variables have different definitions for different kernel versions, and you want your program to deal with all kernels.

You can use the generic Python isinstance() to do basic checks:

if (isinstance(obj, StructResult)):
    ...

if (isinstance(obj, tPtr)):
    return obj.Deref

if (not isinstance(strarr, list)):
    ...

But what if you need to get more details? For many objects, we can retrieve more details about them using the attached Typeinfo instance:

class Typeinfo

There are a number of attributes providing information. If they are unavailable for this type of object, their value is None. For example, dims is None for scalar variables, otherwise it provides information about array dimensions.

dims
  • None for scalars, otherwise a list

  • [4] for char c[4];

  • [2,3] for int *array[2][3];

size

Size of this object, e.g. if this is a struct, then it is the struct size

ptrlev

If this object is a pointer like char *ptr, then it is 1. For char **ptr it is 2, and so on.

stype