Mapping between C and Python Data¶
The C and Python languages are quite different. The purpose of this document is to explain how different C types are represented in the PyKdump framework and how to emulate C operators such as dereference.
Integers and Pointers¶
There are many integer types in C and only one integer type in Python. See Numeric Types — int, float, complex for details about Python.
While emulating C-code in Python, we should take into account that Python integers never overflow. As a result, after doing operations on integers, we might need to mask bits and/or do other conversions manually, taking into account the size of integers in C-code. See Conversion of Integers for some examples.
A pointer is essentially an integer. But to be able to represent
pointers to structures or other typed data, PyKdump subclasses
int
adding to it some additional data.
- class EnumInfo(dict)¶
We use this to represent info about enumerations - mapping between symbolic names and integer values.
As this inherits dictionaries, you can find the integer value using its name as a key. For reverse lookup you can use:
- getname(ival)¶
- Parameters:
ival -- integer value for this enum
- Returns:
a string with enum name
- class tEnum(int)¶
When data is represented by enums, we have an integer value and symbolic name. We can use instances of this class as normal integers but if needed, we can retrieve their name as well:
- __repr__()¶
- Returns:
symbolic name
Attributes:
- class tPtr(addr, ti)¶
Create a typed pointer with address addr and TypeInfo ti. Normally you do not create these objects yourself but rather rely on the framework API to return objects of this type when needed.
Index access for pointers is implemented as in C, so that you can do something like:
tptr[i]
There are two attributes implemented as properties:
- ptype¶
returns typeinfo of this object
- Deref¶
returns a dereference for this pointer (with appropriate type)
As Python does not have the ->
operator, both struct pointers and
structs themselves are represented similarly, with an object that has
address, type information, and access methods to its fields. Normally
you do not use this class constructor yourself but rather rely on
framework subroutines to create these objects when calling
functions. We will describe different methods of this class in the
next section.
Working with Structs and Unions¶
In the following discussion we will mainly talk about struct, but in most cases the same methods are applicable to union objects as well. PyKdump uses the same class to represent both of them.
As mentioned in the previous section, there is no difference in representation between a struct and a pointer to a struct. In C they both have addresses, and only access to their fields is different. For example:
struct A {
int ifield;
...
};
struct A a;
struct A *ap;
a.ifield; // field value
&a; // struct address
ap; // a pointer with struct address
ap->ifield; // field value
In Python, we will get a StructResult
object in both
cases.
Dereference Chains¶
Assuming that in C we have the following:
result = a->f1.f2.f3->f4; // result variable should be of appropriate type
we will use the following in Python:
result = a.f1.f2.f3.f4
PyKdump analyzes intermediate field types and interprets them as
structs or pointers to structs as needed, so that we ultimately reach the
f4 value. Please note that this works for simple pointers only, not for a pointer to a pointer like struct B **dp;
. The type of result
object will be as needed, according to its C definition.
Useful Methods and Fields of StructResult
¶
- class StructResult¶
This is an object representing a struct or union. It is created by the framework as needed, as a result of calling subroutines to read structs or as a result of dereference.
Note
In most cases, we obtain instances of subclasses of this class, one per C-struct. This is an optimization as this lets us analyze symbolic info obtained from GDB once only and cache it as subclass class methods.
- __len__()¶
- Returns:
an integer with struct size
- __str__()¶
- Returns:
q string suitable for printing, e.g.:
<struct nfs_client 0xffff88042e947000>
- castTo(sname)¶
Analog of type-casting in C
- Parameters:
sname -- a string with struct name
- Returns:
an object of a new type
Example:
skbhead = sd.input_pkt_queue.castTo("struct sk_buff")
- Dump(indent=0)¶
Dump object contents for debugging purposes, with indentation if needed.
- Eval(estr)¶
This method is useful if we have a
StructResult
object and want to do a complex dereference. For example, our object isS
, it has a fielda
which is another struct, and we want to do something like:S.a.b.c
- Parameters:
estr -- a string describing a dereference chain, possibly with multiple dereferences, such as "a.b.c" for example above
- Returns:
result of dereference
This is mainly useful for performance reasons. When we do:
S.a.b.c
this does dereferencing sequentially. But if we do:
S.Eval("a.b.c")
this creates an optimized dereferencer for the "a.b.c" chain, caches it, and next time reuses it.
- fieldOffset(fname)¶
- Parameters:
name -- a string with field name
- Returns:
an integer with offset of this field
- hasField(fname)¶
- Parameters:
fname -- a string with field name
- Returns:
whether a field with this name exists in this struct
Example:
if t.hasField("rlim"): ...
- isNamed(sname)¶
- Parameters:
sname -- a string with struct name
- Returns:
whether this instance represents struct with such name
Example:
o.isNamed("struct sock")
- shortStr()¶
When we want to display struct name and address in our programs, we usually rely on the str() subroutine. This method is useful when we want to save space (e.g. to fit output into an 80-char string). So we do not display struct/union like __str__ does, e.g.:
<nfs_client 0xffff88042e947000>
Strings¶
In C, there is no special string type, so strings are typically represented as an array of characters:
char *var;
char s[10];
The problem is that we cannot be 100% sure that char s[10]
is
really used for a string or is just an array of 10 signed 8-byte
values. So while it is reasonable to assume that this is a string, we
should have a way to interpret it as simple bytes instead.
To deal with this ambiguity, variables that "look" like strings are converted not to text but rather to special objects.
- class SmartString(str)¶
This class is a subclass of the generic Python
str
- Unicode strings, so instances of it can be used as normal strings; you can print them, search them, etc.At the same time, depending on how these objects are created, they have some addional methods. First of all, if the C definition was just a pointer, we cannot know the length of this string. C-strings are NULL-terminated - but how many bytes do we need to read? We read 256 bytes, search for NULL, and then convert the found number of bytes to ASCII (non-ASCII bytes are represented with backslash escapes).
At the same time, you can access raw data using a special attrubute of these objects.
Finally, if
char *s
is a member of a struct/union, we might be interested not only in the pointer value, but the address of this pointer as well. So if this is a member ofstruct A a
, we might like to know&a.s
.- __long__()¶
- Returns:
an integer with address of this object
Attributes:
- ByteArray¶
byte array, without any conversion to Unicode
- addr¶
(unsigned long) a.s
- ptr¶
(unsigned long) &a.s
Accessing Global/Static variables by Name¶
Many kernel tables and variables are defined either as globals or static, and usually we can access them using their name.
- readSymbol(symbol)¶
This subroutine gets symbolic information based on the C definition for this variable name and returns the needed Python object automatically, e.g.:
int ivar; -> integer in Python int iarr[10]; -> a list of 10 integers in Python struct tcp *tcps; -> a StructResult for this type and address
- Parameters:
symbol -- a string with variable name, the same as C identifier
- Returns:
an object with proper type
Accessing Information about Types¶
When you get an object by using readSymbol()
or from some other
subroutine, you might need to check the object type. For example, some global
variables have different definitions for different kernel versions, and
you want your program to deal with all kernels.
You can use the generic Python isinstance()
to do basic checks:
if (isinstance(obj, StructResult)):
...
if (isinstance(obj, tPtr)):
return obj.Deref
if (not isinstance(strarr, list)):
...
But what if you need to get more details? For many objects, we can
retrieve more details about them using the attached Typeinfo
instance:
- class Typeinfo¶
There are a number of attributes providing information. If they are unavailable for this type of object, their value is None. For example, dims is None for scalar variables, otherwise it provides information about array dimensions.