| EMN, INRIA, DIKU |
ARRAY_SIZE (array.html, array.cocci).
This semantic patch first checks that the header file kernel.h
defining ARRAY_SIZE is included, then converts various code
patterns to calls to ARRAY_SIZE and finally removes local
macros.
The semantic patch could be extended to cases where kernel.h
is not already included, but it is not easy to add a new include file in
the "right" place.
The rules for introducing calls to ARRAY_SIZE contain
parentheses around the matched expression. The isomorphism
paren causes code where these parentheses are absent to also
be considered. The isomorphism is applied after the parse tree is
constructed for the rule, so dropping the parentheses does not introduce
the possibility of associativity problems.
One of the rules in this semantic patch checks for a case where the size of
the array is divided by the size of the type of its elements. The required
type information is often stored in header files. To improve performance,
Coccinelle normally only considers the header files in the current
directory and those in the include path whose name corresponds to the name
of the given C file. The option -all_includes causes all of
the header files mentioned in the C file to be included. When treating a
directory, a header file is processed each time it is included. If there
are transformations that affect the header file, these will appear more
than once in the output.
roundup and DIV_ROUND_UP (round.html, round.cocci).
This semantic patch is similar to the one for introducing
ARRAY_SIZE.
BUG_ON (bugon.html, bugon.cocci). When debugging is enabled,
BUG_ON expands to a conditional that invokes BUG
if its argument is true. The advantage of using BUG_ON rather
than such a conditional is that when debugging is not desired,
BUG_ON can be defined to expand to nothing at all. This
semantic patch converts a conditional that has only BUG in its
then branch to a call to BUG_ON. This transformation,
however, is only safe when the tested expression has no side effects.
Since BUG_ON can discard its argument, the rule checks that
this expression does not contain any function calls or assignments. A more
complicated rule could be written that would lift such a term out of the
conditional test and rewrite the conditional to test the result. In the
case of a test expression that does not involve an assignment, this would
require introducing a new variable to store the result. Such a variable
could be declared as a SmPL metavariable using fresh
identifier, in which case the user would be prompted for the
identifier name.
One "function call" that does not have a side effect and is often used in
conjunction with BUG is unlikely. The first rule
of the semantic patch treats cases involving such calls. It disables the
isomorphism unlikely, which can cause some redundant matches,
as an unlikely expression is an expression itself.
!x&y (notand.html,
notand.cocci). An expression of this form is
almost always meaningless, because it combines a boolean operator with a
bit operator. In particular, if the rightmost bit of y is 0,
the result will always be 0. This semantic patch focuses on the case where
y is a constant. Another possibility is to consider the case
where y is an arbitrary expression (notand_exp.html, notand_exp.cocci). This rule contains a
disjunction that causes it not to perform the transformation when
y is itself negated, as an expression of the form
!x&!y can make sense.
!E && E1 or E || E1, where
E1 contains E->fld (andand.html, andand.cocci). In either case, E->fld
represents a NULL pointer dereference. In each rule of the semantic patch,
the header makes it explicit that the body of the rule represents an
expression, which solves a SmPL parsing problem.
sizeof to the result of sizeof
(sizeof.html,
sizeof.cocci).
The rule considers separately the cases where the argument is a type or an
expression. It would not be sufficient to replace the argument of the
inner sizeof by "...", because that would only
match an expression.
u8, however, are sometimes not included directly in the C
file, but in some file that the C file includes. Since spatch has no
option to include files recursively, it would be necessary to write extra
rules to find bugs in the use of these types.
DEFINE_MUTEX. DEFINE_MUTEX is declared among the
metavariables as a declarer name meaning that the
DEFINE_MUTEX pattern should be parsed as a variable
declaration rather than as a function call.
simple_strtol and strict_strtol convert a string
to a signed integer. Sometimes, however, the result is stored in a
location having an unsigned type. This semantic patch translates such
calls to simple_strtoul and strict_strtoul,
respectively. It considers a type to be unsigned if it is anything other
than int, long, or s32, thus getting
around the problem of typedefs in header files encountered in the case of
(find_unsigned.html, find_unsigned.cocci). False positives are,
however, possible, if some other signed type is used.
continue at the bottom of a for loop (continue.html, continue.cocci). This semantic patch removes a
continue in a conditional at the bottom of a for
loop. Unfortunately, due to the control-flow based nature of Coccinelle,
this semantic patch sometimes considers a continue to be at the bottom of a
for loop when actually it is not.
x is
preceded by another NULL test x that causes a return if
x is NULL. The next three rules consider whether the NULL
test is useful because of some other control-flow path. For NULL tests
that area useless, the rule fix then rewrites some cases where
the NULL test is straightforward to eliminate. The Python rule at the end
of the semantic patch prints out some information about cases that were not
possible to fix.
!, with the
intuition that the NULL value reflects the failure of the function call,
and in the latter case, it uses a comparison to NULL. Other strategies
could be implemented. Both rules in the semantic patch disable the
isomorphisms is_zero and isnt_zero, which convert comparisons to 0 to
simpler expressions involving just the compared value or its negation. For
this transformation, we only want cases where there is an explicit
comparison to 0.
extern, in which
case the initialization may be more useful that it appears locally. This
rule records the position of any such initialization in the position
metavariable p. The second rule finds a declaration of a
variable that is at a position that is different from any of the recorded
ones, and then checks that the only subsequent references to the variable
simply initialize it to a constant. In this case, the declaration and all
of the initializations are deleted.
Other variations could be considered. This semantic patch focuses on constants, because they have no side effects. Variables, however, would have the same property. Another variation would be when a variable is assigned to the result of calling a function, but the value is never used. In this case, the function call cannot be deleted, but it is at least possible that the variable is holding some sort of error code that should be checked or freshly allocated memory that should be stored or freed.
memset 0 after a call to
alloc_bootmem (bootmem.html, bootmem.cocci). The Linux function
alloc_bootmem and variants all initialize the allocated data
to 0 and panic if the allocation is not possible. This semantic patch
removes any subsequent NULL test or memset when there is no
previous reference to allocated data. The constraint that there be no
previous reference to the allocated data is particularly important in the
memset case, as the memset could be useful if it
serves to reset the allocated memory to 0. In the NULL test case, the
constraint could be relaxed to check only that there is no intervening
reassignment, but it does not seem likely that this would find more bugs,
since normally a NULL test would come before any accesses. p1 and p2, respectively. The first two disjuncts
seem redundant, since one matches an assignment as a statement and the
other matches an assignment of the same form, but as an expression. The
first case actually only serves to indicate to the SmPL parser that this
rule is a disjunction of statements, as is needed to parse the final
disjunct containing "...", and not a disjunction containing
only expressions.
The next two rules consider the possibility that the NULL test is there because of some other path that does not go through the dereference. In the first case, the path starts with an assignment of the location of interest, while in the second case there is no assignment and the path starts at the beginning of the function definition.
The last rule prints out the result in emacs org mode format, providing links to the dereference and the NULL test.
return, with or
without an argument, 2) The then branch is a sequence of statements ending
in a return, with or without an argument, 3) The then branch
is a sequence of statements ending in a goto, which ultimately
leads to a return.
mutex_lock is followed on every control-flow path either by
a call to mutex_unlock or a conditional ending in a return,
which is assumed to represent error-handling code. If such a conditional
does not contain a call to mutex_unlock, then this is assumed
to be a bug, and one is added by the semantic patch.
kmalloc or a related function is not followed by a call to
kfree (kmalloc.html, kmalloc.cocci). This semantic patch makes the
constraints that the variable storing the result of calling
kmalloc is a local variable and that the only reference to
this variable between the allocation point and the return is an
initialization of its fields. The former constraint ensures that at the
point of the allocation, there is no pointer accessible from outside the
function to the allocated data. The latter constraint ensures that no such
pointer is created, eg by passing the allocated data to another function
which stores it in a global data structure.
Often, Linux error handling code performs a goto to a label at the end of
the function that performs a sequence of deallocations. The matching of a
goto in the first rule of this semantic patch only serves to record its
position to be used in creating the subsequent output, to make it easier to
see what control-flow path was used in detecting the bug. The semantic
patch contains two rules for printing out the results. The first rule is
used only if the position, p3, of a goto is available. This
rule ends in cocci.include_match(False), indicating the
bindings used in this rule should be discarded. The second rule is then
used for the other possible bindings, for the cases where there is no goto.
pci_dev_put (add_pci_dev.html, add_pci_dev.cocci).
pci_get_device and some related functions have the property
that they take an argument that is the starting point of a search, and
after completing the search they decrement the reference count of this
argument. This behavior is particularly convenient for iterating over a
sequence of objects, as when the iteration completes all objects have been
freed without any explicit reference count manipulation. When there is an
early return from within such a loop, however, it may be necessary to
decrement the reference count explicitly. This semantic patch detects and
fixes such returns for while and for_each_pci_dev
loops. In each case, the iteration variable is declared as local
idexpression, to ensure that it is a local variable, and thus not
visible from outside the function. for_each_pci_dev is
declared as an iterator name in the second rule, so that it
will be parsed correctly.
of_node_put and
scsi_device_put (missing_put.html, missing_put.cocci). This rule is similar to
the previous one, but it focuses on loop types that imply different reference
count functions. Each of the various loop types is declared using
iterator name in only the first rule in which it appears. It
is then considered to be an iterator name throughout the rest
of the semantic patch.
local_irq_restore (local.html, local.cocci).
A call to local_irq_save should normally be followed by a call
to local_irq_restore along all paths. This rule uses
when any to allow any number of conditionals to appear between
then call to local_irq_restore and the matched conditional
(otherwise matching would stop at the first conditional, due to the
shortest path constraint on "..."), and when
strict to ensure that all paths are checked, even those that are
considered to be error paths (normally constraints on error paths are
relaxed, as the code cannot be expected to eg free something if its
allocation has failed).
ALLOC and FREE that have to be
manually replaced by the name of an allocation function and its
corresponding deallocation function before using the sematic match. This
semantic match targets more specialized resource allocation protocols than
kmalloc/kfree. It puts fewer constraints on what
it reports as bugs, and thus may return more false positives, in particular
because it allows the allocated value to be passed to another function,
which may save it in some way, implying there is no need for a
deallocation. However, the more specialized allocation functions are used
less often, and thus we have not found the number of false positives to be
burdensome in practice. If desired, one could replace kmalloc
and kfree in the kmalloc rule by the names of other allocation
and deallocation functions and obtain potentially fewer false positives,
but potentially more false negatives as well.
Concretely, the semantic match focuses on the allocation site (storing the result in a local variable) and each subsequent return that does not return either 0, some expression involving the allocated value, or some other non-NULL pointer. Between these points, the semantic match checks that there is no call to the deallocation function, either directly or under a conditional, that the return is not in a control-flow path that checks that the result of the allocation function is NULL, and that the allocated value has not been stored somewhere or overwritten. If these cases are satisfied, the Python script prints the position of the call to the allocation function and of the return.
The constraints on a return that is considered of interest and the path
between the call to the allocation function are purely heuristic and may
give good or bad results depending on how the allocation function is
expected to be used. For example, the constraint that a return of 0 is not
of interest derives from the assumption that 0 represents success and in a
success case, all allocated memory has been saved or freed in some way,
even if it is not apparent to the semantic patch. This assumption may be
reasonable in the case of an allocation function that is based on
kmalloc, but we have found that it is less likely to be valid
in the case of an "allocation" function whose effect is to increment a
reference count. Often a reference count should be both incremented and
decremented within a single function (similar to a lock), and thus should
in particular be decremented even if the containing function succeeds.
Some pairs of allocation and deallocation functions that we have considered
are ioremap/iounmap,
auth_domain_find/auth_node_put,
of_find_node_by_name/of_node_put, and
pci_get_slot/pci_dev_put. Semantic patches for
these, that are slightly different from alloc_free.cocci in that they try
to correct the bug when possible, are available in (iounmap_check.html, iounmap_check.cocci), (auth.html, auth.cocci), (ofname1.html, ofname1.cocci), (get_slot.html, get_slot.cocci), respectively. By trying to
correct the bug, however, these semantic patches may be less successful
than the corresponding instantiations of alloc_free.cocci, because they can
fail when multiple control-flow paths reach a return, as this is considered
to provide inconsistent information as to how the return should be updated.
SPIN_LOCK_UNLOCKED (sl.html, sl.cocci). SPIN_LOCK_UNLOCKED is
deprecated, as described in the Linux file
Documentation/spinlocks.txt. The semantic patch handles the
case of a variable declaration or a dynamic initialization.
DEFINE_SPINLOCK is declared as a declarer name so
that the SmPL parser knows that it is a valid replacement for a variable
declaration; if DEFINE_SPINLOCK were considered to be the name
of a function, it would not satisfy this constraint, resulting in a parse
error.
The case where SPIN_LOCK_UNLOCKED is used in a structure
initialization cannot be handled in general, because it would be necessary
to accumulate an arbitrary number of structure fields as the argument of
__SPIN_LOCK_UNLOCKED. Coccinelle could be used to find such
uses of SPIN_LOCK_UNLOCKED, which then could be updated by
hand.
kmalloc/memset conversion (kzalloc.html, kzalloc.cocci). This semantic patch translates a
call to kmalloc followed by a call to memset into a
single call to kzalloc that performs both operations. Many
cases are considered to ensure that the call to memset is
intended to initialize to zero the complete allocated data, rather than
e.g., to reinitialized it after performing some other operations.
The semantic patch also converts calls to kzalloc where the
first argument is a product of the size of a structure and some other value
to a call to kcalloc. The latter function performs an
overflow check before performing the multiplication.