Pusher: an augmented fuzzer based on the connection between input and comparison operand

Bin ZHANG; Jiaxi YE; Ruilin LI; Chao FENG; Yunfei SU; Chaojing TANG

doi:10.1007/s11704-021-0075-8

Frontiers of Computer Science >

2022 , Vol. 16 >Issue 4: 164206

DOI: https://doi.org/10.1007/s11704-021-0075-8

RESEARCH ARTICLE

Pusher: an augmented fuzzer based on the connection between input and comparison operand

Bin ZHANG ,
Jiaxi YE ,
Ruilin LI ,
Chao FENG ,
Yunfei SU ,
Chaojing TANG

Expand

College of Electronic Science, National University of Defense Technology, Changsha 410072, China

Received date: 21 Feb 2020

Accepted date: 06 Jan 2021

Published date: 15 Aug 2022

Copyright

2022 Higher Education Press

Fold

Abstract

Coverage based fuzzing is a widespread vulnerability detection technique, and it has exposed many bugs in many real-world programs. However, its attention is to eliminate the testing on the repeated paths, yet it still employs random mutation to generate inputs, which is blind to penetrate complex comparisons in the program. As a result, the testing coverage is limited. Despite some solution proposals are presented, this problem is still partially solved. This paper argues that random mutation is mainly limited by two challenges, the sizable search spaceand the lack of a useful feedback to direct the search. Then we present an augmented fuzzing technique by addressing these two challenges. First of all, we point out a black relationship between input contents and comparison operands, which is dubbed connection. Second, we present a novel method to collect the comparison operands during execution, which is leveraged to infer the connections. Based on the connections, the fuzzer can learn about which input byte affects on which comparison instruction to establish a smaller search space. Third, the connection provides a useful feedback to direct the search. We resort to a modern meta-heuristic algorithm to satisfy this searching requirement. We developed a prototype $Pusher$ and evaluated its performance on several benchmarks and four real-world programs. The experimental results demonstrate that $Pusher$ works better than some other state-of-the-art fuzzers on bug detection, and can achieve a higher testing coverage. Moreover, we take a detailed statistic about the execution overhead in Pusher, and the results indicate that the execution overhead introduced by our approach is within an acceptable scope.

Key words： software safety; software testing; information security; vulnerability; searching

Cite this article

Bin ZHANG , Jiaxi YE , Ruilin LI , Chao FENG , Yunfei SU , Chaojing TANG . Pusher: an augmented fuzzer based on the connection between input and comparison operand[J]. Frontiers of Computer Science, 2022 , 16(4) : 164206 . DOI: 10.1007/s11704-021-0075-8

1 Introduction

Fuzzing has been an extensively used technology for vulnerability discovery since proposed [1, 2], and is proved to be more scalable on the large real-world software than symbolic execution and taint anlaysis. Its fundamental idea is to run randomly or strategically generated inputs against the target program and monitor abnormal behaviors, like crashes.

Coverage based fuzzing is a remarkable innovation in the fuzzing community [3]. It constructs a testing loop, in which the new inputs are continuously generated by mutating the seeds from a seed pool. Then, it conducts the coverage evaluation on each input through a lightweight instrumentation, and only the ones hitting new coverage would be added to the seed pool. By this testing loop, the engine is skilled at eliminating the inputs on the duplicated paths and can discover bugs faster than traditional blind fuzzing. The state-of-the-art coverage based fuzzers, such as AFL (proposed by Michał Zalewski), LibFuzzer in LLVM, and Honggfuzz (from Google), have found lots of bugs in the real-world programs [4].

However, since the fuzzers have no preknowledge of target program, it has to produce the testing inputs by random mutation. As a result, these fuzzers are weak at penetrating into deeper code areas that are guarded by the complex constraints. This reduces the effciency of bug discovery capability for coveraged-based fuzzers.

To address this challenge, symbolic execution is introduced to bypass such complex constraints. It solves the path constraints via SMT solvers (such as Z3 [5]) to generate the correspoding inputs that exercise the new paths [6–8]. This core idea behind such hybrid testing method is promising yet limited currently. This is mainly because there are still some challenges that symbolic execution needs to be solved, such as state explosion [9, 10].

Taint analysis [11–13] is also leveraged to handle the limitation in random mutation. These techniques recognize the input contents which can affect the program execution, and practice various strategies on these input contents to achieve better path exploration performance. However, deploying taint analysis still requires more conditions than applying fuzzing. For example, taint analysis needs to identify and understand the instructions, and DataFlowSanitizer requires other explanations for the external libraries [14]. In a nutshell, their practicability on large real-world programs is not easy as well.

To avoid the disadvantages introduced by symbolic execution and taint analysis, some lightweight techniques are presented. These techniques address the blindness of the random mutation, such as program-state aware based techniques [15, 16], statistical analysis based techniques [17, 18], etc. For example, a Markov Chain based power schedule is proposed [17] which can gravitate the testing energy toward the inputs executing on the low-frequency paths. A targeted strategy is put forward [18] to test more on the rare branches. A program-state based approach is leveraged to apply adaptive mutation [15]. However, the limitation in random mutation is only partially relieved by these lightweight techniques, and it is worth further researching.

Among the related technology investigation, improving fuzzing performance via the lightweight search-based input generation is promising [19]. The core idea is to formulate the testing process as an optimization problem, and leverage searching algorithms to tackle it [20]. For example, EvoSuite [21] adopts a hybrid searching approach to steer whole test suites toward satisfying more comparisons.

Challenges Two significant challenges need to be addressed to make such search-based fuzzing technique more efficient. The first challenge is that the searching for the target inputs is usually typified by a sizable search space which may bring in serious overhead and make the testing ineffective. The second challenge is that most searching algorithms claim a heavy dependence on the feedback, which is useful to direct the running.

This paper presents a lightweight search-based approach to augment fuzzing via addressing the two challenges mentioned above. First of all, we know that the comparison operands would be assigned concrete values when the program executes on an input, i.e., there is a black relationship between the input contents and the comparison operands, we dubbed this relationship a connection in the program. And our approach is principally based on these connections. Second, we design a novel method to obtain the connections during execution. Then a connection inference is presented to recognize which input bytes can affect which comparison operands. It finally locates a smaller search space, addressing the challenge of the sizable search space. Third, these connections offer a feedback to indicate the difference between the current input to the expected input (satisfying the comparison) to take some off-the-shelf algorithm to search for the inputs.

During implementation, the proposed techniques are integrated with the original coverage-based fuzzing, and it performs as a search-based module to help the engine achieve a higher coverage. Many modern searching algorithms can be used in our approach, yet this is not our emphasis in this paper. Therefore, we just select the vanilla genetic algorithm [22] as our searching engine.

We developed a prototype

Pusher

based on AFL, and evaluated its performance on LAVA-M [23], CGC binaries, and four real-world programs. On the LAVA-M database,

Pusher

beats the other nine state-of-the-art fuzzers on bug detection. On the CGC binaries,

Pusher

finds more bugs and achieves a higher coverage than other fuzzers. On the four real-world programs,

Pusher

behaves the best accroding to the testing coverage. Besides, an in-depth analysis of the execution overhead demonstrates that our approach brings less overhead and is scalable for testing real world programs.

The rest of this paper is organized as follows. In Section 2, a motivation example and the framework of our approach are described. In Section 3, the techniques are depicted in detail. Section 4 reports the experimental evaluation results. Section 5 discusses the related work and Section 6 concludes the whole paper.

2 Overview

2.1 Motivation example

We take a motivation example to describe how our approach regulates the input contents and solves the comparisons.

Program Figure 1 is an example containing two representative complex path constraints. It reads contents from the input

FILE *fp

, and assigns values to the

magic

and

result

variables. Then, the program performs two checks on the two variables:

Fig.1 An example containing two representative path constraints at lines 8 and 19

Full size|PPT slide

● C1: A magic number check at lines 3–9. Assuming that the value of

magic

is directly from the

0th, 1st, 2nd

, and

3rd

bytes of the input.

● C2: A simple format check at lines 11–20. The value of

checkresult

is a result of the

check_format

function, assuming that it is calculated according to the

8th, 12th, 16th

, and

20th

bytes of the input

fp

An error will be raised for input that satisfies any of the two checks.

Coverage-based fuzzing These two checks highlight the challenge in the random mutation used in the coverage based fuzzers. Because the random mutation is unware of the program, it has to randomly select input contents for random mutation. As a result, it is hard to satisfy the C1 or C2 check. For example, to handle the C1 check, the random mutation should select the

0th, 1st, 2nd, 3rd

bytes for mutation, and the obtained input contents need to be

0xdeadbeef

by chance.

P u s h e r

Our approach generates the inputs that can satisfy C1 and C2 checks via the following steps.

● The program executes on an input

s

, the

magic

and

result

variables are assigned concrete values. We use

V m s

for the

magic

and

V c s

for the

result

. Therefore, the combination of

(s, V m s, V c s)

is a connection instance between the input contents and the comparison operands.

● The engine produces a new input

s 0

by mutating on the

0th

byte of the input

s

. The program executes on

s 0

, and it obtains a new connection instance,

(s 0, V m s 0, V c s 0)

. By comparing inference that whether the

0th

byte affects on any corresponding comparison operands or not in this case, it would find that

V m s 0

is different from

V m s

, i.e., the

0th

byte can affect the

magic

variable.

● After traversing all the bytes of input

s

, the engine learns the knowledge that the

magic

variable is affected by the

0th, 1st, 2nd, 3rd

bytes, and the

checkresult

variable is based on the

8th, 12th, 16th, 20th

bytes (if the length of the input is larger than 20 bytes). These recognized bytes locate a reduced space for the searching. It addresses the challenge of searching in a sizable space.

● Finally, it adjusts the recognized bytes to making

V m

0xdeadbeef

, and

V c

result

and produces the corresponing inputs. In this process, the distance between

V m

and

0xdeadbeef

(or

V c

and

result

) acts as a feedback to direct the searching.

2.2 The framework of our approach

This paper designs a new fuzzer, which integrates the original coverage-based testing with our connection-based searching. Figure 2 depicts the framework of our approach.

Fig.2 The workflow of our approach

Full size|PPT slide

We instruments target source code by adding comparison operand collection code and also generates static information for search-based testing component during compiling. Then we choose coverage-based fuzzing as one component in our fuzzer

Pusher

to quickly creates a seed pool for the searching component.

In the connection-based searching, the engine runs the inputs from the seed pool to collect the comparison operands, which compose the connection instances between the input contents and comparison operands. Then, an inference component is leveraged to locate the bytes that can affect the comparison operands. Afterward, a heuristic searching engine is introduced to solve the unsatisfied comparisons and generate the target inputs by mutating the recognized bytes. This process is iterated until all the inputs in the seed pool are conducted once.

3 Technology

In this section, the applied techniques are described in detail. In Section 3.1, a novel instrumentation method is presented to collect the connection instances between the input contents and the comparison operands. Then, Section 3.2 describes the process of how to make the connection inference to identify the input bytes that affect the comparison operands. In Section 3.3, we take the genetic algorithm [22] to search for the target inputs.

3.1 Connection instance collection

The

cmp

and

test

instructions in the binary embody the comparisons in the source code. Without loss of generality, we take

cmp

instruction to typify the comparison in this paper.

To discriminate the

cmp

instructions, we assign each of them a unique label as the identity, which is calculated by three related basic blocks: the one where target

cmp

instruction locates and its two successive blocks according to the static control flow. Previous work has introduced a method to allocate unique IDs for each block, like

i d A

is assigned for the block

A

. Therefore, the

cmp

instruction labels can be determined based on these IDs.

Formally, the label calculation is as Eq. (1) shows.

μ

is a symbol for the label of a

cmp

instruction;

i d c

is denoted as the ID of the basic block where the

cmp

instruction is in,

i d t

and

i d f

are the IDs of the two successive basic blocks.

(1)

μ = i d c ⊕ (i d t ≫ 1) ⊕ (i d f ≫ 2) .

Moreover, the

cmp

instructions are reconstructed to ensure a uniform shape. In brief, each

cmp

instruction is characterized in three forms, shown in Table 1.

imm

means an immediate value,

reg

is referred to a value from registers, and

mem

is a value from the memory. For the first form, there are two immediate values in a

cmp

instruction, it is ruled out for its seldom usage. The second form is the expected shape with a dynamic operand and a fixed operand, the fixed operand is chosen as the target value (required in the searching). For the third form, both the two operands are dynamic, it would be reconstructed into the second form, creating a dynamic operand and a fixed operand. Afterwards, all the

cmp

instructions hold the same form, e.g., “

cmp op,imm

”.

Tab.1 The reconstruction of the $cmp$ instructions

Form	Example	Reconstructed	Target value
$cmp$ imm, imm	cmp 5, 7	N/A	N/A
$cmp$ reg/mem, imm	cmp eax, 7	N/A	7
$cmp$ reg/mem, reg/mem	cmp eax, ebx	cmp eax-ebx, 0	0

During execution, every input comes across some

cmp

instructions, as well as the concrete comparison operands. Then the input and the concrete comparison operands jointly make up a connection instance. Formally, given an input

s

COPMap

(s) collects all the concrete comparison operands, depicted as Eq. (2). Every combination of

(s, COPMap (s))

is a connection instance.

(2)

COPMap (s) = {(μ, O p μ) μ ∈ L a b e l s e t (s)},

where

L a b e l s e t (s)

denotes a set of labels of the

cmp

instruction emerged in the execution trace;

μ

is the label for a

cmp

instruction, and

O p μ

is its dynamic operand value, i.e., the

o p

in “

cmp op,imm

”.

We design a novel method to collect

COPMap

(running by instrumentation) as Algorithm 1 shows. First, a memory block is initialized (line 2); during execution, the input would encounter some

cmp

instructions, of which the labels are collected (lines 3 and 4). Then, for each

cmp

instruction with label

μ

, its operand value

O p μ

is obtained (line 6) and stored into the memory unit indexed by

μ

(line 7). At last, the memory block preservers all the specific operand values of the emerged

cmp

instructions, i.e.,

COPMap (s)

Fig.3

Full size|PPT slide

3.2 Connection inference

Since not all input contents can affect the comparison operands (such as the example in Fig. 1, only the

0th, 1st, 2nd,

3rd

bytes are used to get

magic

variable), the only mutation on such related bytes is promising to generate the inputs that can overcome the complex path constraints.

In our study, the connection between the input contents and the comparison operands means that, once the input contents change, the corresponding comparison operands change as a result. Therefore, we present sequential analyzing method to perform the connection inference, pointing out which input content affects which comparison operand.

As a common choice, the sequential analyzing is done byte-wise. Given an input

s

and its

COPMap (s)

, a new input

s 0

is produced by mutation on the

0th

byte, and

COPMap (s 0)

is obtained after executing. It decides whether the mutated byte belongs to the connection by comparing

COPMap (s)

and

COPMap (s 0)

. For example, if

COPMap (s)

and

COPMap (s 0)

possess different values at a same comparison operad, then the

0th

byte is involved to this comparison. After traversing all the bytes, the connection inference is made on this input.

Formally, we take

CI (s)

to represent the connection inference results for the input

s

. It is a key-value storage shown in Eq. (3).

(3)

CI (s) = {(i, l a b e l s e t i) 0 ≤ i < l e n (s)},

where

l e n (s)

is the input length,

i

is the key role, it denotes the index of each byte,

l a b e l s e t i

is the value role, it is a set including all the labels of the

cmp

instructions connected to the No.i byte. The quantity of labels in

l a b e l s e t i

can be

0, 1

or even larger.

Figure 3 shows an example of the partial inference process. The left part is the memory block of

COPMap (s)

, and the right is

COPMap (s 0)

(

s 0

is produced by mutation on the

0th

byte of

s

). Their values at

5th

and

14th

memory units are different (marked as gray), it means the

0th

byte can affect the operands at the

cmp

instructions with labels of

5

and

14

. So that, it yields the connection as Eq. (4) shows.

Fig.4 An example to show the process of the connection inference

Full size|PPT slide

(4)

CI (s) = {(0, (5, 14))} .

The total connection inference can be made by traversing all the bytes of the input. Intuitively, an accurate way is assigning

0x00

0xFF

for each byte to produce new inputs. However, it inevitably results in too much overhead. To make a trade-off between the effectiveness and the accuracy, each byte is only assigned some random values (like

10

values).

Algorithm 2 demonstrates the process of connection inference. It reads an input

s

, and finally outputs the result

CI (s)

. Given an input, the engine traverses all its bytes (lines 2–11), and every byte is checked for

N

times (lines 5–10); every time a new input

s ′

is produced by mutation on one byte of

s

, its

COPMap (s ′)

is compared to

COPMap (s)

, then it makes a decision about whether this byte belongs to the connection (lines 6–8); the

CI (s)

is updated after checking each byte (line 9).

Fig.5

Full size|PPT slide

3.3 Connection based searching

Based on the connection inference results, many modern meta-heuristic algorithms can be used to search for the target inputs and penetrate the complex comparisons. However, our primary goal in this paper is not to optimize the meta-heuristic algorithms themselves, but rather to address the challenges in random mutation. So that we just adopt the genetic algorithm (GA for short) as a choice to verify the availability of our techniques. In a nutshell, this section illustrates how we handle the collected information and resort to GA engine to produce the target inputs.

First of all, for an input

s

, its connection inference result

CI (s)

is obtained. We transforme

CI (s)

into another keyvalue storage, in which the cmp instruction label acts as the key role, and a set of the input bytes, which can affect the corresponding

cmp

instruction operands, acts as the value role. This storage is named

TCI (s)

, and formally shown in Eq. (5).

(5)

TCI (s) = {(μ, P o s s e t μ) μ ∈ L a b e l s e t (P)},

where P is the target program, Label_set(P) is a set of all the

cmp

instruction labels in the program, µ is the

cmp

instruction label, and

P o s s e t μ

is the set of all the input bytes connected to the µ

cmp

instruction.

For example, the

CI (s)

in Eq. (4) is transformed into Eq. (6). Specifically, the

cmp

instruction with label

5

is connected to the

0th

input byte, and the

cmp

instruction with label

14

is connected to the

0th

input byte as well.

(6)

TCI (s) = {(5, 0), (14, 0)} .

Furthermore, there is a target value for each

cmp

instruction (see Table 1). When compiling, all comparison instructions are instrumented in LLVM IR level to output the corresponding target values. For clear description, we utilize

V t μ

to denote the target value of the

cmp

instruction with

μ

label. Such as, the

V t μ

for the comparison at line 8 in Fig. 1 is

0xdeadbeef

Based on

TCI (s)

, Algorithm 3 is designed to apply a GA engine to satisfy our searching requirements. It reads an input

s

, and outputs several target inputs

i n p u t s e t

For a target program

P

, given an input

s

, its

TCI (s)

is obtained by the execution and conversion (lines 2, 3); for each

cmp

instruction contained in

TCI (s)

, an individual GA based process is done to search for target inputs (lines 4-10); the set

P o s s e t μ

of the connected input bytes and the target value

V t μ

for the

μ

cmp

instruction are determined (lines 5, 6); then, according to the GA model, the

P o s s e t μ

is considered as the GAGenes, and

V t μ

is considered as the GATarget (lines 7, 8), both of them are delivered to the GA engine via the parameters; finally, the GA engine tries its best to produce the target inputs (line 9).

Fig.6

Full size|PPT slide

More specially, in GA engine, the inputs bytes in

P o s s e t μ

are disassembled into bits, each of which acts as a gene in the GA model. And in the fitness function, we take the Euclidean distance between the two comparison operands,

o p

and

i m m

, as a feedback to decide the surviving possibility of each input in the evolution [22].

4 Implementation and evaluation

Based on the proposed methods, we develop a prototype

Pusher

which is built on top of AFL, LLVM instrumentation [24] and GAlib which Matthew proposes. AFL performs the state-of-the-art coverage based fuzzing, LLVM instrumentation is used to record the comparison operands, and GAlib provides the genetic algorithm based searching. Besides, we add some extra codes to conduct the connection inference. For the quantity of

L a b e l s e t (s)

for input

s

, we adopt the size of AFL trace map (i.e., 65536) to cover the requirement for most real world programs.

4.1 Evaluation setup

We investigate the evaluation experiments of other fuzzers according to their public papers, the results are shown in Table 2. Their experiments suggest a representative choice to evaluate LAVA-M [23], CGC, and some real-world programs. Therefore, we also perform the evaluation experiments on these three benchmark databases.

Tab.2 The evaluation experiments in other fuzzers

Fuzzer	Time	Benchmark	Compared Fuzzer
Steelix	2017	RWP	AFL-dyninst
		CGC	AFL-dyninst
		LAVA-M	FUZZER, SES, VUzzer, AFL-lafintel
VUzzer	2017	RWP	AFL
		CGC	AFLPIN
		LAVA-M	FUZZER, SES
Angora	2018	RWP	AFL
		CGC	−
		LAVA-M	FUZZER, SES, AFL, VUzzer, Steelix
T-fuzz	2018	RWP	AFL
		CGC	AFL, Driller
		LAVA-M	FUZZER, SES, VUzzer, Steelix
Profuzzer	2019	RWP	AFL, AFLFast, Driller, Vuzzer, QSYM
		CGC	−
		LAVA-M	AFL, AFLFast, Driller, VUzzer, Angora, QSYM

Note: RWP is short for real-world program

Experimental infrastructure All experiments were conducted on a server equipped with 2 Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz (72 threads in total), running Linux Ubuntu 18.04.3 LTS AMD64. Besides, the fuzzers were launched in single mode, to avoid the performance fluctuation caused by the different parallel modes in various fuzzers [16, 25].

Research questions The following experiments were designed to answer three research questions:

● RQ1: How well is the bug detection of

Pusher

● RQ2: How well is the testing coverage of

Pusher

● RQ3: How well is the execution overhead of

Pusher

4.2 Bug detection on LAVA (RQ1)

In 2016, to address the shortage of ground-truth corpora for vulnerability discovery evaluation, a new technique is developed in paper [23] to produce such corpora, LAVA-1, and LAVA-M. In LAVA-1, there are

69

instances of the

file

program, each of which is injected with a unique bug. Whereas in LAVA-M, there are four programs, each of which is injected with more than one bug. In terms of the bug discovery evaluation, LAVA-M is an appropriate benchmark database in a long run.

Both the authors and users of LAVA-M have arranged five hours of testing for bug detection. Keeping up with this standard, we also take

Pusher

on LAVA-M for five hours and compare its results with other fuzzers.

In particular, each bug in LAVA-M owns a unique ID. That is realized in

l a v a_g e t

function as Fig. 4 shows. Once any of the two comparisons at line 2 is satisfied, the input would trigger the crash, and the program outputs a

b u g_n u m

as the ID of the bug. This mechanism can be leveraged for crash distinguishing, and the number of unique bugs can be counted easily.

Fig.7 Each bug owns a unique ID

Full size|PPT slide

After five hours of testing, the number of detected bugs by

Pusher

is shown in the last column in Table 3. Besides, the second column shows the number of inserted bugs according to the provided document (actually, this number is incomplete, some inserted bug are not mentioned [14]). The other columns summary the evaluation results of other fuzzers.

Tab.3 The number of detected bugs on LAVA-M database

Program	Listed bugs	FUZZER	SES	AFL	AFL-lafintel	Angora	$Pusher$
base64	44	7	9	9	28	44+4	44+4
md5sum	57	2	0	0	0	57	57+4
uniq	28	7	0	0	24	28+1	28+1
who	2136	0	18	1	2	1443+98	1847+205

The results indicate that

Pusher

found 44 listed bugs and 4 unlisted bugs in

base64

, 57+4 bugs in

md5sum

, 28+1 bugs in

uniq

. and 1847+205 bugs in

who

. The unlisted bug IDs are acquired by running the crashes and collecting from their outputs. Overall,

Pusher

is the best among the 10 compared fuzzers on the bug discovery.

We invastigated why some bugs cannot be detected in the who program. The reason we found is that the inserted codes are too concentrated. For example, all the codes from line 743 to line 1353 (610 lines) in readutmp.c file are inserted by LAVA techniques, more importantly, the bugs among these codes are connected to the same four input bytes, such nested conditional branches make it difficult to find all the bugs. However, this scenario is not universal in the realworld program, so missing some bugs in the who program is acceptable.

Furthermore,

Pusher

is built on top of AFL by introducing a search-based method to handle the complex path constraints. The results show that

Pusher

outperforms AFL on the bug discovery, it is proof that our proposed approach is effective in solving more path constraints.

4.3 Bug detection on CGC (RQ1)

In 2016, the Defense Advanced Research Projects Agency (DARPA) held a Cyber Grand Challenge (CGC), and finally released a benchmark database for vulnerability discovery evaluation, which has been wildly used, such as Table 2 shows.

However, the original CGC binaries can only run under the DARPA Experimental Cyber Research Evaluation Environment (DECREE), which is a customized system. Due to the instrumentation dependency,

Pusher

cannot run on the DECREE system so far. Thanks to the work of Trail of Bits, a migration version is created in the Linux system. Therefore, we take

Pusher

on these migrated binaries for evaluation.

Though the sizes of these binaries are small, the vulnerabilities in which are quite diverse. We have tested these binaries and get some observations.

● I. Because of the migration problems [15] and the infinite loops of reading, some binaries cannot be tested by fuzzers.

● II. In some binaries, the vulnerabilities cannot be detected, whereas the testing coverages are even high. Because covering the related codes is not a guarantee to find these bugs, other detection techniques [26] are required.

● III. Some binaries contain some special paths guarded by complex condition predicates, like magic number comparison. Once these comparisons are penetrated, the testing coverage rises a lot, and the possibility of finding bugs also increases.

Because

Pusher

is designed to handle the comparisons, i.e., it is designed to address the binaries with No.III observations. So, we take

Pusher

to test some selected binaries and collected the branch coverage and number of crashes. The results are shown in Table 4 (We failed to build the CGC binaries with Angora).

Tab.4 The experimental results on CGC binaries

Program	Size	AFL		AFLFast	AFL-lafintel		FairFuzz		$Pusher$
CGC_File_System	88 KB	7.2%	0	7.2%	59.5%	0	7.2%	0	75.1%	10
CGC_Hangman_Game	5.2 MB	12%	0	12%	12%	0	12%	0	94%	1
CGC_Image_Parser	145 KB	4.6%	0	4.6%	4.6%	0	4.6%	0	60.6%	12
CGC_Video_Format_Parser_and_Viewer	82 KB	12%	0	12%	53.1%	0	12%	0	53.1%	0
CNMP	56 KB	27.5%	0	27.5%	42.5%	26	47.5%	3	80%	30
FASTLANE	52 KB	25.9%	0	17.5%	17.5%	0	27.7%	0	51.2%	19
Gridder	164 KB	84.2%	10	70.3%	82.9%	23	84.2%	28	84.2%	56
Griswold	95 KB	3.8%	0	3.8%	45%	13	3.8%	0	55.3%	27
Barcoder	172 KB	59%	0	59%	57.2%	0	59%	0	68%	25

The results indicate that on all 9 binaries,

Pusher

achieves a higher branch coverage than other fuzzers. For example, it achieves 94% branch coverage in CGC_Hangman_Game which is 7.8× than other tools. This is because there is a simple check

strcmp (inbuf, “HANGEMHIGH!”)

in the begining of main function, which prevents the other tools from testing further. However, by leveraging GA searching method,

Pusher

can quickly pass through this check to uncover more code areas. Besides,

Pusher

can find more bugs on these binaries (a total number of 180 crashes in 7 binaries).

Take

Barcoder

as an example, when it tries to create a barcode from the input bitmap file, it firstly invokes function

validate_bmp

headers

to check if the input file as a correct header. Only the input file that passes this check can be further processed. The check function is shown in Fig. 5. To pass this function, the input file needs to satisfy many validation checks such as magic number, file size, image size, etc. Fuzzers like AFL, AFL-lafintel and and Fairfuzz has no knowledge about the input-branch relationship, thus they cannot bypass the complex checks. Whereas, by extracting the target values when compiling and leveraging heuristic search method to solve them,

Pusher

can quickly generate a valid bitmap file to satisfy all the checks in this function.

**Fig.8 The validate_bmp_headers function implementation in the Barcoder program**

Full size|PPT slide

4.4 Bug detection on real-world programs (RQ1)

We selected several real-world programs to evaluate the ability of unknown bug discovery of our system. The input types of this dataset cover ELF binary, image, and packet capture. To demonstrate the efficiency of our system, we also compared our results with AFL, AFL-lafintel, FairFuzz, and Angora. All the experiments are lasted for 72 hours to ensure soundness. The initial seeds are equal and from fuzz-data [27].

The testing result is shown in Table 5. We can see that

Pusher

outperforms all the other fuzzers in these four real-world programs from the tabe. Especially in

jhead

and

tcpdump

Pusher

can successfully uncover bugs while other fuzzers like AFL, AFL-lafintel, and Fairfuzz cannot. We will illustrate the reason for the performance improvment by checking the testing coverage in Section 4.5.

Tab.5 The number of detected bugs on real-world programs

Program	AFL	AFL-lafintel	FairFuzz	Angora	$Pusher$
jhead	0	0	0	16	114
nm	77	69	27	3	83
objdump	5	22	29	34	51
tcpdump	0	0	0	0	3
total	82	91	56	53	251

In conclusion,

Pusher

works well on the bug detection, especially the bugs guarded by the complex condition constraints.

4.5 Testing coverage on real-world program (RQ2)

Since it is hard to predict where a bug may appear in the software, maximizing the testing coverage is usually chosen as a proxy of increasing the probability of finding bugs. Therefore, we make an deep investigation of the testing coverage (i.e., branch coverage, line coverage, and function coverage) to illustate the bug detection performance improvment in Table 5. The results are shown in Table 6.

Tab.6 The final testing coverage after 72 hours of testing

Program	AFL			AFLFast			AFL-lafintel			Fairfuzz			Angora			$Pusher$
Program	BC	LC	FC	BC	LC	FC	BC	LC	FC	BC	LC	FC	BC	LC	FC	BC	LC	FC
jhead	213	313	18	213	313	18	213	313	18	213	313	18	479	687	26	568	714	26
jhead				0%	0%	0%	0%	0%	0%	0%	0%	0%	+125%	+119%	+44%	+167%	+128%	+44%
nm	4397	7097	391	4342	6993	390	4046	6856	405	4468	7230	392	4002	6679	404	4710	7851	433
nm				−1%	−1%	0%	−8%	−3%	+4%	+2%	+2%	0%	−9%	−6%	+3%	+7%	+11%	+11%
objdump	2602	4375	273	2378	3949	258	2694	4990	309	2595	4426	278	3337	6329	358	3737	7034	372
objdump				−9%	−10%	−5%	+4%	+14%	+13%	0%	+1%	+2%	+28%	+45%	+31%	+44%	+61%	+36%
tcpdump	15474	23452	706	13501	20645	682	12791	20475	698	16264	24530	753	2482	4834	269	18754	27861	846
tcpdump				−13%	−12%	−3%	−17%	−13%	−1%	+5%	+5%	+7%	−84%	−81%	−62%	+21%	+19%	+20

Note: BC is the branch coverage number. LC is the line coverage number. FC is the function coverage number

After the execution, Table 6 shows the final testing coverage results, and Fig. 6 shows the increasing process of the line coverage along with time. From an overall perspective, it is concluded that

Pusher

can achieve the highest testing coverage among the six compared fuzzers on the four programs. Besides, although

Pusher

is developed on the ground of AFL, it can achieve an 11% to 128% increase on the line coverage than the original AFL.

Fig.9 The increasing process of branch coverage number in the 72 hours. (a) jhead; (b) nm; (c) objdump; (d) tcpdump

Full size|PPT slide

After an adequate test (72 hours),

Pusher

behaves better than AFLFast and AFL-lafintel. That is because both AFLFast and AFL-lafintel still leverage the random mutation to produce new inputs, which is powerless to penetrate complex paths. Whereas AFLFast introduces a power-based schedule, and AFL-lafintel exploits the program-transformation to help the testing, their improvements would become weak after a long time testing. On the contrary,

Pusher

is aware of the connection between the inputs and the comparisons, so that it successfully achieves a higher testing coverage.

Fairfuzz and Angora perform better than AFL in all these four programs except tcpdump, that is because they present related techniques to augment the mutation capacity. In Fairfuzz, it leverages the statistical results of execution to direct mutation, and Angora adopts the DataFlowSanitizer to do so. In comparison, it can be seen that

Pusher

acts better than these two fuzzers. That is because the statistical results in Fairfuzz are not absolutely accurate in the individual case, and the external library description requirements in Angora can also bring inaccuracy. Especially in tcpdump, since the input function like

fread

is implemented in external library libpcap that we are not interested in, we patched tcpdump so that taint data can be introduced.

In contrast,

Pusher

is easily deployed to test many programs, and it can conduct the input searching based on an accurate connection between the inputs and the comparisons. Therefore, it can achieve a higher testing coverage.

Case study An interesting sample is the testing on jhead program. In that case, AFL, AFLFast, AFL-lafintel, and Fairfuzz reach their top coverage at the very beginning, no new coverage is discovered in the following 72 hours. On the contrary,

Pusher

and Angora are continually exercising new codes during the testing.

More specifically, this difference is due to a magic number check. Figure 7 shows the related code snippet, it reads some data and compares it to the “Exif” string, if satisfied, the

process_EXIF()

function would be invoked, otherwise, the program quits quickly.

**Fig.10 The code snippet in the jhead program**

Full size|PPT slide

If this comparison cannot be solved in the testing, the program would end directly, and the fuzzer cannot discover new codes. Because AFL, AFLFast, and AFL-lafintel have no idea about which input bytes are connected to this comparison, they have to conduct the searching in an enormous space, and fail at last. The strategy in Fairfuzz is invalid due to the inadequate statistical information, it also fails to penetrate this comparison.

Contrariwise, both

Pusher

and Angora are aware of which input bytes are connected to this comparison to direct the mutation to produce the expected inputs quickly. Even so,

Pusher

still achieves a higher coverage than Angora.

In conclusion,

Pusher

improves the testing coverage.

4.6 Execution overhead evaluation (RQ3)

Execution overhead is an essential metric for vulnerability detection tools, and it seriously influences their application on the large software, i.e., symbolic execution is limited partly because of its serious overhead. Therefore, we analyze the execution overhead of

Pusher

in this section.

In the fuzzers, the execution overhead mainly comes from two aspects: instrumentation and analysis. For example, the execution overhead in AFL consists of the coverage collection and the testing schedule, as Table 7 shows. Developed on AFL,

Pusher

additionally increases the overhead by the comparison operands collection, the connection inference and the GA based searching, which is also shown in Table 7.

Tab.7 The overhead in AFL and $Pusher$

	Overhead	AFL	$Pusher$
Instrumentation	coverage collection	√	√
Instrumentation	comparison operands collection		√
Analysis	testing schedule	√	√
	connection inference		√
	GA based searching		√

We utilize the total execution number after thorough testing as an indicator of the execution overhead of the fuzzers, i.e., in a fixed time, a fewer execution number denotes a more significant overhead. Because

Pusher

is based on AFL, we only compare

Pusher

with AFL and its variants (Angora is not taken into comparison) to avoid the affection of other unrelated factors.

The execution number results are shown in Fig. 8. Taken the execution number of AFL as the baseline, it can be seen that

Pusher

achieves a fewer number, it only executes 39% execution of that in AFL on jhead, 96% on nm, 80% on objdump, and 63% on tcpdump. Nevertheless, the execution number of

Pusher

is almost on the same order of magnitude as that in AFL. Moreover, the execution numbers of the AFL variant fuzzers are also on the same order of magnitude as that in AFL, and they are regarded as light techniques. Therefore, the overhead in

Pusher

is acceptable, and it belongs to the lightweight techniques.

Fig.11 The final execution numbers. (a) jhead; (b) nm; (c) objdump; (d) tcpdump

Full size|PPT slide

We also carry out in-depth statistics about the execution numbers by the coverage-based testing and the search-based testing (consisting of the connection inference and the GA based searching), in an attempt to understand the execution distribution between these two stages.

For clarity, the execution numbers are transformed into the normalized rate, as Fig. 9 shows. From the results, it can be seen that the execution distribution between the two testing stages is reasonable.

Fig.12 The execution distribution in $Pusher$ between coverage-based testing and search-based testing

Full size|PPT slide

For example, on the

jhead

program, the search-based testing only occupies 3% of the total execution, because the amount of the inputs in SeedPool is small, the search-based component can quickly accomplish its work, and wait for new inputs generated by the coverage based testing. Since

tcpdump

contains lots of

swith-case

statements that affected by input file in source code (as shown in Fig. 10),

Pusher

spends more time on overcoming such cases by search-based testing which occupies 46% quantity of the total execution. Whereas, the introduced overhead is within an acceptable scope, and the testing coverage is improved as a result.

Fig.13 The code coverage of tcpdump for vanilla AFL (left part) and $Pusher$ (right part)

Full size|PPT slide

Moreover, we further take a detailed statistic about the execution distribution between the connection inference and the GA based searching. The results in Table 8 indicate that the connection inference causes the primary execution overhead in the search-based testing. For example, the execution number of connection inference on the

tcpdump

binary is 81659k, whereas the number of GA based searching is just 1016k, which is almost negligible.

Tab.8 The execution distribution in $Pusher$ between connection inference and GA searching

Program	connection inference/#	GA based searching/#
jhead	4748k	41k
nm	8352k	1720k
objdump	8842k	969k
tcpdump	81659k	1016k

As far as we know, the other connection inference techniques, like taint analysis, symbolic execution, hold on heavier burdens than our approach when applying to specific program structures such as symbolic loops/implicit data flow that commoly exist in real world software. Whereas our approach still belongs to the lightweight techniques and more adaptive for complex program structures. In other words, our method is insensitive to the scale of target program (i.e., lines of code), the overhead and efficiency only can be affected by the quantity and complexity of branch instructions within the target program.

In addition, the execution distribution is imbalanced between the connection inference and GA based searching, it provides an explanation that why our researching emphasis is on the connection inference, and just select one off-the-shelf modern search algorithm as a representative to handle the comparisons.

In conclusion, the execution overhead introduced by our approach in

Pusher

is within an acceptable scope.

5 Related Work

5.1 Input aware fuzzing

Vuzzer [12] proposed an application-aware evolutionary fuzzing strategy that does not require any prior knowledge of the application or input format. Instead, it leverages control and data flow features based on static and dynamic analysis to infer the critical bytes of comparison instructions to maximize coverage and explore deeper paths. However, even though the critical bytes are located, their mutation strategies are still based on random mutations. To relieve the overhead introduced by taint analysis of Vuzzer, Steelix [15] presented a program-state based fuzzing method. It leverages light-weight static analysis and binary instrumentation to provide coverage information and comparison progress information to help fuzzer to locate the magic bytes. However, its muation methods still belong to random strategies. Angora [14] leverages the similar searching based muation method (i.e., gradient descent) to our GA method, which can improve the quality of generated testcase. However, as demonstrated in [28], Angora uses the expensive taint tracking step only sparsely to overcome hard-to-solve conditions bringing more overhead. What’s worse, one has to make a compromise between precision and usability. This is because, to successfully build a target program with Angora’s compiler, one has to ignore the taint propagation in many libraries [14]. Similar to our method, redqueen [28] implements a light weight “taint analysis” to locate the bytes related to magic bytes or checksums. For magic bytes cases, redqueen employs several specific mutation strategies to overcome them. For checksum cases, it uses the “patch-solve” method to overcome them, which introduces the high-weight symbolic execution. However, our method uses light-weight searching method to solve the constraints which is considered to be more scalable.

5.2 Hybrid fuzz testing

Hybrid fuzz testing uses concolic execution to discover frontier nodes that represent unique paths in the program. For example, Driller is an up-to-date hybrid testing tool that leverages fuzz testing and concolic execution in a complementary manner to find deeper bugs [6]. It is more practical than previous hybrid tools. DeepFuzz [29] defined a novel method to assign probabilities to program paths to maximize both the execution path depth and the degree of freedom in input parameters for exploitation. SAFL leverages symbolic analysis to generate high-quality seed inputs for the fuzzer, and prioritizing the fuzzing towards seeds which have exercised rarely executed branches to help the fuzzing process to exercise rare and deep paths with higher probability [30]. DigFuzz designs a novel Monte Carlo based probabilistic path prioritization model to quantify each path’s difficulty and prioritize them for concolic execution which improves the number of bug detected and the code coverage [7]. Intriguer proposed a field-level constraint solving, which optimizes symbolic execution with field-level knowledge. It performs instruction-level taint analysis and records execution traces without data transfer instructions like

mov

. Then reduces the execution traces for tainted instructions that accessed a wide range of input bytes, and infers input fields to build field transition trees [31].

5.3 Machine learning based fuzzing

Gong et al. trained a deep learning model based on samples generated by AFL that caused the program state to be changed and the program state to be unchanged [32]. Whether the sample generated by the new round of AFL could change the program state can be predicted by the trained model. Therefore, AFL cannot execute samples that cannot generate new states, improving the efficiency of the fuzzer. NeuFuzz [33] learns the known vulnerability programs and hidden vulnerability patterns in the sample through LSTM to discover the execution path containing the vulnerability. Then, NeuFuzz preferentially executes the seed files that can cover the path containing the vulnerability, and assigns more mutation energy to these seed files based on the prediction results. NEUZZ proposed a novel program smoothing technique using surrogate neural network models that can incrementally learn smooth approximations of a complex, real-world program’s branching behaviors to improve the efficiency of fuzzing [34].

5.4 Checksum bypass

Taintscope is the first method that is aware of checksums like CRC and can bypass them automatically [35]. It uses heuristic method to locate the checksum point and then replaces the checksum part in the input with the calculated checksum value to bypass the check. However, this method may fail when dealing with the example in Fig. 1. Taintscope bypasses checksum by adjusting the checksum part in the input, thus this method fails in our cases in Fig. 1 since

magic

and

FORM

are both constant values. Taintscope can bypass the path constraint like

f (x) = c

where

c

is a constant value with the help of constraint solvers, but that method is regarded as a kind of hybrid fuzz testing method. CAFA presents a approach based on taint analysis to identify the checksum point automatically. Then the application is statically patched in an antilogical manner at the checksum point to bypass the checksum verification [36].

6 Conclusion

This paper presents a new prototype by addressing two challenges in random mutation used in the coverage-based fuzzing. The two challenges are the sizable searching space and the lack of a useful feedback information. In our design, a novel instrumentation method is utilized to collect the comparison operands, which is then used to infer the connection between the input contents and the comparison operands. In that way, the engine is aware of which input byte is pivotal to penetrate which comparison. Furthermore, adopting the distance between the two operands as feedback, we employed the genetic algorithm as a representative to search for the expected inputs, and to penetrate the complex constraints. We developed a new tool, and evaluated it on the LAVA-M, CGC binaries, and four real-world programs. The experimental results demonstrate that our approach improves the bug detection and testing coverage, whereas its introduced execution overhead is within an acceptable scope, ensuring its practicability on lots of large programs.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61702540) and Hunan Provincial Natural Science Foundation of China (2018jj3615).

References

Publishing order | Descend order by publishing year | Descend order by cited within

1	Miller B P , Fredriksen L , So B . An empirical study of the reliability of UNIX utilities. Communications of the ACM, 1990, 33( 12): 32– 44

2	Liang H , Pei X , Jia X , Shen W , Zhang J . Fuzzing: state of the art. IEEE Transactions on Reliability, 2018, 67( 3): 1199– 1218

3	Serebryany K. Continuous fuzzing with libfuzzer and addresssanitizer. In: Proceedings of IEEE Cybersecurity Development. 2016, 157– 157

4	Gan S, Zhang C, Qin X, Tu X, Li K, Pei Z, Chen Z. CollAFL: path sensitive fuzzing. In: Proceedings of IEEE Symposium on Security and Privacy. 2018, 679−696

5	Demoura L, Bjørner N. Z3: An efficient SMT solver. In: Proceedings of Tools and Algorithms for the Construction and Analysis of Systems. 2008, 337– 340

6	Stephens N, Grosen J, Salls C, Dutcher A, Wang R, Corbetta J, Shoshitaishvili Y, Kruegel C, Vigna G. Driller: augmenting fuzzing through selective symbolic execution. In: Proceedings of Network and Distributed System Security Symposium. 2016

7	Zhao L, Duan Y, Yin H, Xuan J. Send hardest problems my way: probabilistic path prioritization for hybrid fuzzing. In: Proceedings 2019 Network and Distributed System Security Symposium. 2019

8	Pak B S. Hybrid fuzz testing: discovering software bugs via fuzzing and symbolic execution. PhD thesis, Carnegie Mellon University Pittsburgh, PA, 2012

9	Baldoni R , Coppa E , Doelia D C , Demetrescu C , Finocchi I . A survey of symbolic execution techniques. Journal of ACM Computer Survey, 2018, 51( 3): 1– 39

10	Peng H, Shoshitaishvili Y, Payer M. T-Fuzz: fuzzing by program transformation. In: Proceedings of IEEE Symposium on Security and Privacy. 2018, 697– 710

11	Newsome J, Song D X. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In: Proceedings of the Network and Distributed System Security Symposium. 2005

12	Rawat S, Jain V, Kumar A, Cojocar L, Giuffrida C, Bos H. VUzzer: application-aware evolutionary fuzzing. In: Proceedings of Network and Distributed System Security Symposium. 2017

13	Dowser: A guided fuzzer to find buffer overflow vulnerabilities. In: Proceedings of the USENIX Security Symposium

14	Chen P, Chen H. Angora: efficient fuzzing by principled search. In: Proceedings of IEEE Symposium on Security and Privacy. 2018, 711– 725

15	Li Y, Chen B, Chandramohan M, Lin S W, Liu Y, Tiu A. Steelix: program-state based binary fuzzing. In: Proceedings of the Joint Meeting on Foundations of Software Engineering. 2017, 627– 637

16	Ye J , Zhang B , Li R , Feng C , Tang C . Program state sensitive parallel fuzzing for real world software. IEEE Access, 2019, 7 : 42557– 42564

17	Böhme M, Pham V T, Roychoudhury A. Coveragebased greybox fuzzing As Markov chain. In: Proceedings of ACM SIGSAC Conference on Computer and Communications Security. 2016, 1032−1043

18	Lemieux C, Sen K. FairFuzz: a targeted mutation strategy for increasing greybox fuzz testing coverage. In: Proceedings of ACM/IEEE International Conference on Automated Software Engineering. 2018, 475– 485

19	Dave M, Agrawal R. Search based techniques and mutation analysis in automatic test case generation: a survey. In: Proceedings of IEEE International Advance Computing Conference. 2015, 795– 799

20	Harman M, Jia Y, Zhang Y. Achievements, open problems and challenges for search based software testing. In: Proceedings of IEEE International Conference on Software Testing, Verification and Validation. 2015, 1– 12

21	Fraser G, Arcuri A. EvoSuite: automatic test suite generation for object-oriented software. In: Proceedings of ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 2011, 416– 419

22	Rowe J E. Genetic algorithm theory. In: Proceedings of Conference Companion on Genetic and Evolutionary Computation. 2007, 3585

23	Dolan-Gavitt B, Hulin P, Kirda E, Leek T, Mambretti A, Robertson W, Ulrich F, Whelan R. LAVA: large-scale automated vulnerability addition. In: Proceedings of IEEE Symposium on Security and Privacy. 2016, 110–121

24	Lattner C, Adve V. LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of IEEE International Symposium on Code Generation and Optimization. 2004, 75– 86

25	Liang J, Jiang Y, Chen Y, Wang M, Zhou C, Sun J. PAFL: extend fuzzing optimizations of single mode to industrial parallel mode. In: Proceedings of ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2018, 809– 814

26	Serebryany K, Bruening D, Potapenko A, Vyukov D. AddressSanitizer: a fast address sanity checker. In: Proceedings of USENIX Annual Technical Conference. 2012, 309– 318

27	Security M. fuzzdata: fuzzing resources for feeding various fuzzers with input. Mozilla Security, December 2017

28	Aschermann C, Schumilo S, Blazytko T, Gawlik R, Holz T. REDQUEEN: fuzzing with input-to-state correspondence. In: Proceedings of Annual Network and Distributed System Security Symposium. 2019

29	Böttinger K, Eckert C. Deepfuzz: triggering vulnerabilities deeply hidden in binaries. In: Proceedings of International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. 2016, 25– 34

30	Wang M, Liang J, Chen Y, Jiang Y, Jiao X, Liu H, Zhao X, Sun J. Safl: increasing and accelerating testing coverage with symbolic execution and guided fuzzing. In: Proceedings of International Conference on Software Engineering: Companion Proceeedings. 2018

31	Cho M, Kim S, Kwon T. Intriguer: field-level constraint solving for hybrid fuzzing. In: Proceedings of ACM SIGSAC Conference on Computer and Communications Security. 2019, 515– 530

32	Gong W, Zhang G, Zhou X. Learn to accelerate identifying new test cases in fuzzing. In: Proceeding of Security, Privacy, and Anonymity in Computation, Communication, and Storage. 2017, 298– 307

33	Wang Y , Wu Z , Wei Q , Wang Q . Neufuzz: efficient fuzzing with deep neural network. IEEE Access, 2019, 7 : 36340– 36352

34	She D, Pei K, Epstein D, Yang J, Ray B, Jana S. NEUZZ: efficient fuzzing with neural program smoothing. In: Proceedings of IEEE Symposium on Security and Privacy. 2019, 803– 817

35	Wang T, Wei T, Gu G, Zou W. Taintscope: a checksumaware directed fuzzing tool for automatic software vulnerability detection. In: Proceedings of IEEE Symposium on Security and Privacy. 2010, 497– 512

36	Liu X , Wei Q , Wang Q , Zhao Z , Yin Z . Cafa: a checksum-aware fuzzing assistant tool for coverage improvement. Journal of Security and Communication Networks, 2018, 2018 : 1– 13

Options

Outlines

About the journal

Browse

Authors & reviewers

Abstract

Cite this article

1 Introduction

2 Overview

2.1 Motivation example

Fig.1 An example containing two representative path constraints at lines 8 and 19

2.2 The framework of our approach

Fig.2 The workflow of our approach

3 Technology

3.1 Connection instance collection

Tab.1 The reconstruction of the cmp instructions

Fig.3

3.2 Connection inference

Fig.4 An example to show the process of the connection inference

Fig.5

3.3 Connection based searching

Fig.6

4 Implementation and evaluation

4.1 Evaluation setup

Tab.2 The evaluation experiments in other fuzzers

4.2 Bug detection on LAVA (RQ1)

Fig.7 Each bug owns a unique ID

Tab.3 The number of detected bugs on LAVA-M database

4.3 Bug detection on CGC (RQ1)

Tab.4 The experimental results on CGC binaries

Fig.8 The validate_bmp_headers function implementation in the Barcoder program

4.4 Bug detection on real-world programs (RQ1)

Tab.5 The number of detected bugs on real-world programs

4.5 Testing coverage on real-world program (RQ2)

Tab.6 The final testing coverage after 72 hours of testing

Fig.9 The increasing process of branch coverage number in the 72 hours. (a) jhead; (b) nm; (c) objdump; (d) tcpdump

Fig.10 The code snippet in the jhead program

4.6 Execution overhead evaluation (RQ3)

Tab.7 The overhead in AFL and Pusher

Fig.11 The final execution numbers. (a) jhead; (b) nm; (c) objdump; (d) tcpdump

Fig.12 The execution distribution in Pusher between coverage-based testing and search-based testing

Fig.13 The code coverage of tcpdump for vanilla AFL (left part) and Pusher (right part)

Tab.8 The execution distribution in Pusher between connection inference and GA searching

5 Related Work

5.1 Input aware fuzzing

5.2 Hybrid fuzz testing

5.3 Machine learning based fuzzing

5.4 Checksum bypass

6 Conclusion

Acknowledgements

References

Tab.1 The reconstruction of the $cmp$ instructions

**Fig.8 The validate_bmp_headers function implementation in the Barcoder program**

**Fig.10 The code snippet in the jhead program**

Tab.7 The overhead in AFL and $Pusher$

Fig.12 The execution distribution in $Pusher$ between coverage-based testing and search-based testing

Fig.13 The code coverage of tcpdump for vanilla AFL (left part) and $Pusher$ (right part)

Tab.8 The execution distribution in $Pusher$ between connection inference and GA searching