Binary Synthesis from RISC-V Executables

Shoki HAMANA Nagisa ISHIURA
School of Science and Technology, Kwansei Gakuin University
2-1 Gakuen, Sanda, Hyogo, 669–1337, Japan

Abstract—This paper presents an implementation of a binary synthesizer which converts a given executable binary code of RISC-V into hardware functionally equivalent to a RISC-V core executing the code. A CPU core and an instruction memory are replaced by the synthesized hardware, which reduces execution time and hardware size for small scale programs. A given binary code is disassembled and parsed to build a control dataflow graph (CDFG), then traditional high-level synthesis techniques are applied to generate RT level Verilog HDL. For a small example program consisting of 34 through 160 instructions, synthesized hardware on Xilinx FPGA Artix-7 took about 74.5% less cycles than on RISC-V Rocket core, with smaller number of LUTs.

I. INTRODUCTION

RISC-V [1] is an open-source instruction set architecture. Publicly available RTL source codes and tool chains including compilers, simulators, debuggers, are making it easy to develop hardware and software for embedded systems.

Though the most popular usage of RISC-V cores may be as host processors for high-performance custom hardware engines, they may also be used as main processors for low cost IoT devices. Since RISC-V is originally not intended for high performance processors, instruction extensions or hardware support become options, when a RISC-V core needs more efficiency in computation speed or in power consumption. If the bottleneck is reduced to some core operations, customized instructions will be effective. Otherwise, migration of the loads from software to hardware should be considered.

High-level synthesis [2] or binary synthesis [3, 4, 5] is one of the most prevailing techniques for this purpose, where a program code written in high-level languages or an executable binary code are automatically compiled into a register transfer level hardware model.

Although the performance of the hardware generated by binary synthesis is generally lower than those by high-level synthesis, due to lack of high-level information in the input programs, binary synthesis can handle wider range of software codes. The input programs may contain inline assembly or interrupt handlers in hand written assembly.

This paper presents binary synthesizer that translates a RISC-V binary code into logic synthesizable Verilog HDL description. A whole binary code is synthesized into a hardware module functionally equivalent to the RISC-V core executing the code. It aims at replacing the CPU core and the instruction memory by the hardware and reducing the execution cycles and hardware cost for small scale programs.

II. BINARY SYNTHESIS

Binary synthesis differs from high-level synthesis in the front-end part; a CDFG (control dataflow graph) is constructed from a binary code instead of a program written in a high-level language. Almost all of the back-end techniques are common with high-level synthesis.

The merit of binary synthesis is that it has less restrictions on input behavior specifications. For example, C programs with pointers or complicated control structures can be synthesized into hardware. Binary synthesis can handle programs in handwritten assembly programs or those containing inline assembly codes intended to communicate with hardware. It can even deal with programs containing interrupt handlers [6].

Mittal et al. [3] developed a binary synthesizer to translate DSP binaries into FPGA hardware. It accepted programs in C/C++, Matlab, and Simulink as well as hand-written assembly.

The binary synthesizer developed by Stitt et al. [4] synthesized selected sections of binary codes of MIPS, ARM, and MicroBlaze into coprocessors, or hardware accelerators. ACAP [5] converted MIPS binary codes into hardware. It allows either generation of coprocessors from code sections or compilation of a whole binary code into a hardware module that replaces the CPU and the instruction memory.

However, to the best of our knowledge, there is no binary synthesizer that takes RISC-V binaries.

III. BINARY SYNTHESIS FROM RISC-V EXECUTABLES

The input to our binary synthesizer is a binary executable code for RISC-V. As for the instruction set, we assume RV32IM, base integer instruction set with standard extension for integer multiplication and division. Then, as shown in Fig. 1, a hardware module which is compatible with the CPU executing the binary program is synthesized. The elimination of the instruction memory might be useful in protecting the software from analysis.

Fig. 2 shows the flow of synthesis in our method. Input programs may be written in C language or in assembler language. Inline assembly is also allowed. They are compiled (by gcc) or assembled (by gas) and then linked to a executable binary code (by ld). The front-end of our binary synthesizer scans the disassembled code to generate a CDFG (control dataflow graph), a popular data structure for high-level synthesis. The back-end of the synthesizer takes the CDFG to perform a standard sequence of high-level synthesis tasks to generate a hardware model in Verilog HDL.

Fig. 3 (a) shows how DFG is generated from a RISC-V in-
Table I summarizes the result of an experiment where small C programs were synthesized targeting an FPGA. "#insn" indicates the instruction counts in the programs. "#cycle", "#LUT", "delay" show the execution cycle counts, the LUT counts, and the critical path delays, respectively. The target FPGA was Xilinx Artix-7 (XC7Z020-1CLG400I). "A,M" indicates the numbers of ALUs and multipliers allocated for synthesis. The RISC-V core used in the experiment is generated by the Rocket Chip Generator\(^1\) with the DefaultRV32Config mode. The hardware generated by ACAP-R took 74.5% less cycles on average. The LUT counts of the hardware are supposed to grow in proportional to the instruction counts, but were smaller than the core when the programs were of less than 160 instructions.

### Table I

<table>
<thead>
<tr>
<th>Program</th>
<th>RISC-V #cycle</th>
<th>LUT delay (ns)</th>
<th>ACAP-R #cycle</th>
<th>LUT delay (ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td>fibonacci</td>
<td>1,483</td>
<td>14.7</td>
<td>3.0</td>
<td>1,533</td>
</tr>
<tr>
<td>arith-test</td>
<td>642</td>
<td>17.2</td>
<td>2.1</td>
<td>2,665</td>
</tr>
<tr>
<td>FSM</td>
<td>777</td>
<td>11.6</td>
<td>2.2</td>
<td>2,362</td>
</tr>
</tbody>
</table>


IV. PRELIMINARY EXPERIMENT

A binary synthesizer “ACAP-R” has been implemented based on the method described in the previous section. The front-end for RISC-V has been newly developed and the back-end of ACAP is used as it is.

Table I summarizes the result of an experiment where small C programs were synthesized targeting an FPGA. "#insn" indicates the instruction counts in the programs. "#cycle", "#LUT", "delay" show the execution cycle counts, the LUT counts, and the critical path delays, respectively. The target FPGA was Xilinx Artix-7 (XC7Z020-1CLG400I). “A,M” indicates the numbers of ALUs and multipliers allocated for synthesis. The RISC-V core used in the experiment is generated by the Rocket Chip Generator\(^1\) with the DefaultRV32Config mode. The hardware generated by ACAP-R took 74.5% less cycles on average. The LUT counts of the hardware are supposed to grow in proportional to the instruction counts, but were smaller than the core when the programs were of less than 160 instructions.

V. CONCLUSION

A binary synthesizer ACAP-R which generates hardware from a RISC-V binary code has been presented.

The original version of ACAP has another synthesis mode that compiles specified code segments into coprocessors which is tightly coupled with the CPU core. Our future work includes extending ACAP-R to operate in this mode.

Acknowledgments—Authors would like to express their appreciation to Dr. H. Kanbara (ASTEM/R1), Prof. H. Tomiyama (Ritsumeikan Univ.), and Mr. T. Nakatani (formerly with Ritsumeikan Univ.) for their valuable comments. We would also like to thank to the members of Ishiura Lab. of Kwansei Gakuin Univ. for their cooperation.

REFERENCES