If you are reading this, then you are probably familiar with RISC
vs CISC
architectures. In a nutshell, RISC
or “reduced instruction set computer” provides a smaller and simpler set of instructions whereas CISC
or “complex instruction set computer” provides a larger and more complex set of instructions.
In the old days when compilers were not available or not as good as today’s, CISC
architecture definitely had the benefit, since the same program could be written with fewer instructions thus easier to write. These days, however, there isn’t really much of this benefit anymore. In fact, CISC
architecture internally works like RISC
by breaking down the complex instruction into multiple microcodes. On the other hand, RISC
systems also have evolved a long way and grew in the number of instructions and became much more complex. So, these two are no longer as different as they used to be.
Then I wonder, does RISC
still require more instructions to run the same program compared to CISC
? If so, by how much?
It’s time to test with my simple “sort” program written in C
, C++
, and Rust
. You can find the source code here. I have two systems where I can run perf
. The first is on an arm64 architecture, which is a RISC
system. Another is running on x64 architecture, which is a CISC
system. So I am going to compile three versions of “sort” programs and run with the same input file to see what is the number of instructions.
# commands to run on both architectures
clang++ -O3 -std=c++11 -Wall -o sort_c sort_c.cc
clang++ -O3 -std=c++11 -Wall -o sort_cc sort_cc.cc
rustc -C opt-level=3 sort_rust.rs -o sort_rust
perf stat -e instructions,branches,branch-misses,cache-references,cache-misses ./sort_c input.tsv output1.tsv \
&& perf stat -e instructions,branches,branch-misses,cache-references,cache-misses ./sort_cc input.tsv output2.tsv \
&& perf stat -e instructions,branches,branch-misses,cache-references,cache-misses ./sort_rust input.tsv output3.tsv
Here is the result from x64, i.e., CISC
architecture
Performance counter stats for './sort_c input.tsv output1.tsv':
161413800 instructions
33040437 branches
1321273 branch-misses # 4.00% of all branches
5523003 cache-references
1681255 cache-misses # 30.441 % of all cache refs
0.046643009 seconds time elapsed
0.042402000 seconds user
0.004240000 seconds sys
Performance counter stats for './sort_cc input.tsv output2.tsv':
227483933 instructions
50806676 branches
1159631 branch-misses # 2.28% of all branches
5731183 cache-references
1061021 cache-misses # 18.513 % of all cache refs
0.041496133 seconds time elapsed
0.041501000 seconds user
0.000000000 seconds sys
Performance counter stats for './sort_rust input.tsv output3.tsv':
191565250 instructions
34253343 branches
484295 branch-misses # 1.41% of all branches
4320041 cache-references
724238 cache-misses # 16.765 % of all cache refs
0.022297123 seconds time elapsed
0.017839000 seconds user
0.004459000 seconds sys
And below is from arm64, i.e., RISC
architecture
Performance counter stats for './sort_c input.tsv output1.tsv':
248707180 instructions
28395504 branches
2001363 branch-misses # 7.05% of all branches
78965252 cache-references
2736389 cache-misses # 3.465 % of all cache refs
0.514452325 seconds time elapsed
0.367949000 seconds user
0.135981000 seconds sys
Performance counter stats for './sort_cc input.tsv output2.tsv':
284428138 instructions
37117681 branches
2173518 branch-misses # 5.86% of all branches
109696054 cache-references
2133882 cache-misses # 1.945 % of all cache refs
0.504255702 seconds time elapsed
0.385078000 seconds user
0.115126000 seconds sys
Performance counter stats for './sort_rust input.tsv output3.tsv':
220099607 instructions
26378144 branches
1175539 branch-misses # 4.46% of all branches
80712455 cache-references
1654143 cache-misses # 2.049 % of all cache refs
0.357494232 seconds time elapsed
0.275701000 seconds user
0.078771000 seconds sys
So, for C
-implementation, arm64 runs roughly 50% more instructions than the x64 system. For C++
version, arm64 runs 27% more, and for Rust
version, arm64 runs 15% more instructions than x64.
These numbers are quite interesting. My naive hunch is that the C
language has been around for so long, and maybe the compilers are so good at translating it into x64-specific instructions to take full advantage of the CISC
architecture. On the other hand, maybe the newer languages, such as C++
and Rust
tend to use more generic instructions rather than platform-specific instructions.
In any case, the final conclusion is that RISC
these days still requires more instructions than CISC
, although that gap is probably narrowing.