If you are reading this, then you are probably familiar with RISC vs CISC architectures. In a nutshell, RISC or “reduced instruction set computer” provides a smaller and simpler set of instructions whereas CISC or “complex instruction set computer” provides a larger and more complex set of instructions.

In the old days when compilers were not available or not as good as today’s, CISC architecture definitely had the benefit, since the same program could be written with fewer instructions thus easier to write. These days, however, there isn’t really much of this benefit anymore. In fact, CISC architecture internally works like RISC by breaking down the complex instruction into multiple microcodes. On the other hand, RISC systems also have evolved a long way and grew in the number of instructions and became much more complex. So, these two are no longer as different as they used to be.

Then I wonder, does RISC still require more instructions to run the same program compared to CISC? If so, by how much?

It’s time to test with my simple “sort” program written in C, C++, and Rust. You can find the source code here. I have two systems where I can run perf. The first is on an arm64 architecture, which is a RISC system. Another is running on x64 architecture, which is a CISC system. So I am going to compile three versions of “sort” programs and run with the same input file to see what is the number of instructions.

# commands to run on both architectures
clang++ -O3 -std=c++11 -Wall -o sort_c sort_c.cc
clang++ -O3 -std=c++11 -Wall -o sort_cc sort_cc.cc
rustc -C opt-level=3 sort_rust.rs -o sort_rust

perf stat -e instructions,branches,branch-misses,cache-references,cache-misses ./sort_c input.tsv output1.tsv \
 && perf stat -e instructions,branches,branch-misses,cache-references,cache-misses ./sort_cc input.tsv output2.tsv \
 && perf stat -e instructions,branches,branch-misses,cache-references,cache-misses ./sort_rust input.tsv output3.tsv

Here is the result from x64, i.e., CISC architecture

Performance counter stats for './sort_c input.tsv output1.tsv':

        161413800      instructions
        33040437      branches
        1321273      branch-misses             #    4.00% of all branches
        5523003      cache-references
        1681255      cache-misses              #   30.441 % of all cache refs

    0.046643009 seconds time elapsed

    0.042402000 seconds user
    0.004240000 seconds sys



Performance counter stats for './sort_cc input.tsv output2.tsv':

        227483933      instructions
        50806676      branches
        1159631      branch-misses             #    2.28% of all branches
        5731183      cache-references
        1061021      cache-misses              #   18.513 % of all cache refs

    0.041496133 seconds time elapsed

    0.041501000 seconds user
    0.000000000 seconds sys



Performance counter stats for './sort_rust input.tsv output3.tsv':

        191565250      instructions
        34253343      branches
        484295      branch-misses             #    1.41% of all branches
        4320041      cache-references
        724238      cache-misses              #   16.765 % of all cache refs

    0.022297123 seconds time elapsed

    0.017839000 seconds user
    0.004459000 seconds sys

And below is from arm64, i.e., RISC architecture

Performance counter stats for './sort_c input.tsv output1.tsv':

        248707180      instructions
        28395504      branches
        2001363      branch-misses             #    7.05% of all branches
        78965252      cache-references
        2736389      cache-misses              #    3.465 % of all cache refs

    0.514452325 seconds time elapsed

    0.367949000 seconds user
    0.135981000 seconds sys



Performance counter stats for './sort_cc input.tsv output2.tsv':

        284428138      instructions
        37117681      branches
        2173518      branch-misses             #    5.86% of all branches
        109696054      cache-references
        2133882      cache-misses              #    1.945 % of all cache refs

    0.504255702 seconds time elapsed

    0.385078000 seconds user
    0.115126000 seconds sys



Performance counter stats for './sort_rust input.tsv output3.tsv':

        220099607      instructions
        26378144      branches
        1175539      branch-misses             #    4.46% of all branches
        80712455      cache-references
        1654143      cache-misses              #    2.049 % of all cache refs

    0.357494232 seconds time elapsed

    0.275701000 seconds user
    0.078771000 seconds sys

So, for C-implementation, arm64 runs roughly 50% more instructions than the x64 system. For C++ version, arm64 runs 27% more, and for Rust version, arm64 runs 15% more instructions than x64.

These numbers are quite interesting. My naive hunch is that the C language has been around for so long, and maybe the compilers are so good at translating it into x64-specific instructions to take full advantage of the CISC architecture. On the other hand, maybe the newer languages, such as C++ and Rust tend to use more generic instructions rather than platform-specific instructions.

In any case, the final conclusion is that RISC these days still requires more instructions than CISC, although that gap is probably narrowing.