Speed up C++ — use Python..? Link to heading
Disclaimer: This post is not really about how to speed up C++, but rather one of my frustrations with C++.
OK, maybe the title of this writing is too bold, but this is not a joke. There are real-world occasions where C++ is lagging behind Python by a very large margin.

Above is a benchmark plot, and we can observe that C++ is 5x slower than the others. Here, cpp_py is a C++ program, which calls Python interpreter and delegates the task. OK, maybe Python is just too heavily optimized for this task, but that still does not justify 5x slower performance for C++.
What about # instructions? That’s even worse. C++ requires almost 20x more instructions to run this task! Did someone mention efficient C++?

So what is this task that I tested anyway? Load the entire content of a given file into memory. That’s all. This is one of the most basic but essential functions that any practical language must support.
In Python, we can do this with
with open(file, 'rb') as f:
buf = f.read()
Simple, short, intuitive, easy, sweet, and fast. In Rust, we do
let mut ifs = File::open(file)?;
ifs.read_to_end(&mut buf)?;
Simple, intuitive, easy, and fast. In C++, we can do this with…
std::ifstream ifs{file};
std::vector<char> buf{std::istreambuf_iterator<char>{ifs}, {}};
Complex, long, not-intuitive, difficult, bitter, can-never-remember, slow, and inefficient.
Isn’t C++ all about high-performance? If C++ is slow, why anyone would bother with this complex and difficult language? I know this is specific to iostream library, but hey it is the part of the standard C++. If its standard IO library is poorly designed/implemented, it needs to be fixed.
I am not the only one complaining about iostream. This has been a well-known issue in C++ for a decade. In his talk “Writing Quick Code in C++, Quickly”, Andrei Alexandrescu makes a joke about how efficiency and iostream can’t go along.
What can be done then? Personally, I think a new standard IO library needs to be written for C++ from scratch. iostream is hopeless.
You can find the source code here if you want to run the benchmark yourself.