Currently, I am experimenting with a Newton-Raphson convergent fractal plotter I wrote in C++. I came up with the initial program about 2 weeks ago, and at that time, it was very sluggish. It took about 24 seconds to calculate one full HD (1920×1080) bitmap image.
I was looking to generate a 1000-frame video as the roots move in a circle of unit radius. With my initial program, rendering of frames would’ve taken 1000×24 = 24000 seconds. 6 hours and 40 minutes. I knew I could do better.
I optimized the program in all possible ways I could think of. Some of what I did:
- Implemented a simple complex number class myself (no pun intended).
- Program writes pixel to file as soon as it is calculated instead of saving data to a vector<vector<vector<char> > > and writing all pixels after frame has been calculated.
- If this were a chemical reaction, then calculating would be the rate determining step. Previously, the program calculated each term of the derivative of polynomial and added those together, followed by the same for the functional value. Now program uses the fact that if . This gives a much needed speed boost.
- Many more small optimizations in code structure.
After rewriting major chunks of the program, I achieved more than a 10x speed boost. Now, the program could compute and write each frame in 2.2 seconds, and used much less memory (770 KB compared to 10 MB). You can find the program here: https://github.com/KeshavGuptaIndia/updated_newton_fractals
I knew I could do better. Individual frames could easily be calculated in parallel. Since I had never experimented with multi-threading in my programs, I decided this is a good opportunity to give multithreading a try. The simplest way to do this was using the STL multithread implementation – std::thread. Here are some of the results I obtained with my i5-4430 (Quad Core @ 3GHz):
Single Thread CPU Usage – ~33%
CPU Usage with 2 threads – ~60%
CPU is maxed out with anything more than 4 threads.
Since I do not have a CPU Cooler other than the stock cooler, I run into thermal throttling issues pretty soon. So in practice, the max number of threads I use is 2.
Finally, with 4 threads, I get about 0.61 seconds per image, about 40 times faster than my first program!
I have now uploaded the 1000 frame video to my YouTube channel: https://youtu.be/JhOEInkzCAY
Stay tuned! 🙂