Minesweeper Bot

Minesweeper is a game that is played on a grid, behind which hide a certain number of “mines”. The objective is to click no all squares except the mines. If a square does not contain a mine, clicking on it reveals the number of mines in the adjacent 8 squares, but clicking on a mine results in a loss right away.

Detailed rules and strategies can be found right here: minesweeper.info

My bot uses three basic techniques to play minesweeper:’

(code can be found here: GitHub)

  1. If the number on a square is equal to the number of uncovered squares around it, all of them must be mines.
  2. If a square already has mines around it equal to the number on it, the remaining uncovered squares around it must be safe.
  3. If the above two rules fail to produce any “safe squares”, it resorts to checking each and every possible combination of mines in the remaining independent boundary pieces (called the Tank Algorithm).

Failing all of that, it click on a random square.

The first two rules are usually enough to win most games in basic and intermediate, whereas the third rule helps in the expert level.

Finally, here’s a video of my bot playing at the expert level:

GPU Computation

First of all, sorry for the long delay in posting.

As I mentioned in Multi-threading for improved render times!, I had an i5-4430 with its stock cooler. I have now updated my rig to a more powerful processor i7-7700K (with liquid cooling) and also installed a GTX 1080.

Calculation for a single pixel is a very simple task and outcome for one pixel cannot affect another pixel’s calculation in any way, which makes such a task ideal for a system with a very high number of cores (such as GPUs, which have hundreds of processing cores). Thus, I wanted to shift the computation to my GPU and see how the CPU fares against it.


I want to keep this short, so here is the code: Github – mandelbrot_gpu (uses CUDA 8.0) Note that I enabled maximum optimization during compilation using -O3.


I used two zoom sequences, 1000 frames each; one of them spans a large area and has mostly empty space, so it is easier to calculate . The other one requires heavy computation since it moves mostly near the circumference of the set.

Platform Seq 1 average per frame Seq 2 average per frame
CPU 486 ms 2158 ms
GPU 15 ms 43 ms


This is the finished video (with a lot of compression artifacts):


The GPU blew the CPU out of the water. Why do people still have CPUs? Why isnt all computation done on GPUs? That is because most tasks cannot be parallelized to this extent. Most tasks require a very elaborate instruction set and sequential execution of code, which is best done by few, high powered cores as in a CPU. Tasks which can be parallelized though (such as ML tasks), are faster and efficient when run on GPUs.

CHIP-8 Emulator – inspired by MITx Computation Structures (6.004)!

I am currently enrolled in MITx’s Computational Structures (6.004) on edX. I have learnt about registers, stacks, procedures, compilers, opcodes etc.

To put my new knowledge to work, I decided to program an emulator. I read about many systems and decided that CHIP-8 would be perfect as my first emulation project. It has just 36 instructions which encompass all features – math, branching, graphics, sound, input. These instructions are themselves easy to program.

CHIP-8 is not actually a computer system. It is a really low level programming language – deals directly with registers, stack pointers, sprite data and RAM – that ran on 1970’s computers such as COSMAC VIP and Telmac 1800. When handheld programmable calculators became powerful enough in the 1990’s, CHIP-8 presented an easy way to quickly program games for them.

So far, I have completed the core work – the emulator can load programs, take inputs and execute them. There is no sound yet – I am working on it. The emulator uses SDL 2.0 for I/O.


Minor bugs left in code.

Screenshot of CONNECT4

Stay tuned for updates! 😀

Thanks to Thomas P. Greene; I used his page as reference.

Multi-threading for improved render times!

Currently, I am experimenting with a Newton-Raphson convergent fractal plotter I wrote in C++. I came up with the initial program about 2 weeks ago, and at that time, it was very sluggish. It took about 24 seconds to calculate one full HD (1920×1080) bitmap image.

I was looking to generate a 1000-frame video as the roots move in a circle of unit radius. With my initial program, rendering of frames would’ve taken 1000×24 = 24000 seconds. 6 hours and 40 minutes. I knew I could do better.

I optimized the program in all possible ways I could think of. Some of what I did:

  • Implemented a simple complex number class myself (no pun intended).
  • Program writes pixel to file as soon as it is calculated instead of saving data to a vector<vector<vector<char> > > and writing all pixels after frame has been calculated.
  • If this were a chemical reaction, then calculating \frac{f'(z)}{f(z)} would be the rate determining step. Previously, the program calculated each term of the derivative of polynomial and added those together, followed by the same for the functional value. Now program uses the fact that \frac{f'(z)}{f(z)} = \frac{\alpha}{z-a} + \frac{\beta}{z-b}... if f(z) = (z-a)^\alpha (z-b)^\beta.... This gives a much needed speed boost.
  • Many more small optimizations in code structure.

After rewriting major chunks of the program, I achieved more than a 10x speed boost. Now, the program could compute and write each frame in 2.2 seconds, and used much less memory (770 KB compared to 10 MB). You can find the program here: https://github.com/KeshavGuptaIndia/updated_newton_fractals

I knew I could do better. Individual frames could easily be calculated in parallel. Since I had never experimented with multi-threading in my programs, I decided this is a good opportunity to give multithreading a try. The simplest way to do this was using the STL multithread implementation – std::thread. Here are some of the results I obtained with my i5-4430 (Quad Core @ 3GHz):



Single Thread CPU Usage

Single Thread CPU Usage – ~33%

Double Thread CPU Usage

CPU Usage with 2 threads – ~60%

Quadruple Thread CPU Usage 2

CPU is maxed out with anything more than 4 threads.

Thermal Throttling Thermal Throttling 2

Since I do not have a CPU Cooler other than the stock cooler, I run into thermal throttling issues pretty soon. So in practice, the max number of threads I use is 2.

Finally, with 4 threads, I get about 0.61 seconds per image, about 40 times faster than my first program!

I have now uploaded the 1000 frame video to my YouTube channel: https://youtu.be/JhOEInkzCAY

Stay tuned! 🙂