Wisdom of The Crowd: The Ensemble
This post describes a new type of chess engine that combines the power of several other engines--hence the "wisdom of the crowd"
Leave your comments below
The idea of combining minds to defeat strong players in chess in not new. For example, Kasparov versus the World is a correspondence game that took place in 1999 between Gary Kasparov and online players who voted on their favorite move for the black pieces. At each move, the move played for the black player was the candidate move that received the most votes from online voters.
From Stacked Ensemble to Chess Ensemble
I've tried my hand at creating a new chess engine that follows this same principle. In fact, I got the idea not from the Kasparov vs. The World game, but from my background in machine learning. Through my work in the private sector, I've been called to develop predictive models, sometimes for targets that are difficult to predict. In those difficult cases, I use what's called stacked ensembles: Several predictive models are developed, and each of their predictions are combined into a final prediction. Authors have shown mathematically (e.g., van der Laan, Polley, & Hubbard, 2007) that an ensemble of models is at least as good as any of the single models making up the ensemble, provided the number of predictions (here, the number of chess moves) is large enough.
To attempt to create a new chess engine that could beat Stockfish (currently one of the best chess engines in the world), I borrowed this principle of ensembles and applied it to chess moves. In other words, I conceptualized the choice of chess moves as a prediction problem, where a series of engines are trying to guess (read: predict) the continuation of the game.
So the idea is simple: At each move, I ask a series of chess engines to choose a move, then I use a majority vote (or any other decision-making algorithm) to make a final decision on which move to make.
The Ensemble combines the moves of anonymous engines
Match Against Stockfish 10
To test out the performance of the Ensemble, I pitted it against Stockfish 10 (the latest stable release at the time of writing), which is one of the strongest engines in the world. The engines played 10 games in a row, with a 10-ply depth limit for all engines (more on this limit below). Stockfish played white in every game, and the Ensemble played black.
At each move, I asked the input of 8 different engines: Komodo, DiscoCheck, Andscacs, DeepToga, Texel, Critter, Gull, and RodentII. The final move was a simple majority vote. In case of a tie, the first move in alphabetical order was chosen.
The 10-game result is show below. Amazingly, the Ensemble scored 8.5/10 against Stockfish (8W, 1D, 1L)! The figure also shows the performance of the individual members of the Ensemble against Stockfish, with much variation across engines in individual results against Stockfish. Nonetheless, when put together, the engines defeat Stockfish hands down--that is the wisdom of the crowd.
Variations to The Methodology
I tried some variations to the methodology and composition of the Ensemble. In case of a tie, I also tried weighting the different votes by each engine's rating rather than using the alphabetical order of candidate moves; overall, the performance of the Ensemble was rather unchanged. I also toyed with the set of engines that composed the Ensemble, and again, I observed good performance against Stockfish.
Perhaps more interestingly, I tried increasing the max-ply depth limit to 20 plies for all engines, and even though the games took much longer to run, the Ensemble won every single game it played against Stockfish in those conditions (I ran a total of three games if I recall correctly).
Relationship With AlphaZero (And Other Neural Networks)
You might have heard of AlphaZero, the new kid on the block when it comes to chess engines. It uses neural networks to implement a type of machine learning algorithm called reinforcement learning, where the engine tries stuff by itself, and is rewarded for good decisions (i.e., wins). AlphaZero, like the other chess engines so far, is alone in its quest to victory: It chooses its own moves at all times. So AlphaZero is original in its approach to choosing moves--it uses neural networks and reinforcement learning--but it is very traditional in the sense that it is still a one-man show. Conversely, the Ensemble is a combination of several engines who decide together what to play. In fact, though I have yet to try this, one of the engines composing the Ensemble could be AlphaZero or one of his little neural-network cousins!
What's Next?
There is much work to be done with the Ensemble still, beyond evaluating its performance at greater depths. Here are some ideas, other suggestions welcome:
Implement Chess960--maybe there will be surprises! (update Jan. 1 2020: first version now working!)
Weight the engines differently at each move (for example, by learning the best weights for each engine using machine learning)
Implement a "tag-teaming" system according to in-game performance (much like in the WWE), which would allow engines in and out of the Ensemble as the game progresses
Parallelize the Ensemble to increase in-game performance in online gameplay (update Dec. 27 2019: this is now done!)
View The Ensemble In Action!
I've successfully implemented an automatic version of this chess engine on lichess, connecting to lichess via their API for Bots using Python. Fittingly, the Bot is called @TheEnsemble. If you'd like to try it out, simply send a challenge to the Bot, or contact me directly!