# Tic-tac-toe: MENACE v/s Human

To develop a MENACE machine that excels in the game of tic-tac-toe. The hashtables of a machine are modeled and trained by playing “n” games against a human/optimal strategy.

Based on the results of the games, make improvements to a model.

**MENACE**:

Creating states for the MENACE machine, as well as providing the same number of initial beats to each of the playable places.

Considering that MENACE is the initial player, after every even move, hunt for the state in the HashTable and pick up the empty state using randomly weighted logic.

By rewarding and subtracting beats from the appropriate states and moves, the model is trained.

**Human: **Human can play optimally or randomly.

As per the algorithm, If the move is chosen randomly, human randomly selects the empty spot.

Or, if chosen optimally, it follows the sequence of following steps:

a. Play for win

b. Block opponent win

c. Fork

d. Block Fork

e. Play center, if empty

f. Play opposite corner

g. Play corners, if empty

h. Play sides, if empty

**Algorithm**:

Creating states by using three integers {0, 1, 2} to generate a nine-digit number. The empty state is “0”, the MENACE move is “1”, and the human move is “2”.We also store a nine-digit number in a HashMap once it is generated.

The structure of HashMap is as follows:

{

“100000002”: [0,9,9,9,9,9,9,9,9,0]

}

The key in the HashMap is the serialized state of the tic-tac-toe game and the value is the empty spots/locations holding respective no. of beats(alpha) in them.

The states are being validated before storing in the HashMap based on the following conditions:

a. As MENACE plays first, therefore there should be the same no. of moves (total no. of 1s and 2s in the state)

b. We consider the total moves less than 8 as those are the only crucial ones. After 7 the chances of winning are very less whereas, the chances of drawing are more.

c. That state should not be in a “win-win” state.

d. Checking if the mirror image is already present in the HashMap

By following all the validations/filtering conditions, the total no. of states obtained is 304(considering MENACE plays first).

Later, when the game begins, We check the state (serialized version) of the game in the HashMap, if it is not present, we check for the mirror images of that particular state in the HashMap.

Once the mirror image is found in the HashMap, we replace/rotate or tic-tac-toe board in such a way that the serialized version of the state is equal to the one found in the HashMap.

To pick a position/spot to play, we again check the state (serialized version) of the game in the HashMap, if it is not present, we check for the mirror images of that particular state in the HashMap. If the game turn is less than 8 then the state is for the sure present in the HashMap and for later chance, a random no. is given the priority.

Once, the state is found in the HashMap we look for its value of it and pick a random index based on the no. of beats(weights). Therefore, a weighted random index algorithm has been used here to pick a random index based on the weights. If the state is not found in the HashMap (for chance more than 7), we pick a random index and play the respective position on the board and move forward.

In the human turn, the algorithm has 2 options to play from. Humans can play in a defined optimal step, or it can play randomly. When it’s human’s turn, we first select which method to play with. The selection is done based on a weighted random algorithm, where the weights are assigned according to the probability. For example, if the probability of human playing optimally is 0.9 the weight associated with it will be 9 and the weight to play randomly will be 1. So, the total weight is 10 and it is adjusted accordingly.

If optimal algorithm is selected, human will follow defined steps which are,

1. Check if human can win after this move. If yes, play that move.

2. Check if menace is going to win in the next set. If yes, block that move

3. Check if human can play fork against menace. If yes, play the fork move

4. Check if menace is playing fork. If yes, block the fork

5. Check if the center of the board is empty. If yes, play center move.

6. Check if a corner opposite to the menace is empty. If yes, play that opposite corner.

7. Check if any other corner is empty. If yes, play corners.

8. Check if any side is empty. If yes, play that side.

If the random algorithm is selected, find all the empty positions on the board. Select one position out of these randomly and play that move.

**Invariants**:

The following are the invariants in the tic-tac-toe game:

1. It follows a class variant, as there are two players in the game, and the no. Xs or Os in each turn is relatable.

E.g., No. of O’s are less than X’s, considering X plays first.

2. To win the game, there are exactly 8 conditions.

a. Combination of 3 in a row — count 3

b. Combination of 3 in a column — count 3

c. Combination of diagonals — count 2

3. As there are two different players in the game, they play alternatively.

4. Maximum no. of moves possible is 9. (Menace play max 4 moves)

5. In case of mirror images and rotation of the board,

The central positions are never affected while check mirror images, as it’s going to be the same on rotation and mirror of the board.

a. Middle row spots can never be at the corner.

b. The corner spots are more affected by rotation.

6. Menace always plays first move.

7. Menace states are modified as per the chosen path.

**Conclusions**:

Based on the graphs, when the Menace plays with the flawlessly playing human, the menace loses in the beginning, but later after around 100 games, Menace learns to block a move to make the game “draw” or “win”.

Based on the Human’s random strategy, Menace produces a positive trend after specific no. of games. The positive trend is, it learns to block a move or tries to win by opting a win-win spot. Whereas, based on the Human’s optimal strategy, the response or chances of wins reduces but eventually (after ~95 games) it learns to block a win-win state.