The GPT-2 is a text-generating AI system that has the impressive ability to generate human-like text from minimal prompts. The model generates synthetic text samples to continue an arbitrary text input. It is chameleon-like — it adapts to the style and content of the conditioning text. There are plenty of applications where it has shown success:
- Text generation - GPT-2 is the almighty king of text generation.
- Chatbots. Even the unmodified model understands that text is a dialog, like "Human: ... Bot: ..." and answers after "Bot:".
- Machine translation. The GPT-2 model can learn translations in format "english sentence = french sentence".
- Summarization. The model understands the "TL;DR:" tag at the end of the text as a signal to write a summary.
- Question answering. The GPT-2 can answer questions out-of-the-box not that bad, but for accurate results, it should be fine-tuned on some QnA dataset like SQUAD.
- Generating poetry. GPT-2 models work well for poetry. The quality of the results is limited by sometimes only having access to smaller models and difficulty in running larger models at all.
- Music generation. "Music Modeling" is just like language modeling – just let the model learn music in an unsupervised way, then have it sample outputs. OpenAI has an impressive compositor MuseNet too.
- Image generation. Just as a large transformer model trained in language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples.
- GPT-2 models can be used to spread fake news, fake reviews etc. Just people's ethics prevent it from happening, but who knows for what time...
Despite other more complex NLP models being released, like T5 from Google and GPT-3 from OpenAI, the GPT-2 is still the best NLP model able to run on average hardware.
Some ML researchers tried to evaluate GPT-2 in a very unusual application.
Check out a related article:
To confirm that GPT-2 is a general pattern-recognition program, ML researcher Shawn Presser (@theshawwn) trained GPT-2 to play chess using solely PGN files. Here you can find the progress. The model has shown an ability to recognize known chess patterns.
We liked an idea to evaluate GPT-2 not only on natural language generation, but also on other applications like the chess game. We thought training the model on the current board state is more perspective over training on PGN sequence. In our mind, when the current board state is used, it is not so necessary to have game history to predict the next move.
But in the case with PGN files, the whole history is important. Inspired by Shawn's "Cryochess – GPT-2 1.5B chess engine" trained on PGN notations, and "Programming a Chess Player" by Professor Blank, the following code was written to train GPT-2 play chess using current board state, or FEN.
Here are some useful resources if you are new to Chess:
You should have an ML-ready system with a good GPU, CUDA 10.1, TensorFlow 2, and PyTorch installed.
Check out a related article:
Install python-chess  and aitextgen  modules:
In [ ]: !pip install python-chess !pip install aitextgen !pip install tqdm
1.2. PGN files download
import os if not os.path.exists("pgn"): os.mkdir("pgn")
Download PGN game files to the pgn folder. Some resources with PGN files are:
You can also use SCID to convert SCID databases (*.sg4) to PGN format.
We've used PGN archives with 100,000 games inside for this training. Please note that importing numerous games (like a million) requires a lot of RAM to start training.
2. Training data generation
Our model uses the training file with the current board state and next move on each line, in the following format:
[Result] FEN-position-and-side-only - next_move
[1-0] r1bq3k/ppp2rpp/5b2/3n4/3P4/P4p2/BP1B1PPP/R2QR1K1 w - a2d5 [1-0] 2bQ4/p4kb1/6n1/q1p1p3/1rn1P3/N3BP2/1PP5/2KR2R1 w - a3c4 [0-1] 1r3rk1/p4nbp/1qppb1p1/4p3/PP2P3/4NN1P/2QB1PP1/2R1R1K1 b - f8c8
Here we used only games which ended up with a win, skipping draws. The function to generate training text is:
import os from tqdm.auto import tqdm import glob import chess.pgn MAX_IMPORT = 100000 def importPgn(filename, s, max_import): counter = 0 total = 0 with open(filename) as f: for line in f: if "[Result" in line: total += 1 if total > max_import: total = max_import pbar = tqdm(total=total, desc="read " + filename, unit=" games", mininterval=1) pgn = open(filename) while counter < max_import: game = chess.pgn.read_game(pgn) if not game: break board = game.board() moves = game.mainline_moves() count = sum(1 for _ in moves) # skip unfinished games if count <= 5: continue result = game.headers["Result"] # import only resultative games if result != "1-0" and result != "0-1": continue for move in moves: if board.turn == chess.WHITE and result == "1-0": line = ( "[1-0] " + " ".join(board.fen().split(" ", 2)[:2]) + " - " + move.uci() ).strip() s.add(line) elif board.turn == chess.BLACK and result == "0-1": line = ( "[0-1] " + " ".join(board.fen().split(" ", 2)[:2]) + " - " + move.uci() ).strip() s.add(line) board.push(move) counter += 1 pbar.update(1) pbar.close() return counter def convert(): games = 0 moves = 0 max_import = MAX_IMPORT s = set() # load previous state if os.path.exists("fen.txt"): with open("fen.txt") as f: for line in tqdm(f, desc="read fen.txt", unit=" moves", mininterval=1): if line: s.add(line) max_import -= 1 if max_import <= 0: break for file in glob.glob("pgn/*.pgn"): count = importPgn(file, s, max_import) games += count max_import -= count if max_import <= 0: break with open("fen.txt", "w") as f: for line in tqdm(s, desc="write fen.txt", unit=" moves", mininterval=1): f.write(line + "\n") moves += 1 print("imported " + str(games) + " games, " + str(moves) + " moves") convert()
It takes about 15 minutes to import 100K games.
As we need only chess moves in a model memory, we train the small GPT-2 model from scratch as described in aitextgen docs. The small model was selected because it is possible to train it on average hardware in a shorter time, compared to larger models. Probably, using the large model has its own benefits but it is over complex for the demonstration.
You may run the training function multiple times to repeat training and achieve acceptable loss, as model checkpoints are periodically saved. I've stopped at loss value near 0.8 to save time, but even at that level, the model can predict moves.
Tune-up batch_size and num_workers to better fit with your GPU and avoid OOM.
from aitextgen import aitextgen from aitextgen.utils import build_gpt2_config from aitextgen.TokenDataset import TokenDataset from aitextgen.tokenizers import train_tokenizer import os file_name = "fen.txt" model_dir = "trained_model" config_file = os.path.join(model_dir, "config.json") pytorch_model_file = os.path.join(model_dir, "pytorch_model.bin") vocab_file = os.path.join(model_dir, "aitextgen-vocab.json") merges_file = os.path.join(model_dir, "aitextgen-merges.txt") dataset_cache_file = os.path.join(model_dir, "dataset_cache.tar.gz") max_length = 100 vocab_size = 10000 def train(): if not os.path.exists(model_dir): os.mkdir(model_dir) # train tokenizer if necessary if not os.path.exists(vocab_file): print("training tokenizer, please wait...") train_tokenizer(file_name, save_path=model_dir, vocab_size=vocab_size) if os.path.exists(dataset_cache_file): # use cache data = TokenDataset( dataset_cache_file, vocab_file=vocab_file, merges_file=merges_file, block_size=max_length, from_cache=True, ) else: # or create token cache if necessary data = TokenDataset( file_name, vocab_file=vocab_file, merges_file=merges_file, block_size=max_length, line_by_line=True, save_cache=True, cache_destination=dataset_cache_file ) if not os.path.exists(pytorch_model_file): config = build_gpt2_config( vocab_size=vocab_size, max_length=max_length, dropout=0.0, n_embd=512, n_head=16, n_layer=16, ) ai = aitextgen( config=config, vocab_file=vocab_file, merges_file=merges_file, to_gpu=True ) else: ai = aitextgen( model=pytorch_model_file, config=config_file, vocab_file=vocab_file, merges_file=merges_file, to_gpu=True ) ai.train( data, num_steps=150000, generate_every=1000, save_every=1000, learning_rate=1e-4, batch_size=16, num_workers=4, ) train()
It takes about 8 hours. To get a well-trained model you'll need a few days.
4.1. Random player
This is the simplest possible player. The function takes a list of valid moves and randomly makes a choice. It plays chess badly.
import random def random_player(board): move = random.choice(list(board.legal_moves)) return move.uci(), False, False
4.2. GPT-2 player
This player is using GPT-2 "AI" to predict the next move. The prompt for the model is constructed from the expected result (we want to win, so it is "1-0" for white and "0-1" for black), current board state and side. Then the model appends the next generated move to the prompt.
A few notes about this player:
- It is trained on a small amount of data in ML units so cannot act as a chess master.
- You can see from the results that the model predicts moves from unknown board states, not presented to it during training.
- The model can generate an invalid move sometimes and fix this a valid random move is used.
import os from aitextgen import aitextgen from aitextgen.utils import build_gpt2_config import chess from tqdm.auto import tqdm model_dir = "trained_model" vocab_file = "aitextgen-vocab.json" merges_file = "aitextgen-merges.txt" max_length = 100 model_dir = "trained_model" config_file = os.path.join(model_dir, "config.json") pytorch_model_file = os.path.join(model_dir, "pytorch_model.bin") vocab_file = os.path.join(model_dir, "aitextgen-vocab.json") merges_file = os.path.join(model_dir, "aitextgen-merges.txt") dataset_cache_file = os.path.join(model_dir, "dataset_cache.tar.gz") max_length = 100 ai = aitextgen( model=pytorch_model_file, config=config_file, vocab_file=vocab_file, merges_file=merges_file, from_cache=True, to_gpu=True, # to_fp16=True ) # a set to find known states db = set() with open("fen.txt") as f: for line in tqdm(f, desc="read fen.txt", unit=" moves"): if line: db.add(" ".join(line.split(" ", 3)[:3])) def gpt2_player(board): if board.turn == chess.WHITE: prompt = "[1-0] " + " ".join(board.fen().split(" ", 2)[:2]) else: prompt = "[0-1] " + " ".join(board.fen().split(" ", 2)[:2]) isKnown = prompt in db prediction = ai.generate_one( prompt=prompt, max_length=max_length, temperature=0.9, top_k=0, ) isPredicted = False try: uci = prediction.split(' - ').strip() move = chess.Move.from_uci(uci) isPredicted = True except Exception as e: # print(str(e)) move = None if not move or move not in board.legal_moves: # give up and do random move move = random.choice(list(board.legal_moves)) isPredicted = False return move.uci(), isPredicted, isKnown
4.3. Playing a Game
This function takes two players and performs the game between them.
import time from IPython.display import display, HTML, clear_output import chess def who(player): return "White" if player == chess.WHITE else "Black" def display_board(board, use_svg): if use_svg: return board._repr_svg_() else: return "<pre>" + str(board) + "</pre>" def play_game(player1, player2, visual="svg", pause=0.1): """ playerN1, player2: functions that takes board, return uci move visual: "simple" | "svg" | None """ use_svg = (visual == "svg") board = chess.Board() known1 = 0 predicted1 = 0 total1 = 0 known2 = 0 predicted2 = 0 total2 = 0 if visual is not None: display(display_board(board, visual == 'svg')) try: while not board.is_game_over(claim_draw=True): if board.turn == chess.WHITE: uci, isPredicted, isKnown = player1(board) total1 += 1 if isKnown: known1 += 1 if isPredicted: predicted1 += 1 else: uci, isPredicted, isKnown = player2(board) total2 += 1 if isKnown: known2 += 1 if isPredicted: predicted2 += 1 name = who(board.turn) board.push_uci(uci) board_stop = display_board(board, use_svg) html = "<b>Move %s %s, Play '%s':</b><br/>%s<br/>Known/Predicted/Total moves: %s/%s/%s %s%% - %s/%s/%s %s%%" % ( len(board.move_stack), name, uci, board_stop, known1, predicted1, total1, round(predicted1 / (total1 or 1) * 100), known2, predicted2, total2, round(predicted2 / (total2 or 1) * 100)) if visual is not None: if visual == "svg": clear_output(wait=True) display(HTML(html)) if visual == "svg": time.sleep(pause) except KeyboardInterrupt: msg = "Game interrupted!" return (None, msg, board) result = "1/2-1/2" if board.is_checkmate(): msg = "checkmate: " + who(not board.turn) + " wins!" result = "1-0" if who(not board.turn) == "White" else "0-1" elif board.is_stalemate(): msg = "draw: stalemate" elif board.is_fivefold_repetition(): msg = "draw: 5-fold repetition" elif board.is_insufficient_material(): msg = "draw: insufficient material" elif board.can_claim_draw(): msg = "draw: claim" if visual is not None: print(msg) return (result, msg, board)
Let's meet together gpt2_player vs. random_player:
play_game(gpt2_player, random_player) pass Move 61 White, Play 'd2d7': Known/Predicted/Total moves: 2/29/31 94% - 0/0/30 0% checkmate: White wins!
Interesting is that often the game will end up in a stalemate. Most probably it is the result of not analyzing the next move and selecting the best one just for a moment.
Now let's play 100 games (gpt2_player plays white):
from tqdm.auto import tqdm plays = 100 white_wins = 0 black_wins = 0 pbar1 = None pbar2 = None for i in tqdm(range(plays), desc="Plays"): if not pbar1: pbar1 = tqdm(total=plays, desc="White wins") if not pbar2: pbar2 = tqdm(total=plays, desc="Black wins") result, _, _ = play_game(gpt2_player, random_player, visual=None) if result is None: break elif result == "1-0": white_wins += 1 pbar1.update(1) elif result == "0-1": black_wins += 1 pbar2.update(1) pbar1.close() pbar2.close() print("Final score: %s-%s" % (white_wins, black_wins)) Final score: 52-0
In most cases there are draws or gpt2_player wins. Nearly half of plays ended up with a checkmate from the white player controlled by GPT-2, and overall score is decisively on its side. Interesting notation is that almost always the board state is new to the model, and the model is performing valid moves definitely more often than fails. So we can conclude the model learned some basic patterns from training data to successfully predict the next move.
4.4. A human player
This function handles human input to play:
def human_player(board): uci = get_move("%s's move [q to quit]> " % who(board.turn)) legal_uci_moves = [move.uci() for move in board.legal_moves] while uci not in legal_uci_moves: print("Legal moves: " + (",".join(sorted(legal_uci_moves)))) uci = get_move("%s's move[q to quit]> " % who(board.turn)) return uci, True, False def get_move(prompt): uci = input(prompt) if uci and uci == "q": raise KeyboardInterrupt() try: chess.Move.from_uci(uci) except: uci = None return uci
Try your hand at playing chess against the gpt2_player. Note that you must enter your move in UCI, such as "a2a4", meaning moving the piece at a2 to location a4.
play_game(human_player, gpt2_player) pass Move 10 Black, Play 'b7b6': Known/Predicted/Total moves: 0/5/5 100% - 2/5/5 100%
We applied the natural text generation model, the GPT-2, in an unusual field of Chess game moves generation. Despite it is far away from master level yet, it showed an ability to learn Chess basics. Feeding more training data and increasing the model size theoretically will bring this model to a higher level, but our goal was to confirm the GPT-2 model ability to learn and generate abstract patterns.
What is interesting, when the model is playing against the random player, which moves "stupidly", the model behaves not very well too, but when playing vs human, it moves more "thoughtfully". We think it's because the better you play, the more similar is the board state to some state from training data, and the more confident is the model on generating the next move.
Above results show that besides natural text generation, the GPT-2 model confidently can generate any type of textual patterns. And it's not just repeating training data, because the model finds some similarities to successfully deal with unknown input, in other words, it's like the model builds an algorithm internally. And last but not least, it can be trained on an average personal computer.
The subject is open for further experiment, not covered in this article:
- Continue the model training until lower loss reached.
- Use a larger model.
- Use a larger PGN dataset with a billion or more games.
- Predict the next two, three, or more moves.
- Add board state and move analysis to make it more like a chess program.
The notebook is available on Google Colab. Feel free to do your own experiment.