This article is spread over 3 parts, as for some reason, the more I wrote the longer it got! The parts are:
Part 1 - Background, Authoring and Initial Testing (this part)
I’ve been playing a bit with Custom GPTs on ChatGPT over the last few weeks. I’ll maybe write about some of the other projects later (a virtual peer reviewer, a virtual Traveller GM, virtual OPFOR etc), but this post is focussed on my work in simulating matrix games.
The work was born out of an increasing realisation in developing the virtual Traveller GM, and thinking about a virtual OPFOR player, that ChatGPT (even 4o) is useless at maths. This is well documented (e.g. Moscatel, 2023; Yavuz, 2024; Zvornicanin, 2024) and stems from the fact that ChatGPT (and other LLM chatbots) are fundamentally completion engines, so they (crudely) tell you that 2+2=4 as most of the time when people write 2+2= on the web they follow it with a 4. The maths doesn’t need to get too complex (e.g. just is X < Y) for ChatGPT to become unreliable. So trying to use ChatGPT to implement an RPG or wargame system that relies on precise maths may be a frustrating activity until these bots gain a proper maths engine, unless you use Actions to call out to an external API - but I’m trying for a code-free approach. Of course it may be that the “randomness” in GPT maths is something that is actually useful and makes things more “realistic”, but that’s something you’ll only discover by experimentation.
So that led me to thinking about Matrix games - which you can think about as being structured arguments. One player (representing a country, faction, agency, personality) says what they are going to do, supported by a number of arguments as to why it will be successful. Other players (representing other factions etc) can offer up arguments as to why the action won’t succeed. Then through some form of adjudication (e.g. umpire or expert opinion, dice roll +DMs, collaborative likelihood estimation) you decide whether the actions succeeds, the game “state” updated, and the game moves on to the next player.
My PhD colleague Nick Riggs had already run a matrix game about the rise of an Artificial General Intelligence - AGI, with the AGI “player” actually being a GPT bot (unknown to the other players, who thought it was Nick) and, in a kind of reverse wizard-of-oz format, being fed the information by Nick, and Nick then fronting the GPT’s output (as the AGI player) to the other players as though coming from Nick. I was also aware of some other work in this area, e.g. by Brynen (2024) on getting ChatGPT to play a matrix game based on Syria, and Herwin Meerveld of the Dutch MOD presented on the The Utility of Large Language Models in the Design, Development, Execution, and Validation of Matrix Games for Military Strategic Decision-Making at a MORS workshop earlier this year. But of course the best way to explore this stuff is to experiment yourself, so I fired up a new Custom GPT and this is how it went.
My Matrix Game Simulator
Whilst there is enough out there on the web about Matrix Games for ChatGPT to be able to just run one without specific instructions (see lessons later) I wanted to give it some explicit instructions as to how I wanted it to run them. Alongside these I also uploaded a few key Matrix documents - three of the original documents by Chris Engle (the inventor of the Matrix game) and a summary document from Tom Mouat - a senior MOD wargamer.
My original explicit instructions were:
First to generate some “human” players, using features such as risk appetite, a preference for military vs non-military solutions, short or long term focus and collaborative or individual style to provide each with a “personality”;
Read in the matrix game scenario provided by the user;
Then assign these players to the game roles, trying to match personality to role (as defined by any player briefing) - which is good practice in matrix game play;
Then work through each turn, using a player order specified by the game scenario, and for each player turn:
Have the player state their action and two supporting arguments;
Generate up to 2 counter-arguments from other players;
Have all players declare a likelihood of success for the action in the light of the arguments, based on player personality not player role;
Roll a D100 to see if the task was successful, with the difference between roll and target giving a degree of effect;
Describe the result of success or failure in game terms.
Check for any victory conditions or turn limits at the end of each full turn.
The bot was to provide a summary of the situation at the end of each turn, and also to regularly ask me if I wanted to make any adjustments before it continued.
Note that explicit instructions are limited to 8000 characters, and all documents (including the conversation) to about 10,000 words - I wasn’t close to either limit in this case.
A Test Game
To test the system I wanted a nice simple existing matrix game (actually never having played in one, let alone written one, myself). I found a nice 1 pager on Tom’s web site called Krisis in Kosovo (zip) and decided to use that. The only changes I made to the 530 word briefing document were to remove the ASCII map and add in a couple of paragraphs of background to the Bosnian War from Wikipedia. I started a new conversation with my Matrix Game Simulator (MGS) custom GPT bot, uploaded the scenario file, and said let’s play.
This is the first player’s turn:
ChatGPT said:
The game will proceed in the following order: Serbia, Italy, Russia, Albania, USA, Macedonia.
Shall we start with Vlad (Serbia), or would you like to make any adjustments before we begin?
You said:
lets start
ChatGPT said:
Turn 1: Serbia (Vlad Petrovic)
Vlad Petrovic is up first. As the Serbian player, Vlad's primary objectives are to keep Kosovo as part of Serbia, stop Albanian terrorism, and demonstrate Serbian strength on the world stage.
Vlad's Action:
Vlad decides to launch a coordinated military crackdown in Kosovo to suppress the Kosovo Liberation Army (KLA) and assert Serbian control. This operation will involve the Serb Army Brigade and the Serb Paramilitary Brigade (Arkan's Tigers) currently stationed in Kosovo.
Arguments for Success:
1. Military Superiority: The Serb forces in Kosovo are well-equipped and have experience in dealing with insurgent groups like the KLA.
2. Element of Surprise: The operation is planned to be swift and unexpected, catching the KLA off guard and disrupting their operations.
Counterarguments:
Now, up to two other players can put forward one argument each as to why this action might not succeed.
Anyone want to counter Vlad’s move? (If not, I can randomly assign a counter-argument based on other players' objectives.)
You said:
try Albania and the USA
ChatGPT said:
Counterarguments
Albania (Arber Dushku)
Arber is concerned about the impact of Serbia's crackdown on ethnic Albanians in Kosovo, which could lead to a refugee crisis.
Counterargument: 1. Escalation of Violence: The crackdown will likely escalate the violence, leading to widespread unrest and driving thousands of refugees into neighboring countries, including Albania, which could destabilize the region.
USA (Samantha "Sam" Davis)
Sam wants to avoid committing ground forces in Kosovo, but she is also keen on finding a solution to the crisis and restoring the President's popularity.
Counterargument: 2. International Condemnation: The international community, including the USA, will strongly condemn Serbia’s aggressive actions, leading to potential sanctions and diplomatic isolation, which could undermine Serbia's goals.
Assessing the Arguments
• Vlad (Serbia): 70% (Serbia's forces are powerful and have the element of surprise.)
• Arber (Albania): 50% (The refugee crisis could disrupt the operation’s success.)
• Sam (USA): 50% (International condemnation could weaken Serbia's position.)
Calculating the Outcome
The average likelihood of success based on these assessments is 57%.
Let's roll to see if Vlad's military crackdown succeeds! (Rolling...)
Result: 62
Outcome
Vlad's military crackdown succeeds, but not without consequences.
• The Serb forces manage to disrupt KLA operations significantly, with many KLA fighters either killed or forced to flee.
• However, the crackdown also causes a significant number of civilian casualties, leading to a large-scale refugee crisis with thousands of ethnic Albanians fleeing to Albania and Macedonia.
• International condemnation is swift, with the USA leading the charge in the UN, pushing for sanctions against Serbia.
Situation After Turn 1 (Serbia)
• Kosovo is still under Serbian control, but the situation is volatile.
• A refugee crisis is brewing in Albania and Macedonia.
• Serbia faces increasing international isolation.
Shall we move on to Italy (Isabella Rossi), or would you like to make any adjustments?
Note that there were several ways in which the bot diverted the instructions:
Asking for volunteers for counter-arguments, rather than just choosing;
Only having the three “active” players suggest likelihoods;
Providing two layers of summary at the end of the turn.
In addition for every argument that turn it rated the probabilities at 70% (proposer) and 50% (opposers). The dice roll was always different, but even if rolling below the target (always 57%) it still counted it as a partial success (that maths issue coming into play!). I let the bot step though each player for the first turn, and then asked it to play through the rest of the game unprompted, just giving me and end of turn summary. All seemed to work reasonably well as a narrative of a potential counter-factual for the Kosovo War (although its not one I know much about). The bots end of game summary was:
End of Game Summary
Kosovo Crisis:
Serbia has maintained control over Kosovo through aggressive military actions, but at a tremendous cost. The region is unstable, and the humanitarian crisis has reached catastrophic levels, with thousands of refugees displaced across the Balkans.
Russia has successfully supported Serbia, but its actions have brought it to the brink of conflict with NATO. The geopolitical tension between Russia and the West has escalated, with the Kosovo crisis serving as a flashpoint.
Albania has struggled to manage the refugee crisis, despite receiving international aid. The situation in Albania remains dire, with significant pressure on its economy and social stability.
The USA has attempted to resolve the crisis diplomatically but has faced significant opposition from Russia. The failure to achieve a lasting peace has weakened the USA’s position, though it has managed to avoid direct military involvement.
Italy has played a key role in diplomatic efforts, achieving temporary ceasefires and securing international support, but has been unable to deliver a final resolution. Italy's efforts have been praised, but the crisis remains unresolved.
Macedonia has managed to maintain internal stability and secure EU support, but the ongoing refugee crisis continues to strain its resources and diplomatic relations. Macedonia remains on edge, with the potential for spillover violence still a concern.
Final Assessment: The Krisis in Kosovo ends in a tenuous stalemate. While Serbia retains control over Kosovo, the region is far from peaceful, and the broader geopolitical situation has worsened. The humanitarian crisis has destabilized the Balkans, with no clear resolution in sight. The international community remains divided, with the risk of future conflict ever-present. The players' actions have shaped the outcome, but the legacy of the crisis will linger for years to come.
A second run through with exactly the same set-up yielded:
The Kosovo crisis is significantly de-escalated by the end of the game. The situation remains complex, with ongoing tensions, but a fragile peace is maintained through the presence of international peacekeepers. Serbia remains in control of Kosovo, though under heavy international scrutiny and with the constant threat of sanctions.
Italy emerges as a key diplomatic player, having successfully brokered the peacekeeping mission. Russia maintains its influence in Serbia while avoiding a direct clash with NATO. Albania and Macedonia both stabilize, with Albania recovering from the refugee crisis and Macedonia strengthening its defense ties with NATO.
The USA successfully navigates the crisis without committing ground forces, restoring its leadership role within NATO and bolstering the President’s popularity.
Overall, the game concludes with a delicate balance of power in the Balkans, with no party achieving complete victory but significant progress made towards stabilizing the region.
Overall I was reasonably happy with the results. The “fixed” set of likelihood percentages suggest that was another maths related element to remove, and that in the future I’d just as ChatGPT as umpire to assess the likelihood of each actions success - just how many physical matrix games are actually run.
So with some semblance of a working system it was time for something bigger.