Autonomous Learning in Tic-Tac-Toe Using Q-Learning: A Reinforcement Learning Approach
Abstract
This study investigates the application of Q-learning, a model-free reinforcement learning algorithm, to train an autonomous agent to master the game of Tic-Tac-Toe. The agent, playing against a random opponent, learns optimal move-selection strategies through trial-and-error over 5,000 training episodes. By leveraging an epsilon-greedy exploration strategy with a decay mechanism and a carefully structured reward system, the agent demonstrates rapid and stable convergence towards a near-optimal policy. Performance metrics show that the agent's win rate progressively increases from a baseline of random guessing to approximately 90%, while its average reward shifts from negative values (indicating frequent losses) to consistently high positive scores. The findings validate the efficacy of Q-learning in deterministic, discrete state-space environments and underscore its value as a foundational algorithm for understanding autonomous learning. This work provides a comprehensive blueprint for its implementation and serves as a basis for scaling to more complex game-playing domains.