March 27, 2025

Multi-Agent Deep Q-Learning for Trading Strategy Optimization

Intro

The cryptocurrency market is known for its volatility, decentralized structure, and complex dynamics, making it a challenging environment for developing consistent trading strategies. In their 2024 paper, Trading Strategy of the Cryptocurrency Market Based on Deep Q-Learning Agents, Chester S. J. Huang and Yu-Sheng Su propose a novel reinforcement learning approach that trains multiple agents to learn optimal trading strategies across various cryptocurrencies using Deep Q-Learning (DQL) techniques (Huang & Su, 2024).

Traditional approaches to algorithmic trading in the cryptocurrency space often rely on single-asset strategies, fixed indicators, or supervised learning models trained on historical data. These models, while useful, tend to be static and inflexible when faced with sudden market shifts. Reinforcement learning (RL) offers a more dynamic alternative by allowing agents to learn through interaction, updating their strategies based on observed rewards.

However, existing RL-based approaches frequently focus on trading one cryptocurrency at a time. Huang and Su challenge this limitation by designing a system where multiple agents are trained simultaneously on a shared dataset, each developing independent decision-making capabilities. The system aims to discover trading strategies that perform reliably across diverse market conditions—uptrends, downtrends, and horizontal trends.

Methodology

The core of the proposed framework is a multi-agent DQL system in which each agent chooses one of three actions daily: go long, go short, or remain inactive (“wait and see”). Each agent is trained using the same historical data but develops its own Q-values and policies through repeated exposure to different market phases.

To stabilize training, the authors incorporate techniques from advanced DQN architectures, including target networks and adversarial networks. Agents are grouped into three categories:

Go-long agents: focus solely on identifying profitable long positions.
Go-short agents: specialize in short-selling opportunities.
Go-long + go-short agents: dynamically choose between long, short, or passive positions based on market signals.

A majority-voting system is applied across agent outputs to form consensus decisions, adding robustness and reducing overfitting to individual agent biases.

Experimental Design

The study evaluates performance across six cryptocurrencies—BTC, ETH, VET, ADA, TRX, and XRP—representing various market trends. Agents are trained and tested on Binance data from 2018 to 2022, with a clear separation between training, validation, and test periods. Metrics used for evaluation include annualized return, maximum drawdown (MDD), and coverage ratio (COV), which reflects how often the agents chose to trade.

The performance of each agent group is benchmarked against a basic buy-and-hold strategy.

Results

Empirical findings indicate that the proposed multi-agent DQL framework outperformed the buy-and-hold approach in most market scenarios, particularly when using go-long agents in uptrending markets. For example:

In the Ethereum (ETH) uptrend, go-long agents achieved an annualized return of 725.48%, surpassing the 407.18% from the buy-and-hold strategy.
Even in downtrend markets such as Ripple (XRP), go-long agents outperformed the buy-and-hold benchmark, posting an annualized return of −3.70% compared to −60.97%.

The system also demonstrated a significant reduction in maximum drawdowns and a more controlled trading frequency, which helped limit transaction costs and volatility exposure.

Conclusion

This study demonstrates the viability of multi-agent deep reinforcement learning for developing adaptive, asset-agnostic trading strategies in the cryptocurrency market. By training agents to act based on market behavior rather than static rules, the proposed method shows improved resilience and profitability across varying market trends.

The framework presents a step forward in algorithmic trading research, particularly in highly volatile and fragmented markets like digital assets. Future work may focus on improving reward functions, exploring alternative reinforcement learning models (e.g., PPO, A3C), and applying similar frameworks in real-time environments with live data.

About Axon Trade

Axon Trade provides advanced trading infrastructure for institutional and professional traders, offering high-performance FIX API connectivity, real-time market data, and smart order execution solutions. With a focus on low-latency trading and risk-aware decision-making, Axon Trade enables seamless access to multiple digital asset exchanges through a unified API.

Explore Axon Trade’s solutions: