An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions
Bringing AI outside the lab is a formidable challenge, since the real world is full of uncertainty, non-stationarity, and plenty of unknown hidden states. Therefore, training an AI system starting from simulated data mimicking the reality will only provide limited success when released into the wild. This is a challenging problem as the real-world exhibits vast numbers of latent variables and confounders that vary over time, which can have a profound effect on the performance of an AI system. Moreover, the real world shows large uncertainties, which may be too complex and ill-defined to be modeled with the required accuracy. Thus, it becomes necessary to encompass uncertainty into the modeling effort and, furthermore, to measure and approximate changes in real-world environments.
The game of curling can be considered a good testbed for studying the interaction between artificial intelligence systems and the real-world. Curling has been described as a combination of bowling and chess; it is a turn-based game in which two teams play alternately on the ice sheet, requiring a high level of strategic thinking and performance. Curling is challenging because it requires precise throwing (robot control problem) and strategic planning to win. Moreover, the environmental characteristics change at every moment, and every throw has an impact on the outcome of the match. Moreover, we had to address the issue of transferring from a simulation to a real environment with high uncertainty and needed to propose an adaptation to a dynamically changing environment where relearning is not a possibility. Basically, this is a big issue in a real-world problem. In many real-world problems, sufficient time was not allocated to adapt to the changing environment. In curling, there is no time for relearning during a curling match due to the timing rules of the game; in addition, very little to no information is available on the nature of the environmental change. Furthermore, every throw (episode) has a large impact on the outcome of the match. This case study contains challenges related to using robotic systems in real-world environments: strong temporal variability, uncertainties, and continuousness.
Curly, the AI curling robot system, consists of the curling AI (strategy planning model, adaptation model to the dynamically changing environment in a real icy world, a curling simulator) and two curling robots (skip-/thrower-Curly) (Figure 1). We succeeded not only in terms of strategic planning but also with respect to the real-time adaptation within the real curling game setting.
While deep reinforcement learning (DRL) methods have been successfully applied to games in discrete action spaces, numerous real-world reinforcement learning (RL) applications require an agent to select optimal actions from a continuous action space. We will contribute an adaptation framework and testbed application to this transfer challenge. Our proposed adaptation framework extends standard DRL using temporal features, which learn to compensate for the uncertainties and non-stationarities that are an unavoidable part of curling (Figure 2). Specifically, we show that the adaptation is crucial and compares very favorably to just transferring a model trained on physics-based simulations of curling. Moreover, we report a curling robot that can achieve human-level performance in the game of curling using an adaptive deep reinforcement learning framework. Our curling robot, Curly, was able to win three of four official matches against expert human teams [top-ranked women’s curling teams and Korea national wheelchair curling team (reserve team)]. In summary, our paper contributes to the deep question of how to transfer a DRL learning model from simulation data to the real uncertain world demonstrating that the present gap between the simulation and the real-world can indeed be significantly narrowed.
The use of explainable AI techniques to gain a further understanding of critical shot impacts, thus allowing the curling AI and its creators to learn better from their mistakes, will be interesting for future studies. Moreover, cases of extreme changes, such as sweeping in curling, deserve further investigation. Last, we note that the insights obtained within our framework on how to alleviate challenges such as strong temporal variability, uncertainties, and continuousness are readily transferable for contributing to other real-world applications of comparable complexity in robotics and beyond (e.g., autonomous drone, etc.).
This work has been done by Professor Seong-Whan Lee and his team at Korea University in Korea.