Independent Learning in Markov Potential Games: New Insights and Constraints
Open access
Autor(in)
Datum
2023Typ
- Master Thesis
ETH Bibliographie
yes
Altmetrics
Abstract
Markov games offer a formal mathematical framework for modeling multi-agent reinforcement learning problems. Markov Potential Games (MPGs) represent a subclass of mixed cooperative and competitive Markov games for which finding Nash equilibria is tractable. The recently introduced class of constrained Markov Potential Games (CMPGs) generalizes MPGs to model the case where the agents’ reward maximization is subject to global constraints. Existing methods for learning Nash equilibria (NE) in MPGs can be categorized into centralized and independent learning algorithms. Notably, for learning ε-approximate NE, the best-known sample complexity achieved by centralized algorithms is significantly lower than for independent learning (O(ε⁻³) vs. O(ε⁻⁵)). Moreover, no converging independent learning algorithm is known for CMPGs. Nevertheless, whether these gaps are inherent is unknown, i.e., no provable separation between centralized and independent learning has been shown. Continuing on this quest, our contributions are twofold: (a) We propose a new playerwise policy gradient (PG) algorithm that requires coordination among players, however, improves on iteration and sample complexity regarding the dependence on the number of players m. The proposed method also improves over the m-dependence in the complexity of previously known centralized algorithms. (b) In the constrained case, we make progress on closing the gap between centralized and independent learning by providing an independent policy gradient algorithm for learning approximate constrained Nash equilibria in CMPGs. Inspired by contemporary optimization literature, our algorithm performs proximal-point-like updates augmented with a regularized constraint set. Each proximal step is solved inexactly using a stochastic switching gradient algorithm. Under some technical constraint qualification conditions, we establish convergence guarantees towards constrained approximate Nash equilibria. We perform simulations to illustrate our results in two real-world applications of NE-learning in CMPGs. Mehr anzeigen
Persistenter Link
https://doi.org/10.3929/ethz-b-000646915Publikationsstatus
publishedVerlag
ETH ZurichOrganisationseinheit
09729 - He, Niao / He, Niao
ETH Bibliographie
yes
Altmetrics