Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization

Feidiao YANG; Wei CHEN; Jialin ZHANG; Xiaoming SUN

doi:10.1007/s11704-020-9519-9

PDF(582 KB)

Front. Comput. Sci. ›› 2021, Vol. 15 ›› Issue (5) : 155404. DOI: 10.1007/s11704-020-9519-9

RESEARCH ARTICLE

Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization

Feidiao YANG¹^,² ,
Wei CHEN³ ,
Jialin ZHANG¹^,² ,
Xiaoming SUN¹^,²

Author information +

History +

Abstract

Combinatorial optimization in the face of uncertainty is a challenge in both operational research and machine learning. In this paper, we consider a special and important class called the adversarial online combinatorial optimization with semi-bandit feedback, in which a player makes combinatorial decisions and gets the corresponding feedback repeatedly. While existing algorithms focus on the regret guarantee or assume there exists an efficient offline oracle, it is still a challenge to solve this problem efficiently if the offline counterpart is NP-hard. In this paper, we propose a variant of the Followthe-Perturbed-Leader (FPL) algorithm to solve this problem. Unlike the existing FPL approach, our method employs an approximation algorithm as an offline oracle and perturbs the collected data by adding nonnegative random variables. Our approach is simple and computationally efficient. Moreover, it can guarantee a sublinear (1+ ε)-scaled regret of order O( $T 23$ ) for any small ε>0 for an important class of combinatorial optimization problems that admit an FPTAS (fully polynomial time approximation scheme), in which Tis the number of rounds of the learning process. In addition to the theoretical analysis, we also conduct a series of experiments to demonstrate the performance of our algorithm.

Keywords

online learning / online combinatorial optimization / semi-bandit / follow-the-perturbed-leader

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Feidiao YANG, Wei CHEN, Jialin ZHANG, Xiaoming SUN. Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization. Front. Comput. Sci., 2021, 15(5): 155404 https://doi.org/10.1007/s11704-020-9519-9

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	György A, Linder T, Lugosi G, Ottucsák G. The on-line shortest path problem under partial monitoring. Journal of Machine Learning Research, 2007, 8(10): 2369–2403

[2]	Audibert J Y, Bubeck S, Lugosi G. Regret in online combinatorial optimization. Mathematics of Operations Research, 2013, 39(1): 31–45 CrossRef Google scholar

[3]	Neu G, Bartók G. An efficient algorithm for learning with semi-bandit feedback. In: Prodeedings of International Conference on Algorithmic Learning Theory. 2013, 234–248 CrossRef Google scholar

[4]	Neu G, Bartók G. Importance weighting without importance weights: an efficient algorithm for combinatorial semi-bandits. Journal of Machine Learning Research, 2016, 17(154): 1–21

[5]	Kalai A, Vempala S. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 2005, 71(3): 297–307 CrossRef Google scholar

[6]	Wang W, Zhou Z H. Crowdsourcing label quality: a theoretical analysis. Science China Information Sciences, 2015, 58(11): 1–12 CrossRef Google scholar

[7]	Thompson W. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 1933, 25(3/4): 285–294 CrossRef Google scholar

[8]	Robbins H. Some aspects of the sequential design of experiments. Bulletin of the America Mathematical Society, 1952, 58(5): 527–535 CrossRef Google scholar

[9]	Auer P, Cesa-Bianchi N, Fischer P. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 2002, 47(2–3): 235–256 CrossRef Google scholar

[10]	Auer P, Cesa-Bianchi N, Freund Y, Schapire R. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 2002, 32(1): 48–77 CrossRef Google scholar

[11]	Gai Y, Krishnamachari B, Jain R. Combinatorial network optimization with unknown variables: multi-armed bandits with linear rewards and individual observations. IEEE/ACM Transactions on Networking, 2012, 20(5): 1466–1478 CrossRef Google scholar

[12]	Chen W, Wang Y, Yuan Y, Wang Q. Combinatorial multi-armed bandit and its extension to probabilistically triggered arms. Journal of Machine Learning Research, 2016, 17(50): 1–33

[13]	Kveton B, Wen Z, Ashkan A, Szepesvari C. Tight regret bounds for stochastic combinatorial semi-bandits. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2015, 535–543

[14]	Sankararaman K A, Slivkins A. Combinatorial semi-bandits with knapsacks. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2018, 1760–1770

[15]	Kveton B, Wen Z, Ashkan A, Eydgahi H, Eriksson B. Matroid bandits: fast combinatorial optimization with learning. In: Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence. 2014, 420–429

[16]	Takimoto E, Warmuth M. Path kernels and multiplicative updates. Journal of Machine Learning Research, 2003, 4(5): 773–818

[17]	Awerbuch B, Kleinberg R. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 2004, 45–53 CrossRef Google scholar

[18]	McMahan H B, Blum A. Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of International Conference on Computational Learning Theory. 2004, 109–123 CrossRef Google scholar

[19]	Dani V, Kakade S, Hayes T. The price of bandit information for online optimization. In: Proceedings of the 20th International Conference on Neural Information Processing Systems. 2007, 345–352

[20]	Abernethy J, Hazan E, Rakhlin A. Competing in the dark: an efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory. 2008, 263–273

[21]	Cesa-Bianchi N, Lugosi G. Combinatorial bandits. Journal of Computer and System Sciences, 2012, 78(5): 1404–1422 CrossRef Google scholar

[22]	Bubeck S, Cesa-Bianchi N, Kakade S. Towards minimax policies for online linear optimization with bandit feedback. In: Proceedings of Annual Conference on Learning Theory. 2012

[23]	Combes R, Talebi M, Proutiere A, Lelarge M. Combinatorial bandits revisited. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems. 2015, 2116–2124

[24]	Blum A. On-line algorithms in machine learning. In: Fiat A, Woeginger G L, eds. Online Algorithms. Berlin: Springer, 1998, 306–325 CrossRef Google scholar

[25]	Foster D, Vohra R. Regret in the on-line decision problem. Games and Economic Behavior, 1999, 29: 7–35 CrossRef Google scholar

[26]	Cesa-Bianchi N, Lugosi G. Prediction, Learning, and Games.New York: Cambridge University Press, 2006 CrossRef Google scholar

[27]	Hannan J. Approximation to bayes risk in repeated play. Contributions to the Theory of Games, 1957, 3: 97–139 CrossRef Google scholar

[28]	Vazirani V. Approximation Algorithms. Berlin: Springer, 2001

[29]	Williamson D,Shmoys A. The Design of Approximation Algorithms. New York: Cambridge University Press, 2011

[30]	Poland J. FPL analysis for adaptive bandits. In: Proceedings of International Symposium on Stochastic Algorithms. 2005, 58–69 CrossRef Google scholar

[31]	Kakade S, Kalai A, Ligett K. Playing games with approximation algorithms. SIAM Journal on Computing, 2009, 39(3): 1088–1106 CrossRef Google scholar

[32]	Garber D. Efficient online linear optimization with approximation algorithms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 627–635

[33]	HazanE, HuW, LiY, LiZ. Online improper learning with an approximation oracle. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 5652–5660

RIGHTS & PERMISSIONS

2021 Higher Education Press

AI Summary AI Mindmap

PDF(582 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Accepted	Published
25 Dec 2019	27 Jul 2020	15 Oct 2021
Just Accepted Date	Issue Date
30 Jul 2020	30 Aug 2021

About the journal

Aims & scope

Description

Editorial board

Abstracting / Indexing

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors

Download templates