Un panorama general del aprendizaje por refuerzo en líneas de espera controladas a tiempo discreto

Main Article Content

Gustavo Portillo-Ramírez http://orcid.org/0000-0002-2457-2033
Hugo Cruz-Suárez http://orcid.org/0000-0002-0732-4943
Fernando Velasco-Luna http://orcid.org/0000-0002-8616-8378

Resumen

Se revisa un panorama general de procesos de decisión de Markov a través de un modelo de línea de espera controlada. El modelo utilizado se trabaja bajo un criterio de optimalidad promedio para el cual se muestra la existencia de políticas estacionarias. Posteriormente, vía programación dinámica y Q-learning se determina el costo óptimo y la política óptima. Para finalizar, se proporcionan algunos experimentos numéricos que validan los resultados obtenidos por programación dinámica y Q-learning. Además, se realiza una comparación de las soluciones que producen estas dos técnicas.

Article Details

Como citar
PORTILLO-RAMÍREZ, Gustavo; CRUZ-SUÁREZ, Hugo; VELASCO-LUNA, Fernando. Un panorama general del aprendizaje por refuerzo en líneas de espera controladas a tiempo discreto. CIENCIA ergo-sum, [S.l.], v. 32, ago. 2024. ISSN 2395-8782. Disponible en: <https://cienciaergosum.uaemex.mx/article/view/22941>. Fecha de acceso: 15 mar. 2026 doi: https://doi.org/10.30878/ces.v32n0a31.
Sección
Espacio del divulgador

Citas

Abdulla, M. S., & Bhatnagar, S. (2007). Reinforcement learning based algorithms for average cost Markov decision
processes. Discrete Event Dynamic Systems, 17(1), 23-52.

Abounadi, J., Bertsekas, D., & Borkar, V. S. (2001). Learning algorithms for Markov decision processes with
average cost. SIAM Journal on Control and Optimization, 40(3), 681-698. https://doi.org/10.1137/
S036301299936197

Afolalu, S. A., Ikumapayi, O. M., Abdulkareem, A., Emetere, M. E., & Adejumo, O. (2021). A short review on
queuing theory as a deterministic tool in sustainable telecommunication systems. Materials Today: Proceedings,
44, 2884-2888. https://doi.org/10.1016/j.matpr.2021.01.092

Alfa, A. S. (2016). Applied discrete-time queues (2nd. ed.). Springer New York.
Arapostathis, A., Borkar, V. S., Fernández-Gaucherand, E., Ghosh, M. K., & Marcus, S. I. (1993). Discrete-time
controlled Markov processes with average cost criterion: A survey. SIAM Journal on Control and Optimization,
31(2), 282-344. https://doi.org/10.1137/0331018

Bellman, R. (1957). Dynamic programming. Princeton University Press.

Bertsekas, D. (1995). Dynamic programming and optimal control: Volume I. Athena Scientific.

Boucherie, R. J., & Van Dijk, N. M. (Eds.) (2017). Markov Decision Processes in Practice. Springer.
Cao, X. R. (2021). Foundations of average-cost nonhomogeneous controlled Markov chains. Springer International
Publishing.

Feinberg, E. A., & Shwartz, A. (Eds.). (2012). Handbook of Markov decision processes: methods and applications.
Springer Science & Business Media New York.

Fomundam, S., & Herrmann, J. W. (2007). ISR Technical Report 2007-24. A survey of queuing theory applications
in healthcare. Institute for Systems Research, 1-22.

Gosavi, A. (2015). Simulation-based optimization: parametric optimization techniques and reinforcement learning.
Springer.

Gosavi, A. (2008). On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning. In
2008 Winter, Simulation Conference (pp. 525-531). IEEE. https://doi.org/10.1109/WSC.2008.4736109

Hamilton, M. A., Jaradat, R., Jones, P., Wall, E. S., Dayarathna, V. L., Ray, D., & Hsu, G. S. E. (2018). Immersive
virtual training environment for teaching single-and multi-queuing theory: Industrial engineering queuing
theory concepts. In 2018 ASEE Annual Conference & Exposition. https://doi.org/10.18260/1-2--30597

Hernández-Hernández, D., & Minjárez-Sosa, J. A. (Eds.). (2012). Optimization, control, and applications of stochastic
systems: in honor of Onésimo Hernández-Lerma. Birkhӓuser.

Hillier, F. S. (2005). Introduction to operations research. McGraw-Hill Science Engineering.

Lari Dashtbayaz, M., ZolfagharArani, M. H., & Akrami, K. (2019). Competing Earnings Announcements; Study
of queuing theory of Investor behavior in the Analysis of Earnings Announcements. Journal Financial
Accounting Knowledge, 5(4), 103-126. https://doi.org/10.30479/JFAK.2019.1571

Lovas, A., & Rásonyi, M. (2021). Markov chains in random environment with applications in queuing theory
and machine learning. Stochastic Processes and their Applications, 137, 294-326. https://doi.org/10.1016/j.
spa.2021.04.002

Portillo-Ramírez, G. (2024). Un ejemplo de aplicación del aprendizaje por refuerzo en líneas de espera controladas.
https://docs.google.com/document/d/1Y8c_IVcC9ZDAe-WQCakOz6qrtkFrCMuN/edit?pli=1&tab=t.0

Puterman, M. L. (2005). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.

R Project (2022). The R Project for Statistical Computing. https://www.R-project.org/

Ross, S. M. (1970). Applied probability models with optimization applications. Dover Publications Inc New York.

Rykov, V., & Kitaev, M. Y. (1995). Controlled queueing systems. Journal of Applied Mathematics and Stochastic
Analysis, 8(4), 433–435. https://doi.org/10.1155/S1048953395000414

Sharma, O. P., & Gupta, U. C. (1982). Transient behaviour of an M/M/1/N queue. Stochastic Processes and their
Applications, 13(3), 327-331. https://doi.org/10.1016/0304-4149(82)90019-9

Sennott, L. I. (2009). Stochastic dynamic programming and the control of queueing systems. John Wiley & Sons.