Un panorama general del aprendizaje por refuerzo en líneas de espera controladas a tiempo discreto
Main Article Content
Resumen
Se revisa un panorama general de procesos de decisión de Markov a través de un modelo de línea de espera controlada. El modelo utilizado se trabaja bajo un criterio de optimalidad promedio para el cual se muestra la existencia de políticas estacionarias. Posteriormente, vía programación dinámica y Q-learning se determina el costo óptimo y la política óptima. Para finalizar, se proporcionan algunos experimentos numéricos que validan los resultados obtenidos por programación dinámica y Q-learning. Además, se realiza una comparación de las soluciones que producen estas dos técnicas.
Article Details
Como citar
PORTILLO-RAMÍREZ, Gustavo; CRUZ-SUÁREZ, Hugo; VELASCO-LUNA, Fernando.
Un panorama general del aprendizaje por refuerzo en líneas de espera controladas a tiempo discreto.
CIENCIA ergo-sum, [S.l.], v. 32, ago. 2024.
ISSN 2395-8782.
Disponible en: <https://cienciaergosum.uaemex.mx/article/view/22941>. Fecha de acceso: 15 mar. 2026
doi: https://doi.org/10.30878/ces.v32n0a31.
Sección
Espacio del divulgador

Esta obra está bajo licencia internacional Creative Commons Reconocimiento-NoComercial-SinObrasDerivadas 4.0.
Citas
Abdulla, M. S., & Bhatnagar, S. (2007). Reinforcement learning based algorithms for average cost Markov decision
processes. Discrete Event Dynamic Systems, 17(1), 23-52.
Abounadi, J., Bertsekas, D., & Borkar, V. S. (2001). Learning algorithms for Markov decision processes with
average cost. SIAM Journal on Control and Optimization, 40(3), 681-698. https://doi.org/10.1137/
S036301299936197
Afolalu, S. A., Ikumapayi, O. M., Abdulkareem, A., Emetere, M. E., & Adejumo, O. (2021). A short review on
queuing theory as a deterministic tool in sustainable telecommunication systems. Materials Today: Proceedings,
44, 2884-2888. https://doi.org/10.1016/j.matpr.2021.01.092
Alfa, A. S. (2016). Applied discrete-time queues (2nd. ed.). Springer New York.
Arapostathis, A., Borkar, V. S., Fernández-Gaucherand, E., Ghosh, M. K., & Marcus, S. I. (1993). Discrete-time
controlled Markov processes with average cost criterion: A survey. SIAM Journal on Control and Optimization,
31(2), 282-344. https://doi.org/10.1137/0331018
Bellman, R. (1957). Dynamic programming. Princeton University Press.
Bertsekas, D. (1995). Dynamic programming and optimal control: Volume I. Athena Scientific.
Boucherie, R. J., & Van Dijk, N. M. (Eds.) (2017). Markov Decision Processes in Practice. Springer.
Cao, X. R. (2021). Foundations of average-cost nonhomogeneous controlled Markov chains. Springer International
Publishing.
Feinberg, E. A., & Shwartz, A. (Eds.). (2012). Handbook of Markov decision processes: methods and applications.
Springer Science & Business Media New York.
Fomundam, S., & Herrmann, J. W. (2007). ISR Technical Report 2007-24. A survey of queuing theory applications
in healthcare. Institute for Systems Research, 1-22.
Gosavi, A. (2015). Simulation-based optimization: parametric optimization techniques and reinforcement learning.
Springer.
Gosavi, A. (2008). On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning. In
2008 Winter, Simulation Conference (pp. 525-531). IEEE. https://doi.org/10.1109/WSC.2008.4736109
Hamilton, M. A., Jaradat, R., Jones, P., Wall, E. S., Dayarathna, V. L., Ray, D., & Hsu, G. S. E. (2018). Immersive
virtual training environment for teaching single-and multi-queuing theory: Industrial engineering queuing
theory concepts. In 2018 ASEE Annual Conference & Exposition. https://doi.org/10.18260/1-2--30597
Hernández-Hernández, D., & Minjárez-Sosa, J. A. (Eds.). (2012). Optimization, control, and applications of stochastic
systems: in honor of Onésimo Hernández-Lerma. Birkhӓuser.
Hillier, F. S. (2005). Introduction to operations research. McGraw-Hill Science Engineering.
Lari Dashtbayaz, M., ZolfagharArani, M. H., & Akrami, K. (2019). Competing Earnings Announcements; Study
of queuing theory of Investor behavior in the Analysis of Earnings Announcements. Journal Financial
Accounting Knowledge, 5(4), 103-126. https://doi.org/10.30479/JFAK.2019.1571
Lovas, A., & Rásonyi, M. (2021). Markov chains in random environment with applications in queuing theory
and machine learning. Stochastic Processes and their Applications, 137, 294-326. https://doi.org/10.1016/j.
spa.2021.04.002
Portillo-Ramírez, G. (2024). Un ejemplo de aplicación del aprendizaje por refuerzo en líneas de espera controladas.
https://docs.google.com/document/d/1Y8c_IVcC9ZDAe-WQCakOz6qrtkFrCMuN/edit?pli=1&tab=t.0
Puterman, M. L. (2005). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
R Project (2022). The R Project for Statistical Computing. https://www.R-project.org/
Ross, S. M. (1970). Applied probability models with optimization applications. Dover Publications Inc New York.
Rykov, V., & Kitaev, M. Y. (1995). Controlled queueing systems. Journal of Applied Mathematics and Stochastic
Analysis, 8(4), 433–435. https://doi.org/10.1155/S1048953395000414
Sharma, O. P., & Gupta, U. C. (1982). Transient behaviour of an M/M/1/N queue. Stochastic Processes and their
Applications, 13(3), 327-331. https://doi.org/10.1016/0304-4149(82)90019-9
Sennott, L. I. (2009). Stochastic dynamic programming and the control of queueing systems. John Wiley & Sons.
processes. Discrete Event Dynamic Systems, 17(1), 23-52.
Abounadi, J., Bertsekas, D., & Borkar, V. S. (2001). Learning algorithms for Markov decision processes with
average cost. SIAM Journal on Control and Optimization, 40(3), 681-698. https://doi.org/10.1137/
S036301299936197
Afolalu, S. A., Ikumapayi, O. M., Abdulkareem, A., Emetere, M. E., & Adejumo, O. (2021). A short review on
queuing theory as a deterministic tool in sustainable telecommunication systems. Materials Today: Proceedings,
44, 2884-2888. https://doi.org/10.1016/j.matpr.2021.01.092
Alfa, A. S. (2016). Applied discrete-time queues (2nd. ed.). Springer New York.
Arapostathis, A., Borkar, V. S., Fernández-Gaucherand, E., Ghosh, M. K., & Marcus, S. I. (1993). Discrete-time
controlled Markov processes with average cost criterion: A survey. SIAM Journal on Control and Optimization,
31(2), 282-344. https://doi.org/10.1137/0331018
Bellman, R. (1957). Dynamic programming. Princeton University Press.
Bertsekas, D. (1995). Dynamic programming and optimal control: Volume I. Athena Scientific.
Boucherie, R. J., & Van Dijk, N. M. (Eds.) (2017). Markov Decision Processes in Practice. Springer.
Cao, X. R. (2021). Foundations of average-cost nonhomogeneous controlled Markov chains. Springer International
Publishing.
Feinberg, E. A., & Shwartz, A. (Eds.). (2012). Handbook of Markov decision processes: methods and applications.
Springer Science & Business Media New York.
Fomundam, S., & Herrmann, J. W. (2007). ISR Technical Report 2007-24. A survey of queuing theory applications
in healthcare. Institute for Systems Research, 1-22.
Gosavi, A. (2015). Simulation-based optimization: parametric optimization techniques and reinforcement learning.
Springer.
Gosavi, A. (2008). On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning. In
2008 Winter, Simulation Conference (pp. 525-531). IEEE. https://doi.org/10.1109/WSC.2008.4736109
Hamilton, M. A., Jaradat, R., Jones, P., Wall, E. S., Dayarathna, V. L., Ray, D., & Hsu, G. S. E. (2018). Immersive
virtual training environment for teaching single-and multi-queuing theory: Industrial engineering queuing
theory concepts. In 2018 ASEE Annual Conference & Exposition. https://doi.org/10.18260/1-2--30597
Hernández-Hernández, D., & Minjárez-Sosa, J. A. (Eds.). (2012). Optimization, control, and applications of stochastic
systems: in honor of Onésimo Hernández-Lerma. Birkhӓuser.
Hillier, F. S. (2005). Introduction to operations research. McGraw-Hill Science Engineering.
Lari Dashtbayaz, M., ZolfagharArani, M. H., & Akrami, K. (2019). Competing Earnings Announcements; Study
of queuing theory of Investor behavior in the Analysis of Earnings Announcements. Journal Financial
Accounting Knowledge, 5(4), 103-126. https://doi.org/10.30479/JFAK.2019.1571
Lovas, A., & Rásonyi, M. (2021). Markov chains in random environment with applications in queuing theory
and machine learning. Stochastic Processes and their Applications, 137, 294-326. https://doi.org/10.1016/j.
spa.2021.04.002
Portillo-Ramírez, G. (2024). Un ejemplo de aplicación del aprendizaje por refuerzo en líneas de espera controladas.
https://docs.google.com/document/d/1Y8c_IVcC9ZDAe-WQCakOz6qrtkFrCMuN/edit?pli=1&tab=t.0
Puterman, M. L. (2005). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
R Project (2022). The R Project for Statistical Computing. https://www.R-project.org/
Ross, S. M. (1970). Applied probability models with optimization applications. Dover Publications Inc New York.
Rykov, V., & Kitaev, M. Y. (1995). Controlled queueing systems. Journal of Applied Mathematics and Stochastic
Analysis, 8(4), 433–435. https://doi.org/10.1155/S1048953395000414
Sharma, O. P., & Gupta, U. C. (1982). Transient behaviour of an M/M/1/N queue. Stochastic Processes and their
Applications, 13(3), 327-331. https://doi.org/10.1016/0304-4149(82)90019-9
Sennott, L. I. (2009). Stochastic dynamic programming and the control of queueing systems. John Wiley & Sons.
http://orcid.org/0000-0002-2457-2033