Un panorama general del aprendizaje por refuerzo en líneas de espera controladas a tiempo discreto
Main Article Content
Resumen
En este artículo se revisa un panorama general de procesos de decisión de Markov a través de un modelo de línea de espera controlada. El modelo utilizado se trabaja bajo un criterio de optimalidad promedio para el cual se muestra la existencia de políticas estacionarias. Posteriormente, vía programación dinámica y Q-learning se determina el costo óptimo y la política óptima. Finalmente, se proporcionan algunos experimentos numéricos que validan los resultados obtenidos por programación dinámica y Q-learning. Además, se realiza una comparación de las soluciones producidas por estas dos técnicas.
Article Details
Como citar
PORTILLO-RAMÍREZ, Gustavo; CRUZ-SUÁREZ, Hugo; VELASCO-LUNA, Fernando.
Un panorama general del aprendizaje por refuerzo en líneas de espera controladas a tiempo discreto.
CIENCIA ergo-sum, [S.l.], v. 32, ago. 2024.
ISSN 2395-8782.
Disponible en: <https://cienciaergosum.uaemex.mx/article/view/22941>. Fecha de acceso: 25 jun. 2025
doi: https://doi.org/10.30878/ces.v32n0a31.
Sección
Espacio del divulgador

Esta obra está bajo licencia internacional Creative Commons Reconocimiento-NoComercial-SinObrasDerivadas 4.0.
Citas
Abdulla, M. S., & Bhatnagar, S. (2007). Reinforcement learning based algorithms for average cost Markov decision processes. Discrete Event Dynamic Systems, 17(1), 23-52.
Abounadi, J., Bertsekas, D., & Borkar, V. S. (2001). Learning algorithms for Markov decision processes with average cost. SIAM Journal on Control and Optimization, 40(3), 681-698. doi:10.1137/S036301299936197
Afolalu, S. A., Ikumapayi, O. M., Abdulkareem, A., Emetere, M. E., & Adejumo, O. (2021). A short review on queuing theory as a deterministic tool in sustainable telecommunication systems. Materials Today: Proceedings, 44, 2884-2888. doi:10.1016/j.matpr.2021.01.092
Alfa, A. S. (2016). Applied discrete-time queues (2nd ed.). Springer New York. Arapostathis, A., Borkar, V. S., Fernández-Gaucherand, E., Ghosh, M. K., & Marcus, S. I. (1993). Discrete-time controlled Markov processes with average cost criterion: A survey. SIAM Journal on Control and Optimization, 31(2), 282-344. doi:10.1137/033101
Bellman, R. (1957). Dynamic programming. Science, Princeton University Press.
Bertsekas, D. (1995). Dynamic programming and optimal control: Volume I. Athena Scientific.
Boucherie, R. J., & Van Dijk, N. M. (Eds.) (2017). Markov Decision Processes in Practice (Vol. 248). Cham, Switzerland, Springer.
Cao, X. R. (2021). Foundations of average-cost nonhomogeneous controlled Markov chains. Springer International Publishing.
Feinberg, E. A., & Shwartz, A. (Eds.). (2012). Handbook of Markov decision processes: methods and applications (Vol. 40). Springer Science & Business Media.
Fomundam, S., & Herrmann, J. W. (2007). A survey of queuing theory applications in healthcare.
Gosavi, A. (2015). Simulation-based optimization: Parametric Optimization Techniques and Reinforcement Learning. Springer.
Gosavi, A. (2008). On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning. In 2008 Winter, Simulation Conference (pp. 525-531). IEEE. doi:10.1109/WSC.2008.4736109
Hamilton, M. A., Jaradat, R., Jones, P., Wall, E. S., Dayarathna, V. L., Ray, D., & Hsu, G. S. E. (2018, June). Immersive virtual training environment for teaching single-and multi-queuing theory: Industrial engineering queuing theory concepts. In 2018 ASEE Annual Conference & Exposition. doi:10.18260/1-2--30597
Hernández-Hernández, D., & Minjárez-Sosa, J. A. (Eds.). (2012). Optimization, control, and applications of stochastic systems: in honor of Onésimo Hernández-Lerma. Springer Science & Business Media.
Hillier, F. S. (2005). Introduction to operations research. McGrawHill. New York, NY, USA.
Kitaev, M. Y., & Rykov, V. V. (1995). Controlled queueing systems. CRC press. doi:10.1155/S1048953395000414
Lari Dashtbayaz, M., ZolfagharArani, M. H., & Akrami, K. (2019). Competing Earnings Announcements; Study of queuing theory of Investor behavior in the Analysis of Earnings Announcements. Financial Accounting Knowledge, 5(4), 103-126. doi:10.30479/JFAK.2019.1571
Lovas, A., & Rásonyi, M. (2021). Markov chains in random environment with applications in queuing theory and machine learning. Stochastic Processes and their Applications, 137, 294-326. doi:10.1016/j.spa.2021.04.002
Portillo-Ramírez, G. (2024). Algoritmos. Recuperado de: URL: https://docs.google.com/document/d/1Y8c_IVcC9ZDAe-WQCakOz6qrtkFrCMuN/edit?usp=sharing&ouid=109235866241196651564&rtpof=true&sd=true
Puterman, M. L. (2005). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/
Ross, S. M. (1970). Applied probability models with optimization applications. Dover Publications Inc New York.
Sharma, O. P., & Gupta, U. C. (1982). Transient behaviour of an M/M/1/N queue. Stochastic Processes and their Applications, 13(3), 327-331. doi:10.1016/0304-4149(82)90019-9
Sennott, L. I. (2009). Stochastic dynamic programming and the control of queueing systems. John Wiley & Sons.
CC BY-NC-ND
Abounadi, J., Bertsekas, D., & Borkar, V. S. (2001). Learning algorithms for Markov decision processes with average cost. SIAM Journal on Control and Optimization, 40(3), 681-698. doi:10.1137/S036301299936197
Afolalu, S. A., Ikumapayi, O. M., Abdulkareem, A., Emetere, M. E., & Adejumo, O. (2021). A short review on queuing theory as a deterministic tool in sustainable telecommunication systems. Materials Today: Proceedings, 44, 2884-2888. doi:10.1016/j.matpr.2021.01.092
Alfa, A. S. (2016). Applied discrete-time queues (2nd ed.). Springer New York. Arapostathis, A., Borkar, V. S., Fernández-Gaucherand, E., Ghosh, M. K., & Marcus, S. I. (1993). Discrete-time controlled Markov processes with average cost criterion: A survey. SIAM Journal on Control and Optimization, 31(2), 282-344. doi:10.1137/033101
Bellman, R. (1957). Dynamic programming. Science, Princeton University Press.
Bertsekas, D. (1995). Dynamic programming and optimal control: Volume I. Athena Scientific.
Boucherie, R. J., & Van Dijk, N. M. (Eds.) (2017). Markov Decision Processes in Practice (Vol. 248). Cham, Switzerland, Springer.
Cao, X. R. (2021). Foundations of average-cost nonhomogeneous controlled Markov chains. Springer International Publishing.
Feinberg, E. A., & Shwartz, A. (Eds.). (2012). Handbook of Markov decision processes: methods and applications (Vol. 40). Springer Science & Business Media.
Fomundam, S., & Herrmann, J. W. (2007). A survey of queuing theory applications in healthcare.
Gosavi, A. (2015). Simulation-based optimization: Parametric Optimization Techniques and Reinforcement Learning. Springer.
Gosavi, A. (2008). On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning. In 2008 Winter, Simulation Conference (pp. 525-531). IEEE. doi:10.1109/WSC.2008.4736109
Hamilton, M. A., Jaradat, R., Jones, P., Wall, E. S., Dayarathna, V. L., Ray, D., & Hsu, G. S. E. (2018, June). Immersive virtual training environment for teaching single-and multi-queuing theory: Industrial engineering queuing theory concepts. In 2018 ASEE Annual Conference & Exposition. doi:10.18260/1-2--30597
Hernández-Hernández, D., & Minjárez-Sosa, J. A. (Eds.). (2012). Optimization, control, and applications of stochastic systems: in honor of Onésimo Hernández-Lerma. Springer Science & Business Media.
Hillier, F. S. (2005). Introduction to operations research. McGrawHill. New York, NY, USA.
Kitaev, M. Y., & Rykov, V. V. (1995). Controlled queueing systems. CRC press. doi:10.1155/S1048953395000414
Lari Dashtbayaz, M., ZolfagharArani, M. H., & Akrami, K. (2019). Competing Earnings Announcements; Study of queuing theory of Investor behavior in the Analysis of Earnings Announcements. Financial Accounting Knowledge, 5(4), 103-126. doi:10.30479/JFAK.2019.1571
Lovas, A., & Rásonyi, M. (2021). Markov chains in random environment with applications in queuing theory and machine learning. Stochastic Processes and their Applications, 137, 294-326. doi:10.1016/j.spa.2021.04.002
Portillo-Ramírez, G. (2024). Algoritmos. Recuperado de: URL: https://docs.google.com/document/d/1Y8c_IVcC9ZDAe-WQCakOz6qrtkFrCMuN/edit?usp=sharing&ouid=109235866241196651564&rtpof=true&sd=true
Puterman, M. L. (2005). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/
Ross, S. M. (1970). Applied probability models with optimization applications. Dover Publications Inc New York.
Sharma, O. P., & Gupta, U. C. (1982). Transient behaviour of an M/M/1/N queue. Stochastic Processes and their Applications, 13(3), 327-331. doi:10.1016/0304-4149(82)90019-9
Sennott, L. I. (2009). Stochastic dynamic programming and the control of queueing systems. John Wiley & Sons.
CC BY-NC-ND