Journal of Systems Engineering and Electronics ›› 2023, Vol. 34 ›› Issue (3): 723-743.doi: 10.23919/JSEE.2023.000073
• • 上一篇
收稿日期:2022-09-23
出版日期:2023-06-15
发布日期:2023-06-30
Chenggang SHAN1,2(
), Chuge WU1(
), Yuanqing XIA1(
), Zehua GUO1(
), Danyang LIU1(
), Jinhui ZHANG1,*(
)
Received:2022-09-23
Online:2023-06-15
Published:2023-06-30
Contact:
Jinhui ZHANG
E-mail:uzz_scg@163.com;wucg@bit.edu.cn;xia_yuanqing@bit.edu.cn;guo@bit.edu.cn;liudanyang093@163.com;zhangjinh@bit.edu.cn
About author:Supported by:. [J]. Journal of Systems Engineering and Electronics, 2023, 34(3): 723-743.
Chenggang SHAN, Chuge WU, Yuanqing XIA, Zehua GUO, Danyang LIU, Jinhui ZHANG. Adaptive resource allocation for workflow containerization on Kubernetes[J]. Journal of Systems Engineering and Electronics, 2023, 34(3): 723-743.
"
| Notation | Meaning |
| | A K8s node (VM), |
| | The |
| | The allocated CPU resource number through the adaptive resource allocation algorithm |
| | The allocated memory resource number through the adaptive resource allocation algorithm |
| | The accumulated CPU resource number for many task requests |
| | The accumulated memory resource number for many task requests |
| | The maximum remaining resource amount of CPU among K8s cluster nodes |
| | The maximum remaining resource amount of memory among K8s cluster nodes |
| | The total of residual CPU resource across K8s cluster nodes |
| | The total of residual memory resource across K8s cluster nodes |
| | The current task request with respects to |
| | A record of task-state data from Redis database |
| | An interface of acquiring pod list in Informer component |
| | An interface of acquiring node list in Informer component |
| | A data dictionary for storing remaining resources (CPU and memory) of each node |
| | The accumulated CPU resource requirements for all pods on a node |
| | The accumulated memory resource requirements for all pods on a node |
| | The accumulated allocatable CPU resource across K8s cluster nodes |
| | The accumulated allocatable memory resource across K8s cluster nodes |
| | The residual CPU resource in K8s cluster |
| | The residual memory resource in K8s cluster |
| | A data struct from Podlist, which contains many key fields about container’s features |
| | The allocated CPU resource amount for task request based on (9) |
| | The allocated memory resource amount for task request based on (9) |
| | A proportional value derived from experience, |
| | A constant value derived from experience, |
"
| Workflow type | Metrics | Constant arrival | Linear arrival | Pyramid arrival | |||||
| Adaptive | Baseline | Adaptive | Baseline | Adaptive | Baseline | ||||
| Number of workflow rquests | 30 | 30 | 34 | ||||||
| Interval between two requests bursts/s | 300 | 300 | 300 | ||||||
| Montage | Total duration of all workflows/min (standard deviation) | 33.18 ( | 36.79 ( | 26.95 ( | 36.45 ( | 49.31 ( | 54.69 ( | ||
| Average workflow duration/min (standard deviation) | 5.74 ( | 7.80 ( | 5.41 ( | 11.33 ( | 7.22 ( | 11.73 ( | |||
| CPU resource usage (standard deviation) | 0.28 ( | 0.27 ( | 0.35 ( | 0.31 ( | 0.26 ( | 0.20 ( | |||
| Memory resource usage (standard deviation) | 0.28 ( | 0.27 ( | 0.35 ( | 0.31 ( | 0.26 ( | 0.20 ( | |||
| Epigenomics | Total duration of all workflows/min (standard deviation) | 30.55 ( | 39.06 ( | 34.3 ( | 43.66 ( | 51.42 ( | 62.12 ( | ||
| Average workflow duration/min (standard deviation) | 4.24 ( | 9.35 ( | 9.81 ( | 16.53 ( | 9.65 ( | 19.41 ( | |||
| CPU resource usage (standard deviation) | 0.34 ( | 0.27 ( | 0.32 ( | 0.25 ( | 0.21 ( | 0.20 ( | |||
| Memory resource usage (standard deviation) | 0.34 ( | 0.27 ( | 0.32 ( | 0.25 ( | 0.21 ( | 0.20 ( | |||
| CyberShake | Total duration of all workflows/min (standard deviation) | 38.30 ( | 50.29 ( | 34.06 ( | 49.46 ( | 46.76 ( | 66.41 ( | ||
| Average workflow duration/min (standard deviation) | 9.19 ( | 17.29 ( | 9.41 ( | 20.61 ( | 4.94 ( | 19.47 ( | |||
| CPU resource usage (standard deviation) | 0.26 ( | 0.24 ( | 0.27 ( | 0.24 ( | 0.22 ( | 0.19 ( | |||
| Memory resource usage (standard deviation) | 0.26 ( | 0.24 ( | 0.27 ( | 0.23 ( | 0.22 ( | 0.19 ( | |||
| LIGO | Total duration of all workflows/min (standard deviation) | 30.82 ( | 52.17 ( | 44.02 ( | 53.87 ( | 45.26 ( | 63.56 ( | ||
| Average workflow duration/min (standard deviation) | 4.26 ( | 21.15 ( | 16.21 ( | 28.05 ( | 4.20 ( | 14.07 ( | |||
| CPU resource usage (standard deviation) | 0.40 ( | 0.24 ( | 0.28 ( | 0.23 ( | 0.31 ( | 0.23 ( | |||
| Memory resource usage (standard deviation) | 0.40 ( | 0.24 ( | 0.28 ( | 0.23 ( | 0.31 ( | 0.23 ( | |||
| 1 |
BURNS B, GRANT B, OPPENHEIMER D, et al Borg, omega, and kubernetes. Communications of the ACM, 2016, 59 (5): 50- 57.
doi: 10.1145/2890784 |
| 2 |
BERNSTEIN D Containers and cloud: from LXC to Docker to Kubernetes. IEEE Cloud Computing, 2014, 1 (3): 81- 84.
doi: 10.1109/MCC.2014.51 |
| 3 |
JUVE G, CHERVENAK A, DEELMAN E, et al Characterizing and profiling scientific workflows. Future Generation Computer Systems, 2013, 29 (3): 682- 692.
doi: 10.1016/j.future.2012.08.015 |
| 4 | LEE Y C, ZOMAYA A Y Stretch out and compact: workflow scheduling with resource abundance. Proc. of the IEEE/ACM 13th International Symposium on Cluster, Cloud, and Grid Computing, 2013, 219- 226. |
| 5 | ZHENG C, TOVAR B, THAIN D Deploying high throughput scientific workflows on container schedulers with makeflow and mesos. Proc. of the IEEE/ACM 17th International Symposium on Cluster, Cloud and Grid Computing, 2017, 130- 139. |
| 6 |
SILVER A Software simplified. Nature, 2017, 546 (7656): 173- 174.
doi: 10.1038/546173a |
| 7 |
DI T P, CHATZOU M, FLODEN E W, et al Nextflow enables reproducible computational workflows. Nature Biotechnology, 2017, 35 (4): 316- 319.
doi: 10.1038/nbt.3820 |
| 8 |
DEELMAN E, VAHI K, JUVE G, et al Pegasus, a workflow management system for science automation. Future Generation Computer Systems, 2015, 46, 17- 35.
doi: 10.1016/j.future.2014.10.008 |
| 9 | DEELMAN E, VAHI K, RYNGE M, et al The evolution of the pegasus workflow management software. Computing in Science & Engineering, 2019, 21 (4): 22- 36. |
| 10 |
JALILI V, AFGAN E, GU Q, et al The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update. Nucleic Acids Research, 2020, 48 (W1): W395- W402.
doi: 10.1093/nar/gkaa434 |
| 11 | BADER J, THAMSEN L, KULAGINA S, et al Tarema: adaptive resource allocation for scalable scientific workflows in heterogeneous clusters. Proc. of the IEEE International Conference on Big Data, 2021, 65- 75. |
| 12 | HOENISCH P, SCHULTE S, DUSTDAR S, et al Self-adaptive resource allocation for elastic process execution. Proc. of the IEEE 6th International Conference on Cloud Computing, 2013, 220- 227. |
| 13 | HOENISCH P, SCHULTE S, DUSTDAR S. Workflow scheduling and resource allocation for cloud-based execution of elastic processes. Proc. of the IEEE 6th International Conference on Service-oriented Computing and Applications, 2013. DOI: 10.1109/SOCA.2013.44. |
| 14 | WITT C, WAGNER D, LESER U Feedback-based resource allocation for batch scheduling of scientific workflows. Proc. of the International Conference on High Performance Computing & Simulation, 2019, 761- 768. |
| 15 | KHATUA S, SUR P K, DAS R K, et al Heuristic-based resource reservation strategies for public cloud. IEEE Trans. on Cloud Computing, 2014, 4 (4): 392- 401. |
| 16 |
ABDULLAH M, IQBAL W, BUKHARI F, et al Diminishing returns and deep learning for adaptive CPU resource allocation of containers. IEEE Trans. on Network and Service Management, 2020, 17 (4): 2052- 2063.
doi: 10.1109/TNSM.2020.3033025 |
| 17 | CHEN Z Y, HU J, MIN G, et al Adaptive and efficient resource allocation in cloud datacenters using actor-critic deep reinforcement learning. IEEE Trans. on Parallel and Distributed Systems, 2021, 33 (8): 1911- 1923. |
| 18 | CHEN M S, HUANG S, FU X, et al Statistical model checking-based evaluation and optimization for cloud workflow resource allocation. IEEE Trans. on Cloud Computing, 2016, 8 (2): 443- 458. |
| 19 | SCHULER L, JAMIL S, KUHL N AI-based resource allocation: reinforcement learning for adaptive auto-scaling in serverless environments. Proc. of the IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing, 2021, 804- 811. |
| 20 | SHAN C G, WANG G, XIA Y Q, et al Containerized workflow builder for Kubernetes. Proc. of the IEEE 23rd International Conference on High Performance Computing and Communications, 2021, 685- 692. |
| 21 | SHAN C G, WANG G, XIA Y Q, et al. KubeAdaptor: a docking framework for workflow containerization on Kubernetes. https://arxiv.53yu.com/abs/2207.01222. |
| 22 | IGLESIA D G D L, WEYNS D MAPE-K formal templates to rigorously design behaviors for self-adaptive systems. ACM Trans. on Autonomous and Adaptive Systems, 2015, 10 (3): 15. |
| 23 | ARCAINIP, RICCOBENE E, SCANDURRA P Modeling and analyzing MAPE-K feedback loops for self-adaptation. Proc. of the IEEE/ACM 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, 2015, 13- 23. |
| 24 | RZADCA K, FINDEISEN P, SWIDERSKI J, et al Autopilot: workload autoscaling at google. Proc. of the 15th European Conference on Computer Systems, 2020, 16. |
| 25 | GitHub-source code. https://github.com/CloudControlSystems/ResourceAllocation. |
| 26 |
LEE K, PATON N W, SAKELLARIOU R, et al Adaptive workflow processing and execution in Pegasus. Concurrency and Computation: Practice and Experience, 2009, 21 (16): 1965- 1981.
doi: 10.1002/cpe.1446 |
| 27 |
ISLAM S, KEUNG J, LEE K, et al Empirical prediction models for adaptive resource provisioning in the cloud. Future Generation Computer Systems, 2012, 28 (1): 155- 162.
doi: 10.1016/j.future.2011.05.027 |
| 28 | MAO Y, YAN W F, SONG Y, et al. Differentiate quality of experience scheduling for deep learning inferences with docker containers in the cloud. IEEE Trans. on Cloud Computing, 2022. DOI: 10.1109/TCC.2022.3154117. |
| 29 |
YIN L, LUO J, LUO H B Tasks scheduling and resource allocation in fog computing based on containers for smart manufacturing. IEEE Trans. on Industrial Informatics, 2018, 14 (10): 4712- 4721.
doi: 10.1109/TII.2018.2851241 |
| 30 | HU S H, SHI W S, LI G H. CEC: a containerized edge computing framework for dynamic resource provisioning. IEEE Trans. on Mobile Computing, 2022. DOI: 10.1109/TMC.2022.3147800. |
| 31 | CHANG C C, YANG S R, YEH E H, et al. A Kubernetes-based monitoring platform for dynamic cloud resource provisioning. Proc. of the IEEE Global Communications Conference, 2017. DOI: 10.1109/GLOCOM.2017.8254046. |
| 32 | MAO Y, FU Y Q, GU S W, et al. Resource management schemes for cloud-native platforms with computing containers of docker and Kubernetes, 2020. https://arxiv.53yu.com/abs/2010.10350. |
| 33 | CHAKRABORTY J, MALTZAHN C, JIMENEZ L. Enabling seamless execution of computational and data science workflows on hpc and cloud with the popper container-native automation engine. Proc. of the 2nd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC, 2020: 8−18. |
| 34 | WAIBEL P, HOCHREINER C, SCHULTE S, et al ViePEP-C: a container-based elastic process platform. IEEE Trans. on Cloud Computing, 2019, 9 (4): 1657- 1674. |
| 35 | SCHULTE S, HOENISCH P, VENUGOPAL S, et al Introducing the vienna platform for elastic processes. Proc. of the International Conference on Service-Oriented Computing, 2013, 179- 190. |
| 36 | HOENISCH P, SCHULLER D, SCHULTE S, et al Optimization of complex elastic processes. IEEE Trans. on Services Computing, 2015, 9 (5): 700- 713. |
| 37 |
KEPHART J O, CHESS D M The vision of autonomic computing. Computer, 2003, 36 (1): 41- 50.
doi: 10.1109/MC.2003.1160055 |
| 38 | Pegasus. Workflow gallery. https://pegasus.isi.edu/workflow_gallery. |
| 39 | Kubernetes. Configure quality of service for pods. https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod. |
| 40 | GitHub-source code. https://github.com/CloudControlSystems/OOM-Test. |
| No related articles found! |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||