Cost-Efficient Resilient Data Engineering Workloads Using Preemptible Resources
DOI:
https://doi.org/10.47941/ijce.3038Keywords:
Preemptible Computing, Cost Optimization, Resilient Architecture, Data Engineering, Cloud ResourcesAbstract
This article examines how organizations can optimize cloud computing costs through resilient data engineering workloads on preemptible resources. By leveraging discounted but ephemeral computing offerings from major cloud providers, enterprises can achieve significant cost reductions while maintaining operational reliability. The discussion covers the fundamental characteristics of preemptible computing resources, architectural patterns for resilient data processing, case studies of successful ETL workload optimizations, and applications for machine learning training. Key findings demonstrate that properly designed resilient architectures can withstand interruptions while preserving processing integrity, enabling organizations to harness substantial cost advantages through partitioning, checkpointing, and stateless processing patterns. The article further explores how these architectural approaches not only deliver direct economic benefits but also contribute to enhanced security postures, improved disaster recovery capabilities, and more efficient resource utilization across enterprise computing environments, providing a comprehensive framework for technical leaders seeking to balance cost optimization with operational resilience in increasingly complex cloud ecosystems.
Downloads
References
GitGuardian, "Ephemeral Workload Security in Cloud Environments," [Online]. Available: https://www.gitguardian.com/nhi-hub/ephemeral-workload-security-in-cloud-environments
Ashish Kumar Mishra, et al., A survey on optimal utilization of preemptible VM instances in cloud computing," ACM Digital Library, 2018. [Online]. Available: https://dl.acm.org/doi/abs/10.1007/s11227-018-2509-0
Prateek Sharma, et al., "Portfolio-driven Resource Management for Transient Cloud Servers," ACM Digital Library, 2017. [Online]. Available: https://dl.acm.org/doi/10.1145/3084442
Eli Cortez, et al., "Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms," ACM Digital Library. [Online]. Available: https://dl.acm.org/doi/10.1145/3132747.3132772
Feng Yan, et al., "Optimizing Power and Performance Trade-offs of MapReduce Job Processing with Heterogeneous Multi-core Processors," IEEE, 2014. [Online]. Available: https://ieeexplore.ieee.org/document/6973747
Shuo Liu, et al., "Profit Aware Load Balancing for Distributed Cloud Data Centers," IEEE, 2013. [Online]. Available: https://ieeexplore.ieee.org/document/6569848
Abhishek Verma, et al., "Large-scale cluster management at Google with Borg," ACM digital library, 2015. [Online]. Available: https://dl.acm.org/doi/10.1145/2741948.2741964
Xiao Zhang, "CPI2: CPU performance isolation for shared compute clusters," The University of Kansas. [Online]. Available: https://www.ittc.ku.edu/~heechul/courses/eecs750/S14/slides/W4-CPI2-sid.pdf
Edo Liberty, et al., "Elastic Machine Learning Algorithms in Amazon SageMaker," ACM Digital Library, 2020. [Online]. Available: https://dl.acm.org/doi/10.1145/3318464.3386126
Haoyu Zhang, et al., “SLAQ: Quality-Driven Scheduling for Distributed Machine Learning," arxiv, 2018. [Online]. Available: https://arxiv.org/abs/1802.04819
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Vishal Mukeshbhai Shah

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.