Cost-Efficient Resilient Data Engineering Workloads Using Preemptible Resources

Authors

  • Vishal Mukeshbhai Shah International Institute of Information Technology

DOI:

https://doi.org/10.47941/ijce.3038

Keywords:

Preemptible Computing, Cost Optimization, Resilient Architecture, Data Engineering, Cloud Resources

Abstract

This article examines how organizations can optimize cloud computing costs through resilient data engineering workloads on preemptible resources. By leveraging discounted but ephemeral computing offerings from major cloud providers, enterprises can achieve significant cost reductions while maintaining operational reliability. The discussion covers the fundamental characteristics of preemptible computing resources, architectural patterns for resilient data processing, case studies of successful ETL workload optimizations, and applications for machine learning training. Key findings demonstrate that properly designed resilient architectures can withstand interruptions while preserving processing integrity, enabling organizations to harness substantial cost advantages through partitioning, checkpointing, and stateless processing patterns. The article further explores how these architectural approaches not only deliver direct economic benefits but also contribute to enhanced security postures, improved disaster recovery capabilities, and more efficient resource utilization across enterprise computing environments, providing a comprehensive framework for technical leaders seeking to balance cost optimization with operational resilience in increasingly complex cloud ecosystems.

Downloads

Download data is not yet available.

References

GitGuardian, "Ephemeral Workload Security in Cloud Environments," [Online]. Available: https://www.gitguardian.com/nhi-hub/ephemeral-workload-security-in-cloud-environments

Ashish Kumar Mishra, et al., A survey on optimal utilization of preemptible VM instances in cloud computing," ACM Digital Library, 2018. [Online]. Available: https://dl.acm.org/doi/abs/10.1007/s11227-018-2509-0

Prateek Sharma, et al., "Portfolio-driven Resource Management for Transient Cloud Servers," ACM Digital Library, 2017. [Online]. Available: https://dl.acm.org/doi/10.1145/3084442

Eli Cortez, et al., "Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms," ACM Digital Library. [Online]. Available: https://dl.acm.org/doi/10.1145/3132747.3132772

Feng Yan, et al., "Optimizing Power and Performance Trade-offs of MapReduce Job Processing with Heterogeneous Multi-core Processors," IEEE, 2014. [Online]. Available: https://ieeexplore.ieee.org/document/6973747

Shuo Liu, et al., "Profit Aware Load Balancing for Distributed Cloud Data Centers," IEEE, 2013. [Online]. Available: https://ieeexplore.ieee.org/document/6569848

Abhishek Verma, et al., "Large-scale cluster management at Google with Borg," ACM digital library, 2015. [Online]. Available: https://dl.acm.org/doi/10.1145/2741948.2741964

Xiao Zhang, "CPI2: CPU performance isolation for shared compute clusters," The University of Kansas. [Online]. Available: https://www.ittc.ku.edu/~heechul/courses/eecs750/S14/slides/W4-CPI2-sid.pdf

Edo Liberty, et al., "Elastic Machine Learning Algorithms in Amazon SageMaker," ACM Digital Library, 2020. [Online]. Available: https://dl.acm.org/doi/10.1145/3318464.3386126

Haoyu Zhang, et al., “SLAQ: Quality-Driven Scheduling for Distributed Machine Learning," arxiv, 2018. [Online]. Available: https://arxiv.org/abs/1802.04819

Downloads

Published

2025-07-28

How to Cite

Shah, V. M. (2025). Cost-Efficient Resilient Data Engineering Workloads Using Preemptible Resources. International Journal of Computing and Engineering, 7(18), 1–11. https://doi.org/10.47941/ijce.3038

Issue

Section

Articles