Welcome! nomad.green is a bit of a play on words– Nomad’s logo is, of course, green– and Nomad’s simplicity and lightweight footprint leave it uniquely situated to being a “green” orchestrator. This space is dedicated to exploring HashiCorp Nomad as a sustainable orchestrator and scheduler.
Initially, it stands as a companion to the HashiConf EU session Sustainable Nomad, and includes references, citations and commentary that would not fit into a twenty-minute session.
References for HashiConf talk
The definitions provided for technical, operational, and environmental stability are borrowed from Bill Johnson’s excellent work on sustainability as it pertains to Site Reliability Engineering. In particular, his presentation at USENIX SRECon 2020 inspired us to address multiple aspects of sustainability instead of taking a purely operational approach.
Hidden operational costs
The study about on-call work and fatigue referenced in the talk can be found here. This is essential reading for anyone in an on-call rotation, and particularly for management in organizations that rely on on-call staff.
Ziebertz, C. M., van Hooff, M. L., Beckers, D. G., Hooftman, W. E., Kompier, M. A., & Geurts, S. A. (2015). The Relationship of On-Call Work with Fatigue, Work-Home Interference, and Perceived Performance Difficulties. BioMed Research International, 2015, 1–10. https://doi.org/10.1155/2015/643413
Toil and Complexity
The definition of toil we use is from Chapter 5 of Google’s Site Reliability Engineering book. It is called out in this post on the Google Cloud blog, in which Eric Harvieux discusses the practical challenges around identifying toil, and his colleague Laura Beegle’s work around investigating complexity within Google. From this, we get our concept of experienced complexity, and the particularly insightful observation that “the observable outcome of well-managed system complexity is a better user experience.”
Complexity and Culture
“What are your top challenges to using and deploying containers?” The data cited in the talk (41% complexity, 41% cultural changes with
the development team) is from the CNCF Survey Report from 2020.
Nomad allows practitioners and organizations to decouple modern scheduling and orchestration from containerization. HashiCorp’s Nomad documentation and Learn guides cover these topics in depth. General links provided; more specific information to follow.
Where Your Workloads Are
The figures cited here are also from the CNCF Survey Report, 2020.
Memory oversubscription in action is shown in Nomad’s Topology Visualization.
For the sake of brevity, the talk on sustainability didn’t mention too much about autoscaling, in part because it has been given many excellent treatments by Nomad contributors, including James Rasell’s talk at HashiConf EU the day prior.
Other relevant work
Some work on carbon-aware scheduling has been done on the Kubernetes side, but it is in its infancy.
- Bill Johnson (again!) discusses the topic of carbon-aware Kubernetes and sketches out deployment strategies.
- Research: A Low-Carbon Kubernetes Scheduler
This talk is something of a (serendipitous) spiritual successor to Seth Vargo’s talk on the Ecological Impact of Compute from a few years ago. Thanks to Nic Jackson for pointing this out!
Ecosystem and quality of life
A major component of operational sustainability is ecosystem and community. These are critical areas for Nomad in the coming months, and you can help be a part of these efforts.
The clearest immediate future directions in this area are cost-aware scheduling, using compute cost as a carbon proxy, spot instance markets to take advantage of otherwise wasted compute resources, and architecting green-field infrastructure to take advantage of low-effort cost savings, e.g., non-critical batch jobs on follow-the-sun schedules. This section is a work in progress.
- The Nomad Project - HashiCorp
- Discuss - Nomad Forums - HashiCorp
- Nomad Gitter, the indomitable
- And more!
Thanks to HashiCorp and the Nomad Engineering team for making this all possible, and continuing to push the boundaries of what an inspired team can accomplish. I would particularly like to thank Melissa Gurney Greene, Nic Jackson, Jacquie Grindrod, Taylor Dolezal, and Jono Solsulska on the Dev Rel team for soundboarding this talk; James Rasell and Luiz Aoqui of the Nomad Ecosystem team for their work on Nomad’s Autoscaler and Dynamic Application Sizing; and Michael Schurter for helping us all believe.
There’s no Nomad without the community. We 💚 you.
I am a human named Gale. If you need to, you can get in touch with me in a variety of ways: