Welcome! nomad.green is a bit of a play on words– Nomad’s logo is, of course, green– and Nomad’s simplicity and lightweight footprint leave it uniquely situated to being a “green” orchestrator. This space is dedicated to exploring HashiCorp Nomad as a sustainable orchestrator and scheduler.

Initially, it stands as a companion to the HashiConf EU session Sustainable Nomad, and includes references, citations and commentary that would not fit into a twenty-minute session.

References for HashiConf talk

Sustainability

The definitions provided for technical, operational, and environmental stability are borrowed from Bill Johnson’s excellent work on sustainability as it pertains to Site Reliability Engineering. In particular, his presentation at USENIX SRECon 2020 inspired us to address multiple aspects of sustainability instead of taking a purely operational approach.

Hidden operational costs

The study about on-call work and fatigue referenced in the talk can be found here. This is essential reading for anyone in an on-call rotation, and particularly for management in organizations that rely on on-call staff.

Citation:

Ziebertz, C. M., van Hooff, M. L., Beckers, D. G., Hooftman, W. E., Kompier, M. A., & Geurts, S. A. (2015). The Relationship of On-Call Work with Fatigue, Work-Home Interference, and Perceived Performance Difficulties. BioMed Research International, 2015, 1–10. https://doi.org/10.1155/2015/643413

Toil and Complexity

The definition of toil we use is from Chapter 5 of Google’s Site Reliability Engineering book. It is called out in this post on the Google Cloud blog, in which Eric Harvieux discusses the practical challenges around identifying toil, and his colleague Laura Beegle’s work around investigating complexity within Google. From this, we get our concept of experienced complexity, and the particularly insightful observation that “the observable outcome of well-managed system complexity is a better user experience.”

Complexity and Culture

“What are your top challenges to using and deploying containers?” The data cited in the talk (41% complexity, 41% cultural changes with
the development team) is from the CNCF Survey Report from 2020.

Migrating Sustainably

Nomad allows practitioners and organizations to decouple modern scheduling and orchestration from containerization. HashiCorp’s Nomad documentation and Learn guides cover these topics in depth. General links provided; more specific information to follow.

Where Your Workloads Are

The figures cited here are also from the CNCF Survey Report, 2020.

Maximizing Efficiency

Visualizing Efficiency

Memory oversubscription in action is shown in Nomad’s Topology Visualization.

Autoscaling

For the sake of brevity, the talk on sustainability didn’t mention too much about autoscaling, in part because it has been given many excellent treatments by Nomad contributors, including James Rasell’s talk at HashiConf EU the day prior.

Notes

Other relevant work

Some work on carbon-aware scheduling has been done on the Kubernetes side, but it is in its infancy.

Prior Art

This talk is something of a (serendipitous) spiritual successor to Seth Vargo’s talk on the Ecological Impact of Compute from a few years ago. Thanks to Nic Jackson for pointing this out!

Ecosystem and quality of life

A major component of operational sustainability is ecosystem and community. These are critical areas for Nomad in the coming months, and you can help be a part of these efforts.

Future Directions

The clearest immediate future directions in this area are cost-aware scheduling, using compute cost as a carbon proxy, spot instance markets to take advantage of otherwise wasted compute resources, and architecting green-field infrastructure to take advantage of low-effort cost savings, e.g., non-critical batch jobs on follow-the-sun schedules. This section is a work in progress.

Links

Gratitude

Nomad

Thanks to HashiCorp and the Nomad Engineering team for making this all possible, and continuing to push the boundaries of what an inspired team can accomplish. I would particularly like to thank Melissa Gurney Greene, Nic Jackson, Jacquie Grindrod, Taylor Dolezal, and Jono Solsulska on the Dev Rel team for soundboarding this talk; James Rasell and Luiz Aoqui of the Nomad Ecosystem team for their work on Nomad’s Autoscaler and Dynamic Application Sizing; and Michael Schurter for helping us all believe.

Community

There’s no Nomad without the community. We 💚 you.

Curator

I am a human named Gale. If you need to, you can get in touch with me in a variety of ways: