Building Green Software by Anne Currie, Sarah Hsu and Sara Bergman, published by O'Reilly is available here under a CC BY-NC-ND Creative Commons license i.e. you can read it and quote it for non commercial purposes as long as you attribute the source (O'Reilly's book) and don't use it to produce derivative works.
You can buy the book from good bookstores including Amazon in all regions (currently on offer in the UK). or read it on the O'Reilly site if you are an O'Reilly subscriber.
Co-Benefits: What Is The Real Battle?
“The greatest victory is that which requires no battle.” ― Sun Tzu, The Art of War.
We would be lying if we said pursuing green software was an easy quest. We would need to hang our heads in shame if we claimed convincing others about the inevitability of sustainability in the tech industry is a simple task. We would have to have a serious conversation with ourselves if we didn’t acknowledge that sustainability has now taken a backseat for many organizations during the current economic downturns.
Our original intention for this chapter was to delve deeper into the multidisciplinary aspects of eco-friendly computing and explore the co-benefits of sustainability. By now, you likely understand the interdisciplinary nature of green software, where all areas of concern in software engineering are interconnected, and achieving a delicate balance among them is no easy task.
However, we cannot ignore the unprecedented challenges that most climate advocates have faced over the past several months. For the first time in history, we are witnessing climate records being broken left, right, and center. Yet, investments in addressing climate change are still struggling to become a top priority for many organizations, including those in the tech sector. (Legislation worldwide is slowly catching up to get everyone in the right gear, but the progress is just not happening fast enough).
Therefore, we are shifting the attention of this chapter slightly. While we will still cover the positive side effects of implementing carbon efficiency in software, we will also provide you with a practical tool—a mental model—to help you demonstrate to your friends, colleagues, managers, and perhaps even your CTO that adopting eco-friendly practices in software operations and development not only benefits the environment (duh!) but also enhances software system performance, reliability, and resilience while reducing costs.
So, hold on tight—this chapter is going to be an interesting one!
We are kicking things off with a topic that will literally get you the most bang for your buck- cost-saving as an advantage of going green.
We assessed the pros and cons of using cost as a proxy measurement for carbon emissions in Chapter 10. In this section, we will examine how and why embracing sustainable practices can enable you to build a budget-optimized product and vice versa.
A cost-effective workload is one that meets all business requirements with the lowest possible expenditure. So, how does a cost-effective workload contribute to sustainability efforts? Let’s find out!
<Sidebar>There are several well-defined patterns for cost optimization when constructing a budget-friendly product that also delivers business value, for example, the guidelines provided by public cloud providers like AWS, Microsoft, and GCP. Instead of listing them all here, we will be cherry-picking the ones that also have environmental gains.</Sidebar>
Get a good fit.
A great way to be cost-effective is to select the resource that best fits your requirements and then use it as it was intended to be used. For example, suppose you have chosen cloud provider GCP as your host. In that case, it's worth the effort to explore GCP’s cloud-native solutions instead of using the cloud offerings as an infrastructure-as-a-service (IaaS) where you need to build, deploy, and manage your own creation because that isn’t how GCP is intended to be used. It wouldn’t reduce your ops expenditure by much (if anything) and wouldn’t take advantage of the huge investments Google has put into its services.
<Sidebar>The term “cloud-native” refers to solutions specifically designed to operate within a public cloud environment right from their very start. These solutions typically leverage technologies like containers, microservices, and, most importantly, services. </Sidebar>
Let’s look at an example of what we mean. Imagine you've been tasked with creating a distributed tracing solution that supports applications in the GCP ecosystem. In this scenario, the best-fitted choice for the data ingestion backend would usually be Google’s Cloud Trace service, which was designed for that purpose. A more labor-intensive option would be to manually configure and deploy Grafana Tempo on top of Google Cloud Storage.
By opting for the official service, you're saving on operational and hosting expenses and reducing carbon emissions because that service has been optimized for efficiency in that environment. You are using GCP as it was intended and benefiting from the costly efficiency work Google has already put into it, as well as the work it will do in the future.
<Sidebar>Distributed tracing is the response of DevOps and SREs to the complexity that arises from migration to microservices environments. It enables us to trace a request as it moves through an application. For example, it helps us track a request from the front-end to multiple microservices in the back-end and finally to the database.</Sidebar>
Another important best practice for getting a great operational fit for your systems is to use dynamic resource allocation to prevent expensive overprovisioning. For instance, implementing autoscaling wherever possible to accommodate the fluctuating demands of a workload. This approach not only reduces costs but also contributes to carbon efficiency (as we discussed in Chapter 4). Removing idle or underutilized resources is even easier and has the same effect of reducing cost and increasing carbon efficiency.
So, how should we tie together the above? FinOps, from the Linux Foundation, was created to help organizations regain control of spiraling cloud computing expenses. It represents a collaborative effort to reduce cloud costs that bridges multiple disciplines, including technology, finance, and business.
FinOps and GreenOps are like sisters from another mother. Both are about
optimizing software systems to reduce machine requirements and expensive electricity use. According to Pini Reznik of GreenOps consultancy re:cinq, folks in the cloud can cut bills by up to 50% by using FinOps and GreenOps best practice tuning and optimization.
Going green really does save you bucks in the bank.
The second point to ponder is reliability and resilience.
Before we dive into the specifics of why and how operating a reliable and resilient software system contributes to environmental efforts, let's begin by distinguishing between the two terms. They are often used interchangeably, but it's vital to understand their individual relevance to sustainability in this context and distinguish them from the related concept of availability.
Availability is a percentage. A service that has 99% availability is one that responds 99% of the time, even if that response time is slow or otherwise poor.
Reliability is related to a system's ability to consistently and correctly perform its intended functions over time. It measures the system’s capacity to withstand failures and errors. For example, if your SLOs require requests to be processed within 3ms, and this happens 99% of the time, then it’s 99% reliable.
In contrast, resilience refers to a system's capability for swift and graceful recovery from failures and disruptions. It demonstrates a system’s ability to resume functionality in the face of unforeseen circumstances. Essentially, as detailed in Microsoft's well-architected framework, “a reliable workload is both resilient and available.”
Resilience is closely intertwined with operational efficiency and is a huge step forward for both system reliability and sustainability.
Until recently, our primary strategy for providing system reliability was redundancy. We used to maintain duplicate physical copies of a system in a variety of different locations with the aim of guaranteeing there was always ready-and-waiting capacity to fail over to in the event of a problem.
The trouble is maintaining copies is wasteful in terms of hardware and electricity. Those backup systems spend the vast majority of their lives sitting idle because that is their point. Redundancy - the clue is in the name.
Such systems exist solely in the hope they will never have to step in and save the day. Unfortunately, if they are ever called upon to do so, they too often fail. This is sometimes due to a lack of testing but also because when one region has an outage, it is not wildly usual for the others to hit the same issue. It turns out that, as well as being carbon inefficient, redundancy is not the best solution for reliability.
Building resilience is the more efficient, modern approach to reliability and is more dynamic and effective, as well as requiring less in the way of reserved hardware resources.
Resilience involves the automated and programmatic operational techniques and lightning-fast responses commonly associated with DevOps and Site Reliability Engineering (SRE). These include monitoring for errors, autoscaling, and automatic restarts. The goal is to recover from issues in real time rather than fail over to waiting servers, and a resilient system may leverage flexible cloud resources even if it is on-prem by default.
Resilience uses automation to reduce time-to-recovery and enhance reliability whilst getting rid of the need for all those static backups. At the same time, the improved monitoring techniques make the system more aware of its energy and hardware usage, and, as efficiency guru Peter Drucker mentioned, what you measure tends to improve.
Example
Let's consider an e-commerce store that sells underwear. We can confidently state that the online shop is reliable because, even during the Black Friday rush, the store can withstand the huge growth in traffic. Customers are still able to browse selections, add items to their carts, and place orders using credit cards without compromising any Service Level Objectives (SLOs), such as transaction latency.
<Sidebar>Latency, one of the four golden signals of monitoring that SREs around the world regard as one of the most vital metrics to observe for any product, is about how long it takes to process a request. In real-world terms, the time needed for the online store to respond to a button.</Sidebar>
Furthermore, we can also say that the store is resilient because it successfully and dynamically shifted itself to a different region when the location where it was initially deployed experienced an outage.
What we have just described is a difficult feat in the realm of software engineering- the ability to provide a reliable and resilient service, and behind every such product lies a constant stream of firefighting incidents. We are sure that nearly all SREs, or anyone who has held a pager, can relate to the terrible experience of dealing with an outage of any scale, but we learn from them and improve. In fact, that’s why our industry started moving from redundancy to the more sustainable and effective resilience.
If SREs can leverage potential catastrophic outages as a means to persuade their teams and higher-ups of the importance of prioritizing resilience from day one, thereby getting a more robust system, why can't green software engineers do the same to get greener systems?
As stated by green software expert Bill Johnson, “Reliability over time is a great way to define sustainability,” and we too see these areas of concern as connected. During the energy transition, a lack of electricity at certain times is going to be a whole new failure mode for systems. On top of that, climate-related weather is going to increase the failure rate of all physical infrastructure. Resilience makes systems greener and cheaper and, as importantly, more reliable in the face of a changing climate.
The human race has always been fascinated by pushing boundaries. We began with physical challenges; for instance, there are many forms of competitive running, such as sprints, marathons, and obstacle racing. When we exhausted the ideas for physical challenges for the human body, we turned to machinery. There are numerous competitive showdowns between various mechanical devices. There are radio-controlled cars, boat racing, and, of course, the fast-and-furious Formula 1. Software engineers are no exception. Ask any developer. Isn't a performant system everyone's wildest dream?
A performant system in the context of software engineering is one that can handle the demands it encounters fast.
Before the widespread usage of cloud computing and the emergence of on-demand offerings, many engineering organizations dealing with on-premise setups had to attempt (and sometimes fail) to predict their peak requirements, and they overprovisioned resources intentionally to make sure they could meet their business needs.
Unfortunately, as we have pointed out before, overprovisioning is a wasteful practice.
It increases embodied carbon due to extra wiring and hardware, and it hinders innovation (because setting up a server isn’t as quick as Amazon’s next-day shipping). It lowers operational efficiency, leading to additional costs and wasteful outcomes across the board, including carbon emissions induced by electricity.
Therefore, striving for an intelligently performant system that doesn’t rely on overprovisioning is not just a hardcore badge that every practitioner wants to collect anymore; it's also a vital goal. Why? This is because such a system is also a green one.
There are many well-established design principles that address performance, but one of the most effective is looking for the best fit between what you want to do and the resources you use to achieve it.
One example is optimizing your compute layer by choosing the appropriate option for your intended use case (in other words, selecting the correct type of machine for your workload - like a memory-optimized one for 4D gaming).
Choosing the best hardware fit not only provides a more performant experience but can also contribute to energy and carbon savings because specialist hardware is much more energy efficient.
<Sidebar>There is always a calculation to be made in terms of the embodied carbon bill from choosing specialist over generalist hardware, but for well-managed, heavily-used hardware, electricity use is usually more of a contributor to climate change than the kit’s embodied carbon. In good data centers, energy efficiency is usually more important.</Sidebar>
Optimizing your storage layer is another well-established approach for performance improvement. Data management has emerged as a field that requires careful consideration. Most enterprises must now meet multiple regulatory requirements for their data storage across many countries and even continents.
For instance, consider a savings account application with data subject to different retention rules. You may need frequent access to a customer's identification details, such as name and account number, while bank statements are required less often and must be retained for different lengths of time in different regions. Deliberately selecting the appropriate storage type for each of those requirements is a cost-saving, performance, and sustainability must.
As an example, let’s consider how you might handle this in the world of AWS. What you could do is consult their object storage performance sheet (a.k.a Performance Chart across the S3 Storage Types). You may choose the glacier-level classes for data that you need to keep for regulatory requirements but don't require constant access, such as 10-year-old emails. You could opt for the standard class for objects that require frequent manipulation, such as the current month's credit card balance.
As well as better performance, selecting the right storage type will lead to, you guessed it, reductions in embodied carbon and energy usage, as AWS can now be smarter about their resource usage.
It's hopefully becoming obvious where we are heading with this section: most of the best practices in performance have knock-on green benefits through efficiency, a topic we discussed in detail in Chapters 3 and 4. To avoid repeating ourselves, please revisit those chapters if you need a quick refresher.
Efficiency is a key driver of both performance and carbon reduction in software, and the result is that greener is usually faster.
Many would argue that security is the most crucial regard of any software business. An unsecured product is a betrayal of users. It can lead to significant damage to an organization’s reputation, eroding valuable trust not only with customers and vendors but also with regulators.
Information security has always been a complex discipline, and regrettably, it has evolved as drastically as climate change. Attackers worldwide now possess more computer processing power than ever before. They also have access to much larger attack surfaces thanks to the ever-increasing “unknown-unknowns” arising from intricate distributed systems (sometimes stretching over multiple hosting providers. Therefore, It would be foolish of us to attempt to provide a comprehensive guide on secure software design. Fortunately, there are many well-written materials available for your reference. In this section, we will pay attention to the characteristics of a secure system that have positive effects on the environment.
<Sidebar>Unknowns-unknowns, which imply unidentified information (i.e., subjects of which you are either unaware or do not fully understand), originate from the 4th quadrant of the Rumsfeld Matrix—a framework not inherently related to software engineering and operations but, nevertheless, a paradigm that has been broadly applied in the field and across various domains to enhance our understanding of certainty and uncertainty when making decisions. For example, in chaos engineering, you are encouraged to conduct experiments following the Rumsfeld matrix to figure out which things to break first!</Sidebar>
A secure system is one that can withstand malicious attacks. Those may range in origin from you absentmindedly clicking on a link that you shouldn't have, resulting in the installation of malware, to a sophisticated distributed denial-of-service (DDoS) attack that disrupts your regular traffic. Either of these scenarios (and many more) can have disastrous outcomes, including revenue loss and reputation damage. One of the lesser appreciated is wasteful carbon emissions.
Security is green
For example, drawing inspiration from CDN company Cloudflare’s real-world analogy, we can think of a DDoS attack like surprise congestion obstructing your morning commute to work. Your workplace symbolizes your servers; you represent a regular user trying to access those servers, and the congestion signifies the DDoS attack’s malicious increase in network traffic that impedes or even blocks genuine requests.
The increase in network load from a DDoS attack can lead to a surge in energy consumption due to overworking resources such as CPU and memory. Such attacks also waste embodied carbon due to the need for additional hardware to handle malicious traffic while maintaining normal service. In short, DDoS attacks waste energy and cause greenhouse gas emissions to no benefit to anyone (apart from the attacker).
Preventing or closing down attacks like DDoS ones rather than overprovisioning in order to survive them is, therefore, greener and more secure. There is plenty of material out there on the techniques to do this, including reducing your attack surface and rate limiting.
<Sidebar>Security expert Ed Harrison pointed out in Chapter 4 that one of the primary ways of reducing your attack surface is also a green technique: close down zombie systems and servers that are no longer needed.</Sidebar>
Secure systems are greener than insecure ones, and a secure system is inherently more credible and dependable. That trustworthiness holds significant value, enhancing marketability and leading to increased sales and adoption. There is, thus, a positive feedback loop between sustainability and security.
Data is everywhere, quite literally.
The watches we wear, the cars we drive, and even the refrigerators where we store our favorite beer are all examples of Internet of Things (IoT) devices. According to the "State of IoT—Spring 2023" report, the world observed an 18% increase in IoT connections in 2022, rising to 14.3 billion active endpoints. They all gather data, and these statistics are just for 2022 IoT devices - we haven't even touched on the data that will be gathered for LLM models yet.
<Sidebar>IoT devices are any machines equipped with sensors that have the ability to collect and analyze data while being connected to other devices through a network (usually the internet).</Sidebar>
Such an enormous volume of data can be thought of as an unorganized pile of Lego pieces and, if thoughtlessly managed, has the potential to be just as hazardous (surely, everyone has accidentally stepped on a loose Lego piece and experienced the agony).
This data is going to be transmitted, processed, and stored, all of which has a hefty carbon bill attached. Therefore, at every stage (source, transformation, pipeline, destination, usage, and behaviors), thorough consideration, planning, and management are imperative. Again, plenty of material exists out there on best practices for IoT data management. The bad news is that being green means, you are going to have to read it.
Control LLMs
The rise of LLMs and AI (LLMs once again taking the spotlight) has resulted in a wealth of new, well-written material on data management, covering a wide range of topics from data analytics to data engineering and data observability (Yay, O’Reilly!).
The good news is that each of these emerging fields offers practical best practice guidelines that align well with green software principles, so follow them. For instance, data sanitization typically leads to more accurate and repeatable end results while also reducing the demands on storage and computation. Therefore, clean and correct data promotes sustainability by encouraging efficiency and automation.
Think about data models
We strongly recommend investing effort in creating a data model that suits your specific use case and aligns with the needs of both upstream and downstream processes (ideally, your entire organization).
Almost all enterprise engineers have faced the endless need to transform data from one format to another to make it compatible with your systems. These transformations are famous for being resource-intensive, even with automation in place. It is, therefore, greener, cheaper, less bug-prone, and more performant to design models that can scale and adapt to new requirements and need less in the way of transformation.
There are many more parallels between sustainability and other data management techniques that we could point out, but it's vital to recognize that the primary challenge in this field is the sheer volume of data, all of which has a potential carbon cost. Following best practices when dealing with data is crucial for reasons of cost, performance, security, and legality, as well as environmental impact.
“The wise warrior avoids the battle.” ― Sun Tzu, The Art of War.
We are singing the same hymn again, namely, another quote from Sun Tzu!
What we are trying to say here is that sustainability does not need to compete with the other priorities in software development and operations. If you are a glass-half-full kind of person, the utopia for green software advocates really is a flowery garden where everything coexists in harmony.
We argue that green software practices should be integrated into everything we build, manage, and operate. In fact, they should not merely be integrated but should also serve as the foundation.
Our stance should not come as a surprise to you (unless Chapter 11 is the first chapter you're reading). The very bad news is our beautiful planet is currently facing significant challenges: we are experiencing out-of-the-ordinary droughts, unexpected storms, and record-breaking temperatures every day. The fortunate news for us is that in our industry, a shift to sustainability is aligned with our other priorities. Being green is about being efficient, which means faster and cheaper software, which, as we have seen, can be used to increase security and resilience and thus reduce risk.
We have now dedicated substantial time to drawing parallels between sustainability and other tech priorities. For example, if you are engaging in conversations with your data team or Chief Data Officer (CDO), you should absolutely seize the opportunity to highlight the strong connection between green software and data engineering.
The strategy of highlighting the alignment between green software and other priorities offers an alternative approach to raising awareness of this crucial topic. While doom and gloom tactics may have been productive in the past, they are no longer the most practical option, especially when budgets are tight. We need to show that being green is an opportunity to build better software, not merely a cost.
We firmly believe that presenting green tech in the following ascending order (depending on your level of maturity, as explained in Chapter 12) will give you a massive leg up!
Carbon efficiency can be achieved through applying existing best practice design principles such as the ones associated with performance and cost-cutting.
Green software is not an ivory tower; it’s part of many other well-defined pillars of consideration in computing; hence, it does not require an entirely new stream of workflow.
Software will benefit from carbon efficiency being integrated into it from the outset. Therefore, sustainability should be considered a fundamental aspect of all software products and decision-making processes.
Much like everything in software engineering, careful thought is required. Striking the right balance, making the correct compromises, and determining precisely what is required from both a functional and non-functional perspective will enable you to create a product that is budget-friendly, reliable, performant, secure, as well as being environmentally friendly.
You're all set! We hope this roller coaster ride of a chapter hasn't left you weary.
Our goal was to shed light on the once-not-so-obvious similarities between the design principles of sustainability and other critical aspects of software engineering. We've covered five selected areas with a couple of guidelines, along with real-world analogies. We have no doubt that you are now capable of conducting similar exercises for any other area of concern you see fit (you can do it!)!
While we fully acknowledge that integrating sustainability into any software system isn't smooth sailing, it's worth noting that adding the challenge of convincing others that it's a priority, especially during economic downturns, is like navigating a small boat through a storm at sea.
However, we have complete faith that, with the journey we've taken you on in this chapter, you are now well-prepared to engage in conversations with everyone around you about integrating sustainability from the get-go.
Implementing sustainability early not only has numerous positive side effects but also might prove to be less challenging than you initially thought. (Thanks to all the best practice pioneers who have paved the way for green software.)