Chapter 8 - AI and LLMs

Building Green Software by Anne Currie, Sarah Hsu and Sara Bergman, published by O'Reilly is available here under a CC BY-NC-ND Creative Commons license i.e. you can read it and quote it for non commercial purposes as long as you attribute the source (O'Reilly's book) and don't use it to produce derivative works.

You can buy the book from good bookstores including Amazon in all regions (currently on offer in the UK). or read it on the O'Reilly site if you are an O'Reilly subscriber.

Chapter 8 - Greener Machine Learning, AI and LLMs

“A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.” - Alan Turing 1950

In late March of 2023 Bill Gate wrote in his blog GateNotes “The Age of AI has begun” and said that artificial intelligence will be as revolutionary as mobile phones or the Internet. That is quite the statement, especially for the two authors who are barely old enough to remember a time without mobile phones or the Internet. There is no denying that we live in the age of Artificial Intelligence (AI) and Machine Learning (ML). Never before has it had as profound an impact on our lives as it has right now. AI makes splashy headlines in all areas of life, from art, to medicine, to warfare, to school papers, to climate.

AI is not new. Alan Turing first suggested the idea of a “thinking machine” in his paper “Computing Machinery and Intelligence” in 1950. In this article he defines the, now world-famous, idea of “The Imitation Game”, which we now call the Turing Test. The test is designed to judge whether or not a computer has human, or human-like, cognitive abilities. This idea was, and still is over 70 years later, captivating and thought provoking. Not only to industry folks like you and we, but also to Hollywood as evident by several on-screen adaptations of the idea like Westworld, Ex Machina or The Imitation Game.

Since the first AI models in the 1950s and 1960s, AI continued to develop at the same pace as Moore’s law up until 2012, after that model size and innovation exploded. In the early months of 2023 the world saw a new paradigm shift with large language models (LLMs) becoming available to the general public through models and services like ChatGPT, Microsoft365 co-pilot and Google’s PaLM2. LLMs are a type of AI algorithm that uses deep learning techniques and massively large data sets to understand, summarize, generate and predict new content. These models dramatically increased the amount of data used for training and inference compared to previous language models.

In this chapter we will focus on how to build greener ML and AI systems, not on whether or not AI will change our lives, or take over the world. Although we feel like this is the place to say that we consider climate change to be a larger threat to humanity than the rise of AI.

To understand why AI deserves its own chapter, we’ll walk you through the rapid growth of AI usage and AI models (spoiler: it is way, way faster than Moore’s law). The bulk of this chapter is dedicated to mitigation approaches across the ML lifecycle. The life cycle can be defined in many ways but for this chapter we will stick with a simplified model of “Project Planning”-> “Data Collection” -> “Design and Training of ML models” -> “Deployment and Maintenance”. Green AI could be a whole book in and of itself. This chapter is more of a snackable bite size.

Growth, in size and usage

We have already mentioned that AI models originally grew approximately with the speed of Moore’s law. Hardware has always been an enabler for AI, firstly Moore’s law was enough but in later years the increased use of parallelization, GPUs and other specialized hardware like AI accelerators have (no pun intended) accelerated the field further.

How does model size impact sustainability of ML? Let’s look at an example. In 2019 Emma Strubell, Ananya Ganesh, and Andrew McCallum wrote a paper called “Energy and policy considerations for deep learning in NLP” which by now is heavily cited. In this paper they characterize the dollar cost and carbon emissions of deep learning in natural language processing. Specifically they analyzed 4 popular, what they called “off-the-shelf” models, Transformer, ELMo, BERT and GPT-2. One of the most cited results from that paper is that training Transformer (with 213 million parameters) with neural architecture search one single time emitted as much carbon as five times the lifetime of an American car. Which is quite significant. The smaller models in the paper have a smaller carbon cost associated with them, yet many chose to only cite the data from training the Transformer model. This raised some criticism to this paper, saying that a model with the size of Transformer would be very rare due to how expensive it was to train, in dollar cost. How wrong they were. Since then model size has exploded and large models are no longer a rarity. By 2022, 2023 and 2024 the large models we see now have several hundred billion parameters, which is about a thousand times larger than Transformer.

So we know that model size is growing, causing increased carbon cost for training. Training is however only one side of the puzzle. It is the area where we as of now have the most research available, which is likely more a reflection of the state of research than the state of software companies operations. The use of AI features is growing rapidly too. The state of AI in 2022 report by McKinsey shows that AI adoption has more than doubled between 2017 and 2022, even though they did see a plateau in the first part of the 2020’s. With the true breakthrough of large language models (LLMs) in the start of 2023, it will be interesting to see what the adoption rate looks like in their 2023 report. The same report from McKinsey in 2022 also found that the number of AI capabilities used by organizations has also doubled between 2018 and 2022, with NLP taking the lead. The State of AI in the Enterprise report from Deloitte in 2022 found that 94% of their respondents say AI is critical to success and 79% of respondents say they've fully deployed three or more types of AI, compared to just 62% in 2021.

To summarize, we can see that the size of AI models is growing as well as the usage of said models. This makes AI and sustainability interesting to talk about, beyond asking ChatGPT if it believes in climate change or not.

Project Planning

Project planning is the first phase of most software products. This is where you really have the chance to design for green. Now is a great time to ask difficult questions! Questions like “What will be the climate impact of this new system?” or “How do we plan to measure the impact?”. Changing the design when your product is still a paper product is much cheaper compared to when it is already written and deployed and in-use.

If you want to include some carbon-aware features, like demand shaping, this is a great time to start these conversations. Demand shaping means changing the behavior of your product depending on the carbon intensity of your grid, like we talked about in Chapter 5. If you need a refresher, you can think of how the video quality of a video conferencing call changes depending on your internet bandwidth. The same thinking can be applied to the carbon intensity of the grid. Maybe you serve less computationally intense recommendations when the user’s grid is dirty? Project planning is a good time to have these conversations.

Another thing to consider in the early phases are your service level agreements (SLA) and service level objectives (SLOs). According to Adrian Cockcroft, ex-VP of Sustainable Architecture at AWS, “The biggest win is often changing requirements or SLAs. Reduce retention time for log files. Relax overspecified goals.”. Critically considering what service level targets your service or customer actually needs and not delivering more than needed can be a big sustainability win.

Data Collection

Data collection is the second phase of the ML life cycle. Data collection means gathering raw data from potentially various sources. These sources could be things like your own internal sales system, the Internet, a small part of the Internet like a specific forum website, or survey data.

This phase is often seen as a less glamorous part of the ML lifecycle. We use all sorts of euphemisms to further emphasize how not-fun this part is. “Crap in, crap out”, “Data cleaning”, “Panning for gold” etc. All these comparisons bring your imagination to dirty manual labor. It is a little ironic that data collection has such a bad rep, when we do know that data quality issues have a cascading effect further ahead in the life cycle, such as reducing accuracy.

As we build larger ML models, they need larger datasets to prevent overfitting and for the data in our model to actually be representative of the full, real-world, data. Large datasets can also potentially be reused for other projects later, hence, they become even more attractive. Since datasets are growing, it means that green data collection is becoming increasingly important in order to keep carbon cost down. Even so, there is surprisingly little research on how much of the carbon footprint the data collection stands for. But fear not, there are some tools you can use to minimize the footprint of your data collection pipeline.

Firstly, critically think about how much data you actually need and if there are already open-source datasets which could suit your scenario. Using already gathered data means that you do not have to spend additional carbon emissions building your own data pipeline. Luckily for you, there are already lots of data sets available to use, some open-source and free, others available at a cost. Just to mention two examples, HuggingFace have over 75k data sets available and Kaggle have over 280k data sets available. Both of these resources are open-source and publicly available and cover a wide range of scenarios, from images of cats to avocado prices to neonatal mortality in SAARC countries, just to name a few.

In times when data collection does not need to happen on-demand, then consider demand shifting as one way to make use of when and where there is green energy available to us. In Chapter 5 you learned more about how this can be achieved.

<Side bar>Large datasets often carry ethical implications as well. There can be a lack of informed consent in how these datasets are built and used, or it might not even be possible to later on withdraw your consent to being a part of the dataset. This has sparked some debate in the art community with the rise of high quality AI generated images, but as software practitioners your code could be used in AI models, so this is not only a concern for artists, but for all.

Another example of ethical implications can be seen with the rise of reinforcement learning with human feedback (RLHF). Reinforcement learning first only used “raw data'' for training, meaning data that has not been labeled but rather only scraped from the internet in massive quantities. This often worked quite well on a technical level, although not always but that is a story for another time. RLHF did have some issues with content, which soon became too large to ignore. As you can imagine the internet is full of examples of content that might not be suitable for a professional context or might be downright appalling. To combat this, RLHF was created, which uses a combination of raw data and human labeled data. This technique is used by newer LLMs. This labeling is what is the cause for ethical concern. TIME reported in January 2023 how Kenyan workers were paid less than $2 per hour and had to label violent, sexist and racist data in order to purge the model of undesired content. AI and ethics is another topic which is a book in and of itself, so we’ll leave the rest for extracurricular activities for the extra interested ones. </sidebar>

Design and Training of ML models

Up next in the ML lifecycle: the design and training of ML models. This is perhaps the most unique part of the ML lifecycle, where this type of software differs the most from other types of software. It is also an area where we have quite a lot of data and mitigation approaches available to make the phase greener.

Size matters

Training large models requires significant storage and compute cycles, by shrinking the model size it is possible to speed up training time as well as increase the resource efficiency of training. This can in turn save not only time but money and carbon. Shrinking the model sizes is an ongoing research area with several initiatives exploring topics like pruning, compression, distillation, and quantization, among other techniques. As you learned in Chapter 4 about Operational Efficiency, being more resource efficient isn’t a silver bullet on its own, but rather it unlocks the potential to be greener as you can achieve more with the same hardware and energy. Edge computing, where data is processed by devices or servers at the “edge” of the network e.i closer to the end user, and Internet of Things (IoT) means we are seeing more and more devices with limited capabilities; smaller models will be the way to go for these kinds of devices. Another perk of Edge computing is reducing the energy consumption by doing the processing and storage closer to the data source. A sustainability win-win.

One of the shrinking techniques is quantization, a technique which maps continuous infinite values to a smaller set of discrete finite values. In the world of ML, this means representing the ML model with low-precision data types, like 8-bit integers, instead of the usual 32-bit floating point. This is a green technique for several reasons, it saves storage space and is thus more resource efficient, just like the other shrinking techniques mentioned above. Additionally it allows for some operations to perform much faster with integer arithmetic, which saves energy and resources. When quantization is performed during the training phase, it is called quantization aware training. Meta has experimented with training quantization for its LLaMA model in the paper “LLM-QAT: Data-Free Quantization Aware Training for Large Language Models”. The company experimented with LLaMA models of sizes 7B, 13B, and 30B and showed accurate, 4-bit quantization is possible using this technique. This was the one of the first instances of quantization aware training being successfully used for LLMs, which opens the door to more resources efficient training for LLMs.

Another example of model shrinking techniques is pruning. In the paper “PruneTrain: fast neural network training by dynamic sparse model reconfiguration”, where the authors attempt to use pruning in the training phase to decrease the model size. For an image classification scenario, they could show significant reductions in training time and resources use in terms of FLOPs, memory use and inter-accelerator communication.

<sidebar>Using smaller models can also be a way to democratize ML research. If training models “worthy” of research publications require massive computational power, the potential contributors are limited to institutions with deep pockets. This can easily cause fairness issues with only a small crowd being able to contribute to cutting edge research. With smaller models, anyone with a laptop can contribute to the field. </sidebar>

Size isn’t all

While limiting the size of ML models is one great option, there are more available to you when it comes to making your training greener. For example, ML training has the great benefit of very rarely being urgent. Yup, you guessed it, this means it is a great candidate for demand shifting, read more in Chapter 5.

Another option, perhaps most widely used for image recognition or natural language processing, is to leverage pre-trained models. This can be done in two ways, either by using the model as-is or by using transfer learning. Using an existing model as-is will make your training phase very green, as you can practically skip the training altogether. And the greenest software is the software that does not exist (it is also the most boring software). Just like in the data collection phase, the community has come together and there already exists lots of models which are available to you, either publically or which can be purchased. To reuse our examples from before, HuggingFace has over 350k models available and Kaggle has over 2k models publically available.

If you cannot find a perfect model to reuse, but you find one that is almost right, you can use transfer learning. The general idea of transfer learning is that you have a model trained on a large data set, with generic enough data to be considered a pretty good model of the world, which you reuse for a new purpose. Let’s say you have a model which can detect cats in pictures, you can then use transfer learning to adapt the model to instead recognize monkeys. With transfer learning you take advantage of the model already created and avoid re-training it from scratch, rather use techniques like fine-tuning or feature extraction, to adapt the model to your scenario. This saves carbon compared to a full retraining (and possibly data collection!) of a brand new model. This technique is also a great one to talk to your CFO about, as it is much more economically feasible to adapt an existing model to a new problem compared to creating a new one from scratch.

Typically ML models are trained centrally in a data center, which is very convenient when you have centralized data, but different approaches are available. Some of these approaches can be greener if applied with a little thought. One example is training on the Edge which was mentioned earlier in this chapter. Another example is Federated Learning (FL), which has been in production use by for example Google since 2017. FL is a technique where training is distributed across end-user devices, keeping all data local to the device but collaborating on the final model. The paper “Can Federated Learning save the planet?” looks deeper into the environmental cost of federated learning compared to centralized training in terms of carbon cost. Their findings show FL, despite being slower to converge, can be a greener technology than training centralized in data centers. Especially for smaller datasets or less complex models, FL can be a much greener technology.

The advancement of AI is closely linked to the advancement of hardware. Okay, that is arguably true for all of the software industry, but this is even more so for AI. Chapter 6 has a deeper dive into the world of hardware and sustainability, head over there for more content. In this chapter we’ll settle for one example, specialized AI chips. AI chips can be used both for training and inference and they typically include graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and application-specific integrated circuits (ASICs) which are specialized for AI tasks. Because these chips are specialized for the tasks, unlike general purpose hardware chips like central processing units (CPUs), they are massively more efficient. Using a specialized AI chip for training can be 10x-1000x more efficient compared to using generalized hardware. This in turn yields a massive cost win, which for example Image Recognition scenarios have taken advantage of historically. Of course, we have to consider the embodied cost of any hardware produced as well, but more on that trade-off scenario in Chapter 6.

Deployment and Maintenance

For production companies deployment and maintenance may very well be where the most carbon is spent. We can’t say for sure as the area is not well-researched, at least not with publicly available results. However logic dictates that with a full training only happening once and inference happening many, many, many times, this phase is where many of us should be spending our time and attention. This might not be true for all ML scenarios, for example in the research sphere where models are primarily built for the purpose of writing a paper. But in the enterprise software sphere, where the authors of this book hang out, inference is well worth your attention.

One way to make deployment of your ML models greener is to decrease the size of the model in use. In the training section of this chapter we saw that quantization, compression and pruning can be used in the training phase to shrink the model size. These techniques can also be used post-training to decrease the size of the model used when inferencing. Decreasing the size of the final model means two things. Firstly, smaller devices can run these models, which might be great for IoT or client side scenarios. Secondly, it makes deployment of these models cheaper and greener as smaller models in production means that you can be more resource efficient.

When talking about maintenance of ML models, we must mention MLOps (Machine Learning Operations). MLOps is a field which lives in the intersection of Machine Learning, DevOps and Data Engineering. It aims to operationalise the process of taking machine learning models to production, maintaining and monitoring them. When it comes to making the maintenance phase of the ML lifecycle greener, we can conclude that MLOps and DevOps have a lot of things in common, even if they are not identical. As such, we can reuse much of what we learned in Chapter 4 - Operational Efficiency when it comes to reducing the carbon footprint of our operations. The general lessons learned in Chapter 3 - Code Efficiency will also hold true for writing code to serve ML production workloads, so head back there for a reminder if you need it.

Where Should You Start?

We hope that you now have some insights into why AI and ML are interesting to talk about from a sustainability perspective and that we have provided you with some tools for each part of the life cycle. These tools involve using smaller datasets to make data collection greener and using transfer learning, model re-use or smaller models to save carbon in the training phase. Likely your next question is “Where in the life cycle do I spend the most carbon?”.

The short answer is that it depends on your project! If you are working with very large models with brand new data, it makes sense to start looking at your data collection. However, if you are trying to achieve super high accuracy or have a research scenario where you will not really use your model after its creation, then look into training cost. If you are running ML operations in your production workloads, chances are high that deployment and maintenance is where you are spending the most cost.

Page updated

Google Sites

Report abuse