Book Summary: Algorithms to Live By by Brian Christian and Tom Griffiths

In Algorithms to Live By, Brian Christian and Tom Griffiths bring computer science to life through relatable examples involving secretaries, wardrobes, sporting tournaments, and more. This summary covers the main messages from the book, explaining how general principles from computer science might help us in our daily lives.

However, the book also goes into specific computer science problems, such as the optimal stopping problem, sorting, etc. To keep this summary (somewhat) concise, I’ve split some of these out into separate posts, linked under Key Takeaways below. I’ve also omitted a lot (e.g. the whole chapter on “Networking”), which was interesting but not integral to the book. I strongly encourage you to read the book if you’re interested in the topics covered here.

Buy Algorithms to Live By at: Amazon | Kobo (affiliate links)

Table Of Contents

Key Takeaways from Algorithms to Live By
Detailed Summary of Algorithms to Live By
Other Interesting Points
My Thoughts

Key Takeaways from Algorithms to Live By

We can apply algorithms to some of the problems we face in real life.
However:
- Even the best algorithm can have a pretty high failure rate. Having the best process does not guarantee the best outcome.
- Real-life problems are hard, and many are intractable (i.e. no efficient solution exists).
Yet there is still hope:
- Humans do surprisingly well at difficult computer science problems.
- Sometimes less is more, and simple models can outperform complex ones that overfit the data.
- Relaxing an intractable problem can make it tractable.
- Randomness can also help produce “good enough” solutions much quicker.
Computer science manages trade-offs. For example:
- Sorting pre-empts the costs of a future search, so there is a trade-off between sorting and searching.
- Caching manages the trade-off between time and space.
- Queues bring up average throughput at the expense of delay or latency. When a queue or a to-do list gets too long, it’s better to close it off and drop things entirely.
- Relaxing a problem may trade off certainty for time/computational resources.
In real life, we can try to be computationally kind to others. Constraining someone’s choices can make a decision easier for them.

Specific computer science problems and issues:

Optimal stopping problems. Look at the first 37% of options, then immediately leap at the next best one.
Explore-exploit trade-offs. An unexplored option has a high value, because it gives us information for future decisions. But this exploration premium declines over time.
Sorting. Sorting gets much harder as the number of items increases. There are massive diseconomies to scale.
Caching. Removing the least recently used item from your cache is optimal.
Scheduling. Precedence constraints can make problems intractable, while the ability to switch tasks can make it tractable. However, context-switching has a cost. Too much context-switching can lead to thrashing, where nothing gets done.
Bayes’s Rule. We can estimate probabilities by combining pre-existing beliefs (or priors) with new/observed evidence. The prediction rules to apply differ depending on whether something follows a normal, power-law or Erlang distribution.
Networking. Sometimes we need to back off or drop things to prevent a network or resource from getting overloaded.
Game theory. People acting rationally can lead to worse outcomes for all. We can change those outcomes by changing the rules of the game. The price of anarchy tells us whether a centralised approach would be better than a decentralised one.

Detailed Summary of Algorithms to Live By

We can apply algorithms to some of the problems we face in real life

People often think that algorithms are just for computers and data. A common misconception is that computers solve problems through “brute force”, exhaustively considering every option. However, modern computers don’t usually do this as it takes too much processing power and time. Instead, modern computers trade off time with accuracy and use approximations, just like humans do in the real world.

Ultimately, an “algorithm” is just a sequence of steps used to solve a problem. They are much broader, and older, than computers. A cooking recipe, for example, is also an algorithm.

But even the best algorithm can have a pretty high failure rate

We can take some comfort in the fact that we applied an optimal algorithm even if it doesn’t work out.

For example, in the optimal stopping problem, the rule that gives you the best chance of success only has a 37% success rate. It’s not quite as bleak as that though, because:

“Success” here is defined quite narrowly to mean finding the single best candidate, when in real life you may be happy getting one who is adequate, if not the best.
You may also have the chance to recall candidates in practice, which boosts your chances of success. (On the other hand, real life candidates may also reject you, lowering your chances of success.)

Real-life problems are hard, and many are intractable

A problem is considered tractable if an efficient algorithm to solve it exists. An algorithm is considered efficient if it runs in polynomial time (O(n^c)) or less. (See Sorting – How to Measure Algorithms.)

Real life is often too messy for an algorithm to provide us with a single, clear answer. We make decisions under uncertainty and under time constraints. We frequently confront some of the hardest problems that computer scientists face. For example, seating 107 wedding guests at 11 tables can be done in 11^107 ways, which is more than 200 billion googols.

If trying to perfectly manage your calendar feels overwhelming, maybe that’s because it actually is.
Christian and Griffiths, Algorithms to Live By

Many of these problems are intractable. In fact, most scheduling problems are intractable. A recent survey estimated that 84% of scheduling problems had been proven intractable. The actual number could be as high as 91% as we still don’t know whether 7% of problems are tractable or not.

Humans do surprisingly well at difficult computer science problems

It turns out that humans do pretty well when compared to the “optimal” solution. For example:

Optimal stopping. People tend to stop a bit earlier than the theory suggests, but still pretty close to optimal. In a study by Rapoport and Searle in the 1990s, participants found the best possible candidate around 31% of the time, which was close to the optimal 37%. However, that 37% figure doesn’t take into account the time cost. Once you factor that in, humans may be stopping optimally after all.
Explore-exploit trade-off. Some lab experiments suggest that people tend to over-explore, even when they should be in a pure exploit stage. However, this behaviour may be more rational than it first appears. The theoretical solution assumes fixed probabilities that don’t change over time. In real life, that’s rarely the case (which makes the problem so hard that most consider it intractable). So, if probabilities end up changing over time, it may make sense to keep exploring.
Bayes’s Rule. People’s predictions in the real world turn out to be reasonably close to those predicted by Bayes’s Rule, even when we have just a few observations (or even a single one).

Sometimes less is more

Overfitting occurs when an overly complex model fits training data “too well”. Training data contains errors and noise and a model that is too sensitive to this will identify patterns where there are none. Consequently, a complex ten-factor model might perform worse than a simple two-factor model if only a few factors are relevant, and the remaining factors have either a minor effect or serve as red herrings.

To detect and counteract overfitting you might:

Cross-validate. Test the model’s performance with a different metric (e.g. cross-checking student performance on a standardised test with oral exams), or hold back some data and see how the model performs with that data.
Reduce complexity. You can do this by penalising complexity or by stopping the model early and not allowing it to get too complex.

Brainstorm with Sharpies

Entrepreneurs Jason Fried and David Heinemeier Hansson use a thicker pen to brainstorm, the further ahead they need to think.

They start with Sharpies at the very beginning, when they just need to worry about the big picture. At this stage, pens are too fine and lead you to worry about things you shouldn’t worry about yet.

Relaxing an intractable problem can make it tractable

Some problems may become tractable once you relax it. Although you won’t get the optimal solution, getting an answer that’s half as good in a quadrillionth of the time can be a worthwhile trade-off. These sub-optimal solutions also give you a ballpark for where the optimal solution might lie, so you can decide whether it’s worth the extra effort trying to find it.

Christian and Griffiths explain three ways to relax a problem: dropping constraints, converting a discrete optimisation problem to a continuous one, and downgrading a constraint to a penalty.

Randomness can also help produce “good enough” solutions much quicker

Randomised algorithms (which use randomly generated numbers to solve problem) can give pretty good, but not optimal, answers to some problems much faster than deterministic algorithms (which operate in the same way each time) can.

For example, a quick way to test whether two polynomial expressions, such as 2x^3 + 13x^2 + 22x + 8 and (2x + 1) * (x +2) ** (x + 4) are identical is simply to plug in random values of x. If the equations are not identical, it would be a remarkable coincidence for your outputs to be the same!

Miller-Rabin primality test

Checking whether a very large number is a prime number gets tricky. To check if a 12-digit number is prime, you have to divide it by the 78,498 primes less than its square root (1 million). (Checking whether a number is prime is not merely an academic curiosity — prime numbers play a crucial role in modern cryptography.)

Gary Miller found a much faster way to test for primality with equations involving n (the number being tested) and x (a random value). The issue? It threw up false positives around 25% of the time, suggesting a number was prime even when it wasn’t.

Michael Rabin realised that by plugging in lots of random numbers for x, you could get a high level of certainty that a number actually is prime. Trying this 15 times gets your chance of a false positive down to 1 in a billion. Since Miller-Rabin’s works, people have discovered an efficient algorithm that tested primality in a deterministic way. However, the Miller-Rabin test is much faster and is still used in practice today.

Other benefits of randomness include:

Random sampling can also help us make sense of something that is too complex to understand directly. In contrast, aggregate or summary statistics are too broad and can obscure underlying heterogeneity [Simpson’s Paradox is a good example], while cherry-picked anecdotes are unrepresentative.
Randomness can get you out of a local maximum. Some solution landscapes only have one global maximum, but may have multiple local maxima. If we’re only willing to accept improvements on our current position, we can end up on a “hill” that is a local maximum. Injecting randomness and being willing to accept a temporarily worse outcome can get us out of this.

Computer science manages trade-offs

Explore-exploit trade-off

Life is a balance between novelty and tradition; trying new things vs sticking with the tried and true. This is the explore/exploit trade-off. Exploration involves gathering information, while exploitation is using information you have to get a known good result. Explore-exploit problems are everywhere.

The classic example of this trade-off is the “multi-armed bandit problem” (a multi-armed bandit is a slot machine). Some arms pay off more frequently than others, but you don’t know which ones pay off well until you try them. Initially you just play them at random (exploration) but eventually you will want to switch to playing the ones that have paid off most consistently (exploitation). When you want to make that switch depends on how long you’ll be in the casino.

Woman deciding which slot machine to pull — The decision between exploration and exploitation depends on how much longer you’ll be in the casino

We should explore early on…

The value of exploration can only go down over time.

If you’re a baby, putting every object in the house into your mouth is like studiously pulling all the handles at the casino.
Christian and Griffiths, Algorithms to Live By

Few decisions are truly “one-off” because you can typically learn something to help a future decision. If you treat every decision like your last and focus only on the immediate payoff, you will exploit when you should be exploring.

… and exploit later in life

Although “exploitation” has negative connotations, many of life’s best moments involve such “exploitation” — spending time with family, settling into your favourite chair with a good book, listening to a band’s greatest hits. If you constantly explore, you’ll never be able to benefit from the wonderful discoveries you make.

When you explore, you give up known pleasure for the possibility of something better. Optimal strategies for managing this trade-off inflate the appeal of unexplored options, which means we should expect to be let down on most occasions. As we get older and move into the “exploitation” phase of life, our quality of life should increase. Studies bear this out — older people are generally happier with their social networks and report higher levels of emotional well-being. So getting old isn’t that bad.

Real-world use cases

Some situations you may not have thought of as involving explore-exploit trade-offs include:

Hollywood movies. Making a sequel is a form of exploitation. Sequels are safe and come with a guaranteed fan base. Recently, studios have produced more and more sequels — this may be a sign that the film industry believes its own end is near.
A-B testing. Explore-exploit algorithms adjust what is shown to the user as information is gathered.
Clinical trials. Clinical trials usually involve two groups: a treatment group and a control group. In traditional trials, the control group is given a placebo throughout the trial, even if early results show that the drug being tested could save lives. However, some doctors and statisticians argue that we should use “adaptive trials” instead. In these trials, different treatments are like the different arms of a multi-armed bandit. Adaptive trials could save more patients and still produce reliable results.

Sorting vs Searching

Sorting pre-empts the costs of a future search, so there is a trade-off between sorting and searching.

To work out if something is worth sorting, consider the costs of sorting vs the expected costs of searching.

How likely will you need to search something? Sorting something you will never search is a waste, so it may be more efficient to leave some things unsorted.
How costly is a search? As search functions improve, it reduces the need to sort. You may not need to sort your email inbox if it’s easy enough to search.
Seconds are not necessarily substitutable Even if something takes as long to sort as it does to search, sorting may still be better if time costs are not equivalent. For example, it makes sense for Google to sort its database because that sorting is done ahead of time by computers. That way, it can provide a search result quickly when users need it.

[This section made me think of Personal Knowledge Management (PKM) systems, like my Obsidian vault. Though such vaults are relatively easy to search, you don’t always know what you’re searching for (e.g. you may have used different terminology previously). Tags and folders are a form of Bucket Sort, which is a relatively fast algorithm (see further below). And browsing through a “bucket” when you need to find something may help you draw connections between notes that you hadn’t thought of.]

Time vs Space/Cost (Caching)

Caching manages the trade-off between time and space/cost. Faster, easier-to-access storage (e.g. RAM, your wardrobe) is small and expensive while less convenient storage (e.g. hard drives, your garage) is larger and cheaper.

Caching algorithms try to anticipate what you will need and keep it in an easy-to-access cache. The best algorithm to manage your cache is Least Recently Used (LRU), which replaces the item that hasn’t been used for the longest time. It’s like throwing out clothes you haven’t worn in ages, or at least relegating them to the basement.

Woman with big pile of clothes next to the bed — The pile of clothes by your bed may actually be an efficient cache

Multiple cache levels can improve efficiency. For example, your closet may be one cache level, your basement the next, and then your off-site storage locker a third level. You may be able to optimise further by adding a cache level even more proximate than your closet, such as a pile of clothes next to the bed or a valet stand.

[Christian and Griffiths suggest that having a large stack of papers on your desk is an optimal system because it functions as a cache using the LRU algorithm. While they may not be entirely serious, this idea appears flawed. Such a cache would only be optimal if you were limited to a linear search, where you have to examine each item one by one from the start. However, in reality, you’re not restricted to a linear search, and a rough Bucket Sort can be highly efficient, speeding up future searches. So an unsorted pile is unlikely to be optimal.]

Responsiveness vs Throughput (Context Switching)

Responsiveness is how quickly we can respond to new requests, while throughput describes how much we get done.

Operating systems balance responsiveness and throughput:

When you have multiple computer programs running, the operating system gives each program a slice of time (in milliseconds) in which to run. The more programs that run, the smaller each program’s slice.
Programs can be so responsive that they appear to be running simultaneously, though the processor is rapidly switching between them.
Context Switching has a cost, as it consumes a portion of the processor’s time. If a slice gets too small, all its time is spent context switching instead of doing actual work (this is known as “thrashing”). Modern operating systems therefore set a minimum length for their slices to ensure that responsiveness doesn’t destroy throughput. If more programs are added, they have to wait longer to get their turn (i.e. responsiveness decreases) but at least they still do something.

Humans also balance responsiveness and throughput. Checking your emails constantly makes you more responsive, but all those context switches mean you get less work done. To improve throughput, you may want to practice interrupt coalescing. This could mean checking your e-mails only once or twice a day, or designating “office hours” for people to talk to you.

Throughput vs Latency (Queues)

A buffer is basically a queue. Queues smooth out demand, thereby bringing up average throughput. But this comes at the expense of delay or latency (aka “bufferbloat”). (Incidentally, when looking for fast Internet, we should care more about latency than bandwidth. A plane that can carry more people may has higher “bandwidth”, but that doesn’t mean it goes any faster.)

When a queue or a to-do list gets too long, it may be better to close it off and drop things entirely. Dropped packets on a network is a feedback mechanism that prompts senders to take some pressure off the recipient. We can do this too by saying “no” when we have too much on.

The most prevalent critique of modern communications is that we are ‘always connected’. But the problem isn’t that we’re always connected; we’re not. The problem is that we’re always buffered. … The feeling that one needs to look at everything on the Internet, or read all possible books, or see all possible shows, is bufferbloat.
Christian and Griffiths, Algorithms to Live By

Time vs Certainty

As mentioned above, relaxing a problem trades off certainty for time/computational resources.

Sorting algorithms also trade-off between time and certainty:

Bubble Sort, Insertion Sort and Comparison Counting Sort are very slow, but they are more robust to error and noise.
Merge Sort is much faster, but a single error can move an item a long way from its “correct” placement.
Bucket Sort is very fast, but it only sorts between buckets, not within them. However, sometimes that’s good enough.

Computational kindness

It’s easier for someone to verify something than to search for the answer, so sometimes we should be computationally kind to others by constraining their choices. For example, people are more likely to agree to an invite if you give them several time and date options, instead of a general “what time works for you”? This is because verifying if any of your proposed times work is more straightforward than searching for a place for you in their schedule.

Similarly, when asking people where they want to have dinner, saying “I don’t mind” or “I’m flexible” is a cop-out. It forces others to think of a place, and makes your friend’s job harder because you don’t reveal your preferences.

Other Interesting Points

Chess originated in India. It was later “Europeanised”: shahs became kings, viziers queens and elephants bishops.
Triple or nothing paradox. Assume a game where you have a 50% chance of tripling your money and 50% chance of losing your entire stake. The math says you should always keep playing as your expected earnings will always be positive. But if you follow this strategy, you will eventually lose everything.
Lewis Carroll, the author of Alice in Wonderland, was the pen name of Charles Dodgson, an Oxford mathematician.
There is a lot of noise and randomness in soccer, because it has such low scores. Tom Murphy, a physicist, calculated that a 3-2 win only gave a 5 in 8 (63%) chance that the better team won. Even a 6-1 “blowout” has a 7% chance of being a statistical fluke. [I’ve never been a fan of soccer because of its low scores. A game that can run for 2 hours and end up with a score of 0-0 is clearly a waste of time. Glad to have another reason to support my dislike of it.]
Donald Knuth, a legendary programmer, does not have an email address, so that he can enjoy long hours of studying and concentration uninterrupted. He reviews his postal mail every 3 months and all his faxes every 6 months.
The human brain burns about a one-fifth of our daily caloric intake.

My Thoughts

I loved Algorithms to Live By. I feel the authors had someone just like me in mind when writing it — I don’t have a tech background and am keen to learn more about computer science, but felt somewhat intimidated by the complexity of the subject. The book is not exactly an easy read, because the underlying subject matter can be challenging at times. But Christian and Griffiths do an incredible job in making this technical topic accessible and engaging.

The one criticism I have is that the structure was hard to follow at times because Christian and Griffiths crammed a lot in there. You can’t accused Algorithms to Live By of being a “book that could’ve been a blog post”. Although its key messages were quite straightforward and not that novel, Christian and Griffiths for the most part make their points convincingly by reference to specific examples. They also add tons of content about algorithms, computers and the Internet that is not strictly necessary, but which I nonetheless appreciated. There was very little filler and I get the impression the authors struggled more with deciding what to cut than deciding how to fill their pages.

Buy Algorithms to Live By at: Amazon | Kobo <– These are affiliate links, which means I may earn a small commission if you make a purchase. I’d be grateful if you considered supporting the site in this way! 🙂

Have you read Algorithms to Live By? Did you like it as much as I did? Share your thoughts in the comments below!

If you enjoyed this summary of Algorithms to Live By, you may also be interested in:

5 thoughts on “Book Summary: Algorithms to Live By by Brian Christian and Tom Griffiths”

Phil says:

April 18, 2023 at 8:02 am

Wow this sounds great. I’ll add it to my list. I like this sort of book and often get epiphanies on my own issues and concerns from them.

I’ve got The Alignment Problem (also by Christian) on my bedside table still, but haven’t got around to reading it yet.

1. To Summarise says:
  
  April 18, 2023 at 1:28 pm
  
  Yeah, me too – you learn some new ways of seeing old problems, plus you learn a bit of computer science in the process. Win-win.
  
  Let me know how you go with The Alignment Problem. I just got out The Most Human Human (also by him) based on how much I enjoyed this book, but haven’t started it yet. Seems like a good topic to be learning about with all the recent interest in AI
  
Zak says:

April 25, 2023 at 9:32 pm

This was the first book that really got me into reading, so I’m glad you also had a pleasant experience with it.

That being said, I definitely agree with your point that there is a LOT of content. I’d like to think that the algorithms themselves are less important than the concepts behind them, namely the tradeoffs they explore. I couldn’t for the life of me remember the mathematical specifics of the multi-armed bandit problem, but I nonetheless try to keep in the mind the explore-exploit dichotomy for real-life decisions. In that regard, the practical applications that the book offers are immensely helpful

To Summarise says:

April 26, 2023 at 1:36 am

That was quite a meaty book you chose to start off with!

Completely agree that the concepts behind the algorithms are more important than the algorithms themselves. I also found it really interesting how things like sporting tournaments actually deploy different sorting algorithms – I’d just never thought of them that way before but it makes a lot of sense.

Zak says:

April 26, 2023 at 10:01 pm

Haha, it definitely was – it took me ages to finish. I only kept going because I found the content thought-provoking and relatively actionable.