Computing Time Complexity: A Big-O or Big No?
Okay, let’s start from ground zero, what is a Big O?
Well, let’s brainstorm. A big O can be a look of shock :O!
It can be the hug in XOXO.
Or a bagel. Or a googly eye. (Or everything. Everywhere. ALL. AT. ONCE.)
Now, since everything must be perfectly balanced, let’s be the devil on Big O’s shoulder: what are some Big Nos?
Hmm, fingernails on chalkboards, gaslighting, saying you are five minutes away when you haven’t even left your house yet. Oh, also arson! Definitely a big no.
Oops, I’ve lost the plot (another big no). Back to our first question: what is a Big O?
Well, none of these answers are technically wrong. But I’m talking about a more mathematical Big O…
Big-O notation, or if you want to be fancy schmancy, Landau’s symbol (after the German mathematician Edmund Landau), is used to tell us how fast a function is growing (or declining).
It’s commonly used in mathematics for things like approximations and error terms and in computer science, like in algorithm analysis. In this particular column, we’ll be talking about Big O in the context of computer science, since the math can get very hard, very fast.
So, how do we actually define Big O notation? Well, let me give you the formal definition:
Let f(n) and g(n) be two real-valued functions. We write that f(n) = O(g(n)) (for n approaching infinity, to be more precise), if there are constants N and C such that |f(n)| ≤ C|g(n))| for all n > N. So, if f(n) =O(g(n)) we say “f(n) is O(g(n)).”
As the talented Sabrina Carpenter aptly sings, “I’ll be honest, looking at you got me thinking nonsense.” Because at first look, that definition is literal nonsense. So let’s break it down!
In a nutshell, this definition is saying that the function f(n) doesn’t grow faster than g(n). In an effort to put it even more simply, let’s pretend that me and Soju the Corgi are racing.
For the first couple seconds, we might be running at the same pace (I might even be a little bit ahead!) But, of course, as we keep running, Soju will overtake me and I will be left farther and farther behind her because I absolutely do not run faster. In other words, the function of her distance (with respect to time) grows faster than mine since, once again, she is faster than me and will cover more distance as time goes on. So, for all n > 1 second, me(n) <= Soju(n), which means me(n) = O(Soju(n)).
Okay, this is a pretty wishy-washy example (I also just wanted to include pictures of Soju), but hopefully Big-O makes a little more sense now. It just describes the approximate growth of a function by giving it this sort of upper bound, like how I can hypothetically only run as fast, but no faster, than Soju, past a certain point. She’s an upper bound for my pace.
But why O? Why not X? Or even U, N, I (since we make such a great team ;))? Well, the O stands for order (though I heard that U N I were a close second…don’t check my sources). This is because the order of a function is another word for its growth rate.
German Mathematician Paul Bachmann actually first introduced the symbol O in 1894 in his book Analytische Zahlentheorie (Volume 2). Then, good ol’ Edmund Landau, another German number theorist, adopted big-O and further introduced the little-o notation in 1909, which is why both are now called Landau symbols.
Computer scientist Donald Knuth, also known as the “father of algorithm analysis” (and creator of TeX!), popularized Big-O notation in CS in the 1970s, and also introduced related notation.
What does it mean for an algorithm to be efficient? How can we measure the complexity of an algorithm? Well, there’s a bunch of different factors to consider. We could think about time (or CPU) usage or memory usage (i.e., space complexity) or disk usage or network usage and so on. These are all important, but we’re mainly concerned with time complexity (i.e., the CPU usage).
We can describe the complexity of an algorithm as the computational power we need to run it. We measure this by counting the number of elementary operations the algorithm requires, which usually grows as the size of the input grows. For example, sorting 8 ducks by height will take less time than sorting 137 ducks. So, we usually express complexity in terms of a function of n (like we saw before), where n is the size of the input.
So what counts as an elementary operation? Your day-to-day operations, like + and *, variable assignments, like x = 0, tests, reading or writing primitive types to a program, and other basic statements.
But to be completely honest, the exact number of operations isn’t super important for complexity, we want the BIG picture of how long it will take, depending on how big the input is.
For some algorithms, the complexity is different even for inputs of the same size. To give you an example, suppose we’re running an algorithm to alphabetize all the countries in the world. If only a few of the names are out of order, we might only have to do a few comparisons/elementary operations. On the other hand, if a lot of the names are out of order or the list is randomized, more comparisons are required, which can lead to a larger complexity.
When we’re calculating complexity, we want to know the worst case of the algorithm (pretty pessimistic of us :(). This is what Big-O notation is all about: describing the maximum number of operations we might need to perform depending on the input.
Hold up, why do we even care? Why does complexity and efficiency matter? To answer your question, let me quote the “father of the computer”, Charles Babbage: “Whenever any result is sought by [a machine’s] aid, the question will then arise: by what course of calculation can these results be arrived at by the machine in the shortest time?”
Unfortunately, It’s not good enough to just be able to solve a problem with 1000s of lines of code or to brute-force it. We want to solve it well, in a way that saves space and time (which makes us…superheros?). If multiplying huge numbers or sorting thousands of objects took decades, what would be the point of algorithms? When you’re working in the real world, efficiency must always be prioritized.
The History of Algorithm Complexity
This idea of algorithm analysis and optimization actually goes as far back as the Middle Ages, to a 14th century Egyptian astronomer, Ibn al-Majdi. He wrote about how to reduce the number of elementary steps in calculations, and specifically compared the methods of translation and semi-translation. We’re not going to go into too much detail, but translation was used to calculate the product of two numbers and semi-translation was used to compute the square of a number. To find the square of a number like 274, according to al-Majdi, you need to perform 9 elementary multiplications if you used translation, but if you were to use semi-translation instead, you would need only 6 multiplications. It may not seem like much, but these savings add up!
Another early example of complexity theory is Gabriel Lamé’s analysis of the Euclidean Algorithm. In short, the Euclidean Algorithm is an efficient way to compute the greatest common divisor (GCD) of two integers. For example, the GCD of 12 and 16 is 4, since 4 is the largest number that divides both 12 and 16.
Lamé’s 1844 proof showed that the number of steps in the algorithm will never exceed 5*(the number of its digits of the smallest number). So, if I was finding the GCD of 437 and 8096 (which is 23, by the way), the number of steps (i.e, divisions) in the algorithm will need at most 5*3 = 15 steps. In other words, the Euclidean algorithm has a runtime of O(h), where h is the number of digits of the smaller number.
Okay, enough about other people. I want to do our own examples!
Let’s start with a simple example. Suppose f(n) = n³ + 4n² + 7, what would be the growth rate of f(n)?
Well, the rule of thumb is that if a function is the sum of a bunch of terms, the big-O of it is the growth rate of whichever term has the largest growth rate. So, since n³ grows much faster than n² or 7 (which doesn’t change) as n increases, the growth rate of f(n) would be O(n³).
It seems kind of strange to discard the other terms, huh? If we put this idea into number terms, it’s like saying that 6 + 1000 = 1000. Or that 87,386 + 1,000,000 = 1,000,000. Or that 3,902,317,872 and 6,586,120,887 are basically the same number.
Ew, what? This looks very, very wrong, right? Well, if we zoom out a bit, the difference between 3.9 and 6.6 isn’t too bad right, so what makes these numbers different?
Here, I’m talking about figures, or powers of 10. Just like 3.9 and 6.6 are of the same order of 1 (between 1 and 10¹), 3,902,317,872 and 6,586,120,887 are on the same order of 1,000,000,000 (between 10⁹ = 1,000,000,000 and 10¹⁰ = 10,000,000,000). This kinds of “careless” computation is also known as street-fighting math! Who cares about precision! I’m here for a good time, not a long time!
So, if someone asks you to do #quickmaths and multiply, say, 326*835, you can just say, “Well, it’s on the order of 1,000,000,” and you would be right!
In the same way we’re ignoring constants in these computations, we do this with big-O notation. Why? Well, as n gets larger and larger, constants (factors in the product that don’t depend on n) and terms that are of lower order don’t really matter since it doesn’t change the overall order of the complexity/time it takes to run.
For example, let’s say we’re analyzing some algorithm and we find that the time it takes to run for a problem of size n can be described by the function A(n) = 7n⁵ — 2x³ + 5. Based on our last example, we might be tempted to say this is O(7n⁵). But remember, constants don’t matter! That means we can throw away that 7 in the front and instead write that A(n) = O(n⁵). To read this out loud, we can say that A(n) grows at the order of n⁵.
Now, how can we figure out how fast algorithms can run? Faster than me probably, somebody catch them!
But seriously, this is an important question to answer. We use algorithms all the time, the most common example being for searching and sorting. Knowing which algorithms are more efficient in different scenarios is important to saving time and computation power, especially when dealing with large datasets.
Okay, phew, this is a lot of information. Let’s take a quick break and play a game. I’m thinking of a number between 1 and 100. Try to guess what it is! I’ll tell you whether my number is higher or lower.
If you’ve played this game before, you probably know that the smartest first guess is 50. Why? Because you’re splitting the guessing range in half, and getting rid of the half of the wrong answers based on whatever I say.
Then I say, “Higher!”
Then you say, “75,” since you want to get rid of another half of the possible answers and 75 is halfway between 50 and 100, i.e., (50 + 100)/2 = 75.
I say, “Lower.”
Well, (50 + 75)/2 = 62 (approximately), so you say, “62.”
I say, “Higher.”
You say, “68,” since (62 + 75)/2 is basically 68.
I say, “You got it! Unfortunately, there is no prize. I just wasted 2 minutes of your life.”
You get angry, then realize that this whole article is a life lesson to only play games that have agreed-upon prizes in an ironclad contract. You never make the same mistake again.
Back to the point!
In this game, you were searching for some value in a list of numbers from 1 to 100. Another (very inefficient) way you could’ve done this is by going through the list, one by one. Is it 1? Is it 2? Is it 3? And so on. We both would’ve hated each other by the end of it, but eventually you would’ve gotten to the right answer! But, given the choice, you would probably choose the first way of playing.
Why is that? Well, it’s just more efficient in this scenario! And actually, these searching methods have names. The first way, where we cut the possible answers in half each time, is called binary search. The second way, where we just go down the list looking, is called linear search.
We call binary search a logarithmic algorithm since its time complexity is
This is because log_2(n) counts how many times we have to split the list (of size n) in half. (This is usually just written as O(log(n)) in computer science, but, for clarity, I wrote it as base 2.)
Linear search is, you guessed it, a linear algorithm since its time complexity is O(n). This makes sense since we are looking at every element in a list of size n, meaning we perform n operations.
Let’s do a slightly harder problem now. At camp last summer, the counselors asked all of us campers to sort ourselves by age…without speaking.
Did we succeed? Uh…by cheating, yes. We each wrote down our birthday in the Notes app on our phone and just sort of moved randomly around gesturing wildly at our phone notes. Work smarter, not harder?
Could we have done this better? Yes. There were 91 of us, so this was pretty time consuming.
So, what are some other ideas to sort ourselves?
Well, the most popular algorithms for sorting are selection sort, insertion sort, merge sort, quick sort, and the list goes on as the algorithms get more complicated. We won’t go too much into the details since we could talk about algorithms for a whole college course, but let me briefly describe the first one: selection sort.
To run the selection sort algorithm, we look down a list, find the smallest value (or in this case, youngest person), and switch them with whoever’s in the front. We won’t change this ever again. Then, we look for the next smallest value, and switch them into the second position, and so on.
Let’s say I wanted to sort the list: 3, 7, 1, 6.
Step 1: We find that the smallest value is 1. So, we switch it with the current first element, 3, and the list becomes: 1, 7, 3, 6.
Step 2: We find that the next smallest value is 3. We switch it with the current second element, 7, and the list becomes: 1, 3, 7, 6.
Step 3: The next smallest value is 6, switch it with the current third element, 7, and the list becomes: 1, 3, 6, 7.
Voila! The list is sorted. Simple, huh?
Now, what’s the time complexity for selection sort? Well, at each step for each place in the list, we are basically doing a linear search for the smallest value in the list. So, for each of the n elements in the list, we are running an algorithm with time complexity O(n). Thus, the overall time complexity for selection sort is n*O(n) = O(n²). Whoop, whoop!
If this is a bit confusing, don’t worry at all. The overall point of these examples is to show that different algorithms have different complexities, and knowing them is important when you’re choosing an algorithm for a certain task.
Here we visualize the growth rate of different time complexities.
The fastest possible run time is O(1), which we call constant time complexity, but it’s pretty uncommon in the real world.
Here’s a table that shows the the various time complexities of common search and sort algorithms:
Remember that we use the worst-case time complexity for big-O notation, so that column is the one that we would write down.
The Space-Time Tradeoff
There ends up being a trade-off between space and time complexity. The more time-efficient an algorithm is, the less space-efficient it usually is, and vice versa.
We can see in the last column the space complexity for each algorithm. If you look at merge sort, we can see that the algorithm is very fast, but it takes up a lot of space. On the other hand, bubble sort is pretty slow, but it requires the minimum space, which is pretty neat.
Now that we’ve gotten a feel for a few algorithms and how to analyze them using big-O, let’s take it one step further and talk about some of the hardest problems of contemporary computer science…should be easy enough!
Intro to NP-Completeness
Just like we saw that an algorithm has linear time complexity if its complexity is O(n), we say that it has polynomial time complexity if its complexity is O(n^k ), for some positive constant k.
If a problem can be solved with an algorithm with polynomial time complexity, we call the problem “tractable” or “easy.” Otherwise, we call them “intractable” or “hard” (never let it be said that computer scientists are not creative).
The class of problems that have not yet been solved with polynomial time algorithms are called NP-complete problems, which stands for Nondeterministic Polynomial-time complete. They are said to have the property that though there is no polynomial time algorithm that solves them, once a solution is known, it can be checked in polynomial time.
Also, they have this cool shared property that if just one of them can be solved with such an algorithm, then they can all be! Hurrah!
Oops, we celebrated too early. This question of whether or not NP-complete problems are intractable is actually one of the hardest open questions of computer science. But, we have some hope: though no efficient algorithm has been found for these kinds of problems, it hasn’t been proven yet that such an algorithm doesn’t exist.
A simple example of an NP-complete problem is finding a solution to an equation of the form ax² + by = c, known as a quadratic Diophantine equation. There is no general algorithm for finding a solution to equations like this, but it’s pretty simple to check a solution by plugging it in.
Some other popular examples of NP-complete problems are the Traveling Salesman problem and the Number Partitioning problem, which you’re welcome to look up if you’re interested!
Okay, that’s cool I guess. But why does NP-completeness even matter?
Well, these kinds of problems actually come up pretty often! Imagine working on a problem day after day, for years, and never managing to come up with the best solution…and then you find out that it’s basically impossible to do that. It’d be pretty frustrating, huh?
Knowing what problems are NP-complete saves a lot of time if you come across them in the real-world since you can redirect your efforts to finding a good solution instead of looking for the best possible one.
Now, I have one last challenge for you: summarize this article in one word. (Feel free to email me your answers!)
My answer? Efficiency. In computer science, a solution isn’t good enough if it takes months or years to arrive at. You need to be able to find it efficiently, taking into account both time and space. At the same time, knowing whether it’s even possible for a problem to be solved efficiently is just as important (otherwise you might waste years of your life on it).
We can see these ideas of algorithmic complexity all around us, from cryptography to quantum computing to intro CS classes at every university — it’s pretty neat!
Now, since we started this article with a big O, let’s end it with one. This is for you <3: XOXO.
Until next time! If you found this interesting, make sure to follow me to stay updated on future articles.
In the meantime, check out other articles in my column here! If you have any questions or comments, email me at firstname.lastname@example.org.
To be the first one to hear about all my new articles, recent events, and latest projects, subscribe to my newsletter: Letter? I Hardly Know Her!
As a reminder: this column, Gems in STEM, is a place to learn about various STEM topics that I find exciting, and that I hope will excite you too! It will always be written to be fairly accessible, so you don’t have to worry about not having background knowledge. However, it does occasionally get more advanced towards the end. Thanks for reading!