**Scraping Glassdoor Job Data**

The Big-O is a way of figuring out how efficient an algorithm is. Algorithms are just instructions on how to search for something, within some data. When you’re working on a big project with a huge database, making your code run as efficient as possible can save huge costs in the long run. An algorithm explains every step of getting a glass of milk, assuming there are glasses in the cupboard, and milk in the fridge. Python takes all the hard stuff and puts it in a built in function for you to use.

Such scary words. It oozes math all over them…

This is probably what you’re thinking if you’re new to computer science and programming. But really, they aren’t that hard to grasp, once you understand the basics of them.

the Big-O is a way of figuring out how efficient an algorithm is. When you’re working on a big project with a huge database, making your code run as efficient as possible can save huge costs in the long run. You may ask — What is an algorithm?

You can think of an algorithm as a recipe. A set of instructions on how to complete a task. Imagine getting a glass of milk. The algorithm for doing such a task might look something like this

- Open cupboard
- Get a glass from the cupboard
- Close cupboard
- Open fridge
- Take the milk carton out of the fridge
- Open milk carton
- Pour milk into the glass
- Close milk carton
- Put milk carton back into the fridge
- Close fridge

This is an algorithm. It explains every step of getting a glass of milk, assuming there are glasses in the cupboard, and milk in the fridge.

You might have heard about different search algorithms and such. They are also just instructions on how to search for something, within some data.

A very popular search algorithm is the binary search. It sounds very computery, but it’s actually an algorithm you’ve used many times throughout your life. The binary search basically splits the data in half, checks which side of the split the data you’re searching for is, and repeating the cycle until you found what you’re looking for.

Imagine you’re looking for Todd Jefferson’s phone number in the phone book. The phone book is going to be in alphabetical order, ordered by the last name. You wouldn’t look for Jefferson at page 1, where the last names begin with an A. You could turn to the next page, and repeat until you got to J. But you probably wouldn’t. Because it’s very ineffective, and this is something you just know unconsciously.

What you probably would be doing, is to go to somewhere in the middle. Maybe you end up at a page which shows last names beginning with M. Since you know J is before M, anything after M is useless information in this context. Now, you might try to split the phone book again. Maybe you’ll end up at F. And again, everything before F is useless information, since J is after F. We’ll loosely repeat this process until we find Tod Jefferson's phone number.

I say loosely, because your subconsciousness is incredible efficient in solving these types of math problems, and you might find it more efficient to turn the page to Tod's phone number, if you know that would be more efficient than splitting the data again.

A computer doesn’t have subconsciousness. Nor does it have conscious. The computer is a loyal servant. It does exactly what you tell it to. No more, no less.

There are many pre-made algorithms, created by very smart people. These include ways of sorting data, moving data, manipulating strings and more. Very high-level languages (Python, Ruby, JavaScript etc) often come with predefined functions, or “wrappers” of these efficient algorithms, so that you don’t have to write them yourself.

**Example in Python**

```
list_of_numbers = [2, 1, 3, 5, 4]
list_of_numbers.sort()
print(list_of_numbers)
output: [1, 2, 3, 4, 5]
```

The sort function in python is actually a sorting algorithm called Timsort which you can read more about here: https://en.wikipedia.org/wiki/Timsort

The point in this example is to show you that this simple, short little function you append to your data, is actually an advanced sorting algorithm behind the scenes. Python takes all the hard stuff and puts it in a built in function for you to use.

**Example in JavaScript**

JavaScript comes with a prebuilt sorting function, which you append to a set of data. Consider this example:

```
let listOfNumbers = [2, 1, 3, 5, 4];
listOfNumbers.sort();
console.log(listOfNumbers);
output: [1, 2, 3, 4, 5]
```

JavaScript has the same built in sorting function as Python has. Which algorithm does JavaScript use? It actually depends on where you run your JavaScript code. Each browser is free to implement their own sorting algorithm. For instance, Google’s V8 JavaScript engine (Chrome, Node.js etc) also uses Timsort, while Mozilla Firefox uses merge sort.

In a lower level programming language like C, you wouldn’t have any of these high level functions. Everything has to be coded manually. If you want to sort data in C, you need to figure out how you want to sort the data. In other words — which algorithm you’d use to sort the data, and implement it programmatically.

To recap algorithms, it’s not a scary word. It’s just a recipe — a set of instructions on how to do a task. We are using algorithms consciously and subconsciously many times every day.

The reason I included some info about algorithms, is because the Big O and algorithms go hand in hand. As noted earlier, Big O Notation is all about the efficiency of the algorithm.

You can put it this way:

- How long does it take the computer to do a certain task
- How much memory will the computer use while doing a certain task

The Big O always describes the worst case scenario of the efficiency of an algorithm. Let’s have a look at some common orders of growth.

**O(1)**

O(1) means constant. We use the number 1, to signalize a constant value. The number 1 will never change. 1 will always be 1.

It will always take the same amount of time to execute an O(1) operation.

Example, O(1):

```
myArray = [1, 2, 3]
myArray[1]
output: 2
```

In this example, we’re getting the value of the array at index 1. Since computers know the index of the arrays, searching up an index would be constant. It doesn’t matter how big the array is. It will take the same amount of time to look up an index whether the array consists of 3 elements, or 5000 elements.

**O(n)**

O(n) means linear time. You can think of the n as *number of data.* The time it takes to execute a O(n) operation depends on how much data we have. Let’s look at another example:

```
# PYTHON
my_list = [1, 2, 3, 4, 5]
for number in my_list:
print(number)
# JS
const myArray = [1, 2, 3, 4, 5];
for(let i = 0; i < myArray.length; i++) {
console.log(myArray[i]);
}
```

Both these snippets of code will produce the same outcome. It will spit out the content of the array. When we iterate over data like this. the size of the data will determine how long it will take to iterate over it.

An operation we do a lot of times, is to search through data. Let’s look at an example where we do a linear search through an array. How would this algorithm look like?

- Loop over the array
- Check value
- Is the value what I’m looking for? If it is, return number, exit program.
- If not — go to next value and repeat stage 2.

In code:

```
# PYTHON
arr = [1, 2, 3, 4]
def search(arr, number):
for i in range(len(arr)):
if arr[i] == number:
return print("Number is in array")
return print("Number is not in array")
search(arr, 4)
output: Number is in array
# JS
const arr = [1, 2, 3, 4, 5];
const search = (arr, item) => {
for(let i = 0; i < arr.length; i++) {
if (arr[i] === item){
return "Number is in array";
}
}
return "Number is not in array";
}
search(arr, 6)
output: Number is not in array
```

In the best case, this algorithm is O(1). if the value we’re looking for is the first value in the array. But it might not be. And the array could be huge, with hundreds of values. In the worst case scenario, we would have to iterate through the whole array to find the value we’re looking for. This makes it O(n).

If you run two loops right after each other in the same scope, we’ll get a complexity O(2n), because we get n two times. In Big O Notation, we remove the constant. Which means, O(2n), O(3n) and on is still written as O(n)

**O(n²)**

What is this scary looking thing? It sure starting to look mathy now! That’s what I thought when I first saw this.

This complexity is close to O(n), but it’s a pretty bad operation. Which means, the computer will need to do a lot of calculations in an operation with big O of O(n²). Is this always bad? Should you never use algorithms with this order of growth? The short answer is — yes, you should not.

You should figure out a different way of solving whatever you’re working on. The long answer is — it depends. There are times where these algorithms are the only ones that will work. And that’s ok. Just be aware of it, and try to come up with better ways of solving things.

Let’s look at an example, and figure out why O(n²) is so slow:

```
# PYTHON
num_list = [1, 2, 3]
letter_list = ['a', 'b', 'c']
for number in num_list:
print(number)
for letter in letter_list:
print(letter)
output: 1, a, b, c, 2, a, b, c, 3, a, b, c
# JS
const numArr = [1, 2, 3];
const letterArr = ['a', 'b', 'c'];
for(let i = 0; i < numArr.length; i++){
console.log(numArr[i]);
for (let j = 0; j < letterArr.length; j++){
console.log(letterArr[j]);
};
};
output: 1, a, b, c, 2, a, b, c, 3, a, b, c
```

Here is a program that consists of two nested loops. At the outer loop first iteration, the whole inner loop runs. Looking at the example above, when the outer loop goes to 1, the inner loop has to do a complete iteration before the outer loop moves over to 2. This operation runs in quadratic time. If the array has 10 items, we have to print 100 times. If it has 1000 items, we have to print 1000000 times.

O(n²) might work for smaller programs and operations, but you can see how inefficient it would be when working on huge datasets.

**O(log n)**

This is probably the scariest of all the complexities. The O(log n) follows the logarithmic scale. It will perform better the more data it runs. On large datasets, this is the second best complexity, only beaten by O(1).

For example, when we perform a binary search, we will divide the data in two, and only work with the part where the data we search for probably is. We will continue to do this until we find, or don’t find our value

Let’s take a look at this chart. All the graphs should start to make sense, and they clearly visualize how efficient the different time complexities are. We are lucky to have such higher languages, that hides all this complicated stuff from us so that we can concentrate on building cool apps. However it’s still important to understand how it works under the hood.

Until next time

Stephan Bakkelund Valois

L O A D I N G

. . . comments & more!

. . . comments & more!