I read the whitepaper so you don’t have to.
The Onion Routing Protocol (TOR) was designed by the US Navy in the mid 1990s at the U.S Naval Research Labatory.
The pre-alpha version of Tor was released to the public in September of 2002  and the Tor project, the company that maintains Tor, was started in 2006.
Here is a quote from the paper titled “Tor: The Second-Generation Onion Router” on what Tor is in a nutshell:
Tor is a circuit based low-latency anonymous communication service 
The core principle of Tor is “onion routing” which is a technique for anonymous communication over a public network. In onion routing messages are encapsulated in several layers of encryption, analogous to encapsulation in the OSI 7 layer model . It is called onion routing because onions have layers and this networking protocol also has layers.
The resulting ‘onion’ (fully encapsulated message) is then transmitted through a series of nodes in a network (called onion routers) with each node peeling away a layer of the ‘onion’ and therefore uncovering the data’s next destination. When the final layer is decrypted you get the plaintext of the original message.
The original author remains anonymous because each node in the network is only aware of the preceding and following nodes in the path (except the first node that does know who the sender is, but doesn’t know the final destination).
This has led to ‘attacks’ on which the NSA runs servers in order to attempt to be the first and last nodes in the network. If the NSA server is the first node, it knows where the message is from. If the NSA server is the last node, it knows the final destination and what the message says. 
Onion Routing is a distributed overlay network designed to anonymise TCP-based applications like web browsing, secure shell and instant messaging.
Clients choose a path through the network and build a circuit where each onion router in the paths knows the predecessor and the successor, but no other nodes in the circuit.
Packets flows down the network in fixed-size cells which are unwrapped by a symmetric key at each node and relayed downstream.
The notion of a symmetric key comes from Symmetric-Key Encryption Algorithms.
In Symmetric Key Encryption the same key is used for both Alice (sender) and Bob (receiver) as opposed to public key encryption where these keys differ. This makes it faster and easier to use than Public Key encryption, but this also causes two problems:
If a third party figures out the key, either by attacking the transfer of the key or some other method they will be able to decrypt all communications encrypted with this key.
Symmetric Key encryption is also much less computationally expensive than public key encryption, which is useful in a network like this.
Public Key Cryptography is a set of cryptographic protocols based on algorithms that require two separate keys:
These two keys are mathematically linked. In public key cryptography the use of the public key is to encrypt plaintext and the use of the private key is to decrypt encrypted text. This means that anyone can encrypt plaintext for a specific person but only that person can decrypt it.
The two keys are very large prime numbers. Assume q and p are two separate and equally large prime numbers and then n = pq. n is used to encrypt the message and p and q are used to decrypt the message.
This works because factorisation is hard to compute. Given a very (very) large number, find the factors of that number which are prime numbers.
Often times the numbers are in hexadecimal.
Another useful tool of cryptography to know is digital signatures.
Digital Signatures use public-key cryptography. A digital signature is a digital version of a physical signature. Often times you can say you have cryptographically / digitally signed a document. Only the person with the private key can produce valid digital signatures which allow them to sign a document alot more securely than a physical written signature.
Alice wants to digitally sign a message, m. In order to do that Alice must have:
Alice then uses a signing function to produce a digital signature:
The exact internals of the signing function isn’t necessary to know. The function takes a message, a private key and it will produce a digital signature.
Anyone can use the public key to verify a digital signature:
I say that a signing function isn’t necessary to know however if you want to learn a few go ahead. The reason I haven’t added them to this document is because someone from 2–3 years time might read this and signing functions could be completely different.
Tor requires alot of users to create anonmity and thus if Tor was hard to use users wouldn’t adopt it so easily making it less anonymous. Because of this, the usabillity isn’t just a design choice of Tor but a security requirement. If Tor isn’t usable or designed nicely, it won’t be used and thus it won’t be secured.
Tor has therefore had to make some design decisions that may not improve security but improve usabillity.
Tor is not a completely decentralised peer-to-peer system like many people believe it to be. If it was completely peer to peer it wouldn’t be very usable. Tor requires a set of directory servers that manage and keep the state of the network at any given time.
Tor is not secure against end to end attacks. An end to end attack is where an entity has control of both the first and last node in a path, as stated earlier. This is a problem that cyber security experts have yet to solve, so Tor does not have a solution to this problem.
Tor does not provide protocol-normalisation like Privoxy or the Anonymizer, meaning that If senders want anonymity from responders while using complex and variable protocols like HTTP, Tor must be layered with a ﬁltering proxy such as Privoxy to hide differences between clients, and expunge protocol features that leak identity. 
Tor does not hide the identity of the sender.
In 2013 during the Final Exams period at Harvard a student tried to delay the exam by sending in a fake bomb threat. The student used Tor and Guerillar Mail (a service which allows people to make disposable email addresses) to send the bomb threat to school officials. 
The student was caught, even though he took precautions to make sure he wasn’t caught.
Gurillar mail sends an originating IP address header along with the email that’s sent so the receiver knows where the original email came from. With Tor, the student expected the IP address to be scrambled but the authorities knew it came from a Tor exit node (Tor keeps a list of all nodes in the directory service) so the authoriies simply looked for people who were accessing Tor at the time the email was sent.
So given the network above we are going to simulate a very basic onion routing protocol. The laptop on the left is the sender and the server rack on the right is the receiver.
So we start off by encrypting a message with 3 layers of encryption (Tor typically uses 3 nodes, so therefore 3 layers of encryption are needed)
The first layer is stripped off on the first machine. Something important to note here is that every machine only knows what the predecessor is or what the sucessor is, it does not know what the final goal is unless it is the last node in the path.
As it travels down the path, more and more layers are stripped away. The second (middle) node does not know where the message originated or where the final destination is.
The final node knows what the message is and where it’s going, but it doesn’t know who sent it.
The computer then reads the content of the message which might be “connect me to Facebook” and it then connects to Facebook and receives a message back as a response. Because the node doesn’t know where the original message came from, it sends it back through the network to the predecessor and says “pass this back through the network”.
But before doing this, it adds it’s own level of encryption back.
Eventually the original node receives a fully encrypted packet containing the response from Facebook. Because the original node has both private and public keys it can fully decrypt the message.
One of the key properties here is that once a node decrypts a layer, it cannot tell how many more layers there are to decrypt. It could be as small as 1 or 2 or as large as 200 layers of encryption.
The first node knows who sent the message but it doesn’t know what the message says because it is encrypted. The last node knows what the message says but it can’t tell where it came from.
Each machine, when it wants to create a circuit, chooses the exit node first, followed by the other nodes in the circuit. All paths in the circuit obey these rules:
Something important to note here is that Tor uses the Diffie-Hellman algorithm to set up session keys between the user and onion routers.
Diffie-Hellman is a way of generating a shared secret between two people in such a way that the secret can’t be seen by observing the communication. You’re not sharing information during the key exchange, you’re creating a key together.
Now for some basic knowledge on Diffie-Hellman, taken from here
I come up with two prime numbers, x and y and tell you what they are
You then pick a secret number (a) but don’t tell anyone. Instead you compute and send that result back to me (we’ll call it A)
I do the same thing, but we’ll call my secret number b and the computed number B. So I compute and send you the result.
Now you take the number I sent you and do the exact same operation with it. So that’s .
I do the same operation with the result you sent me,
The cool part here is that the answer I get at step 5 is the same answer you get at step 4. It comes down to a cool property of modulo components:
In a traditional network you have a definite input and a definite output. You know where your data is going. In a Tor Hidden Service you input some data but no outside force can see where it’s going. It is possible to communicate with a server without the user or the server knowing who eachother are.
When a server is set up on Tor to act as a hidden service, the server sends a message to some selected Onion Routers asking if they want to be an introduction point to the server. It is entirely up to the server as to who gets chosen as an introduction point and it doesn’t matter who is chosen.
The introduction points know that they are going to be introducing people to the server.
The server will then create something called a hidden service descriptor which has a public key and the IP address of each introduction point. It will then send this hidden service descriptor to a distributed hash table which means that every onion router will hold some part of the information of the hidden service descriptor.
If you try to look up a hidden service the introduction point responsible for it will give you the full hidden service descriptor.
The key for this hash table is the onion address and the onion address is derived from the server.
The idea is that the onion address isn’t publicised over the whole Tor network but instead you find it another way like from a friend telling you or on the internet (addresses ending in .onion).
So almost every single onion router will have minimal knowledge about the hidden service unless they explicitly want to find it.
If you want to access an onion address you would first request the descriptor off of the hash table and the descriptor has, let’s say 4 or 5 IP addresses of introductory nodes. You pick one at random and you’re going to ask the introduction point to introduce you to the server and instead of making a connection directly to the server you make a rendezvous point at random in the network from a given set of Onion Routers. You then make a circuit to that rendezvous point and you send a message to the rendezvous point asking if it can introduce you to the server using the introduction point you just used and then it says “I want you to send a message, X, to the server”.
The rendezvous point makes a circuit to the introduction point and sends it the word X and its IP address. The introduction point sends the message to the server and the server can choose to accept it or do nothing.
If the server accepts the message it will then create a circuit to the rendezvous point. The server sends the rendezvous point to the server and the rendezvous point looks at both messages and if they’re the same then it will then act as another hop on the circuit and connect them directly.
Another important thing to note is that every single Tor packet (called a cell) is exactly 512kb and they are all encrypted. Tor does this so people cannot guess that larger cells are images / media.
In short, a hidden service works like this, taken from here:
 — http://www.onion-router.net/Publications/JSAC-1998.pdf
 — http://archives.seul.org/or/dev/Sep-2002/msg00019.html
 — http://fermatslibrary.com/s/tor-the-second-generation-onion-router
 — http://ieeexplore.ieee.org/abstract/document/1094702/
 — https://cyber-peace.org/wp-content/uploads/2013/06/Attacking-Tor_-how-the-NSA-targets-users-online-anonymity-_-World-news-_-theguardian.pdf
 — http://www.thecrimson.com/article/2013/12/17/student-charged-bomb-threat/
 — https://www.youtube.com/watch?v=QRYzre4bf7I