DNS 101

https://medium.com/media/a0a23c93ed5dce1af090260ec69339f5/href Jon Christensen and Chris Hickman of Kelsus teach a primer on the domain name system (DNS) and explain what happens when someone types a URL (i.e., Internet link) into a browser. They demystify the flow process to help you understand how DNS works and the mechanics involved. Some of the highlights of the show include: What is DNS? Fundamental past and present protocol for Internet; connection is made via friendly Internet Protocol (IP) address to identify and map servers within a network DNS and IP address offer scalability through levels of hierarchy (.com, .net, .org) DNS servers act as authoritative source for domain names they own and manage; servers can host multiple zones and levels to segment URL namespace Benefits of DNS: Diverse, capable, flexible, and handles Internet traffic Time to Live (TTL): DNS record setting that specifies length of time or number of iterations for resolver to cache a query DNS lookup tools for various operating systems include Dig and NGiNX Links and Resources IPv4 AWS Route 53 GoDaddy NGiNX DNS Tools Kelsus Secret Stache Media Transcript Rich: In Episode 54 of Mobycast, Jon and Chris teach a primer on DNS and explain exactly what happens when you type the URL into the browser. Welcome to Mobycast, a weekly conversation about cloud-native development, AWS, and building distributed systems. Let’s jump right in. Jon: Welcome, Chris. It’s another episode of Mobycast. Chris: Hey, Jon. Good to be back. Jon: Good to talk to you. So, what have you been up to this week? Chris: I’m recovering from the jet lag from the epic trip down to Brazil last week. Thirty-two hours of travel each way. Jon: Ouch. The humble brag, though. Oh, poor me as in Brazil. Chris: It’s so ironic. Four days in Brazil and 32 hours on either side of that traveling. It’s almost of being awashed. It’s like a […]. I paid for it. Jon: You did. Me, too. I’m trying to recover from being in Brazil. It’s funny because that trip motivates me after I see the team and get excited about our futures, but it also exhaust me. So, I come back motivated with no energy to do anything. But here we are. Next week, the motivation is going to really kick in. One of the things we talked about while we were down there was we did a lot of lessons with the team. We decided this year to go over some fundamentals. The better you are with fundamentals at any sport, including software programming, the better you are. So, we decided to turn back to some fundamentals and just brush up on things that many people might never have really dabbed into and others not since college courses kind of thing. One of the things we talked about a lot was networking. Today, we talked about it for a whole day. We won’t do that to you, our listener. We’ll keep it short and we’ll do one little piece of the networking overall talk with you today. This might be the beginning of a series on networking where we get into the fundamentals of networking and then some specifics around networking within AWS because that’s just super fun stuff. That’s the trajectory probably for the next few episodes, unless something major happens, that we have to do a special report on, like we did with some of the open source shenanigans that happened recently. Today, we’ll talk about DNS. What happens when you type in www.kelsus.com—go ahead and type it in now — into your browser. Chris: We’ll wait. Jon: Yes, so what happens? That’s what we’re going to cover today and maybe Chris can kick us off and get a start with the answer to that question. Chris: Yeah. As you mentioned, we had a bunch of technical content training for the team while we’re at […] retreat in Brazil and one of those things was all about networking. As you mentioned, lots of stuff to cover there but DNS is definitely one of the fundamental pieces. So, just thinking about the idea of what happens when you type in a URL, how does your browser know where to go? What are really the mechanics that’s happening? It’s probably a lot of more involved than most people realize. It’s pretty interesting, so that today we could just walk through that flow, to help demystify a little bit, just give everyone a better understanding of just how does DNS work. Maybe to start off at a high level, DNS is an acronym that stands for Domain Name System. Really, DNS was created for us, for people. Jon: Yeah. Computers don’t need it. They’re fine without it. Chris: Not only do they not need it, but they don’t want it. It’s not very efficient. In the beginnings of the internet, when networking started happening, you have physical cards and networking interface cards. Back in the day, everything within networking, a placed card end up having an IP address. This is called IPv4, so it’s a dotted IP address. Four components to it and everyone’s probably obviously seen these things. 192.32.33.255 or 232. 255 is actually special and we won’t use that one. That’s the address. When all this first started, that’s what people were doing. They were connecting to one of these addresses. Once they started being more than a handful of machines on a network, people realized pretty quickly this does not work. They can’t remember all of these dotted things or put them in a notebook somewhere and just manually keeping track of, “Oh, that’s Jon’s server,” or, “That’s Larry’s server,” or, “That’s Susan’s server.” This concept of keeping track like giving these friendly names to these addresses was the start of it. This was just a simple text file that’s actually a host file. It was a very small directory and a text file that keep a map in between a friendly name and the actual address. This allowed folks then to be able to type in that friendly name. They could type in like ‘jon’ and that resolves to Jon’s IP address, likewise, Susan’s IP address. That was the beginning of it. Jon: Right and that host file lived on Unix systems at least under the /etc/hosts in that file. Chris: And the funny, interesting thing is that it continues to exist today. Not only does it exist today, but as we’ll find out, it’s still very much part in the lookup process so it’s still the first link in that chain of events that happens when something goes to resolve a host name to an actual address. Actually, that’s what DNS is basically the phone book of the internet. You can go look up something by name and it gives you its address. This is how computers are figuring out. Given a friendly name, what machine do I connect to? Where do I send my packets to? Jon: Exactly. The host file doesn’t really scale. You can’t have all of the IP addresses on the internet on you host file and keep them up-to-date with what names they go with. People have to come up with another way. One of the things I love about DNS is that the way that they came up with was a protocol that anybody could participate in. It wasn’t a company that came along and said, “We’re going to handle routing of names on the internet. Just send us your money and we’ll send you the IP addresses when you give us a name.” It’s how we ended up and that’s where we are now in a way, because that’s how it was designed, and anybody can still play. We still have a protocol, anybody can still stand up a DNS server, and I think that’s the next part of this journey that we have to talk about. What’s a DNS server and what do they do? Chris: Yes. Just to touch on that, again, it’s interesting. DNS is one of those fundamental, early protocols that comprise the backbone of the internet. This has been around since the early days, since the inception. Just like mail, SMTP, those kind of core protocols, these things have been around a while. They have their their own limitations, they have issues with trust and security that we might talk a little bit about that as we discuss this. DNS is a client-server architecture. You have DNS servers that are managing the records that they’re responsible for and then you have clients or resolvers that query those servers, to get a record. It’s basically just making a query, saying, “Hey, this is the name I’m looking for. Do you have its address?” It’s either going to respond back like, “Yes, I do have it,” or it will perhaps refer them to some other server. It’s walking down a chain of hierarchy. There’s this parent-child relationship as you move through the dots of the URL. You can think of DNS as a hierarchy where, at the very top you have root, then you have the top level domains like .com, .net, or .org, then underneath that you have the next level domains of things like yahoo.com or kelsus.com type thing. There’s a hierarchy there but at the end of the day, you have the servers that are managing the files and answering the request, and then you have the clients, the resolvers themselves that are issuing these queries to make lookups. Jon: A fun thing that I like to think about is just how both DNS and IP addresses are that hierarchical thing and they both enable some scalability. They both enable that only a few servers out there need to know and be responsible for all the top-level domains, then beyond that we can use other servers to deal with the next level, and then beyond that more and more servers can play in this ‘tell me the IP address for this name’ game. That’s pretty cool. The scalability is built into the protocol. We won’t talk about how IP routing works. It’s the same basic concept that IP address is basically a map that tells the computer how to get from here to there. That’s why you don’t have that ‘hand somebody an IP address and a map of how to get there.’ It’s part of the IP address. It’s pretty cool that they both work in a way that makes them scalable and usable just at the protocol level. Chris: Absolutely. It’s almost Darwinian. If they didn’t work, we wouldn’t be here talking about these things. By virtue of the fact that they do work and they are scalable, they lived on and it still works this way, and the reason why it has had so much longevity, which is pretty amazing if you think about it. When DNS was first created, did they really think there’s going to be billions of devices that are going to be on this network and just millions and millions of hostnames? Maybe, but it’s probably still a lot of bigger than they thought. Jon: Yeah, I agree. Let’s talk a little bit more about DNS servers, what they do, and what they have inside of them. Chris: Right. DNS servers, again, exist to manage the names that they own, be it whether they’re the authoritative source for, and they can host multiple zones. You can think of a zone as different levels of the URL, so you can have a zone file for api.kelsus.com, maybe, and then you might have another one for another domain that Kelsus has and what not. It’s a way of segmenting up your name space. These DNS servers register with other name servers to say for which domains or which of these zones are they the authoritative source for. That’s one of the ways that these requests end up getting routed to them, knowing that this is where they need to go to make that lookup to find that name. Jon: There’s a few servers out there that live inside of AWS, that are saying to the world right now today, “Hey, I’ve got kelsus.com. I’ve got that one. Want to know where that goes? You ask me. I’ll tell you.” Chris: Yeah. For every domain out there, there’s some registration with it to say like, “Yes, I am the authoritative source domain server for that particular domain,” whether you’re using AWS Route 53, or you’re using something like GoDaddy DNS, or whatever DNS, if you’re rolling your own or setting up your own server, they would all work the same. Rich: Hey there. This is Rich. Please pardon this quick interruption. We recently passed an internal milestone of 30,000 listens and I want to take a moment to thank you for the support. I was also hoping to encourage you to head on over to iTunes to leave us a review. Positive feedback and constructive criticism are both incredibly important to us. Give us an idea of how we’re doing and we’ll promise to keep publishing new episodes every week. Okay, let’s dive back in. Chris: So, for each one of these zones, you’re going to have a list of records and there’s various different types of DNS records. You have things like regular IP addresses, if you will, so given a name, what’s its IP address that I can connect to? That’s one type of record. That’s called and A Record. You can also have redirects, pseudo names for records. You can give something a friendly name. Sometimes, you might have a less than friendly DNS name for something and you want to alias it to something nicer or easier to remember. You can have those kinds of records in there. Those are CNAME records that are basically just, again, aliases for something else. Jon: And the nice thing about those, just to interject, is just that they stay what they are. So, if you have a CNAME and you decide you want api.kelsus.com to point to 12345.aws.amazon.com/something-something-something, lots of gobbledygook text, when that CNAME is typed into a browser, the browser keeps the CNAME. It doesn’t switch over to the alias. It’s not transparent. That’s a nice feature of CNAME. Chris: Yeah and other types of records are for grabbing mail. So, you have mail transporter records, NX Records. This tells clients that are sending mail that they need to route mail to say, “Where should this mail be delivered to?” There’s records and DNS for dealing with that. DNS, very diverse, very capable, and lots of flexibility. It really deals with all the traffic on the internet. Again, given a name, what’s the IP address that this needs to go to? That’s what it’s doing. For me, another thing that’s pretty important is because this is used constantly, anytime you’re making any kind of network request or what not — you’re probably doing a DNS lookup to say, “Given this hostname, what is the IP address?” — in order to increase performance, lots of caching involved. Caching’s really good here because, for the most part, these addresses are not changing very frequently. Even if you have something or you’re using some […], they’re just not changing that frequently. Caching comes in to play heavily. You can set TTL records, time to live values on these records, which gives guidance on to how long intermediaries are allowed to cache this. So, TTLs end up becoming pretty important. If you are setting up DNS and you have some hostnames you’re adding and what not, it’s something just to consider that if it’s going to be relatively stable and you don’t expect to change it very often, then you can set it to a higher TTL. But if it’s something that you may just get to more into, like if you’re building out a service and maybe you’re going to be changing it from one environment to another one, just realize that when you update your DNS server, the updates are not going to propagate immediately. It’s going to be based upon whatever your TTL was. If you have a high TTL, let’s say it’s one day, you’re probably going to be unpleasantly surprised to find out that it takes a full day before everyone is now pointing to the new server because of that caching. Jon: Right and with TTL, the thing I like to think about is TTLs and caching are why DNS propagation is different for different people in different locations. You change the domain name, you update it maybe on GoDaddy — if you’re unfortunate using GoDaddy — and then again maybe change, “Oh yeah. I’m seeing it. Yup. There’s the change,” and your friend who’s on the other side of the world is like, “I’m still not seeing it.” It just could be because they are hitting a DNS server that cache that address more recently than the one that you are using. As always, you’re to assume that some DNS server out there, just moments ago, cached the address that you’re about to change, so you’re going to have to wait the entire TTL before everybody is up-to-date. Chris: Exactly, yes. It is definitely one of those things where, if you ever do work with DNS, making entries, and changing them, this is something you become intimately familiar with, that you get bitten by it. You start running this and, “Well, it works for me, but not for me,” or, “I see the new page,” or, “I don’t see the new page,” type thing. It’s definitely something to take into account. Again, if you plan on changing something, make sure you set it to a low TTL until you’re stable, and then you can increase it to a higher TTL. Strike that right balance between the flexibility of change versus performance and minimizing your lookups. Jon: Right. This is already turning out to be more complicated to describe than I have expected, “Oh, it’s DNS. It’s easy, it’s quick protocol,” and we still have a lot of things to talk about here. I think the next thing that we are going to talk about is, did you want to go more into resolvers, Chris, or did you want to talk about the flow? Like sort of the story of, A Day in the Life of a Request from a Browser. Chris: At the end of the day, the clients or resolvers, if you will, they’re basically just, “Hey, I need to talk to some name server,” so there’s a way of configuring your client, your resolver to know what names do we go talk to and just issuing a query, say, “This is the name that I’m looking for. Give me a record back.” Sometimes, if the server does have it, it’s done. If it doesn’t have it, it’s going to tell it where’s the child that you should go talk to and do it. So, pretty simple for a client standpoint. There’s many different types of clients and resolvers. There’s standalone, there’s some really useful command line tools like dig and nslookup. Your browsers are obviously half the resolvers and some of those are based upon OS-level code, some have their own custom resolvers. Same thing goes with server-side software like Nginx as well. They have everything, just about anything that’s making connections that has to do DNS lookups. Jon: I think the thing that almost goes without saying, but I’m just going to make it really explicit, is that when you type in into your browser and you’ve never done that before — it’s not going to cache IP address or anything that’s ready to use — it’s going to go look up that name using the almost a distributed micro service that gives you back that name, it has to do that whole request before it even starts to load the Google page. Everything you ever do is to request and not just one. www.google.com I think that’s easy to forget because you just, “Oh, I’m going to Google,” and you just imagine the network traffic is like it’s going from my computer, to Google, to […], and coming back. No, no, no. It’s going to some DNS servers, then coming back, and then going to Google, and then coming back. I just have to say that really say that explicitly, in case it wasn’t clear from what we were talking about already. Chris: Yeah, and probably even more specific, it’s not just two request. That DNS […] actually potentially many request, multiple request. Jon: How long do they take? Sometimes it can take 60 milliseconds? 80 milliseconds? Chris: Yeah. It depends on your network, it depends on the name servers, and how much traffic they’re getting. It can be milliseconds, it could be sometimes seconds. Just like anything else, it’s making network request over the internet. There is very real latency there. Again, I think the real takeaway here is, there’s a lot going on for DNS to work. When you do type in that address at the top of your browser bar, you type in google.com, underneath the covers, there are quite a few steps happening before that page appears inside your browser. Jon: Right and you mentioned the tool. I think it’s worth it for anybody who’s never done that before, if you have a Mac, which most of you do — just guessing — just go to the terminal and type dig +trace domain-name. It could be or it could be mobycast.fm, or it could be google.com or whatever, and then just look at what comes out. It comes out in steps, there’s a lot of information there, and it’s worth checking out. It starts to uncover the complexity that Chris was talking about. www.kelsus.com, Chris: Normally, the intermediate lookups is hidden from you, so you just see the final results. When you put in that +trace, it’s actually showing you how resolution happens. It starts at the very top, going to the root servers of the internet. From there, it’s then going to the TLD servers for whatever TLD it […]. So, the TLD would be like, “Is it .com? Is it .net? .org?” There’s a whole separate set of services that manage just for those. From there, it can then go and find out, “Okay, what’s the authoritative server for this particular address you’re looking for?” If you have that +trace option in there, you’ll see that. It just makes it very, very explicit, very clear of what’s happening in there, that there is this hierarchy of DNS servers and it’s traversing that to go and make it more narrow in scope to find the actual information that it’s looking for. Jon: One of the things that I used to wonder about in the beginning of my career when I thought about DNS is, if an A Record points to an IP address — an IP address is just a single machine — and you’ve got a really big service, like at the time maybe it was Yahoo!, how can that single machine handle all that traffic? I honestly don’t know when it became possible to have multiple A Record answers and I’m not sure if it was possible at that point 20 years ago. But now for sure it is possible. It could be that it’s been all along you’ve been able to say, “Oh, there’s four or five different A Record answers. These four or five IP addresses can be round-robined and they can handle all the traffic. It’s sort of like a DNS-style load balancing. But I don’t know if that’s always been the case. Do you happen to know, Chris? Chris: It’s been around for a while. It’s been at least since the early 1990s, perhaps even before then. Having this multi answer DNS records is definitely been around for quite some time. Of course now, we can use things that coupled with things like load balancers. You can now have a single domain name that, at the end of the day, may be serviced by hundreds of machines behind it. Jon: Right. I think we probably need to wrap up here. You know, we’ve shortchanged in this conversation is security. It’s kind of funny, it’s kind of ironic that that’s the part that we didn’t get to yet. And it’s also a little bit […] because you, our lovely audience have shown us that your favorite episodes are not the security ones. Maybe we’ll try to tuck in some of the security stuff into a future episode but not do an entire episode on DNS security. I don’t know that that would be what everybody wants. For what it’s worth, there’s a bunch of cool stuff in DNS security that we totally want to talk about at some point soon here. Chris: Yeah and it may be boring or not as interesting, but actually, it’s really important. We talked about it does things like writing of mail. If you’re able to spoof or be a man in the middle and change the NX Record for your mail server for some domain, this means it’s pretty easy for someone to hijack your mail. Whenever anyone sends you a mail instead of that mail being forwarded to the server that host your mailbox, it goes to some rogue mailbox. You would never know other than people will be like, “Hey, why are you changing some of the emails? I didn’t get any.” Jon: Maybe that’s how we can get people to listen to the DNS security episode coming up. It will be titled something like, How to Hack DNS and Get Free Stuff from Amazon for Life. Chris: There you go. Jon: All right. Thanks everyone for listening and thanks, Chris, for explaining this so well. I’ll talk to you next week. Chris: All right. See you. Bye. Rich: Well dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with show notes and other valuable resources is available at mobycast.fm/54. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.