# **What happens when we type a `URL` in our browser and press `Enter`?** # **1. Parse the URL** To visit any website on the Internet we need to know the `address` of the `server` that is hosting that website. That address is a number called the `IP` address. `IP` stands for `Internet Protocol`. A `protocol` is a set of rules that define a method of exchanging information over a `computer network`. That is what the `internet` really is, a network made up of billions of connected computers, each one with their own `IP`. For example `google.com`'s IP is `172.217.5.101`, so by typing `http://172.217.5.101` in your browser you get to google.com. The first thing the browser does is check if what we entered is an `IP`. If it is, then it will try to connect to it. In reality though, it's impossible for us to remember all the `IPs` of the many websites we visit, so we remember their `URLs` instead. But URLs are not addresses and our browser needs the exact address. So how does it figure that out? Naturally, it looks it up! # **2. Lookup Address** ### `Local Lookup` There are several different locations where the `IP` address can be found. First the browser checks these 2 locations on its computer. ## `Browser Cache` The browser searches its cache. `Cache` (pronounced *cash*) is a hardware or software component that stores data so future requests for that data can be served faster. The `Chrome` browser, for example, caches the IPs of the URLs you visit for 30 seconds, after that the cache expires. If you use Chrome you can type in this URL to see your cached addresses `chrome://net-internals/#dns`. ## `Hosts File` Second, the browser looks in the operating system's `hosts` file. The hosts file is a text document that has a list of IPs with their associated domains. Windows: `C:\Windows\System32\drivers\etc\hosts` Mac: `/etc/hosts` ### `External Lookup` If the IP address cannot be found locally, the browser must do an external search. This is where `DNS` comes to the rescue! It stands for `Domain Name Service`, which is a `database` hosted on multiple servers around the world that contains `records` of every `domain` on the internet and its `IP` address(s). `DNS` makes browsing the internet human-friendly, as we no longer have to remember any numbers, instead remembering the `URL` name, and `DNS` gives us the `IP`. DNS lookup proceeds as follows: ## `Router Cache` If the `IP` is not found on the browser's computer, the browser tries to see if there's a `router` on the `network` with a DNS cache. If it doesn't find it there it connects to the ISP. ## `ISP DNS Cache` `ISP` stands for `Internet Service Provider`, which is the company you pay monthly to have Internet access. ISPs have DNS cache servers in their data centers. Those servers are the next thing to get checked. If the record is still not found, the ISP's DNS server initiates a `recursive DNS query`. ## `Recursive DNS Query` ![](https://www.filepicker.io/api/file/zHW9lR74QauHvHhtl9zr) Simply put, a recursive DNS query is multiple DNS servers calling each other until they find the correct record, which is then returned to the browser. Let's say we're trying to find the IP of `mail.google.com`. The ISP's DNS server contacts the top-level domain `.com` DNS server, which redirects to the `google.com` DNS server, which returns the IP of `mail.google.com` all the way back to the browser. Note that even if we have to go through all these steps to get the IP, it all happens extremely quickly and we don't have to do any kind of waiting usually. Also, the DNS servers will cache any IP they did not have for a certain amount of time, so it can be accessed quickly the next time it's requested. # **3. Connect to Server** ## `Initiate TCP/IP Connection` Once the browser (which is the `client` in a `server/client` relationship) receives the IP, it will attempt to establish a connection to the website's hosting server using `TCP (Transmission Control Protocol)`. The connection is established using a `TCP/IP 3-way handshake`. Before we get to the server we usually go through some network components, most importantly being the load balancer and the firewall described below. **1.** Client asks the server if it is open for connections by sending it a `SYN (synchronize)` packet. **2.** If the server can accept the connection it responds with an `ACK (Acknowledgement)` of the SYN by sending a `SYN/ACK` packet. **3.** Client receives SYN/ACK packet from server and acknowledges by sending another `ACK` packet. Now we have an `established` TCP/IP connection and we can `transfer` data back and forth! ## `Load Balancer` The `IP` address we get for the website we're trying to visit usually belongs to a server called the `load balancer`. The load balancer does what you imagine, it splits up the load or web traffic onto multiple servers. Large website that receive a lot of traffic need to do this because one server is not enough for the potential millions of users trying to connect. ## `Firewall` Sometimes the IP we get belongs to a `firewall` which is a very important `security` component in the `network stack`. A firewall can be implemented as software or hardware. Firewalls exist at different locations in the network, either before or after load balancers. The firewall is a barrier between a trusted and an untrusted network. It has rules that define who is allowed to access the network and who will be blocked. ## `Data Base` Websites store their information in an `SQL` database. Larger websites split their database onto multiple servers. `SQL (Structured Query Language)` is used to manipulate the database. SQL commands such as `Select`, `Insert`, `Update`, `Delete`, `Create`, and `Drop` are used to accomplish almost everything one needs to do with a database. # **4. HTTP(S) Request** ## `HTTP` `HTTP` is the `Hypertext Transfer Protocol`. It is the underlying protocol used by the `World Wide Web` to define how messages are formatted and transmitted between a Web server and a browser. ## `HTTPS` If you look closely at the displayed URL in your browser, you see it often starting with `https` not `http`. HTTPS is the secure version of the HTTP protocol. That means that the connection between us and the server is encrypted and can't be deciphered by anyone listening in to our communication. This is very important when you are dealing with sensitive information. You don't want anyone finding out your banking, credit card or social security number, for example. ## `SSL` HTTPS uses the `SSL (Secure Sockets Layer)` protocol to establish an encrypted connection between a web server and a browser. An SSL certificate is necessary to create SSL connection. You can see the SSL certificates being used by Chrome by clicking on `Manage certificates` under the `Privacy and security` preferences. ## `GET Request` Once the TCP/IP connection is established, the browser uses HTTP to send a GET request to the web server. The server has a software running on it called a `web server` like `Apache` or `Nginx` that processes the GET request and sends the needed information back. ## `POST Request` Sometimes we need to send information to the server, like login info, or we might be submitting a form. In this case the browser will send an `HTTP POST request` to the server instead of a GET request. To see the client/server communication happening in the background when you browse the internet checkout Firefox's `Firebug` plugin. A lot more communication is happening that what you'd expect! # **5. Server Response** ## `Server Response Codes` A server has multiple kinds of responses to the GET/POST requests it receives. The response is a 3-digit number, the first one of which tells us what family of messages the response belongs to. The one we're usually familiar with is the famous `404 Page Not Found`. It breaks down like this: `1xx`: Informational response `2xx`: Success `3xx`: Redirection `4xx`: Client error `5xx`: Server error # **6. Display Webpage** Once the basic HTML code is received it is processed and displayed by the browser. After that the browser sends GET requests for other items that the HTML code refers to, like `CSS` style guides, `Javascript` files, `images` and `videos`. That's why a lot of times it seems to us that the page loads in parts, with components with the biggest size loading last.