What happens when you type holbertonschool.com in your browser
Does it seems so easy to google something in the browser engine or look for a page by typing its URL in the browser ? Ever wondered what actually happens behind the scenes ? Indeed there are several processes that happen after you hit the enter button that i will try to explain briefly in this article by taking “Holbertonschool.com” as an example of URL.
Servers and clients: Every device or system trying to visit a website is called a client in the other hand the computer that is hosting the website’s content is called server. The website could be static o dynamic In the first case, the static information could be queried and retrieved to the client instead, if the information is dynamic, it goes to the application server to execute the request using a programming language such as python and querying the databases.
A database: is where all the non-temporary data are stored, such as usernames, email addresses, passwords, and much more. A website’s servers interact with databases via a DBMS (DataBase Management System). which is a software that allows us to get, add, modify or remove data.
There’re 2 types of databases, Non-Relational Databases (NOSQL) and Relational Databases (SQL). The usage and the benefits of each database type depends on the functionalities of the website.
When you open your web browser and type in the address bar “holbertonschool.com” , there’re some requests and connections done between the client, the server, some other devices and protocols before it connects with the website.
Domain Name: “www.holbertonschool.com” is just a domain name: consisting of a top level domain (‘com’) and second-level domain (‘holbertonschool’). We can define the domain name as some characters used so that Human beings can identify servers.
IP address: Over the internet, servers aren’t identified by their domain names but by their IP addresses. IP stands for Internet Protocol, is an address that is assigned to each device connected to the internet and allows direct communication to them. There are two types of IP addresses: IPv4 and IPv6 which differ in their format.
- IPv4 is a series of 4 numbers from 0 to 255 separated by
.
which is the local address of the computer you are currently on. - IPv6 is a series of 8 groups of hexadecimal numbers separated by colons which solves the issue of saturated IP addresses.
Domain Name System: This is where the DNS comes in. The DNS is a database system that match the domain name of the website to an IP address. There’s a process called DNS lookup. The browser is going to look for the IP address of the domain name in its cache, then in the computer’s Operating System (exp: Windows 10) . If it’s also not there, it’ll be searched in the router that’s connecting us to the internet and finally it’ll be given to our ISP server (resolver). The resolver will look for the location of Top Level Domain Servers containing the IPs of domain names ending with one of top level domain endings. Then they’ll do a reverse DNS lookup, to find the IP. Therefore, the DNS plays a very important role in the serving of websites. If the DNS server is able to locate the IP address of the website, it will return it in order to know exactly where to look. However, if the DNS server is not able to locate the corresponding IP address, it will return an error: “404 not found”.
TCP/IP: Once the browser has obtained the correct IP address, it will establish a connection between the client computer and the web server hosting the site using a transfer protocol. The Internet is full of small networks with their own protocols, so the use of TCP/IP (Transmission Control Protocol/Internet Protocol) has been standardized since it is more reliable than other protocols such as UDP (User Datagram Protocol).
In fact unlike UDP , TCP verify the establishing of the connection before sending data: First the user sends a packet to the server to start the dialogue, then the server sends one in return to accept. Once the connection is established the browser will send an HTTP request to the web server in the form of headers, some sort of instructions to the server to specify what exactly it wants from it. For exemple, sometimes a header can contain this line “connection: Keep-Alive”, which basically tells the server to keep the connection open. TCP/ IP sends information in the form of packets, which are broken up in pieces of information. It provides guaranteed delivery of the data without errors and and track the packets so that data can’t be lost or corrupted.
SSL/ TLS protocol : When you request something from a website, your request is sent under a special data format and that format is defined by the HTTP protocol. The problem was that this protocol sent data via forms or textboxes which makes it an easy target for hackers and that’s why HTTPS was developed. With the SSL protocol (Secure Sockets Layer) the client recieve a copy of SSL certificate and then HTTPS encrypts the data, this makes sure that the information becomes unreadable for everyone except for the server you are sending and receiving the information to and from. This approach makes our sensitive personal data such as social security numbers, credit card numbers and so on secure from being revealed to hackers.
When a client connects to a server using SSL it requests it to reveal its SSL certificate then the server will send the client a copy of its certificate given by SSL providers and it’s used to authenticate the identity of a website and to guarantee that the server is trustworthy.
In our browser, a secure website can be identify by the “https://” or by a padlock like this:
Firewall: For a website to run properly it needs to ensure security and privacy. These things are taken care of through the use of HTTPS/SSL and firewalls.
A Firewall is a network security device that filters incoming and outgoing network traffic and allows your network to identify traffic that should or shouldn’t be allowed in or out based on a defined set of security rules. Firewalls can restrict certain IP addresses by blocking the TCP/IP ports they use if it detects a malicious incoming requests. it rejects it to prevent it from harming the website infrastructure devices.
Load balancer: Servers have a maximum capacity of accepting requests and sending back responses. A large volume of traffic may overwork a singly working server and cause delays or connection issues. The main solution is to deploy a website with high traffic on different servers and then connect these servers via a piece of hardware or virtual hardware called ‘Load balancer’ Once, the firewall and SSL have done their job and permitted a safe transfer of information, the Load-balancer server will take the role of redirecting the request to one of the backend servers based on a predetermined algorithm that establishes the logic of load deployment depending on factors like server capacity, type of requests, server status (enable or disable).
Load Balancers are tools that helps to make your content always available and provide fastest response and the most efficient and effective use of the available resources.
Conclusion :The internet is really a fascinating technology. it isn’t as simple as we might think it’s and all of those steps we explained done in milliseconds.