Internet explained so that anyone can understand how it works (and why sometimes a part is KO)

January 12, 2023

Internet is a mystery. a huge one. Or at least it seems. Many people use this network every day, and what’s more, we do it for many hours without really being aware of the complexity that makes it possible . Of everything below. Until something goes wrong and we are temporarily forced to stop using some of its services, something that has happened in recent months with Amazon, Facebook, Instagram, Twitch, WhatsApp or Reddit, among many other platforms.

Precisely, the reason why from time to time something breaks on the internet, something sufficiently important for users to notice, is that all its gears must work with the precision of the mechanism of a mechanical watch. And the problem is that this network is so big, it involves so much technology, so many network equipment, so many people and so many interests that, inevitably, from time to time something goes wrong. And the origin of the error can be human or strictly technical.

We all know that, in reality, there is no magic on the internet, but a lot of technology. And in this article we propose you to investigate it with the purpose of making it more familiar and less mysterious. In a report with a limited length and didactic ambition like this, it is not possible to capture all the complexity of the innovation that makes it possible for this gigantic network to exist and function, so we will inevitably have to stick to some of its fundamental technologies .

Our intention is that readers who are less familiar with all this feel a little more comfortable when you read or hear about protocols, IP addresses or control algorithms, among other elements. Even so, this article only intends to offer you an affordable and superficial look at the internet, so in the last section we will propose some readings that can help you delve deeper and go much further if you want to know in more detail how this huge and complex network.

Table of Contents

Everything has a name on the internet: IP addresses

All devices that have the ability to connect to the internet, either through our domestic fiber optics, or using a 4G or 5G connection such as our mobile phones, must be able to identify themselves with a unique name that is not being used. by any other device. Otherwise, if two or more computers had the same name, an inconsistency would occur because we would not know which one to deliver a particular data package to. That unique name is precisely the IP address.

IPv4 has some important limitations because it is the first addressing infrastructure implemented on the Internet.

Its function is essentially the same as postal addresses that allow us to receive packages at home, and that identify us precisely to prevent them from going to the wrong destination. However, there is an important difference between IP addresses and postal addresses: the former can vary, and the latter do not (or usually do not unless the name of the street in which they reside is changed). IP addresses that do not change are known as fixed, and those that vary are dynamic precisely because they are likely to change within a given period of time.

Most of the Internet access providers provide us with a dynamic IP address, and, in addition, the devices that we have at home, and that connect to the network through our ADSL or fiber optic connection, also usually have a dynamic IP address . However, these devices do not have the ability to choose which IP address they want to have; The person responsible for assigning it is a computer that is part of the network and that uses a protocol known as DHCP ( Dynamic Host Configuration Protocol ) to decide which one to deliver to each device.

We already have a fairly good idea of what an IP address is, so now we want to take a look at what it looks like. The protocol that is responsible for describing what format they have and what their meaning is known as IPv4 ( Internet Protocol version 4 ), but it has some important limitations because it is the first addressing infrastructure implemented on the Internet, and this network has grown a barbaric during the last two decades.

In fact, it has grown so large, and will grow even larger in the future, that it has become necessary to design a more advanced addressing infrastructure that is capable of accommodating the enormous number of devices that will be connected to it in the coming years. This new architecture is called IPv6 ( Internet Protocol version 6 ), and proposes addresses with a size of 128 bits , which is why they are four times larger than the 32-bit addresses of IPv4 precisely with the aim of providing us with a map of much larger IP addresses that are capable of solving our needs in the medium and long term.

In the article that I link here we explain in great detail what the IPv6 protocol addresses are like and how they differ from IPv4, but here are some details about both. However, before we go any further, we want to know which IP addresses are included in the source address and destination address fields of data packets. And it is understandable that this is the case because it is necessary for each of the packets that travel through the Internet to carry with it the IP address of the machine that sent it and that of the equipment that must receive it to ensure that it will arrive at its destination correctly. .

IP addresses assigned using the IPv4 protocol are 32-bit numbers and are written in dotted-decimal notation , so they have the format 192.36.7.22. The smallest IP address is 0.0.0.0, and the largest is 255.255.255.255. The addresses assigned by the IPv6 protocol, however, are 128 bits long, as we have seen, and are written in eight groups of four hexadecimal digits each, so they have the form 2001:0ab8:85b3:0000 :0000:8a2e:0260:7224.

Using many more bits allows the IPv6 addressing architecture to give us a much larger address map than IPv4. One last note: according to Google, currently slightly less than 38% of its users access this search engine through IPv6.

These are the heart and brain of the internet: the TCP/IP model and the IP protocol.

The researchers who shaped the network that has evolved into the Internet we all know today realized that it was essential to define an architecture that accurately described what that network should be like and what layers or levels it should be made up of.

The TCP/IP reference model implements four layers: network access, internetwork, transport, and application.

Its purpose was to delimit the scope of those levels, so that each one of them was in charge of solving certain problems and thus avoiding that the levels above them had to work to solve those same challenges. In some ways it is a strategy similar to the “divide and conquer” philosophy that we have probably all heard of at some point.

The OSI (Open Systems Interconnection) reference model proposed by the International Organization for Standardization (ISO) is not really a network architecture because it does not describe the services and protocols that each layer must use.

It is an important model that describes seven different layers or levels (physical, link, network, transport, session, presentation and application), but its relevance is limited mainly to the academic field, so we will not investigate it further to give our full Pay attention to the TCP/IP reference model, which is the flexible network architecture used by the Internet.

Unlike the OSI model, this architecture implements only four layers and not seven. At the base and in contact with the hardware of our equipment resides the network access layer. TCP/IP does not state precisely how this layer should be implemented, so it only stipulates somewhat ambiguously that the device must be able to connect to the network in order to allow IP packets to be sent.

Just above lies the internetwork level, which is considered by many authors to be the true heart of this network architecture. Its function is to allow our devices to deliver our data packages to the network to which they are connected.

The internet layer has a very interesting peculiarity: it must guarantee that the data packets arrive at their destination , although they can do so in a different order than the one in which they were sent. The upper layers of the architecture are responsible for ordering the packets correctly if necessary.

For Tanenbaum the IP protocol is “the glue that holds the internet together”

To carry out its mission, this layer defines precisely what structure the data packets must have, and also uses a protocol known as IP ( Internet Protocol ) that, rightly, the influential and highly respected Andrew S. Tanenbaum describes as “the glue that holds the internet together. His description clearly reflects how important this protocol is.

In fact, this protocol implements the necessary resources for our data packets to reach their destination. It doesn’t matter if the machine you entered while researching at CERN had one purpose: to facilitate access to a collection of documents linked to and distributed by millions of machines across the internet.

The web was created with the purpose of facilitating access to a collection of documents linked and distributed by millions of machines throughout the Internet.

When users use our browser to go through our favorite web pages and we jump from one to another by clicking on the hyperlinks, we are using the web. However, it is important to remember that this resource is part of the internet, but not the same as the internet. It is contained in this great network and it is only a part of it. In essence it uses a client-server architecture, so what users see is a collection of documents that can incorporate hyperlinks, which are nothing more than links that link two documents and allow us to easily jump from one to the other.

The other ingredient in this equation is the server, which is simply the program that is responsible for delivering the pages requested by the clients that connect to it. The dialog between the client and the server is established using a language known as the HTTP protocol ( HyperText Transfer Protocol ), or hypertext transfer protocol. This language is constantly evolving to make it possible to implement new features and improve the user experience, and its latest standardized revision is 5.2. We talk about it in the article that I link right here.

Why sometimes a part of the internet goes to hell

The vision of the Internet that we intend to offer you in this article is superficial and only aspires for users to be minimally familiar with the most relevant protocols and technologies that we use almost daily. However, despite its limited scope, this report allows us to sense the complexity that is hidden under a gigantic network of computers and communications equipment that, according to the consulting firm Statista, in January 2021, was used by more than 4.6 billion people worldwide. .

The architecture of the Internet and a large part of the services it offers us have been designed to be fault-tolerant so that if something breaks, users will not even realize that something has gone wrong. One of the ingredients frequently used to implement this tolerance to errors is redundancy , but there are many other strategies that also seek to keep the services that the Internet offers us when they go wrong. However, despite all these efforts, from time to time something breaks. And the users notice it.

If the people who use the Internet realize that something is wrong, it is because a serious error has occurred from which one or more services have not been able to recover transparently. The root of the problem can be very diverse, so we can illustrate this situation by taking a look at the two big crashes that we have all witnessed in the last few months. One of them took place at the beginning of last June, and caused Amazon, Twitch, Vimeo and Reddit, among many other social networks and services, to go out of action for a long and tedious hour.

On that occasion the origin of the fall was in Fastly , a quite popular CDN ( Content Delivery Network ) that has an enviable customer base (hence why so many important services were affected). A CDN is a hardware and software infrastructure designed to speed up access to a given set of web services, and to make it possible, it stores a portion of the data handled by those services in a high-performance cache system. We explain everything in more detail in the article that I link right here.

The other big crash that we can look at to illustrate the extent to which a part of the internet can be knocked out took place just a few days ago, taking down Facebook, Instagram and WhatsApp, which literally disappeared from the network. . On this occasion, the origin of the problem was the BGP protocol ( Border Gateway Protocol ), or exterior gateway protocol, which, broadly speaking, is responsible for optimizing the transport of data packets between different networks, helping each of them to inform others of your presence. If you want to investigate a little more you can take a look at the article in which we explain it in more detail.

Some recommended reading if you want to investigate more about internet architecture

Fortunately, on the web we can find a lot of literature that allows us to investigate much more about the Internet architecture and better understand how its services are implemented . If you want to go deeper and this article has left you wanting more, we suggest you take a look at Mozilla’s articles dedicated to the structure of the Internet or HTML language; to the article from Stanford University (available in English) that reviews the operation of this great network; or to the abundant documentation published by the W3C, which is the organization that promotes the adoption of the standards used by the World Wide Web.

The readings that we have just proposed are a very small selection of the almost innumerable resources that the Internet makes available to us. However, we also have another option: paper books. The range of options in this area is very wide, but I cannot resist recommending two of my favorite books as a starting point: ‘Computer Networks’, by Andrew S. Tanenbaum, and ‘Communications and Computer Networks’, by William Stallings. Both are academic publications that explain everything in depth and with unquestionable rigor, which has positioned them as two authentic classics that continue to be used despite their seniority in many schools and computer science faculties.