CDN and everything else

CDN and everything else

·

13 min read

What is a CDN? Content delivery networks are explained

A content delivery network (CDN) is a geographically distributed network of servers and their data centers that helps distribute content to users with minimal latency.

This is done by bringing content closer to users' geographic location through strategic data centers called Points of Presence (PoPs). CDNs also include cache servers that store and deliver cache files to speed up web page load times and reduce bandwidth consumption. Below we will go into more detail about how CDNs work.

CDN services are essential for businesses that rely on delivering content to users.

Consider the following:

Large news publication with readers in many countries

Social media sites that need to serve multimedia content in users' feeds

Entertainment websites like Netflix offer high quality web content in real time

E-commerce platforms with millions of customers

Gaming companies with heavy graphics content accessible by geographically dispersed users

All these businesses need to ensure the acceleration of their content delivery, service availability, resource scalability and web application security. This is where CDN services shine as a unique advantage.

A brief history of CDNs

CDNs were created almost twenty years ago to address the challenge of quickly delivering massive amounts of data to end users on the Internet. Today, they have become the driving force behind website content delivery and continue to be researched and improved by academics and commercial developers.

The first content delivery networks were built in the late 90s and still account for 15-30% of global Internet traffic. Following this, the growth of broadband content and streaming of audio, video and related data over the Internet has led to the development of more CDNs. In general, the evolution of CDNs can be classified into four generations:

Preformative Period: Before CDNs were created, the required technologies and infrastructure were being developed. This period was marked by the advent of server farms, hierarchical storage, improvements in web servers, and the deployment of proxy caches. Mirroring, caching and multihoming were also technologies that paved the way for the creation and growth of CDNs.

First Generation: The first iteration of CDNs focused primarily on serving dynamic and static content, as these were the only two types of content on the web. The main mechanism at that time was to create and implement copies, intelligent routing and edge calculation methods. Programs and information were distributed across servers.

Second generation: Next came CDNs that focused on streaming video and audio content or video-on-demand services like Netflix to users and news services. This generation also paved the way for delivering website content to mobile users and saw the use of P2P techniques and cloud computing.

Third generation: The third generation of CDNs is where we are now and continues to evolve with new research and development. We can expect CDNs to be increasingly modeled for the community in the future. This means that the systems are driven by average users and ordinary people. Self-configuration is expected to be the new technology mechanism as well as self-management and automatic content delivery. Quality of experience for end users is expected to be the main driver of the future.

CDNs originally evolved to deal with extreme bandwidth pressures, as the demand for streaming video grew along with the number of CDN service providers. With the advancements in connectivity and new consumption trends in each generation, the price of CDN services has decreased, allowing it to become a mass market technology. And as cloud computing has become more widely adopted, CDNs have played a key role in all layers of business operations. They are key to models such as SaaS (Software as a Service), IaaS (Infrastructure as a Service), PaaS (Platform as a Service) and BPaaS (Business Process as a Service).

How does a CDN work?

CDNs work by reducing the physical distance between the user and the origin (a web or an application server). It consists of a globally distributed network of servers that store content much closer to the origin. To better understand this, it helps to examine how a user accesses web content from a website with and without a CDN.

No CDN

When the user enters his website in the browser, he establishes a connection similar to the one below. A website name is resolved to an IP address using Local DNS or LDNS (such as a DNS server provided by an ISP or a public DNS resolution server). If DNS or LDNS cannot resolve the IP address, it recursively requests upstream DNS servers for resolution. Finally, the request may be forwarded to the valid DNS server hosted in that zone. This DNS server resolves the address and returns it to the user.

Local DNS to valid DNS

The user's browser then connects directly to the source and downloads the website's content. Each subsequent request is served directly by the origin and the static assets are stored locally on the user's device. If another user tries to access the same site from the same or another location, he will go through the same sequence. Each time user requests hit the origin, the origin responds with content. Each step along the way adds a delay or "delay". If the origin is located far from the user, the response time suffers from a significant delay and provides a poor user experience.

With CDN

However, in the presence of a CDN, the process is slightly different. When user-initiated DNS requests are received by his LDNS, it forwards the requests to one of the CDN's DNS servers. These servers are part of the Global Server Load Balancer (or "GSLB") infrastructure. GSLB helps the load balancer function, which literally measures the entire Internet and tracks information about all available resources and their performance. With this knowledge, GSLB resolves the DNS query using the best edge address (usually close to the user). An "edge" is a set of servers that store and serve web content.

Global service load balancer diagram

After DNS resolution is complete, the user sends an HTTPS request to the edge. When the edge receives the request, the GSLB servers help the edge servers to forward the requests on the optimal route to the origin. Edge servers then fetch the requested data, deliver it to the requesting end user, and store that data locally. All subsequent user requests are served from the local dataset without the need to re-query the origin server. Content stored at the edge can be delivered even if the origin is unavailable for any reason.

Why use a CDN?

CDNs help businesses deliver content to end users efficiently by minimizing latency, improving website performance, and reducing bandwidth costs.

Another unique feature of CDNs is that they allow edge servers to pre-fetch content. This ensures that the data you are about to deliver is stored in all CDN data centers. In CDN parlance, these data centers are called Points of Presence (or "POPs"). PoPs help reduce round-trip time by bringing web content closer to the website visitor.

For example, suppose you run an advertising campaign and promote your service or product to millions of potential customers. You might expect a lot of customers to flock to your site after reading the post. If you are dealing with influencers who have a good audience engagement rate, the traffic volume can increase even more. Can you be sure that your primary server can handle this volume increase all at once?

In such a scenario, CDNs can help distribute the load between edge servers and everyone will receive the response. Because only a small fraction of requests reach the origin, your servers won't experience traffic spikes, 502 errors, and overloaded upstream network channels.

Advantages of CDNs

Depending on the size and needs of your business, the benefits of CDNs can be divided into 4 different components:

Improve website page load time

By enabling the distribution of web content closer to website visitors using a nearby CDN server (among other optimizations), visitors experience faster web page load times. Visitors are generally more inclined to click or move away from a website with a high page load time. This can also negatively affect the ranking of the web page in search engines. So having a CDN can reduce bounce rates and increase the amount of time people spend on the site. In other words, a website that loads quickly will keep more visitors around for longer.

Reducing bandwidth costs

Each time an origin server responds to a request, bandwidth is consumed. Bandwidth costs are a major expense for businesses. Through caching and other optimizations, CDNs can reduce the amount of data a primary server must serve, thus lowering hosting costs for website owners.

Increased content availability and redundancy

High volumes of web traffic or hardware failures can disrupt the normal functioning of the website and lead to crashes. Thanks to their distributed nature, CDNs can handle more web traffic and withstand hardware failure better than many origin servers. Additionally, if one or more CDN servers go offline for some reason, other operational servers can receive web traffic and keep the service uninterrupted.

Improve website security

The same process that CDNs handle as traffic surges makes them ideal for mitigating DDoS attacks. These are attacks where malicious agents overwhelm your application or core servers by sending a large number of requests. When the server goes down due to volume, the failure can affect the availability of the website to customers. A CDN essentially acts as a DDoS protection and mitigation platform with GSLB and edge servers that distribute the load evenly across the entire network capacity. CDNs can also provide certificate management and automatic certificate generation and renewal.

How else can a CDN be useful?

CDN is not limited to the benefits described above. A modern CDN platform offers many more benefits to your engineering and business teams.

It can be used to manage access from different regions of the globe. While you allow access to some areas, you can deny access to others.

You can easily offload application logic to the edge and close to your customers. You can process and transform request/response headers and bodies, route requests between different origins based on request attributes, or delegate authentication tasks to the edge.

A large volume of traffic requires an infrastructure to collect and process reports for further analysis. CDNs collect reports and provide an interface for convenient analysis of data generated by visitors.

It is natural that when you are familiar with something, it becomes easy to use it. For this reason, CDN Pro edges are based on NGINX. This means you can perform tasks using standard NGINX directives.

Data security and CDN

Information security is an integral part of CDN. CDNs help protect a website's data in the following ways.

By providing TLS/SSL certificates

A CDN can help protect a site by providing Transport Layer Security (TLS)/Secure Sockets Layer (SSL) certificates that ensure a high standard of authentication, encryption and integrity. These are certificates that guarantee that certain protocols are followed in the data transfer between the user and the website.

When data is transmitted over the Internet, it becomes vulnerable to interception by malicious agents. This is solved by encrypting the data using a protocol that only the intended recipient can decode and read. TSL and SSL are protocols that encrypt data sent over the Internet. This is a more advanced version of Secure Sockets Layer (SSL). If a website starts with https:// instead of http://, you can tell if a website is using a TLS/SSL certificate, which indicates that it is secure enough for the browser to communicate with the server. It is safe.

Reduce DDoS attacks

Since the CDN is deployed at the edge of the network, it acts as a high-security virtual fence against attacks on your website and web application. The distributed infrastructure and location on the edge also make a CDN ideal for blocking DDoS floods. Since these floods must be mitigated outside of your core network infrastructure, the CDN processes them based on their origin on different PoPs, preventing server saturation.

Block bots and crawlers

CDNs can also block threats and limit abusive bots and crawlers from using your bandwidth and server resources. This helps limit other spam and hacking attacks and lowers your bandwidth costs.

Static and dynamic acceleration

Static content refers to your assets that do not need to be produced, processed or modified before being delivered to end users. These may be images or other media files, binary types, or static parts of your application such as HTML, CSS, JavaScript libraries, or even JSON, HTML, or any dynamic response type that doesn't change often. As mentioned earlier, you can pre-fetch such content. Then, when you need to invalidate such content and remove it from the edge servers, you can clean up the routes.

Dynamic acceleration applies to something that cannot be stored on an edge due to its dynamic nature. Imagine a WebSocket application that listens for events from a server or API endpoint whose response varies depending on credentials, geographic location, or other parameters. It is difficult to use cache machines at the edge that are similar to storing static content. In some cases, greater integration between the app and the CDN may help. However, in some cases, something other than cache must be used. For dynamic acceleration, optimized CDN network infrastructure and advanced request/response routing algorithms are used.

Billing model or "What do I pay for?"

Conventionally in a CDN, you pay for the traffic consumed by end users and the amount of requests. In addition, HTTPS requests require more computing resources than HTTP requests, which places a greater burden on the CDN provider's equipment. Because of this, you may incur additional charges for HTTPS requests, while HTTP requests are not billed at an additional cost.

As computing moves to the edge, the CPU becomes the billing issue. Requests may have different processing pipelines and thus require different CPU time. Billing based on the number of requests is impractical. It's more practical to bill based on traffic + CPU time.

CDN components are explained

Various elements make up a CDN and enable it to function as it does.

Here is a brief explanation of the role of the three main components:

Points of Presence (PoPs) - PoPs are data centers that are strategically placed to provide faster communication between users. Bridging the gap between website content and the visitor makes for a much faster and less disruptive user experience.

Cache Servers – These are the components responsible for storing and delivering cache files. Their main function is to increase website loading speed while reducing bandwidth consumption.

Storage (SSD/HDD + RAM) - All recovered data resides on SSD (Solid State Drives), HDD (Hard Disk Drives) or in RAM (Random Access Memory). Those files that are used the most are usually kept in the fastest medium - RAM.

Different types of CDNs

Not all CDNs work the same, and some are better positioned than others to serve certain types of content. Here are 3 types of CDNs to choose from:

Peer-to-peer CDn

If you've ever torrented, you probably already know how a peer-to-peer CDN works.

This CDN works using Peer to Peer protocol. In Peer to Peer CDN, content is not stored on an edge server. Instead, every user on the network who accesses the content also shares the content.

So, for example, when a user downloads a movie from a torrent, he is also sharing parts of the movie with other users in the background. It is very affordable as it does not require expensive hardware.

Push CDN

With Push CDN, you as the website owner or developer are solely responsible for it.

Instead of waiting for the PoP server to send the web page data when requested, you are now sending the content you want to the PoP servers before a request is even made. This information and associated elements are then cached until they are deleted or deleted.

With Push CDN, you have complete control. What you push on the PoP server is what appears on the web visitor's device when making a web request.

Main Pull CDN

Origin Pull CDN, as the name suggests, consists of a PoP server that pulls web page data and other elements from the origin server.

A CDN determines what information is presented to a web user when a request is received.

For example, when a client sends a request that needs to fetch static assets from the CDN, if the CDN doesn't have it, it gets the new

updated assets from the origin server and caches itself with this New property fills. and then send this new cache property to the user.

Unlike Push CDN, it requires less maintenance because the cache updates on CDN nodes are done based on client requests to the origin server