Routing to the Edge of Cloud

Jaynil Patel
4 min readNov 4, 2021
Photo by CHUTTERSNAP on Unsplash

In this post, I’ll attempt to stay away from all the buzz words of IoT, Cloud, etc. and get to the bare bones of just how a user request travels to the nearest server through the network to allow for low latency.

What’s on the Edge?

A couple of decades ago, when things moved a lot slower, computers used to connect to a centralized server to get all the content over the Internet. Imagine sitting in Mumbai waiting for content to arrive from Facebook’s datacenter in Virginia. If you thought internet links would be fast and there wouldn’t be any noticeable lag, think again. If we assume that the request travels in the ideal conditions over congestion-free optical fiber cable all the way, packets would need to travel over thousands of kilometers and even at the speed of light it would take almost a second for the request to return with the desired content. This not only seems to be slow by today’s standards, but is also a wasteful use of the global backbone network if millions of users request the same content.

This is where the CDN/Edge compute comes in. There are service providers (Akamai, Fastly, etc.) that specialise in putting servers literally and geographically everywhere called Points-of-Presence (PoPs). These serve a variety of purposes like caching static content, web application firewall (WAF), authentication, etc., which deserves a separate article of its own. For this one, let’s just say that these CDN providers have put PoPs everywhere forming the edge of the cloud for everything to go through.

Just a look at Akamai’s PoPs should give us an idea of the sheer scale of the edge network they’ve set up. Each of these dots represents anywhere between tens to thousands of servers 🤯

How does this change the scenario discussed earlier you ask? Users will now connect to their nearest CDN server and get the cached content at lowest possible latency instead of connecting to the origin server located thousands of kilometers away.

This brings us to the crux of the matter, i.e. how does “connecting to the nearest server” really work under the hood? This is often considered as something that happens automatically in the background and is overlooked by software engineers in general.

DNS (with Extension Mechanism)

Yup, DNS plays a crucial role in what we’re about to discuss. Traditional DNS servers would simply return a bunch of IP addresses for the queried domain name. The DNS protocol does not allow for any additional capabilities like sending meta info about the client along with the DNS query message (limited to 512 bytes over UDP). The extension mechanism for DNS protocol allows for the DNS client to send additional parameters along with DNS query. These additional parameters help the DNS server to return the optimal IP address for the domain being queried.

The DNS client on a user’s computer will send out the client subnet (not the actual IP address) along with the DNS query. This additional option is passed on to the Authoritative server by the recursive resolvers along the way. The Authoritative server can look at the client subnet to get a sense of a user’s location and return the IP address of the closest CDN server. The user will then connect to the nearest server allowing for low latency. From this point, there are a variety of ways in which static/dynamic content is served based on the architecture of the application in question.

Here’s an example of how adding a client subnet to DNS query changes the IP address returned for

Boom. The first DNS query with client subnet returns the unicast IP of the CDN server which is geographically/topographically closest to the client, while the plain DNS query would point to a server that most probably wouldn’t be geographically close.

How its done at Priceline

Priceline uses Fastly as its CDN provider and Google Cloud Platform to host its application workloads. Here’s an example of how a user request is served.

A user request connects to the nearest Fastly edge server using the DNS extension mechanism discussed earlier. From there, the request is routed to the nearest GCLB instance before it traverses over Google’s proprietary fiber network to one of its N. Virginia datacenter where Priceline’s applications are hosted. With this kind of topology in place, user requests travel over a controlled and congestion-free route for the longest possible stretch, allowing for the best user experience possible, not to mention that the static content will be cached on Fastly servers, eliminating the need for subsequent requests to hit the origin server again.


EDNS is the unsung hero of how the internet works today and this post attempts to change that. It’s also fair to mention at this point that proximity routing can be implemented by another interesting way that uses Anycast deployment to leverage Border Gateway Protocol (BGP) which is a discussion for another time.



Jaynil Patel

Software Engineer at Priceline (Booking Holdings). Tech enthusiast.