The story began in 2004 when a group of Harvard Graduates(Mark Zuckerberg,Eduardo Saverin, Andrew McCollum, Dustin Moskovitz and Chris Hughes) began solving a problem for Students Interaction. It was supposed to be a social networking sites for College students in US, but gaining the popularity it grew and got public in 2006.And from there, the business kept growing on what now is nearly a trillion dollar worth company.
In recent Facebook stats, there are over 2 Billion users on Facebook where over 1.5 billion visit every single day. But there has to be some technology stack that handles such a huge infrastructure, isn’t it? That’s exactly what we’ll be looking for in this Story.📝 So without further do, let’s get started.
1. LAMP Stack
LAMP (Linux, Apache, MySQL & PHP) is a very popular software stack with 4 individual components. It is widely used to build Dynamic websites and web applications.
Although the components can widely vary amongst the databases, scripting languages, servers or Operating Systems, the entire LAMP stack was one of the vital aspects of the Facebook architecture.
Let’s have a look into each of these Components.
A popularly used Operating Systems amongst Software Engineers and developers. It holds a significant market share and also used by Major organization’s like IBM, Dell,Sun Microsystems (remember Java ;) and Nokia.
Also has support for wide array of programming languages such as Java, PHP, Python, Go, Haskell etc.
Apache HTTP server is one of the most popular used Web server worldwide with 29% websites running globally. It works with Multiprocessing modules such as Process based, process-thread based or event hybrid-based.
Apache has various features such as
- Fault Tolerant (ability to work even if one server fails).
- Load Balancing (spread equivalent load across multiple servers for efficient working).
- Web sockets(a mechanism used for Building messaging in Social Media Applications).
- IPv6 compatible.
- High Scalability (handling increased no of users on the website).
Although recently a lot of businesses have switched to NGINX (alternative),it is still popular.
MySQL is a commonly known Relational Databases used by many businesses for small scale to medium scale sized audiences.
Remember when we would upload our timelines with Statuses, post a Birthday wish, share a meme? It would be a back-end query that does all the logical stuff to keep the users Glued✨ for the Experience.
PHP, also known as Hypertext Preprocessor as it’s recursive acronym, is a server side scripting language used to build Dynamic Features to a static webpage like fetching, inserting data from a user into a Database (like MySQL above).
Another popular feature of PHP code is that it also gets embedded with HTML code. But one of the controversies before PHP 7.0 was it’s runtime speed, which as slower than languages like C & C++, and ceretainly not suitable for Growing Facebook user base.
Yeah pun intended
So Engineers at Facebook came up with a solution for PHP with an alternative that we’ll see next.
2. HipHop Virtual Machine (HHVM)
HipHop Virtual Machine (HHVM) is a virtual environment (doesn’t sound appealing right 🥱?) or can be said as an Acclerator that converts the native PHP code first into a byte-code, which is later converted to readable machine code.
This would result into an increased speed (over 9x times) which would not be possible by direct PHP code.
It also decreased latency (delay time) on the Facebook’s website when loading data.
Well you might be getting bored right? Now comes the interesting stuff about strategies used to store all your timeline photos 📸.
Haystack (one of those wallpapers on windows 7) that we would see isn’t it? Yes it is.
It is also an Object Store used by Facebook to store all the photos that users upload.
The name usually got derived with the analogy of
Needle in a Haystack
where the needle would be an Image and the Haystack is the large cluster of Hardware where Data is stored.
But how would we locate the specific Needle in such a huge Haystack where there are other billions of needless as well? The answer is simple.
Img src:- Facebook Engineering
- Each needle(image) is attached a header which contains a tuple of<Offset, Key, Alternate Key, Cookie> and a Footer.
- The record of needles are stored in a object store file(like a Diary) which locates the images in the Cluster of Nodes(which need not be in the same location area like one node is in USA and other in Canada).
The photos are also stored in 4 different sizes(small, the platform you’re reading on ;-),large and thumbnail).
Memcached is an open source caching service used by many social Media sites like Twitter, YouTube, Reddit, Pinterest etc. But for that what exactly caching is?
Img src :- Google
Caching is a mechanism in Computer Architecture to store the most frequently visited data on any website in the local memory(RAM).
For ex:- Let’s say at the beginning of the Lockdown due to Covid-19, we were all locked in our Homes binge watching Netflix (and chill of course 🤟) and the new season of the Webseries named “Money Heist" gets released. It was a hot cake at the time of release so everyone wanted to get a taste as earliest as possible. But isn’t too much of a load at any given exact moment a burden on Server? This is where the episodes would be cached.
This will reduce the load on the Servers instead and also speed up the loading time,hence improving user experience. But there’s more to it.
As time passes by, the Cache keeps getting pushed down the order as the craze gets low for watching the series. And once at the bottom, the cached data of the movie gets kicked out by Eviction Policy(removing data from the cache as it gets old). The most commonly used algorithm would be Least recently used (LRU).
This certainly reduced the load on the database right? But what database are we talking about? That what we’ll be looking next.
Cassandra is an open source, wide-column database store that was internally build by Facebook for storing user’s data into a NoSQL Database.
It provides a design which is a combo of Amazon’s Dynamo dB, and Google’s Big Table. It certainly manages data accross Distributed Systems.
Facebook initially used it for it’s item search feature, and interaction between users. Cassandra comes witha wide variety of features such as
- Fault Tolerant (ability to recover even if one Server crashes down).
- Horizontal Scalability (ability to expand when the users count increases on the website).It basically means adding more hardware Servers (nodes) instead of increasing RAM in one PC(Vertical scaling).
- Wide Column store ( no restriction on no of columns or rows).
Facebook also uses Ganglia,a system for keeping track of Nodes for any failure. This makes the task divided equally with different Nodes communicating and a master operating them.
GraphQL is an open source, query language internally created by Facebook for data manipulation. It is mainly known for its advantages over REST API.
GraphQL API works in a sense such that the API returns the data from the DB that is relevant to the user,instead of going through all the records, saving on unnecessary data fetching.
For ex:- On a weekend, a mom would distribute some grocery shopping amongst her kids. The lady would distribute the tasks to her Kids to get a specific grocery item like dairy or vegetables. This is where the mom’s duty to assign is like the User defining schema, and the kids would play the role of GraphQL API.
Img src:- Toptal.com
REST API on the other hand would result in the lady going all the grocery by herself, hence would fetch all the Data.
Although there’s much more to the Facebook’s engineering infrastructure and would take a big manual to cover all of these. These are some of the backend technologies that Facebook has used overtime and still uses some(like PHP being loyal to the developers).
ReactJS, Atomic CSS, HIVE, Python are some more Technologies that Facebook uses today.
If you liked this article, share it to your friends who are Tech Savy. Also would appreciate if you follow me for more Upcoming Blogs on Tech, Big data and Social Media.✌️.
LinkedIn profile 👇