Development Team Lead
Building web sites/apps to serve millions visits per day is a real challenge, specially when it comes to keeping response time as low as possible. News websites are perfect example for high load web sites/apps. Building few of them and keeping it in good shape, required us to redesign/rewrite our data access layer from scratch.
“Rahawan” is a new revamped design for the data access layer, helped us improve our website response time from 250 milliseconds down to just 10 milliseconds. At the same time reducing database load from 800 patch requests/second down to just 150 patch requests/second.
At its simplest form, “Rahawan” is nothing more than a data access layer powered by a data caching engine. Its main purpose is to instantly provide a fully loaded “Data Model” required to respond to web requests (like browsing some webpage).
The process of providing “Data Model” is done -most of the time- as a direct key value retrieve operation. Any change in database records triggers background tasks inside Rahawan to rebuild the “Data Models” related to the modified database records.
Note: before talking about Rahawan itself, I will do a flashback to show how the solution evolved. if you wish, you can skip iteration one & two.
Like many developers, I did engage in some projects where all lights were “green” any team member can write any code anywhere by anyway. As long as it resulted in a working-website.
The client in a rush to launch their website. Usually this generates lot of pressure on development team, At this iteration doing anything but building a working-website is less priority. No talk about code quality or better coordination between team members. Which leads to fat Controllers communicating with database directly. This is a simplified example:
As shown, this is just the perfect start for “Spaghetti Code”. Controller is communicating with database directly. Data is transferred from the Controller to the View by two different ways (ViewModel and ViewBag). To do partial rendering inside View, “Html.RenderAction” has been used once, and “Html.Partial” has been used in another.
Such code hardly can survive the continuous change requests and bug fixing done by many team members. When some developer fix a bug here, new bug is generated there. If team managed to shoot down all bugs, no one could make an end for bad performance issue. All this consumes a great amount of time and effort to reach near-stable website.
At the end of this iteration, all team agreed that they need some rest for breathtaking and to think about better solutions for such problems. Which will drive us to:
After facing many problems in previous iteration, the team got it. Not all green lights are good, some red flags have to be put on roads leading to problems, and to follow some unified design guidelines.
The two following points, are real examples for red flags our team agreed on:
This is our simplified example after passing the second iteration
Code has been clean up by:
To better understand, let’s interview it.
Rahawan: hi there
Me: Can you please introduce yourself?
Rahawan: I’m an enhanced data access layer.
Me: Why do you claim that you are “enhanced”, what sets you apart from any other data access layer?
Rahawan: I’m designed to utilize a unidirectional data flow, as much as possible.
Me: What do you mean by utilizing a unidirectional data flow?
Rahawan: Before answering, let me first explain the problem I’m here to address:
Me: That’s about the problem, what about your solution?
Rahawan: My solution is to always move modifications done on database items to web server’s in-memory caching engine, even if this data items has not been requested from presentation layer yet. This will ensure a mostly-unidirectional data flow, from the database to the memory to the presentation layer.
Me: And how this will be achieved?
Rahawan: By this design:
Me: Can you explain it?
Rahawan: Sure, it shows the 4 components used to accomplish my mission:
Me: But any ORM such as Entity Framework can do that, why all this?
Rahawan: There is great difference, Entity Framework for example , is not designed as thread safe, which means multiple web requests can NOT be safely handled using same Entity Framework context. While Rahawan is designed to be thread safe.
Me: Thread safe issue can be handled by a way or another, and continue using the ORM instead of such complexity.
Rahawan: This way you only solve half of the puzzle, how about the other half?
Me: What other half?
Rahawan: Slow database queries. You only need five minutes monitoring for live database queries -using sp who is active- before you can clearly tell that ORMs generally and Entity Framework for example, can’t efficiently translate complex queries. Which can cause many time-out errors specially on high load web sites/apps.
Me: And how did you handle the slow database queries issue?
Rahawan: by separating data caching logic from data retrieving logic, this will give us greater control on how data is retrieved (using ORM or 3rd party libraries like Dapper). Which unlocks the ability to rewrite specific queries manually to increase their efficiency without affecting the way it is being cached.
Me: How can the 4 components you mentioned earlier work together?
Rahawan: Take this -real- example
Controller paid no cost to get such news item from the database, which reflected as better response for website users.
Me: This means you will copy the entire database inside Cache Engine?
Rahawan: Of course not, there is some way to decide which data will be kept in Cache Engine and which data will be not.
Me: Details about such way?
Rahawan: Although this is being customized based on situation. But for example, any news items published during last month most probably will be in high demand. Then comes the priority of news items with high visits rate. Those can be kept in Cache Engine while others can be retrieved from the database on-demand.
Me: This means it is not always unidirectional data flow.
Rahawan: That’s why “Mostly-Unidirectional” was mentioned. As large percentage of data requests (based on real world situation) will be unidirectional, and less percentage will be retrieved from the database before being served.
Me: If some item has been retrieved on-demand from database, what is the cache invalidation policy for?
Rahawan: Cached items -as individual items- has no invalidation policy. Instead, Cache Engine capacity is monitored. Once max capacity is reached, least requested items is removed.
Me: I got your idea, what about the implementation?
Rahawan: Although the current 4 components implementation can be changed over time, but these are the current details:
Me: What about the results?
Rahawan: At server-side level, Mini Profiler has been used to measure response time, this is an example for the difference:
Rahawan: At the database level:
Currently, there is a plan to rebuild the Change Detector based on Publish/Subscribe pattern, instead of periodical SQL queries. And other plan for Redis powered Cache Engine, and a lot more.
Rahawan is not the silver bullet to eliminate any web site/app performance issues, it was born from news-based websites challenges, and it can be used in other web sites/apps facing similar challenges. Web sites/apps facing different challenges will need more suitable solution. It may be Rahawan or modified version of, or even something else.
This article talked about the design of Rahawan, and didn’t dig so much on our implementation. As this would require many articles to share our implementation details, and issues encountered during development. Such as memory leaks and sudden process termination just to discover after deep diagnosing journey that it was unhandled stack-overflow exception. will try to write about these details in other articles if needed.
Thank You, Have a good day :)
Create your free account to unlock your custom reading experience.