Start free trial Book a demo
database performance optimization

How we redesigned the analytics data processing in Document360

Category: Product Engineering

Last updated on Jan 20, 2023

The database is one of the most crucial parts of any application’s performance. When it comes to Document360 we always thrive to improve the performance of the application considering the highly scalable SaaS environment.

We were constantly trying to improve our application’s performance, and while closely monitoring the application for any setbacks & improvements, we found out that there’s a significant number of database writes that’s been happening every day through a particular endpoint.

There are 2 sides to Document360, one being the portal where the user will be writing articles, reviewing them, and managing the overall knowledge base. The second part is the knowledge base site where the customers will just consume the contents of the knowledge base.

An intuitive knowledge base software to easily add your content and integrate it with any application. Give Document360 a try!

GET STARTED
Document360

Deep diving further

In layperson’s terms, a public-facing knowledgebase site will get more views compared to the portal where the people will write articles and manage the knowledge base. We started monitoring all the endpoints that are being consumed by the customer-facing knowledge base sites and collected the metrics for them.

Being a knowledge base, tracking is one of the most important parts of the application where we show the knowledge base total number of views/reads/likes/dislikes, etc… All this information must be collected when users visit the Knowledge base site and we consolidate & show the metrics on the portal. We noticed that for collecting analytics Earlier we have been used this API (UpdateTrackingInformation) and this is responsible for collecting all the metrics from the Knowledge base site & writing them to the database.

Imagining a basic flow

Whenever a user comes to the Knowledge base site, the user may visit multiple articles, read them, spend time on some articles where it requires more context, or the user may even like/dislike various articles. Now even if we consider a rough estimate that each user sees at least 5 articles and interact with them, that is 5 separate database hits to write the information to our database.

dbHits_before-Document360

Looking at the average number of hits that have been made to this API is having a huge impact on our database. The average number of hits per day is 145K and when we see it for a week the numbers are very huge 821K. We are sure that we ought to do something to address this problem.

The solution we arrived

Analytics is not the most immediate information required instantly. Whenever 500 users visit the site, we thought that it does not make sense to show analytics instantly in the portal, as nobody in the knowledge base would be looking at analytics for the current date time or today.

We have completely changed our analytics architecture so that the number of DB hits made to the database is drastically lower than earlier. The final architecture that we arrived at is represented in the below diagram.

Analytics Architecture-Document360

We started to use queuing mechanism to consolidate & write the results to the database by grouping the results based on the project. For sake of simplicity, let’s consider there are two websites.

There may be multiple users visiting the documentation site of the products for various use cases. Based on our new architecture all the information that is collected for Document360 will be grouped into a single set and written to the database in a single BulkWrite operation. In this way, we reduce the maximum number of connections and writes that are being made to the database.

The new metrics chart after we rolled out our analytics architecture. On average per day, the number of database writes is 2K and for a week the number stands at 12K

dbHits_after-Document360

Conclusion

The difference is huge, we can see a slight decrease in our overall operational costs including the API servers and our database. So, consider queuing for any such scenarios where data like analytics is not required immediately henceforth the application’s performance can be improved by significantly reducing the overall load on the servers & database.

  Old Implementation New Architecture
Daily Database Writes 144,993 2,135
Weekly Database Writes 821,628 12,813

Interested in Document360 Knowledge base? Schedule a demo with one of our experts

Book A Demo
Document360

Related Articles