Evolutivo 32M

big data 2023 limit

Evolutivo 32M Project: defining the data limits of EvolutivoFW

A little over a month ago we started a project (evo60m) to define the viability of having a 60 million record ingestion into EvolutivoFW. After some days it was clear that that amount of data was not feasible because we have an average rate of 1 million records per day, but since we had already started I decided to keep going and create a test database with 60M records for us to use as a performance and limit evaluation database.

After 35 days I had a database with 32 million records distributed evenly among 15 modules with 1 million records each, 2 million records in emails, and 14 million in Inventory Details. Document Folders and Campaigns serve as a stressing point for the internal many-to-many relation table with millions of relations. I then duplicated that database and added 5000 users. So we now have two databases with a lot of information to play with.

With that we start the evo32m project.

Some Conclusions

Although all the important discoverments are still to be found I can already say that I was happy to see that the record creation rate did not degrade at all. Even though the number of records in the database was consistently increasing the number of records that we could create stayed fixed at around 1 million records per day. I understand that this means that there is a limit in the database engine and that we could increase the number of records ingested by changing the resources of the server for both the code and the database. We are still very far from the 60M request but I am convinced that we can go over that 1 million per day limit easily.

We have a serious issue with memory leaking. The maximum number of records created in one execution is around 5k, after which we get a memory exhaustion error. We have to fix this.

The dashboard/home page is seriously inefficient. Don't even try to land on that page, set the URL to some other module when you enter the application. We have to dedicate time to this, but first, we have to decide what we want to do to give our users a basic control panel view.

Popup screens are slower than the list view.

Analyzing Performance

To start understanding where the bottlenecks and performance issues are in such a big database we need to measure the performance of the code. So the next step of the project was to define the tools that the company is going to use to get those numbers. I define those tools in these three blog posts:

So, for those developers who will be working on this project, the steps are:

  • Read the three posts above
  • Install XHProf PHP extension in your development environment
  • Clone the XHGUI repository
  • Set the token environment variable in the XHGUI docker-compose file
  • Start the docker containers: docker-compose up -d
  • Edit build/ProfileConfig.php and set the url and token values
  • Start profiling

The tasks

As we move forward with the measurements, this project presents numerous unanswered questions that will gradually unveil themselves. Undoubtedly, new tasks will arise, while others will become clearer as we delve into performance analysis. To initiate our progress, I have established a series of initial tasks that we can begin with and modify as needed. An essential aspect of this project involves thorough documentation of our findings. My focus lies not only in identifying and optimizing application bottlenecks but also in comprehending the boundaries of the application and documenting them for future reference and decision-making in subsequent projects. So, for the developers/analysts, please dedicate time to writing what you learn, for the project managers, please assign extra time for documenting.

Looking forward to seeing what we uncover :-)

Previous Post Next Post