Case study: EuskalKultura.com. Improving the performance of a Plone Site
We published EuskalKultura.com in early June. EuskalKultura is a website for Basque people living far away from is homeland, mainly in America (both southern and northern America). It's just a Plone 2.5 with some custom products such as birthday greetings and other one or two custom archetypes with 3 or 4 fields.
But the main work before publishing it was to import the information of the old website. The old site was a PHP based website with lots and lots of items (mainly news items but also events, restaurant information, interviews, ...), all of then multilingual, mainly in Basque and Spanish. So we have to write some scripts to pull the data from the MySQL database and create the content in Plone throug invokeFactory, with the usual UnicodeDecodeErrors :)
After testing it we managed to import all the data and create all that newsitems and events, all of them properly linked thanks to LinguaPlone to have fully translated website.
Short after publishing the website, we discovered that it was consuming a lot of RAM. We hosted it in a memory limited account in a FreeBSD account at HighSpeedRails, but we have both excessive use of RAM and constant restarts. HighSpeedRails provides some scripts to control the memory consumption of your Zope applications, and restarts it when passes the established limit. We also tried to upgrade the memory limit, and put it higher, but our Plone started to eat all available memory in the hosting service.
So, we decided to reproduce the situation locally, fix it if possible and reproduce again the fixed website.
Thanks to zc.buildout, reproducing the environment was quite easy, I just had to checkout the corresponding buildout from our svn server and run ./bin/buildout. We downloaded the 1.5 GB size Data.fs and Lur wrote a python script to try to reproduce server's load parsing Apache logs. In the meantime, I wrote a harder test-plan, using also Apache logs, to use it with JMeter, taking many ideas from the Plone Performance Sprint 2007.
In our initial tests, we easily got our Plone site consume 700 MB (and growing) of RAM after running it for half an our (or less).
It was our, and client's idea, to be able to select the content featured at the home, so we used CompositePack to get it. We created a new layout for it, and write a browser view to avoid featured newsitems appear in the news listing. The code under the hood, was proved to be totally inefficient, and after some analysis of the website, we realized that the home page was automatic, I mean, our client wasn't using that feature.-
Our client, also used newsitems (all of them saved inside a folder) to create diferent kind of newsitems: short articles, featured newsitems, common newsitems, ... and wanted to show in the home page all newsitems except the ones keyworded as XXXX. Again, the code to get that newsitem was highly inneficient.
So, we decided to get rid of CompositePack for the home page, and to use AdvancedQuery to be able to make not queries to the Plone Catalog.
Those minor changes, proved to be great, because the navigation on the website improved a lot, it was quite faster, and the response time of the home paged decreased notably.
Another bottleneck was found in the keyword portlet. Our client uses keywords to tag news items and events, and wanted a way to have a list of all keywords used in news items and events, each one in the corresponding section. We fastly created a view getting those keywords from the catalog (with a catalog query and a inneficient algorithm :)), and using the strategy copied from Quills, we created a traversal adapter to have a view with all the news items keyworded with the selected one. The process of getting the list of needed keywords was slow, so we changed it with a static list of keywords, updated on daily-basis through a cronjob.
We also moved all news items stored in plain Plone folders to Plone Large Folders (based on BTrees and disabled out-of-the-box in Plone 2.5). We had no ordering requirements in the news items or events folder, because their main view is a properly configured Topic/Collection, so the change wasn't dramatic, but time-consuming. It took hours to cut-and-paste and recatalog all that news item and events.
We also took a look on all customized templates, and made some improvements to avoid common Plone problems: avoid using getObject once and again, use existing views instead of home-made-scripts, ...
Finaly, we also removed from our buildout some unneeded products, improved the packaging of our base products and put the zodb-object-cache option to the default (5000 objects).
After running again the python script and the JMeter test plan together, we found that our Plone site wasn't consuming more than 300 MB of RAM, although having the script running all the night. Incredible!!!
We made some more tests, changin the zodb-object-cache option of our Zope instance, to a more significant value (following Hector Velarde's advice). We tried with 10000, 20000, 30000 or 50000, but we didn't get any improvement neither in memory consumption nor in CPU usage according to top (OK, perhaps this is not the way to monitor a process, but that's the way we were taught ;)), so we decided to set the value to 5000.
So, we downloaded the live Data.fs on friday (thanks cron), run the news items and events cut-and-paste scripts on saturday and sunday, make all the changes on monday and re-upload the Data.fs again on monday midnight. On tuesday, we just made the last configuration changes, and got it running.
After 8 hours running, it's just consuming 340 MB of RAM, some more than in our local tests, but far away from the 700 MB-after-5-minutes, we had in the previous situation.