Performance -- a struggle

The Open Atrium application on which Wiscommunity is based is a big complex distribution of Drupal. It has a whole LOT of permissioning going on in the database, and page generation is sometimes slow and painful.

I'm trying to work around the performance issues on the site, but it's a big hairy pile of things to deal with. Open Atrium puts a lot of load on all of the web infrastructure - piles of complex database queries, very heavy use of PHP to generate the pages, and a lot of other potential issues.

The worst side effect here is not really the slow performance of the site (only when logged in - I'll get to that later) but mostly because of (ironically) one of the projects to improve the performance. One of the ways around slow Drupal performance is to add in caching. Lots of caching. One of the continual issues with this is that although it's easy to make a CMS perform by doing caching, it mostly only works well  for people who are not logged in to the site.  Why?  Well, because if you are not logged in, every page appears identical, so we can just cache that stuff up once a page is generated once. But if you are logged in, potentially every page can be different from how it is displayed to others. So - caching breaks down there.

The site uses multiple caching mechnisms.  Much of the site caching occurs using Memcache - so much of the site is cached up in a memory cache on the server.  We're also using Varnish as an external cache for the site - though in our case I think this is largely being rendered redundant because of the final caching layer - Cloudflare. The entire site lives behind a Cloudflare reverse proxy server, which mostly serves to cache up the static resources of the site (images, css files, etc.) and to serve them out from servers all around the world (though in our case for most readers, from Minneapolis).  This is fairly complex so I will not go into it in detail.

But part of the issue is that the Cloudflare reverse proxy does a number of good things, including caching and providing a secure certificate for https connections. This is all great, but it brings with it one small problem - Cloudflare expects a fairly quick response from the web server - if the web server doesn't respond quickly enough you get the dreaded 524 error page from Cloudflare, which is really annoying - you'll probably see that error page occasionally on the site when working logged in. This is the main problem I am currrently trying to work around.

I'm doing some site analysis to see "where it hurts". I think the next step will be to get the site hooked up with New Relic to try to get more insight into where the performance bottlenecks are.

Tags

Comments

Your profile picture

I actually think I made some improvement in this this afternoon - I fixed a few rules in the Varnish cache, and changed the page rules in Cloudflare, and that seems to be improving at least SOME of the  issues I've been seeing. GIst of it is that now I am getting much better caching performance in Varnish, which seems to be helping quite a lot.

Your profile picture

After some conversation with Cloudflare I discovered that the 524 timeouts that all seem incredibly premature are because I'm using Railgun with the Cloudflare account.  And the default timeout in the Railgun daemon is for some obscure reason 30 seconds  - so any time a page takes 30 seconds to load I get a timeout.  Since they are my own Railgun servers, it turns out there's a simple not-very-well-documented configuration change.  Those slow admin functions are still slow, but they no longer produce error messages.  YAY!!!!!

Also -- it APPEARS that the performance culprit here  in some cases (particularly when I log in) is the CiviCRM database I have set up in conjunction with the site.  This will in the long run be helpful, but on the other hand it's not really heavily in use at the time.  Debating about whether to disable CiviCRM for the time being and figuring out how to performance tune it later (weird, I don't usually see it affect the Drupal performance this much) or leave it in and work on it on an ongoing basis.

Add comment

Content Visibility

Public
Groups audience
Open Atrium Section
Wiscommunity Blog
randomness