Hey folks,
I feel compelled to announce this. Please Please Please read this post if you have done any multilingual drupal development.
Do not use t() outside of a hook or before locale gets to build its cache
I was asked to do some profiling on a colleague’s site, and I found this little doozy in the location_views module:
http://drupal.org/node/253813
The offending line is:
define('LOCATION_VIEWS_UNKNOWN', t('unknown'));
Okay, so what is the problem? When drupal starts up, it runs through many “bootstrappping states”:
Booting Drupal’s locale module
The last one is:
case DRUPAL_BOOTSTRAP_FULL:
require_once './includes/common.inc';
_drupal_bootstrap_full();
break;
}
Now let’s look at the end of _drupal_bootstap_full:
// Load all enabled modules
module_load_all();
// Initialize the localization system. Depends on i18n.module being loaded already.
$locale = locale_initialize();
So the modules are getting loaded - module_load_all (they are included) before the locale module is initialized. In location_views, the offending statement is just interpreted upon inclusion because it isn’t wrapped in any hook. So it gets called before the locale_initialize() function.
This last call $locale = locale_initialize() sets to the global $locale variable to the iso language code the user is looking at the site in. If it is not set, see what happens when someone calls t():
function t($string, $args = 0) {
global $locale;
if (function_exists('locale') && $locale != 'en') {
$string = locale($string);
}
.......
Okay, so look at the first conditional. This will obviously return true every time, because locale is == to null when a module uses t() before locale is initialized. So if we dig into locale, what do we see:
function locale($string) {
global $locale;
static $locale_t;
// Store database cached translations in a static var.
if (!isset($locale_t)) {
$cache = cache_get("locale:$locale", 'cache');
if (!$cache) {
locale_refresh_cache();
$cache = cache_get("locale:$locale", 'cache');
}
$locale_t = unserialize($cache->data);
}
So what happens?
We get here, because locale_t doesn’t exist yet:
$cache = cache_get("locale:$locale", 'cache');
We are trying to get a cache for “locale:”. Obviously, this does not exist. Because of this, locale says, okay, let’s refresh the cache.
The locale cache
The locale cache is a massive serialized array of strings and their matches from locales_source and locales_target. Let’s see how it is formed:
function locale_refresh_cache() {
$languages = locale_supported_languages();
foreach (array_keys($languages['name']) as $locale) {
$result = db_query("SELECT s.source, t.translation, t.locale FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE t.locale = '%s' AND LENGTH(s.source) < 75", $locale);
$t = array();
while ($data = db_fetch_object($result)) {
$t[$data->source] = (empty($data->translation) ? TRUE : $data->translation);
}
cache_set(”locale:$locale”, ‘cache’, serialize($t));
}
}
OUCH!
This loops through each available language, and does a join selecting every string from locales source and locales target. It then builds a massive array (the bigger the translated site, the bigger the array) which eats up a huge amount of RAM - For reference, on http://www.amnesty.org, we’ve got over 10k translated strings, so we’re talking a few MB.
Then it serializes the whole thing - which uses a good amount of CPU, and then cache_set writes it to the cache table.
That’s where it gets really bad. When cache_set makes a write to the cache table. It first runs
LOCK TABLES cache;
then INSERT INTO cache…
So the cache table is effectively frozen up for anyone who wants to get data or put data into it. So think about this. If you have a 3MB locale cache array you are inserting into cache, and it takes 200ms to make that insert, any other user trying to access the site who wants to make queries to cache for variables is in line.
This creates a cascading effect where other processes which could have finished quickly are now holding up RAM in apache, waiting for access to the DB. If your server isn’t fast enough, this basically runs mysql’s process limit up to it’s max, and people start being unable to connect, the DB server gets partial inserts, deadlocks, all kinds of ugly stuff.
What’s worse, if t() is used outside of a hook, it will fire on EVERY page load. So on the site I was doing the profiling for, it uses a lot of AJAX. So every AJAX request was actually running this massive insert as well!
In xdebug, running on my sandbox, I was able to bring the avg page load time down from about 3-4seconds to 300-500ms by the simple patch referenced above.
Conclusion
- Don’t ever run t() outside of a hook
- Don’t ever run t() on non-static strings (if you have enough of them, this same thing will happen every time a new one appears in the system
- Watch out for cache_sets in your application. They can be a real silent killer. Everything will work fine, but you are killing yourself performance wise. I suggest using xdebug, and if nothing else, go into cache_set, and add debug_print_backtrace(); to just see everyone who is using it.