July 3rd, 2008

Helpers_user: The user API that never was

I was writing an install profile for a out-of-the-box workflow install profile to be used by media organizations today, when I ran into the rote task of adding roles and permissions.

(for reference, see my last article on workflow configuration in drupal)

So being the programmer who will write something in a day which takes an hour to run to save 5 minutes, I decided to scratch a long standing itch:

The Drupal User System API.

I love the simplicity of the drupal user system, but like many things in drupal, it wants you to use forms to admin it, and the DB model is inherently tied to the controller (not good).

So I decided to contribute this to the helpers module. It provides some simple functions like:

helpers_user_help($section)
permissions_get(unknown_type $rid)
permissions_set(unknown_type $rid, unknown_type $perms = array)
role_add(string $name)
role_get(int $rid = 0, string $name = “”)
user_add_roles($uid, $roles = array)
user_delete_roles($uid, $roles = array)

Simple stuff, but AFAIK, lacking in drupal.

In addition, I also wrote a full set of simplete tests for it which can be found in the tests directory.

The patch can be found at:
http://drupal.org/node/252058
Hope others find this helpful!

-Jacob

July 1st, 2008

Solving bad IA using enterprise search (Reverse Advanced Search)

Since I started working with Apache Solr in Drupal, I’ve realized how much client money has been wasted making ill advised advanced searches. We’ve all gotten the requests for “advanced” searches and it makes any IA-god fearing developer cringe. For the 1% of users who use them, you blow tons of budget, and the result is often quite poor because the client doesn’t really know their data or their users that well.

For those of you who are unfamiliar with faceted search compare the following:

I did a search for WSXGA because I’m looking for a laptop with decent resolution on two sites.

Laptops Direct

vs.

New Egg

(click to enlarge image in new window)

The New Egg search lets me filter, so I know that I’m looking for a laptop between $750 -> $1000, I’ll get 5 results. After that filter, I’ll know what’s available, and the # per manufacturer etc.

Contrast that with an advanced search form where I have to put in all my criteria, and hope I get a result. I might also miss certain results if my vocabulary is bad, or I don’t understand that the website says “high resolution” instead of WSXGA, so I don’t select it.

I think it’s obvious to anyone why faceted search is a good thing. In my next post, I’ll be exploring why is hasn’t gotten widespread adoption, particularly in the small business / NGO sector, and how I plan to help change that.

June 30th, 2008

Similar Nodes Module Released!

I’ve just released a new module which I hope will fill a general need in the Drupal community.

This module allows you to show nodes in a view which have the same taxonomy as a node which you pass in as an argument. BUT it goes a step beyond this. By using a weighting table, one can specify the weight of each vocabulary in making the comparison.

Say you have two vocabularies on a movie site “Genre” and “Community Tags”, the latter being a free tagging taxonomy. With this module, you can give Genre a weight of 10 and Community tags a weight of 1. Which means that if I’m looking at “Meet the Parents”, I’m most likely to see a list of similar Romantic Comedies like “Sleepless in Seattle” and “French Kiss” with something like “Taxi” or “The Godfather II” coming lower in the list, because someone tagged both “Robert De niro”

See the Module Page for more information and how to use.

I’m going to be cleaning up the code a bit and providing a sample view soon, but any brave souls out there who would like to test / suggest, please go ahead!

June 29th, 2008

the t() that took down a webserver

Hey folks,

I feel compelled to announce this. Please Please Please read this post if you have done any multilingual drupal development.

Do not use t() outside of a hook or before locale gets to build its cache

I was asked to do some profiling on a colleague’s site, and I found this little doozy in the location_views module:

http://drupal.org/node/253813

The offending line is:

 define('LOCATION_VIEWS_UNKNOWN', t('unknown')); 

Okay, so what is the problem? When drupal starts up, it runs through many “bootstrappping states”:

Booting Drupal’s locale module

The last one is:



 case DRUPAL_BOOTSTRAP_FULL:
      require_once './includes/common.inc';
      _drupal_bootstrap_full();
      break;
  }

Now let’s look at the end of _drupal_bootstap_full:


// Load all enabled modules
  module_load_all();
  // Initialize the localization system.  Depends on i18n.module being loaded already.
  $locale = locale_initialize();

So the modules are getting loaded - module_load_all (they are included) before the locale module is initialized. In location_views, the offending statement is just interpreted upon inclusion because it isn’t wrapped in any hook. So it gets called before the locale_initialize() function.

This last call $locale = locale_initialize() sets to the global $locale variable to the iso language code the user is looking at the site in. If it is not set, see what happens when someone calls t():


function t($string, $args = 0) {
  global $locale;
  if (function_exists('locale') && $locale != 'en') {
    $string = locale($string);
  }

.......

Okay, so look at the first conditional. This will obviously return true every time, because locale is == to null when a module uses t() before locale is initialized. So if we dig into locale, what do we see:



function locale($string) {
  global $locale;
  static $locale_t;

  // Store database cached translations in a static var.
  if (!isset($locale_t)) {
    $cache = cache_get("locale:$locale", 'cache');

    if (!$cache) {
      locale_refresh_cache();
      $cache = cache_get("locale:$locale", 'cache');
    }
    $locale_t = unserialize($cache->data);
  }


So what happens?

We get here, because locale_t doesn’t exist yet:

$cache = cache_get("locale:$locale", 'cache');

We are trying to get a cache for “locale:”. Obviously, this does not exist. Because of this, locale says, okay, let’s refresh the cache.

The locale cache

The locale cache is a massive serialized array of strings and their matches from locales_source and locales_target. Let’s see how it is formed:


function locale_refresh_cache() {
  $languages = locale_supported_languages();

  foreach (array_keys($languages['name']) as $locale) {
    $result = db_query("SELECT s.source, t.translation, t.locale FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE t.locale = '%s' AND LENGTH(s.source) < 75", $locale);
    $t = array();
    while ($data = db_fetch_object($result)) {
      $t[$data->source] = (empty($data->translation) ? TRUE : $data->translation);
    }
    cache_set(”locale:$locale”, ‘cache’, serialize($t));
  }
}

OUCH!

This loops through each available language, and does a join selecting every string from locales source and locales target. It then builds a massive array (the bigger the translated site, the bigger the array) which eats up a huge amount of RAM - For reference, on http://www.amnesty.org, we’ve got over 10k translated strings, so we’re talking a few MB.

Then it serializes the whole thing - which uses a good amount of CPU, and then cache_set writes it to the cache table.

That’s where it gets really bad. When cache_set makes a write to the cache table. It first runs
LOCK TABLES cache;
then INSERT INTO cache…

So the cache table is effectively frozen up for anyone who wants to get data or put data into it. So think about this. If you have a 3MB locale cache array you are inserting into cache, and it takes 200ms to make that insert, any other user trying to access the site who wants to make queries to cache for variables is in line.

This creates a cascading effect where other processes which could have finished quickly are now holding up RAM in apache, waiting for access to the DB. If your server isn’t fast enough, this basically runs mysql’s process limit up to it’s max, and people start being unable to connect, the DB server gets partial inserts, deadlocks, all kinds of ugly stuff.

What’s worse, if t() is used outside of a hook, it will fire on EVERY page load. So on the site I was doing the profiling for, it uses a lot of AJAX. So every AJAX request was actually running this massive insert as well!

In xdebug, running on my sandbox, I was able to bring the avg page load time down from about 3-4seconds to 300-500ms by the simple patch referenced above.

Conclusion

  • Don’t ever run t() outside of a hook
  • Don’t ever run t() on non-static strings (if you have enough of them, this same thing will happen every time a new one appears in the system
  • Watch out for cache_sets in your application. They can be a real silent killer. Everything will work fine, but you are killing yourself performance wise. I suggest using xdebug, and if nothing else, go into cache_set, and add debug_print_backtrace(); to just see everyone who is using it.

June 28th, 2008

Media Mover Workflow needs

I’ve been playing around with Media Mover this week for Arthur with the goal of improving the general use cases for the community and giving some code review.

For those who don’t know, here is an excerpt from the project page of media mover:

Media Mover is a set of modules which allows website administrators to easily create complex file conversion processes. The core of Media Mover is the media_mover_api module which creates a set of rules allowing multiple modules to interact with a file. Media Mover can take a file emailed to an email account, turn a file attachment into an FLV file, create a new node with the file data, and then save the file on an external file storage like Amazon’s S3 all at once. And that’s just the start.

Wow! So does mm live up the hype? Of course. Arthur wrote it. :)

What I’ve been looking into the integration of Media Mover with workflow-ng and asset. Media Mover’s strengths right now are harvesting from multiple sources like ftp servers, email accounts, local stores, etc… and processing the video / audio / image via ffmpeg.

I’d say the main weakness it has is a drupal waekness, and that is how do files exist in the drupal node paradigm? Where is their meta-data stored?

There are so many different ways people do this task, generally with modules like image or video (which is basically file as a node). Or with modules like videofield, imagefield, etc which is file in files table, and a reference from a CCK field.

This kinda works, but asset provides a much more robust media integration framework, wherein an asset has metadata, it has formatting options and it can be embeded in the text body, or attached as a cck value.

The first goal of the integration is use the store hook in media mover to store the incoming media as an asset. Secondly, we’re going to build a store function to create a new node, and place the asset in a CCK field for that node.

The ultimate scenario is to create multiple assets for multiple processing instructions, so we have a folder of low res videos and a folder of high res videos, and we are automagically somehow adding the created assets to CCK fields (think I have a store selling video, and I have a “preview” CCK asset field and a “full version” CCK asset field).

There is inherent data model problems here both with asset and MM, but let’s see what we can do.

More next week when I’ve written the module, but if any Media Mover users are reading this, how is your experience with MM? What do you perceive as its lackings (if any) and what can be done to make it simpler / more intuitive for you?

-J

May 2nd, 2008

Workflow-ng is godly or flagging a dead content

As many people know, I’m really into workflow-ng.

I’ve integrated a few modules with it, and it allows for a killer amount of functionality and good (although probably complex) administrative interfaces for admins to do their own customization.

Simply put: this is my favorite module after CCK, Views and Panels, and it should be yours too..

More after the jump
Read the rest of this entry »

April 24th, 2008

Advanced Workflow Configuration for Drupal

The workflow trinity: States, Owners and Rules

Amnesty International has 400+ employees in their London office who work in various capacities from research, to advocacy, to marketing and development of the organization. Their web and press divisions (primary admins of the website) need to create stories and press releases with input from all of these employees. As a result, workflow became a very important part of this project.
Read the rest of this entry »

April 24th, 2008

Amnesty International goes Drupal

Amnesty International goes Drupal

Introduction

Amnesty International has been advocating for and protecting human rights and human rights legislation internationally for the past 46 years. Its reputation and the foundation of Amnesty sections in most countries also has made it one of the most recognizable names in the world.

This project (code name IMPACT!) was the result of more than 5 years of attempts to upgrade Amnesty’s web presence and web CRM. Their previous site was based on a very antiquated Lotus Notes backend, a hodgepodge of dreamweaver templates, and dozens of offshoot micro sites.

Here are a few of the notable development efforts that were put forth for the project. Some are in contrib already, and others are on their way.

Workflow

Amnesty International has 400+ employees in their London office who work in various capacities from research, to advocacy, to marketing and development of the organization. Their web and press divisions (primary admins of the website) need to create stories and press releases with input from all of these employees. As a result, workflow became a very important part of this project.

Modules used:

More on our workflow setup

Stay Tuned:

Upcoming articles in this series include:

  • Asset Management using the Asset module
  • alFresco integration
  • right-to-left drupal theming
  • i18n + panels and views
  • CiviCRM + i18n
  • menutrails

April 24th, 2008

Looking for an application specialist

I and some colleagues are starting a new consulting firm specializing in Drupal, specifically with a focus on install profiles, Rapid Application Design and Information Architecture. We’re still exploring different opportunities to find our niche in the drupal world, but we’ve already got a couple of contracts starting and need some help.

Specifically, I’m looking for people in the New Delhi area (that’s where I am now based), who have proven experience in the following:

  • Application configuration - This means being able to take a messy email from a client, and produce easy to maintain and well documented panels, views and content types. Must have experience with panels 1, should have experience with panels 2.
  • Light development - Most of us our heavy coders, so we don’t need a lot of heavy lifting. However, you should be able to at least create blocks, do light theming tasks (not necessarily design), and deubg stuff yourself
  • Bonus: Light Linux administration. If you can setup drupal instances / create databases, etc it saves us the trouble of doing it. :)

College degrees don’t mean anything. Just need a portfolio of work, some references and an hourly rate / # of hours per week available.

Contact me through:
My Contact Form

April 8th, 2008

Google Search Appliance / Google Mini Integration for Drupal

Google’s enterprise search technology is becoming an increasingly popular choice for IT managers to manage their intranets and pool data from multiple sources.

It provides:

  • Excellent keyword searching (obviously) based on pagerank as well as customizable weighting factors
  • Recommended links
  • Source and Date Biasing
  • Good support for different char sets
  • Incredible speed and performance
  • Support for multiple collections and frontends
  • An easy REST interface for API integration

Leveraging Google’s technology on your CMS (Drupal) gives us the following benefits:

  • Potential for more relevant results
  • Integration with 3rd party databases and sites
  • Advanced search features like synonyms, stemming and language detection

How the module works

See the Google Appliance for Drupal module page for more details on implementation.

First thing you need to do is to setup the module so it knows where to connect to:

Google Appliance Settings

At the minimum, you need:

  • Search name (this will appear on a new tab on the search screen).
  • Host Name ( the URL or IP where your GSA or Mini is located )
  • Collection (which collection you wish to search ).
  • Client ( This doesn’t matter much, it just has to be valid. This is equivilent to the “frontend” in the GSA ).

Okay, done? If you want to, enable caching (which will cache results so you don’t need to re-query for the same search within the timeout period) and set the debug level.

Now, you’ll need to tell the mini where to crawl. For this, just go to your GSA administration screen, and punch in the url of your site. For node pages, the module will add meta-tags for the following:

  • Taxonomy (Advanced search filter coming soon!)
  • Date Modified and Created (Date sorting coming soon!)
  • Author
  • Status (pub/unpub)
  • Language (if using i18n)

After installing the module, you will see a new tab on the search screen:
Google Tab

Fire off a search and see your results (drupalified).

Google Results

In addition, you can enable the recommended links block, and if you have configured key matches in drupal, they will show up in this block.

This module is still in Beta, so expect some issues. Here are a few I know of:

  • No meta tags on non-node pages, means they will be found, but won’t have the type / author, etc fields in the results
  • Does not use url() on incoming links, which means if the mini finds node/123 pages, they won’t get aliases

There are probably lots more, but hopefully, this will get people who are interested in this going and we can work on making it better.

I am available for GSA and mini consulting and custom integrations, just contact me

How To find me

Telephone: +1 510.277.0891 | Email: jacobsingh at gmail daht calm

Solution Graphics