November 6th, 2009

Spam yourself. Spamalot or a spamalittle with DevelMailLog

I was upgrading The Watcher module to Drupal 7 today and found myself having to test a lot of email sending. Looking around in vain for a fake email system to log emails to the disk instead of sending them out into the interwebs to risk getting called the dreaded meat product, I decided to write one using the new pluggable mail system interface in Drupal 7.

Spam!

Previous versions of this have existed in the past, but I couldn’t find anything in devel currently. Here’s how it works. If you want to save your mails locally to files:

Step 1
Install Devel
Step 2
Apply this patch (until it gets committed) – Review here: http://drupal.org/node/625062.

cd sites/all/modules/devel
curl http://drupal.org/files/issues/develmail-625062-1.patch | patch -p0

Step 3
In your settings.php file:

$conf['mail_system'] = array('default-system', 'DevelMailLog');

That’s it!

Unless you set anything else mails are saved to files/mails/$to-$subject-$datetime.mail.txt

Example
Contact | My Site

Terminal — bash — 140×50

Bonus
You can change the directory with
variable_set(’devel_debug_mail_directory’, file_directory_path() . ‘/mails’);

Or the file format
variable_set(’devel_debug_mail_file_format’, ‘%to-%subject-%datetime.mail.txt’);

Till next time spammers…

October 22nd, 2009

What can we do to make Drupal 7 faster?

Drupal 7’s major API code freeze is behind us so time to take stock of the effect of the massive API overhauls and the hotly debated new interfaces and how they effect performance. As part of the last sprint at Acquia, I was tasked with comparing the performance of Drupal 6 and Drupal 7 in similar conditions so we know how much work we all have to do before Drupal 7 is ready for release.

So how does D7 match up with D6?

Of men who have a sense of honor, more come through alive than are slain, but from those who flee comes neither glory nor any help.
Homer, The Iliad


The Legend of the Drupal release cycle
In every Drupal development and release cycle, there is a period of rampant innovation by thousands of people distributed throughout the world. They get on IRC, email lists, in the flesh, etc and just bang out a ton of great code. Then the dust settles and the code freezes. Everyone wakes up after 3 days of recovery sleep, finishes their caffeinated beverage of choice, and tries out their new super-duper-fantastic toy to see how it works. It’s full of creaky limbs and flashy lightbulbs attached to misplaced handles. It gets poked and prodded, and then, we take it around the block to see how it runs and what we need to do to harden the security and make it as fast as possible (two areas Drupal has always excelled in). Then we all pitch in to get the bug fixing and performance tuning done.

Drupal 6 was of course a major refinement over Drupal 5 without creating too many waves for developers or user interfaces. It was also a little bit slower. Drupal 7 looks sexy, has much more consistency in its APIs, a kick ass database abstraction layer, a powerful ORM in fields. However as is expected in this pre-release stage, I have found in my testing that Drupal 7 in the current stage is slower, and now is the time to focus on performance.

Disclaimer: These are very preliminary numbers using a new benchmarking setup (which is described below). Neither the methodology nor the reports are perfect, so please do you own benchmarks (I also cover that later).

Summary (Cliff notes)

  • As expected, Drupal 7 is slower as it is pre-release and more feature rich.
  • For anonymous (cached) browsing, D7 is close on /taxonomy/1 and /node/1, it is much slower on /node
  • Authenticated users browsing is about 2-3x slower
  • User operations (login, user page, logout) are about 3-4x slower

What / how are we testing- Check the lanes

Target machine: IBM T60p thinkpad (Ubuntu 9.10, 2.16Ghz Core Duo + 2GB Dual channel RAM).
Testing machine: MacBookPro 2.5ghz 4GB of RAM

Testing machine ran jmeter and the two boxes were connected via LAN cables to minimize the network effect.

Testing platform – Set up the pins

Bowling pins

To start with, we need a reproducible environment of fake data to test against. As an attempt to create such a standard, I started the NorthDrop project on Drupal.org. It is an install profile which uses devel_generate (dumy content generation module) to make fake content depending on settings provided during the install.

I got devel_generate mostly working for D7 and I backported NorthDrop to D6 so we could get two installs with almost identical types and amounts of content.

For the purposes of this test and not having someone call the society for the preservation of old and battered laptops, I put it on the “small setting” which includes:

$sample_sizes['small'] = array();
$sample_sizes['small']['nodes'] = 200;
$sample_sizes['small']['comments'] = 4; // Per node
$sample_sizes['small']['users'] = 50;
$sample_sizes['small']['terms'] = 15;
$sample_sizes['small']['vocabs'] = 3;

Testing format – The approach…

For this basic profiling test, I built three thread groups. A thread group is a set of fake users who will do the same routine (hit a few paths / submit forms) for a certain number of loops. In this case, here are my thread groups:

BowlingThreads

All three thread groups run simultaneously and the results are saved to xml files which are later views / processed.
For verification of results, I used the highly scientific eye-ball method of, “eh… that’s pretty close” after running each test 3-4 times.

Results – The Scorecard

Although I highlighted the important conclusions in the summary, here are the tables it is derived from:
Benchmark Data for Drupal 6 / Drupal 7

And here is a nice histogram built by this awesome script by the folks at Atlassian (makers of JIRA and Fisheye).

This shows the response times for Authenticated users as percentage of requests. (click to see a larger image).

Drupal 6:

Drupal6-percent-auth

Drupal 7:

Drupal7-percent-stacked

What next? – The long drive home

Profiling

We need to examine the causes in more detail. This type of basic performance testing gives us some clues as to what pages / content cause load. Next we have to open up xcache and start getting into the nitty gritty to identify what needs to change.

Improved tests, isolation tests

I’ve posted the jmx I used for testing to a new github repo for this. If I get time, I will be writing another post to outline how this jmx was built and what it takes to run it. hint: it’s really easy! . It would be great if we built tests which just test one facet at a time, and also if we profiled more write heavy ops like commenting, content creation, etc.

Automated testing

The framework is in place to automate this, especially in D7 since the northdrop profile can be installed from the CLI. Jmeter can take params, and the jmetergraph.pl program can give us a good visual. Everyone’s eventual goal is something like testbot to run after every commit or perhaps on certain patches to give us an indication of what effect a change will have on general performance. We’re just dev hours and a server farm away from getting this set-up.

Resource profiling

Looking from the outside, we just get response times. We need to also identify what is making it slow from a system level. Is it MySQL? Is it PHP? RAM or CPU? are we I/O limited in some operations? Sometimes we can go backwards in performance, but forwards in scalability. This should be accounted for.

Good bye!

bowling-pins
If you want to get the test files used for this report they are here.
A special thanks to greg_harvey and Graham Taylor for sending me a starter jmx they had built previously for functional tests. I hope this post is useful in spurring the discussion of D7 performance, please feel free to leave a comment on pajamadesign.com or on my Acquia blog.

There is still time to fix these performance issues so dive in.

August 6th, 2009

Plugin Manager in Core (part deux)

Sorry, long time no blog.

It’s been a crazy three months working on the Plugin Manager in Core project.

For those not acquainted, the plan is to make a GUI based installer / updater for Drupal modules and themes.

Available updates | dev7

We were almost done, and even had it all accessible

Then, some concerns were raised in the community about security and reliability. If you would like the US Library of Congress ref number for this discussion and the issues about Plugin Manager in D7, please contact me directly, I’ll notify you when they have finished building a computer fast enough to import them into their collection.

At any rate, here is the gist:

I, Adrian Rossouw, and probably some others are working to get something in by September 1st.

I’ve developed a specification, a backlog, and worked with Dries to finalize it’s acceptance.
Here is the something we are building:
Plugin manager for D7 code freeze spec.

I’ve also started out on a few of the issues, namely adding chmod support to FileTransfers, and moving the security sensitive operations to a separate file.

But there is a lot of other work to do, and we need all the help we can get. So if you’re interested in volunteering, comment here, or the main specification issue, email me, call me, show up at my house, whatever.

Also, come to my session at DrupalCon. I’ll also be trying to organize a BoF to talk about future plans.

Take care!
Jacob

June 25th, 2009

The death of the Drupal programmer

Okay, so that’s going a bit too far. But we’re getting ever closer to the dream module and theme updates and installs using a GUI in your browser!

Many thanks to cwgordon, Joshua Rogers, dww and especially chx for kicking some serious arse on this issue and getting us very close.

update_process2.mov (video/quicktime Object)

That’s right, in Drupal 7 you will be able to update your modules and themes without learning FTP, SSH or CVS.

Check out my latest screencast

and get involved.

Also PLEASE vote for my session at DrupalCon Paris.

June 10th, 2009

Updating modules and themes in Drupal 7

The problem: Updates in Drupal require FTP / SSH and a bit of know how

When the average Drupal site owner without ssh, cvs and other geek gadgets wants to update modules on or themes on their Drupal site, they currently have to do the following:

  1. Go update status and see the mod is out of date
  2. Take the site offline
  3. Make a backup (if they can)
  4. Know where to find the module on d.o., download the tarball
  5. Unzip the tarball
  6. Remove the current directory
  7. Use FTP to upload the new directory
  8. Run update.php

We’re trying to provide a way that users can get the same user friendliness of a package manager like Synaptic. Where updates and new installs are just a few clicks, no geek gadget belt.

I’ve entered the D7 ux fray, specifically focusing my generous amount of Acquia community time on getting a project called the Plugin Manager spruced up and into core.

For more background on the effort, see: Plugin Manager in Core (part 1).

The solution: make Drupal update like everything else.

Mozilla Firefox

Here is the issue:
Plugin Manager Part 2 : The update status UI

I’ve been working out some wireframes of how the process might look, and I wanted to share them with the planet to see what people thought of them. So without further ado:

Check out the clickable wireframes

Round 2

June 9th, 2009

Wake up and smell the coffee (through an HMAC filter)

Hey, stay out of my index!

So when I first joined Acquia, my fledgling Solr hosting service had IP based security. You, the customer could tell me what IPs you were going to connect with, and I would allow access to your search index from those IPs.

One of the first major tasks was to implement HMAC based authentication to the service to ensure against man-in-the-middle attacks and provide a way to use from any IP. Also, it is standard operating procedure for other Acquia services.

Fail first!

In the first iteration, we built something on the load balancers (which run nginx) because it provided a central point of access control, the balancers were under-utilized and we didn’t have to mess with the Solr code.

This worked okay for awhile, and was decently fast but was quite flaky as some stupid developer had the brilliant idea to implement it as python middleware with fcgi (flup). That developer was me.

Don’t fail second!

So to combat the unstable nature of the fcgi protocol, and to make things a little more efficient, I (along with help from Peter Wolanin and Douglas Hubler) rebuilt it in Java using a Servlet Filter. This was a royal pain the butt, as Java is pretty tricky when it comes to input streams and buffers.

Thankfully the results are worth it:

It’s hard to tell from this graph because of the peak, but the median stayed almost the same (blue line), and the average decreases pretty significantly (purple) as does the 90% line (yellow). Click the image to see it larger.

source=solr_nginx_access (eventtype=solr_search_request)| timechart span=2h median(request_time), perc90(request_time), avg(request_time) as avg_request_time - in the past 3 days - ip-10-251-75-227 - Splunk 3.4.8

This graph shows the standard deviation (blue) in addition to the previous numbers and describes more acutely what the previous graph suggests, that is, the previous implementation was not any slower really, but less consistent, causing some of the requests to take much longer than others.

source=solr_nginx_access (eventtype=solr_search_request)| timechart span=2h stdev(request_time), median(request_time), perc90(request_time), avg(request_time) as avg_request_time - in the past 3 days - ip-10-251-75-227 - Splunk 3.4.8

So there you have, Acquia Search is both secure and fast and now 200% more reliably fast :)

March 17th, 2009

Acquia Search is rocking!

Just want to make a quick update and say that at long last my search project is on the internets and getting decent uptake.

Peter Wolanin and I presented at DrupalCon (don’t laugh too hard at me).

I think the reception so far has been great, and the servers have been champs :) We’re getting more and more signups every day.

One really cool one is Bryan from the CMS Report:
http://cmsreport.com/search/apachesolr_search/Drupal%20Search

If any of you out there want to find out how search can change how you build your sites and bring more people together with pages they need. Just click this link now:

http://acquia.com/products-services/acquia-search

A lot of people are excited but worried about trying out the free beta because we haven’t released any pricing, and something this cool will certainly cost more than a few pesos.

Well, fear not Drupalers, we’ve heard your call and are working on releasing some preliminary pricing soon. We wanted to wait until the beta got rolling a bit before we did this, but we want people to know we have a commitment to making this technology available to as wide an audience as possible. Stay tuned to my blog and/or the planet/acquia.com for more updates on this.

In the meantime, signup for the beta, it really only takes 15 minutes to setup and won’t break your site or lock you in at all. Please give us feedback! We know there are a few usability hiccups still in the signup process, and we’d love your input so we can fix ‘em. Thanks!

February 10th, 2009

The private lives of public IPs and EC2 security groups

As many of you know, I’m working on a Hosted Search product at Acquia. We’re building a pretty cool page where you can get some analytics on your search index and what people are searching for. Here is the deets on Dries’s site (hope he doesn’t mind ;0 )

Path Finder
Uploaded with plasq’s Skitch!

For this, we’re using Splunk which is a tad more pricey than I’d like, but a really amazing tool. Basically, it is grep + awk + a kilo of coke + a dozen redbulls + a Ferrari Testerosa + the same HGH A-Rod has been chewing. I’ll write more about it at some point, but this screen shot should give you an idea:

Path Finder
Uploaded with plasq’s Skitch!

Anyway, we use Splunk’s API to grab data into acquia.com and show the page above. The page was taking 10 seconds to load… I was stumped. Splunk seemed so fast, a couple seconds is reasonable for loading a report from millions of records, but 10 seconds was pretty extreme.

Eventually we discovered it was not Splunk at all, but a separate call in our code to a webservice (Call it Info Server) in EC2 which was being firewalled by Amazon. This caused the request to sit there for 10 seconds, and then timeout.

Here’s how security groups work:

I’ve got 2 servers:
Web Server – Serves static files (:80 and :443) and passes tough stuff to app server
App Server – serves requests back up to web server on :8080

Web Server needs to be able to access App server to push proxied requests through.

In EC2, each server has 1(or more) security groups. A security group is a list of access rights. These can be by Port & IP Range or they can be references to other groups. (wtf?)

Yeah, so the rule for the web server would probably be something like:
IP:any Port:80
IP:any Port:443
IP:111.111.111.111/24 Port:10000 (maybe some admin port for a certain location to access)

For the App Server, we don’t want 8080 world readable. We also don’t know the IP of the web server because this is elastic baby, servers can’t stand still. That’s why we give group permissions. So it looks like:

Group: Web Server

Which means any server launched on your account with the security group “Web Server” will have total access to any server launched with the security group “App Server”. Got it?

If not, here is an FBI style blackout picture which might make it more clear:

Path Finder
Uploaded with plasq’s Skitch!

In our case, we had a problem because we were referencing the external IP of our server(Info Server). See in the depths of the Amazon, each machine has a public IP and a private IP. So when you look for infoserver.acquia.com (made up, btw) it will resolve to 74.x.x.x When you try to look for ec2-10-45-123-41.compute.aws…. it will resolve to 10.24.134.41 and both point to the same place. The difference of course is that the security group settings only apply when you are using the internal IP even if both servers are inside the cloud

Hope you’ve been saved some pain.

Please come checkout Peter Wolanin and I as we present the future of Drupal search (we hope) at DrupalCon!

January 14th, 2009

Making Module Installation Easy for Acquia Search

Jeff Noyes (our Simplicity Guru), Linea Rowe, Peter Wolanin and myself sat down to discuss how the install process for our Hosted Search Service would look (yes, we’re getting close – Private Beta is out in two weeks)! Typically, when you have a faceted search engine, there is a set of filters on the left and search results on the right, with the sorting links generally horizontally aligned somewhere near the search box.

Here are a few examples from around the web:

Newegg.com - 15
Uploaded with plasq’s Skitch!
stuff : Clearance : Target Search Results
Uploaded with plasq’s Skitch!
pancakes, Books, DVDs Movies items on eBay.com
Uploaded with plasq’s Skitch!

And here is our current implementation:

Search | Dries Buytaert
Uploaded with plasq’s Skitch!

Here is the same shot, but broken down in “drupalish”

Search | Dries Buytaert
Uploaded with plasq’s Skitch!

I think it works okay, but we’re concerned that when people enable the module, they will have a hard time getting this together. Here is a series of screen shots of a user, enabling and setting up the module:

Modules | ad
Uploaded with plasq’s Skitch!

This part is simple (if you use Acquia’s hosted search), you just enable one module and you are done configuring the connection to Solr.

However, you standard search ends up looking like this:

Search | ad
Uploaded with plasq’s Skitch!

To get all the nice sorting and facet filters, you need to know (somehow) to go to admin/build/blocks and drag the ApacheSolr: blocks into regions like this:

Blocks | ad
Uploaded with plasq’s Skitch!

So what do people think? Should we just enable a few blocks “out of the box” and hope you are using garland and have a region named “left” or “left-sidebar”? If so, which blocks? Alternately, how can we provide a good workflow for people to know they need to do that extra step to setup their search. The other option Jeff suggested (which is most usable) is to have one block, where you can select what filters you want in it. The downside is that the user loses flexibility about where to but filters (maybe they want sorting on the right, etc).

I’d like to get some feedback on:

A). How to make this process so simple that it really is just checking that one box on the modules page and letting cron run and it looks great for 90% of users.

B). What the default blocks to enable are, and where they should be on the screen

C). How do we address this problem of multi-step installs which want to setup blocks in a more usable way for newbies?

See ya!
jacob

November 28th, 2008

What could search look like on d.o. and g.d.o

Robert Douglas, Peter Wolanin and I are scheming up what we hope to be a jaw dropping presentation of ApacheSolr + Drupal integration at DrupalCon DC. We’re going to show a prototype of d.o. and g.d.o hooked up the Apache Solr search server. We all know that d.o. and g.d.o. are notoriously hard to search through.

For instance, take this query:

http://drupal.org/search/node/views (searching for views).

Search | drupal.org
Uploaded with plasq’s Skitch!

Read the rest of this entry »

How To find me

Telephone: +1 510.277.0891 | Email: jacobsingh at gmail daht calm

Solution Graphics