<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Grabbing Hanuman&#039;s Long Tail - Jacob Singh online &#187; Solr</title>
	<atom:link href="http://pajamadesign.com/category/solr/feed/" rel="self" type="application/rss+xml" />
	<link>http://pajamadesign.com</link>
	<description>Just because something doesn&#039;t do what you planned it to do doesn&#039;t mean it&#039;s useless.   - Thomas A. Edison</description>
	<lastBuildDate>Fri, 06 Nov 2009 08:34:45 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Wake up and smell the coffee (through an HMAC filter)</title>
		<link>http://pajamadesign.com/2009/06/09/wake-up-and-smell-the-coffee-through-an-hmac-filter/</link>
		<comments>http://pajamadesign.com/2009/06/09/wake-up-and-smell-the-coffee-through-an-hmac-filter/#comments</comments>
		<pubDate>Tue, 09 Jun 2009 07:47:44 +0000</pubDate>
		<dc:creator>Jacob Singh</dc:creator>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[acquia]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[splunk]]></category>

		<guid isPermaLink="false">http://pajamadesign.com/?p=58</guid>
		<description><![CDATA[ Hey, stay out of my index! 
So when I first joined Acquia, my fledgling Solr hosting service had IP based security.  You, the customer could tell me what IPs you were going to connect with, and I would allow access to your search index from those IPs.
One of the first major tasks was [...]]]></description>
			<content:encoded><![CDATA[<h3> Hey, stay out of my index! </h3>
<p>So when I first joined Acquia, my fledgling Solr hosting service had IP based security.  You, the customer could tell me what IPs you were going to connect with, and I would allow access to your search index from those IPs.</p>
<p>One of the first major tasks was to implement HMAC based authentication to the service to ensure against man-in-the-middle attacks and provide a way to use from any IP.  Also, it is standard operating procedure for other Acquia services.</p>
<h3> Fail first! </h3>
<p>In the first iteration, we built something on the load balancers (which run nginx) because it provided a central point of access control, the balancers were under-utilized and we didn&#8217;t have to mess with the Solr code.</p>
<p>This worked okay for awhile, and was decently fast but was quite flaky as some stupid developer had the brilliant idea to implement it as python middleware with fcgi (flup). That developer was me.</p>
<h3> Don&#8217;t fail second!</h3>
<p>So to combat the unstable nature of the fcgi protocol, and to make things a little more efficient, I (along with help from Peter Wolanin and Douglas Hubler) rebuilt it in Java using a Servlet Filter.  This was a royal pain the butt, as Java is pretty tricky when it comes to input streams and buffers.</p>
<p>Thankfully the results are worth it:</p>
<p>It&#8217;s hard to tell from this graph because of the peak, but the median stayed almost the same (blue line), and the average decreases pretty significantly (purple) as does the 90% line (yellow). Click the image to see it larger.</p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/b1mx4/source-solr-nginx-access-eventtype-solr-search-request-timechart-span-2h-median-request-time-perc90-request-time-avg-request-time-as-avg-request-time-in-the-past-3-days-ip-10-251-75-227-splunk-3.4.8"><img src="http://img.skitch.com/20090609-xeh6qw1t43hdk8kc162b5fjr7m.preview.jpg" alt="source=solr_nginx_access (eventtype=solr_search_request)| timechart span=2h median(request_time), perc90(request_time), avg(request_time) as avg_request_time - in the past 3 days - ip-10-251-75-227 - Splunk 3.4.8" /></a></div>
<p>This graph shows the standard deviation (blue) in addition to the previous numbers and describes more acutely what the previous graph suggests, that is, the previous implementation was not any slower really, but less consistent, causing some of the requests to take much longer than others.</p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/b1m17/source-solr-nginx-access-eventtype-solr-search-request-timechart-span-2h-stdev-request-time-median-request-time-perc90-request-time-avg-request-time-as-avg-request-time-in-the-past-3-days-ip-10-251-75-227-splunk-3.4.8"><img src="http://img.skitch.com/20090609-jd61y5gjpew3m5159ypkcyeth5.preview.jpg" alt="source=solr_nginx_access (eventtype=solr_search_request)| timechart span=2h stdev(request_time), median(request_time), perc90(request_time), avg(request_time) as avg_request_time - in the past 3 days - ip-10-251-75-227 - Splunk 3.4.8" /></a></div>
<p>So there you have, Acquia Search is both secure and fast and now 200% more reliably fast <img src='http://pajamadesign.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://pajamadesign.com/2009/06/09/wake-up-and-smell-the-coffee-through-an-hmac-filter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Acquia Search is rocking!</title>
		<link>http://pajamadesign.com/2009/03/17/acquia-search-is-rocking/</link>
		<comments>http://pajamadesign.com/2009/03/17/acquia-search-is-rocking/#comments</comments>
		<pubDate>Tue, 17 Mar 2009 08:42:04 +0000</pubDate>
		<dc:creator>Jacob Singh</dc:creator>
				<category><![CDATA[Drupal]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[acquia]]></category>

		<guid isPermaLink="false">http://pajamadesign.com/?p=54</guid>
		<description><![CDATA[Just want to make a quick update and say that at long last my search project is on the internets and getting decent uptake.
Peter Wolanin and I presented at DrupalCon (don&#8217;t laugh too hard at me).
I think the reception so far has been great, and the servers have been champs    We&#8217;re getting [...]]]></description>
			<content:encoded><![CDATA[<p>Just want to make a quick update and say that at long last my <a href="http://acquia.com/products-services/acquia-search">search project</a> is on the internets and getting decent uptake.</p>
<p>Peter Wolanin and I <a href="http://www.archive.org/details/DrupalconDc2009-MoreThanSearchHowApachesolrChangesTheWayYouBuild">presented at DrupalCon</a> (don&#8217;t laugh too hard at me).</p>
<p>I think the reception so far has been great, and the servers have been champs <img src='http://pajamadesign.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   We&#8217;re getting more and more signups every day.</p>
<p>One really cool one is Bryan from the CMS Report:<br />
<a href="http://cmsreport.com/search/apachesolr_search/Drupal%20Search">http://cmsreport.com/search/apachesolr_search/Drupal%20Search</a></p>
<p>If any of you out there want to find out how search can change how you build your sites and bring more people together with pages they need.  Just click this link now:</p>
<p><a href="http://acquia.com/products-services/acquia-search">http://acquia.com/products-services/acquia-search</a></p>
<p>A lot of people are excited but worried about trying out the <a href="http://acquia.com/products-services/acquia-search">free beta</a> because we haven&#8217;t released any pricing, and something this cool will certainly cost more than a few pesos. </p>
<p>Well, fear not Drupalers, we&#8217;ve heard your call and are working on releasing some preliminary pricing soon.  We wanted to wait until the beta got rolling a bit before we did this, but we want people to know we have a commitment to making this technology available to as wide an audience as possible.  Stay tuned to my blog and/or the planet/acquia.com for more updates on this.  </p>
<p>In the meantime, signup for the beta, it really only takes 15 minutes to setup and won&#8217;t break your site or lock you in at all.  Please give us feedback! We know there are a few usability hiccups still in the signup process, and we&#8217;d love your input so we can fix &#8216;em.  Thanks!</p>
]]></content:encoded>
			<wfw:commentRss>http://pajamadesign.com/2009/03/17/acquia-search-is-rocking/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>The private lives of public IPs and EC2 security groups</title>
		<link>http://pajamadesign.com/2009/02/10/the-private-lives-of-public-ips-and-ec2-security-groups/</link>
		<comments>http://pajamadesign.com/2009/02/10/the-private-lives-of-public-ips-and-ec2-security-groups/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 06:22:43 +0000</pubDate>
		<dc:creator>Jacob Singh</dc:creator>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[acquia]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[splunk]]></category>

		<guid isPermaLink="false">http://pajamadesign.com/?p=53</guid>
		<description><![CDATA[As many of you know, I&#8217;m working on a  Hosted Search product at Acquia.  We&#8217;re building a pretty cool page where you can get some analytics on your search index and what people are searching for.  Here is the deets on Dries&#8217;s site (hope he doesn&#8217;t mind ;0 )
Uploaded with plasq&#8217;s Skitch!
For [...]]]></description>
			<content:encoded><![CDATA[<p>As many of you know, I&#8217;m working on a <a href="http://www.lullabot.com/audiocast/podcast-69-solr-robert-douglass-jacob-singh"> Hosted Search product at </a><a href="http://acquia.com">Acquia.</a>  We&#8217;re building a pretty cool page where you can get some analytics on your search index and what people are searching for.  Here is the deets on Dries&#8217;s site (hope he doesn&#8217;t mind ;0 )</p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/brr6c/path-finder"><img src="http://img.skitch.com/20090210-thaj9gubswr9md7bdmr9dha6f2.preview.jpg" alt="Path Finder" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<p>For this, we&#8217;re using <a href="http://splunk.com"> Splunk </a> which is a tad more pricey than I&#8217;d like, but a really amazing tool.  Basically, it is grep + awk + a kilo of coke + a dozen redbulls + a Ferrari Testerosa + the same HGH A-Rod has been chewing.  I&#8217;ll write more about it at some point, but this screen shot should give you an idea:</p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/brr6j/path-finder"><img src="http://img.skitch.com/20090210-xqgddf3f27n1mm8kebq627qg5n.preview.jpg" alt="Path Finder" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<p>Anyway, we use Splunk&#8217;s API to grab data into acquia.com and show the page above.  The page was taking 10 seconds to load&#8230; I was stumped.  Splunk seemed so fast, a couple seconds is reasonable for loading a report from millions of records, but 10 seconds was pretty extreme.</p>
<p>Eventually we discovered it was not Splunk at all, but a separate call in our code to a webservice (Call it Info Server) in EC2 which was being firewalled by Amazon.  This caused the request to sit there for 10 seconds, and then timeout.</p>
<p>Here&#8217;s how security groups work:</p>
<p>I&#8217;ve got 2 servers:<br />
Web Server &#8211; Serves static files (:80 and :443) and passes tough stuff to app server<br />
App Server &#8211; serves requests back up to web server on :8080</p>
<p>Web Server needs to be able to access App server to push proxied requests through.</p>
<p>In EC2, each server has 1(or more) security groups.  A security group is a list of access rights.  These can be by Port &#038; IP Range or they can be references to other groups. (wtf?)</p>
<p>Yeah, so the rule for the web server would probably be something like:<br />
IP:any Port:80<br />
IP:any Port:443<br />
IP:111.111.111.111/24 Port:10000 (maybe some admin port for a certain location to access)</p>
<p>For the App Server, we don&#8217;t want 8080 world readable.  We also don&#8217;t know the IP of the web server because this is elastic baby, servers can&#8217;t stand still.  That&#8217;s why we give group permissions.  So it looks like:</p>
<p>Group: Web Server</p>
<p>Which means any server launched on your account with the security group &#8220;Web Server&#8221; will have total access to any server launched with the security group &#8220;App Server&#8221;.  Got it?</p>
<p>If not, here is an FBI style blackout picture which might make it more clear:</p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/brr61/path-finder"><img src="http://img.skitch.com/20090210-x589gk7mfbia3hm1k6dnqmjyj1.preview.jpg" alt="Path Finder" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<p>In our case, we had a problem because we were referencing the external IP of our server(Info Server).   See in the depths of the Amazon, each machine has a <em>public IP</em> and a <em>private IP</em>.  So when you look for infoserver.acquia.com (made up, btw) it will resolve to 74.x.x.x When you try to look for ec2-10-45-123-41.compute.aws&#8230;. it will resolve to 10.24.134.41 and both point to the same place.  The difference of course is that the security group settings only apply when you are using the internal IP <strong> even if both servers are inside the cloud</strong> </p>
<p>Hope you&#8217;ve been saved some pain.</p>
<p><a href="http://dc2009.drupalcon.org/session/more-search-how-apachesolr-changes-way-you-build-sites">Please come checkout Peter Wolanin and I as we present the future of Drupal search (we hope) at DrupalCon!</a></p>
]]></content:encoded>
			<wfw:commentRss>http://pajamadesign.com/2009/02/10/the-private-lives-of-public-ips-and-ec2-security-groups/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Making Module Installation Easy for Acquia Search</title>
		<link>http://pajamadesign.com/2009/01/14/acquia-search-installation/</link>
		<comments>http://pajamadesign.com/2009/01/14/acquia-search-installation/#comments</comments>
		<pubDate>Tue, 13 Jan 2009 22:55:10 +0000</pubDate>
		<dc:creator>Jacob Singh</dc:creator>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[acquia]]></category>
		<category><![CDATA[acquia drupal planet]]></category>
		<category><![CDATA[Drupal]]></category>
		<category><![CDATA[usability]]></category>

		<guid isPermaLink="false">http://pajamadesign.com/?p=52</guid>
		<description><![CDATA[Jeff Noyes (our Simplicity Guru), Linea Rowe, Peter Wolanin and myself sat down to discuss how the install process for our Hosted Search Service would look (yes, we&#8217;re getting close &#8211; Private Beta is out in two weeks)!  Typically, when you have a faceted search engine, there is a set of filters on the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://acquia.com/about-us/team">Jeff Noyes (our Simplicity Guru), Linea Rowe, Peter Wolanin and myself</a> sat down to discuss how the install process for our <a href="http://acquia.com/blog/hosted-solr-site-search-for-drupal-is-on-the-way">Hosted Search Service</a> would look (yes, we&#8217;re getting close &#8211; Private Beta is out in two weeks)!  Typically, when you have a faceted search engine, there is a set of filters on the left and search results on the right, with the sorting links generally horizontally aligned somewhere near the search box.</p>
<p>Here are a few examples from around the web:</p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/bbdbp/newegg.com-15-15.6-laptops-notebooks-laptops-notebooks-netbooks"><img src="http://img.skitch.com/20090113-8yw2urj2iwmqj28rkip3atu32i.preview.jpg" alt="Newegg.com - 15" - 15.6", Laptops / Notebooks, Laptops, Notebooks &#038; Netbooks" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/bbdb4/stuff-clearance-target-search-results"><img src="http://img.skitch.com/20090113-8nxb298qfrx7ua1sycgceb5uwf.preview.jpg" alt="stuff : Clearance : Target Search Results" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/bbdb6/pancakes-books-dvds-movies-items-on-ebay.com"><img src="http://img.skitch.com/20090113-f72wr8258pbs4nr4ri646sy7er.preview.jpg" alt="pancakes, Books, DVDs Movies items on eBay.com" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<p>And here is our current implementation:</p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/bbdnb/search-dries-buytaert"><img src="http://img.skitch.com/20090113-c1qp2gbj6sn63sn46dj1mtq1gx.preview.jpg" alt="Search | Dries Buytaert" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<p>Here is the same shot, but broken down in &#8220;drupalish&#8221;</p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/bbdn1/search-dries-buytaert"><img src="http://img.skitch.com/20090113-pbudf8gqx59heje6bpcmkf97j6.preview.jpg" alt="Search | Dries Buytaert" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<p>I think it works okay, but we&#8217;re concerned that when people enable the module, they will have a hard time getting this together.  Here is a series of screen shots of a user, enabling and setting up the module:</p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/bbddr/modules-ad"><img src="http://img.skitch.com/20090113-11f8h5i8tn89bbqu27f51ektjy.preview.jpg" alt="Modules | ad" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<p>This part is simple (if you use Acquia&#8217;s hosted search), you just enable one module and you are done configuring the connection to Solr.  </p>
<p>However, you standard search ends up looking like this:</p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/bbdda/search-ad"><img src="http://img.skitch.com/20090113-q2tnn6tgy466c4rp5yc1w8aj43.preview.jpg" alt="Search | ad" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<p>To get all the nice sorting and facet filters, you need to know (somehow) to go to admin/build/blocks and drag the ApacheSolr: blocks into regions like this:</p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/bbddc/blocks-ad"><img src="http://img.skitch.com/20090113-na4yb27rsgb5m184kdmrhrmras.preview.jpg" alt="Blocks | ad" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<p>So what do people think?  Should we just enable a few blocks &#8220;out of the box&#8221; and hope you are using garland and have a region named &#8220;left&#8221; or &#8220;left-sidebar&#8221;?  If so, which blocks?  Alternately, how can we provide a good workflow for people to know they need to do that extra step to setup their search.  The other option Jeff suggested (which is most usable) is to have one block, where you can select what filters you want in it.  The downside is that the user loses flexibility about where to but filters (maybe they want sorting on the right, etc).</p>
<p>I&#8217;d like to get some feedback on:</p>
<p>A). How to make this process so simple that it really is just checking that one box on the modules page and letting cron run and it looks great for 90% of users.</p>
<p>B). What the default blocks to enable are, and where they should be on the screen</p>
<p>C).  How do we address this problem of multi-step installs which want to setup blocks in a more usable way for newbies?</p>
<p>See ya!<br />
jacob</p>
]]></content:encoded>
			<wfw:commentRss>http://pajamadesign.com/2009/01/14/acquia-search-installation/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>What could search look like on d.o. and g.d.o</title>
		<link>http://pajamadesign.com/2008/11/28/what-could-search-look-like-on-do-and-gdo/</link>
		<comments>http://pajamadesign.com/2008/11/28/what-could-search-look-like-on-do-and-gdo/#comments</comments>
		<pubDate>Fri, 28 Nov 2008 04:34:29 +0000</pubDate>
		<dc:creator>Jacob Singh</dc:creator>
				<category><![CDATA[Drupal]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[acquia]]></category>

		<guid isPermaLink="false">http://pajamadesign.com/?p=50</guid>
		<description><![CDATA[Robert Douglas, Peter Wolanin and I are scheming up what we hope to be a jaw dropping presentation of ApacheSolr + Drupal integration at DrupalCon DC.  We&#8217;re going to show a prototype of d.o. and g.d.o hooked up the Apache Solr search server.   We all know that d.o. and g.d.o. are notoriously [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://drupal.org/user/5449">Robert Douglas</a>, <a href="http://drupal.org/user/49851">Peter Wolanin</a> and I are scheming up what we hope to be a <a href="http://dc2009.drupalcon.org/session/more-search-how-apachesolr-changes-way-you-build-sites">jaw dropping presentation</a> of ApacheSolr + Drupal integration at DrupalCon DC.  We&#8217;re going to show a prototype of d.o. and g.d.o hooked up the <a href="http://lucene.apache.org/solr/">Apache Solr</a> search server.   We all know that d.o. and g.d.o. are notoriously hard to search through.  </p>
<p>For instance, take this query:</p>
<p><a href="http://drupal.org/search/node/views">http://drupal.org/search/node/views</a> (searching for views).</p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/h75y/search-drupal.org"><img src="http://img.skitch.com/20081128-m1ha4mdsrbje2t4pth5scypr7i.preview.jpg" alt="Search | drupal.org" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<p><span id="more-50"></span><br />
Umm&#8230; I would expect to be able to get to http://drupal.org/project/views here, but somehow this isn&#8217;t happening.  Okay, so I know I can always use the hidden advanced search and check off &#8220;project&#8221;.  Okay, now I&#8217;ve submitted:</p>
<p><a href="http://drupal.org/search/node/views+type%3Aproject_project">http://drupal.org/search/node/views+type%3Aproject_project</a></p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/h75b/search-drupal.org"><img src="http://img.skitch.com/20081128-t12cb73nei7y2petswcwuiw9kr.preview.jpg" alt="Search | drupal.org" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<p>Okay, so it is understandable that d.o. is a massive organism unto itself, and it will be challenging to get any search engine (and the Drupal core one isn&#8217;t shabby at all) to be able to think for users.  However, we&#8217;ll be proposing a new way to do a user interface for your site with faceted search (here&#8217;s an example):</p>
<p><a href="http://robshouse.net/search/apachesolr_search/drupal">http://robshouse.net/search/apachesolr_search/drupal</a></p>
<div class="thumbnail"><a href="http://skitch.com/jacobsingh/h75k/search-robshouse.net"><img src="http://img.skitch.com/20081128-xfgn5ku3raqdi52pp8adj68qtk.preview.jpg" alt="Search | RobsHouse.net" /></a><br /><span style="font-family: Lucida Grande, Trebuchet, sans-serif, Helvetica, Arial; font-size: 10px; color: #808080">Uploaded with <a href="http://plasq.com/">plasq</a>&#8217;s <a href="http://skitch.com">Skitch</a>!</span></div>
<p>Facets allow us to see what we will get before we click, so we can drill down into what we are looking for AND know there is something there before we go there.  So we will have a facet block for node type, which will show something like Projects(12), so the user knows there are 12 projects which were found for the keyword views.  After that, perhaps there will be another facet for Usage or download count which will allow them to further drill down to what they are after.</p>
<p>I hope this presentation will launch a discussion around search based information architecture in our community, and I sincerely hope you all attend.</p>
<p> Please stop for a moment to read the proposal (and vote on it if you think it will be as awesome as we do):</p>
<p><a href="http://dc2009.drupalcon.org/session/more-search-how-apachesolr-changes-way-you-build-sites">http://dc2009.drupalcon.org/session/more-search-how-apachesolr-changes-way-you-build-sites</a></p>
]]></content:encoded>
			<wfw:commentRss>http://pajamadesign.com/2008/11/28/what-could-search-look-like-on-do-and-gdo/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>250k nodes working to save our habitat</title>
		<link>http://pajamadesign.com/2008/08/12/250k-nodes-working-to-save-our-habitat/</link>
		<comments>http://pajamadesign.com/2008/08/12/250k-nodes-working-to-save-our-habitat/#comments</comments>
		<pubDate>Tue, 12 Aug 2008 17:45:44 +0000</pubDate>
		<dc:creator>Jacob Singh</dc:creator>
				<category><![CDATA[Drupal]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[training]]></category>
		<category><![CDATA[Environmentalism]]></category>
		<category><![CDATA[India]]></category>
		<category><![CDATA[Srijan]]></category>

		<guid isPermaLink="false">http://pajamadesign.com/?p=36</guid>
		<description><![CDATA[I had the privilege of working with <a href="http://srijan.in">Srijan Technologies</a> this spring on Drupal and Agile Development trainings for their team and helping them get <strong>Apache Solr</strong> kicking for the <a href="http://www.indiaenvironmentportal.org.in"> India Environmental Portal </a> which just launched last week to much <a href="#press">fanfare</a>.

]]></description>
			<content:encoded><![CDATA[<p>I had the privilege of working with <a href="http://srijan.in">Srijan Technologies</a> this spring on Drupal and Agile Development trainings for their team and helping them get <strong>Apache Solr</strong> kicking for the <a href="http://www.indiaenvironmentportal.org.in"> India Environmental Portal </a> which just launched last week to much <a href="#press">fanfare</a>.</p>
<h3>The site is based on Drupal 5 and features:</h3>
<ul>
<li> Over 250,000 consciousness&#8211;changing-content-laden nodes </li>
<li> 7,500+ terms that would give any librarian a serious biblio-complex</li>
<li> Feeds from hundreds of in-house and our of house publications </li>
<li> A killer example of Apache Solr being amazing: <a href="http://www.indiaenvironmentportal.org.in/search/apachesolr_search/water%20tid%3A1639"> Apache Solr search for &#8220;water&#8221; </a>
</li>
<li> And probably a lot more.  You should read more about it at Srijan&#8217;s site: <a href="http://blogs.srijan.in/2008/08/10/a-year-to-live-a-powerful-way-of-living-your-life/">http://blogs.srijan.in/2008/08/10/a-year-to-live-a-powerful-way-of-living-your-life/</a> (A full fledged case study is in the offing) </li>
</ul>
<h3> What is an environmental portal? </h3>
<p>India Environmental Portal is an initiative of the <a href="http://www.cseindia.org/">Center for Society and the Environment</a>, one of India&#8217;s oldest and most revered environmental NGOs.  Here is an excerpt from their about page:</p>
<blockquote><p>
This is the age of environment. And to make a difference, in our lifestyle, in policy and in practice we need information, which is accessible, well categorized and easy to use. The India Environment Portal is our effort to put together a one-stop shop of all that you want to know about environment and development issues. Its politics is overt: to build open, networked and informed societies, who can use knowledge to make change&#8230;..</p>
<p>This is a people’s portal. It will actively collate and exchange data, research and information from people working in the field, in campaigns, in scientific institutions, in research and in industry.
</p></blockquote>
<p>I recommend checking out the about page to find out more about this exciting resource:<br />
<a href="http://www.indiaenvironmentportal.org.in/content/about-us">http://www.indiaenvironmentportal.org.in/content/about-us</a></p>
<p>Congratulations to Ipsita, Rahul, Syed, Shashank, and all the rest of the excellent team at Srijan!</p>
<p>And a special thanks to drunken monkey and <a href="http://robshouse.net">Robert Douglass</a> for their work to integrate Drupal and Apache Solr.</p>
<p><a name="press"></a></p>
<h3> Some press about the launch</h3>
<p>http://www.financialexpress.com/news/National-portal-on-environment/347761/</p>
<p>http://in.news.yahoo.com/43/20080811/812/tnl-national-online-portal-on-environmen.html</p>
<p>http://www.thehindu.com/holnus/002200808112067.htm</p>
<p>http://www.ecoearth.info/shared/reader/welcome.aspx?linkid=104697&#038;keybold=climate%20forest%20environment%20warming</p>
<p>http://www.indiaprwire.com/businessnews/20080811/32685.htm</p>
<p>http://alootechie.net/content/indiaenvironmentportalorgin-launched-provide-environmental-information</p>
]]></content:encoded>
			<wfw:commentRss>http://pajamadesign.com/2008/08/12/250k-nodes-working-to-save-our-habitat/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Solving bad IA using enterprise search (Reverse Advanced Search)</title>
		<link>http://pajamadesign.com/2008/07/01/solving-bad-ia-using-enterprise-search-reverse-advanced-search/</link>
		<comments>http://pajamadesign.com/2008/07/01/solving-bad-ia-using-enterprise-search-reverse-advanced-search/#comments</comments>
		<pubDate>Tue, 01 Jul 2008 05:32:41 +0000</pubDate>
		<dc:creator>Jacob Singh</dc:creator>
				<category><![CDATA[Drupal]]></category>
		<category><![CDATA[Professional]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[faceted search]]></category>

		<guid isPermaLink="false">http://pajamadesign.com/?p=30</guid>
		<description><![CDATA[Since I started working with Apache Solr in Drupal, I&#8217;ve realized how much client money has been wasted making ill advised advanced searches.  We&#8217;ve all gotten the requests for &#8220;advanced&#8221; searches and it makes any IA-god fearing developer cringe.  For the 1% of users who use them, you blow tons of budget, and [...]]]></description>
			<content:encoded><![CDATA[<p>Since I started working with Apache Solr in Drupal, I&#8217;ve realized how much client money has been wasted making ill advised advanced searches.  We&#8217;ve all gotten the requests for &#8220;advanced&#8221; searches and it makes any IA-god fearing developer cringe.  For the 1% of users who use them, you blow tons of budget, and the result is often quite poor because the client doesn&#8217;t really know their data or their users that well.</p>
<p>For those of you who are unfamiliar with faceted search compare the following:</p>
<p>I did a search for WSXGA because I&#8217;m looking for a laptop with decent resolution on two sites.</p>
<p><a href="http://www.laptopsdirect.co.uk/asp/searchSite.asp?mode=2&#038;keywords=WSXGA&#038;submit=">Laptops Direct</a></p>
<p>vs.</p>
<p><a href="http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&#038;DEPA=0&#038;Description=WSXGA&#038;x=0&#038;y=0"> New Egg </a></p>
<p>(click to enlarge image in new window)<br />
<a href='http://pajamadesign.com/wp-content/uploads/2008/06/newegg_ld_compare.jpeg' target="_blank"><img src="http://pajamadesign.com/wp-content/uploads/2008/06/newegg_ld_compare.jpeg" alt="" title="newegg_ld_compare"  width="300px" class="aligncenter wp-image-31" /></a></p>
<p>The New Egg search lets me filter, so I know that I&#8217;m looking for a laptop between $750 -> $1000, I&#8217;ll get 5 results.  After that filter, I&#8217;ll know what&#8217;s available, and the # per manufacturer etc.</p>
<p>Contrast that with an advanced search form where I have to put in all my criteria, and hope I get a result.  I might also miss certain results if my vocabulary is bad, or I don&#8217;t understand that the website says &#8220;high resolution&#8221; instead of WSXGA, so I don&#8217;t select it.</p>
<p>I think it&#8217;s obvious to anyone why faceted search is a good thing.  In my next post, I&#8217;ll be exploring why is hasn&#8217;t gotten widespread adoption, particularly in the small business / NGO sector, and how I plan to help change that.</p>
]]></content:encoded>
			<wfw:commentRss>http://pajamadesign.com/2008/07/01/solving-bad-ia-using-enterprise-search-reverse-advanced-search/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Solving bad IA using enterprise search (Vocabulary)</title>
		<link>http://pajamadesign.com/2008/06/30/apache-solr-based-cms-vocabulary/</link>
		<comments>http://pajamadesign.com/2008/06/30/apache-solr-based-cms-vocabulary/#comments</comments>
		<pubDate>Mon, 30 Jun 2008 04:21:55 +0000</pubDate>
		<dc:creator>Jacob Singh</dc:creator>
				<category><![CDATA[Professional]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[information architechture]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://pajamadesign.com/?p=28</guid>
		<description><![CDATA[Long time no blog&#8230;
I had a bit of a realization (or rather a resurgence of a recurring realization) that I enjoy writing.  It happened this weekend as I was &#8220;getting away from  it  all&#8221;.  
I&#8217;ve been interested in enterprise search for small and medium enterprises for a while now.  Having [...]]]></description>
			<content:encoded><![CDATA[<p>Long time no blog&#8230;</p>
<p>I had a bit of a realization (or rather a resurgence of a recurring realization) that I enjoy writing.  It happened this weekend as I was &#8220;getting away from <em> it </em> all&#8221;.  </p>
<p>I&#8217;ve been interested in enterprise search for small and medium enterprises for a while now.  Having implemented the Google Mini and the GSA, I&#8217;ve seen how a good search can really turn your information architecture on its head in a good way.  Like any conversation between two entities be they two people or a person and a website, communication is difficult, and many of the same rules apply:</p>
<h2> Vocabulary </h2>
<p>You have to speak the same language.   This doesn&#8217;t mean an Thai can&#8217;t talk to a Nigerian, but it does mean that when you are communicating, if the same word doesn&#8217;t mean the same thing (which it never does), your intentions, delivery and content is worthless.  That is why non-verbal communication and communication over phones is so ineffective compared to face to face meetings.  The words may be the same, but the interpretation never is.</p>
<p>So what does this have to do with enterprise search?  When a user wants something from your website, they are looking for a keyword.  Many computer scientists have tried to make linguistic aware search engines which correctly interpret sentences and question.  Some of these results are useful, but generally, I believe people don&#8217;t come to a site thinking &#8220;Do you have any
<ul>red Toyota Corolla</ul>
<p>.&#8221;  Internally, they are simply thinking &#8220;Corolla&#8221; and &#8220;Red&#8221;.  For instance, I could speak only two words of English, Red and Corolla, and chances are, I could walk into any American City and rent a red Toyota Corolla.  </p>
<p>When one plans information architecture for a site, they usually start with Persona or stereotypes of users, which have goals.  And then you define actions they take to meet those goals, and try to use the same vocabulary and thought process of these users to make an interface which is organized like their brain.  But when you have 10 different Persona, how is this possible?  And within your 10 stereotypes, there can be huge variation and outside of your assumptions, there may be other users you never thought about.  By having an good search engine, even if you have one page about red Mustangs which is buried in your site, people might find it.  By having an excellent search engine which has synonyms, facets, spell checking, related results, etc you may be able to help the user not only find what they thought they were looking for, but contextual information about it.  What if there is a mechanic looking for parts who types in
<ul>1988 Corolla Fuel Pump</ul>
<p> into a search, shouldn&#8217;t the search engine know what years the fuel pumps for Corollas available are the same and available, and allow him to filter?  Shouldn&#8217;t it know that in the late eighties the Chevy Nova was a clone of the Corolla, and had the same parts cheaper?</p>
<p>This is the type of high value information which comes from dealing with a real human, and no amount of brilliant forethought in information architecture can pre-assume what the person is actually looking for.  If I were doing IA for a parts website sans search, I&#8217;d have to have categories by model, by year, etc Even in a straightforward example like that, value is lost.  That&#8217;s why search engines need to ask the extra question, and today&#8217;s search engines that most sites use are not.</p>
<h2> Faceted Search </h2>
<p>Next whenever, I&#8217;ll write something about faceted search (fancy name for search with fields and filters) and how I think the combination of Apache Solr and open source CMSs like Drupal, Typo3 and Joomla, are going to pave the way to an entirely new concept of information architecture and where we spend out usability testing money. </p>
]]></content:encoded>
			<wfw:commentRss>http://pajamadesign.com/2008/06/30/apache-solr-based-cms-vocabulary/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
