March 24th, 2009

RayV is terrible, and NBA Leauge Pass is a scam

I know this blog is about 99% work focused, but I just have to rant in the hopes that my measly page rank will make it known to the world that nba.com’s International League Pass (you pay $200 to watch NBA games on the internet) is a total scam.

Replayed Game - NBA League Pass Broadband

They use a player called RayV which is basically crowd sourcing bandwidth using a really badly made p2p player and running the games back on a schedule.

I’ve been fiddling with it for 5 months now with zero support. This product is a piece of garbage. Not even a piece, I can’t insult garbage because at least garbage can, at times be recycled and be of use. Hell, if I had a pile of garbage, I would spend an additional $200 to fedex it to the CEO of RayV and the idiot at nba.com who decided this was a good company to partner with.

I have all the system requirements met and get a 1mbps connection, but constant buffering and studdering (especially during crunch time in the 4th quarter) has made it the worst $200 and 20hrs I have spent this year, and I still can’t watch my beloved Celtics!

Aparently, I’m far from alone in thinking that RayV and NBA’s League pass sucks

March 17th, 2009

Acquia Search is rocking!

Just want to make a quick update and say that at long last my search project is on the internets and getting decent uptake.

Peter Wolanin and I presented at DrupalCon (don’t laugh too hard at me).

I think the reception so far has been great, and the servers have been champs :) We’re getting more and more signups every day.

One really cool one is Bryan from the CMS Report:
http://cmsreport.com/search/apachesolr_search/Drupal%20Search

If any of you out there want to find out how search can change how you build your sites and bring more people together with pages they need. Just click this link now:

http://acquia.com/products-services/acquia-search

A lot of people are excited but worried about trying out the free beta because we haven’t released any pricing, and something this cool will certainly cost more than a few pesos.

Well, fear not Drupalers, we’ve heard your call and are working on releasing some preliminary pricing soon. We wanted to wait until the beta got rolling a bit before we did this, but we want people to know we have a commitment to making this technology available to as wide an audience as possible. Stay tuned to my blog and/or the planet/acquia.com for more updates on this.

In the meantime, signup for the beta, it really only takes 15 minutes to setup and won’t break your site or lock you in at all. Please give us feedback! We know there are a few usability hiccups still in the signup process, and we’d love your input so we can fix ‘em. Thanks!

February 10th, 2009

The private lives of public IPs and EC2 security groups

As many of you know, I’m working on a Hosted Search product at Acquia. We’re building a pretty cool page where you can get some analytics on your search index and what people are searching for. Here is the deets on Dries’s site (hope he doesn’t mind ;0 )

Path Finder
Uploaded with plasq’s Skitch!

For this, we’re using Splunk which is a tad more pricey than I’d like, but a really amazing tool. Basically, it is grep + awk + a kilo of coke + a dozen redbulls + a Ferrari Testerosa + the same HGH A-Rod has been chewing. I’ll write more about it at some point, but this screen shot should give you an idea:

Path Finder
Uploaded with plasq’s Skitch!

Anyway, we use Splunk’s API to grab data into acquia.com and show the page above. The page was taking 10 seconds to load… I was stumped. Splunk seemed so fast, a couple seconds is reasonable for loading a report from millions of records, but 10 seconds was pretty extreme.

Eventually we discovered it was not Splunk at all, but a separate call in our code to a webservice (Call it Info Server) in EC2 which was being firewalled by Amazon. This caused the request to sit there for 10 seconds, and then timeout.

Here’s how security groups work:

I’ve got 2 servers:
Web Server – Serves static files (:80 and :443) and passes tough stuff to app server
App Server – serves requests back up to web server on :8080

Web Server needs to be able to access App server to push proxied requests through.

In EC2, each server has 1(or more) security groups. A security group is a list of access rights. These can be by Port & IP Range or they can be references to other groups. (wtf?)

Yeah, so the rule for the web server would probably be something like:
IP:any Port:80
IP:any Port:443
IP:111.111.111.111/24 Port:10000 (maybe some admin port for a certain location to access)

For the App Server, we don’t want 8080 world readable. We also don’t know the IP of the web server because this is elastic baby, servers can’t stand still. That’s why we give group permissions. So it looks like:

Group: Web Server

Which means any server launched on your account with the security group “Web Server” will have total access to any server launched with the security group “App Server”. Got it?

If not, here is an FBI style blackout picture which might make it more clear:

Path Finder
Uploaded with plasq’s Skitch!

In our case, we had a problem because we were referencing the external IP of our server(Info Server). See in the depths of the Amazon, each machine has a public IP and a private IP. So when you look for infoserver.acquia.com (made up, btw) it will resolve to 74.x.x.x When you try to look for ec2-10-45-123-41.compute.aws…. it will resolve to 10.24.134.41 and both point to the same place. The difference of course is that the security group settings only apply when you are using the internal IP even if both servers are inside the cloud

Hope you’ve been saved some pain.

Please come checkout Peter Wolanin and I as we present the future of Drupal search (we hope) at DrupalCon!

January 14th, 2009

Making Module Installation Easy for Acquia Search

Jeff Noyes (our Simplicity Guru), Linea Rowe, Peter Wolanin and myself sat down to discuss how the install process for our Hosted Search Service would look (yes, we’re getting close – Private Beta is out in two weeks)! Typically, when you have a faceted search engine, there is a set of filters on the left and search results on the right, with the sorting links generally horizontally aligned somewhere near the search box.

Here are a few examples from around the web:

Newegg.com - 15
Uploaded with plasq’s Skitch!
stuff : Clearance : Target Search Results
Uploaded with plasq’s Skitch!
pancakes, Books, DVDs Movies items on eBay.com
Uploaded with plasq’s Skitch!

And here is our current implementation:

Search | Dries Buytaert
Uploaded with plasq’s Skitch!

Here is the same shot, but broken down in “drupalish”

Search | Dries Buytaert
Uploaded with plasq’s Skitch!

I think it works okay, but we’re concerned that when people enable the module, they will have a hard time getting this together. Here is a series of screen shots of a user, enabling and setting up the module:

Modules | ad
Uploaded with plasq’s Skitch!

This part is simple (if you use Acquia’s hosted search), you just enable one module and you are done configuring the connection to Solr.

However, you standard search ends up looking like this:

Search | ad
Uploaded with plasq’s Skitch!

To get all the nice sorting and facet filters, you need to know (somehow) to go to admin/build/blocks and drag the ApacheSolr: blocks into regions like this:

Blocks | ad
Uploaded with plasq’s Skitch!

So what do people think? Should we just enable a few blocks “out of the box” and hope you are using garland and have a region named “left” or “left-sidebar”? If so, which blocks? Alternately, how can we provide a good workflow for people to know they need to do that extra step to setup their search. The other option Jeff suggested (which is most usable) is to have one block, where you can select what filters you want in it. The downside is that the user loses flexibility about where to but filters (maybe they want sorting on the right, etc).

I’d like to get some feedback on:

A). How to make this process so simple that it really is just checking that one box on the modules page and letting cron run and it looks great for 90% of users.

B). What the default blocks to enable are, and where they should be on the screen

C). How do we address this problem of multi-step installs which want to setup blocks in a more usable way for newbies?

See ya!
jacob

November 28th, 2008

What could search look like on d.o. and g.d.o

Robert Douglas, Peter Wolanin and I are scheming up what we hope to be a jaw dropping presentation of ApacheSolr + Drupal integration at DrupalCon DC. We’re going to show a prototype of d.o. and g.d.o hooked up the Apache Solr search server. We all know that d.o. and g.d.o. are notoriously hard to search through.

For instance, take this query:

http://drupal.org/search/node/views (searching for views).

Search | drupal.org
Uploaded with plasq’s Skitch!

Read the rest of this entry »

November 23rd, 2008

First Month @ Acquia

Hopefully with my new awesome job, I’ll have time to continue blogging here, but a couple quick hits about my impressions thus far at Acquia.

  • Acquia has really smart people running the show. Everyone, even the non-techies are really techie.
  • They really take care of their people. I’m totally free to purchase whatever technology (within reason) that will help me do my job. They get it, it’s pound foolish to keep gadgets away from geeks.
  • Chris really pushes scrum hard. We set our own limits, and we spend a lot of time planning for contingencies. It’s tough, and annoying, but it is reality, and I like that I am forced to be real before problems occur, not after.
  • Acquia Drupal needs more adoption. I don’t know when the pickup will happen, but I really encourage Drupal shops to start using it. I fully believe there is a huge ROI here, and it will be healthy for the ecosystem
  • Kieran is trouble, don’t go drinking with him (but do go)

September 20th, 2008

On the Josh Howard Media Frenzy

I’m a big hoops fan, so I picked up on this one.

If you’re interested, I was responding to this post here:
http://www.celticsblog.com/index.php?option=com_content&task=view&id=3934&Itemid=189
But got to writing so damn much that I decided it was more of a blog post than a comment:

Your points are fine, but I think you are missing the crucial and more interesting one: Why is the media threatened to see a Black athlete Howard (who is obviously an ass) protesting a given ritual in a White owned sports complex where 99% of the players are Black? It is good to handle things maturely, etc as you mention, but I don’t think you got on Bird for saying “we’re playing like women out there” did you? It’s also about if and how much you are personally offended. How did you feel when Marvin Gaye sang at the all-star game? The country flipped their lids worse then in this case and it is now being used on Team USA commercials!

This nation has been nothing but hate and violence against African Americans. I do not expect any African American to be patriotic. If they are, well that is their choice; but I think any white person should be ashamed to criticize the cynicism of an African American in regards to what this country has done for(to) them. That being said, I guess we should all be polite and protest respectfully, not tred on others, etc. Don’t get me wrong, Howard is no Jesse Owens, Tommie Smith, John Carlos or Muhammad Ali. But that doesn’t mean his point is not valid. Why can’t we discuss letting the players have some political freedom to dictate the rituals? Sure there are 80% white fans in the audience, but the players are almost all Black. Why don’t they sing “Lift Every Voice” or something else?

The National Anthem is flawed for a few reasons. Primarily, it is about a war fought by good old boys (many of whom were involved in the slave trade) for good old boys, not for poor Europeans, and certainly not for enslaved Africans or Native Americans. And secondarily, if has no relevance to African American history in the U.S. If anything, it only hearkened in an economic expansion which caused a brutal acceleration of slavery. I’d be angry too if I had to shut up and not say anything about a song written and revered by some guys who enslaved my family and who’s kids are still attacking me. Maybe I wouldn’t be respectful either.

September 19th, 2008

Adding custom placeholders to your log in python2.4

Python has a great Logging Engine which is at once powerful and simple (like python). One of the features added in 2.5 was the ability to provide a dictionary of extra params to the logging functions to allow for custom replacements in your formatter string. Unfortunately for me, I had to move my entire infrastructure for my Solr hosting project over to RightScale, and RightScale plays well with CentOS, and Python 2.5 doesn’t :(

If you are wondering how to get this same functionality in 2.4, here is how:


import logging
from logging import _srcfile, LogRecord
# Custom logger that uses our log record
class Python25_Logger(logging.getLoggerClass()):
    def makeRecord(self, name, level, fn, lno, msg, args, exc_info, extra):
        """
        A factory method which can be overridden in subclasses to create
        specialized LogRecords.
        """
        rv = LogRecord(name, level, fn, lno, msg, args, exc_info)
        if extra:
            for key in extra:
                if (key in ["message", "asctime"]) or (key in rv.__dict__):
                    raise KeyError("Attempt to overwrite %r in LogRecord" % key)
                rv.__dict__[key] = extra[key]
        return rv

    def _log(self, level, msg, args, exc_info=None, extra={}):
        """
        Low-level logging routine which creates a LogRecord and then calls
        all the handlers of this logger to handle the record.
        """
        if _srcfile:
            fn, lno, func = self.findCaller()
        else:
            fn, lno, func = "(unknown file)", 0, "(unknown function)"
        if exc_info:
            if type(exc_info) != types.TupleType:
                exc_info = sys.exc_info()
        record = self.makeRecord(self.name, level, fn, lno, msg, args, exc_info, extra)
        self.handle(record)

    def findCaller(self):
        """
        Unfortunately, this also needs to be overridden because the trace is one level up now.
        (HACK)
        """

        try:
            raise Exception
        except:
            f = sys.exc_traceback.tb_frame.f_back.f_back
        rv = "(unknown file)", 0, "(unknown function)"
        while hasattr(f, "f_code"):
            co = f.f_code
            filename = os.path.normcase(co.co_filename)
            if filename == _srcfile:
                f = f.f_back
                continue
            rv = (filename, f.f_lineno, co.co_name)
            break
        return rv

And to implement it:


logging.setLoggerClass(Python25_Logger)
log = logging.getLogger('testlogger')
formatter = logging.Formatter('%(name)s: level=%(levelname)s module=%(module)s special=%(special)s: %(message)s')

console = logging.StreamHandler()
console.setFormatter(formatter)
console.setLevel(logging.INFO)
log.addHandler(console)

#Here we are passing along the extra param "special"
log.error('hi mom', extra={'special':'cake'})

This code will output:

testlogger: level=ERROR module=WHATEVER special=cake: hi mom

September 12th, 2008

Delhi Drupal Meetup Tomorrow!!

Sorry for the late notice folks, but I figured we should just try to make this happen:

Calling all drupal wale in Delhi!

We’re meeting at the offices of Srijan in Nehru place.

Directions and contact info are here:
http://groups.drupal.org/node/14727#comment-49495

and the node regarding them meetup (organized in about 3 days) is here:
http://groups.drupal.org/node/14727

There will certainly be 4-5 people there, so please do show up!

At present there is no formal agenda, but Srijan will be providing a projector so we can do some lightning talks, or watch a thrilling DrupalCon presentation or two :) Anyway, see you then.

All the best,
Jacob

September 2nd, 2008

How to – gzip compression between Drupal and nginx / apache

I was working on trying to speed up the transmission and reduce the bandwidth of a web service I’ve been building in EC2. I’m using nginx (en-JINN-ex) as my load balancer and the cloud because, it is awesome.

A little overview for people who have lives and think an HTTP Header is a football term:

When you send a request to a server via your browser a series of Headers(little variables) are sent across which tells the server what type of browser you are, what language you prefer, and a whole host of other interesting info.

When you get data back from the webserver, it also sends back some headers like the status of the request (200 is OK, 404 is a url with no content, etc), the length of the content, type of content, etc.

When a server is using mod_deflate or mod_gzip in apache (or The Gzip Module for Nginx), then it is capable of sending back its content in a compressed (gzipped) form. All modern browsers support transparent decryption on the browser side. What this means is that you see a normal HTML page which is 70k, but only 15 or 20k went over the pipe because of compression. Cool, right? Chances are it just happened as you saw this page.

Now how does a web server know you can accept gzip’d content? Well, you send one of those headers that looks like this:


Accept-Encoding: gzip,deflate

and the server responds with


Content-Encoding: gzip

By default, php does not transparently support gzip encoding, but it can be done. See the following:



if (function_exists('gzinflate')) {
        $gz_on = true;
    }

    if ($gz_on) {
    // Tells the drupal_http_request function to send this header over.
        $headers = array (
            'Accept-Encoding' => 'gzip,deflate',
        );
    }

    $return = drupal_http_request($url,$headers,'GET'); 

    // This checks to make sure that we actually got gzip'd content back
    if ($gz_on == true && stristr($return->headers['Content-Encoding'],'gzip')) {
        //First 10 chars are junk
        $string = substr($return->data, 10);
        $output = gzinflate($string);
    } else {
        $output = $return->data;
    }

Go ahead and try it with $gz_on false and true. You'll see that $return->data will have different lengths if it is compressed or not, but $output will be the same.

One caveat when working with nginx (this one hurt after 2 hrs):
From the nginx manual

Turns gzip compression on or off depending on the HTTP request version.

When HTTP version 1.0 is used, the Vary: Accept-Encoding header is not set. As this can lead to proxy cache corruption, consider adding it with add_header. Also note that the Content-Length header is not set when using either version. Keepalives will therefore be impossible with version 1.0, while for 1.1 it is handled by chunked transfers.

Drupal uses HTTP 1.0 for some reason... I don't know why it uses version 1.0, but I filed an issue about the same, because 1.1 is a lot better and pretty much ubiquitous IIRC.

Happy Header Hacking!

How To find me

Telephone: +1 510.277.0891 | Email: jacobsingh at gmail daht calm

Solution Graphics