Thursday, July 3, 2008

Google App Engine

I've been tinkering around with Google App Engine (GAE) and I'm pretty impressed. It's clearly designed to compete with Amazon's Elastic Compute Cloud (EC2) and hopefully we'll see even more scalable app containers come on the scene. I kind of wonder if BEA will get into the mix anytime soon.

First, the good.

Getting a hello-world app up and running on GAE is a breeze. You can download the SDK, follow their tutorial and have something going in just a few minutes. Even growing your data model is a dream. Just create a class extending one of their datastore base classes (eg db.Model), use or extend the ready-made datastore properties and that's it. No worrying about mappings, creating tables, etc. All of that is handled automagically. The way the data models work seems very much like Django (which, apparently runs like a champ inside of GAE).

Most of the Python 2.5 library is available. A few things are missing (namely file/system-related libs) but I haven't really run into problems with that yet. I could see where it might be a pain in the ass later but hopefully I can just code around those problems or delegate out to EC2 or some other host. That would be ideal. I haven't really tested the interoperability yet but there appears to be at least a way to another server via http.

On to the bad.

I don't have a whole heck of a lot to add here (at least not yet). One thing that is a little difficult for me is the way relationships are handled. It's a little different from what I'm used to. It's sort of possible to model things like they would be done for an RDBMS but it definitely cuts across the grain. Even using the "officially-sanctioned" methods of dealing with relationships seems to be a bit problematic. For example, I would like to create something like a dicussion board. But how would that be handled. I'm thinking along these lines:

Board has many Forumss has many Threadss has many Posts

That's a pretty straightforward model. But when I start calculating datastore calls (because you are limited here) I get something like this (guessing numbers):

  • select Board (1)
  • select Forum (10)
  • select Thread (25)
  • select Post (15)

So just a single pass for a single user to view the posts in a single thread would generate 1+10+25+15=51 datastore calls. Figure that the average user probably clicks on what 5 or 10 threads? Let's just say 10. So that's 510 datastore calls burned per user for the dicussion board. We have 2.5 million datastore APIs we can use per day so 2.5 million / 510 = 4902 users we can support per day.

That's plenty of room for a pet-project, just-my-buddies kind of application but if you're thinking bigger then you're going to fall down pretty fast. Of course when GAE goes live you can pay to go beyond 2.5m API calls but I think the best approach is to rethink the design so that less calls are required. Having some kind of custom-built cache that works well with the datastore (which I think is BigTable underneath) is probably the best bet. Let's consider an approach where we cache all (10) forums into a single call, all (25) threads into a single call and all (15) posts into one more call. So now we're down to just 3 API calls (the initial board call would just call our first 10-forum block) * 10 interesting threads so roughly 30 API calls per uesr. So now we could potentially serve 2.5m/30 = 83,333 customers served. Now we're talking! Factor in memcache and you could see even more improvement.

So back to "the bad". The reason this is bad is that I really haven't thought about caches like this before. So I find myself optimizing earlier than I should be out of fear of scalability problems. OK, so that's probably a limitation of me and not GAE.

No comments: