Archive for the ‘Web Technology’ Category

Google App Engine

April 8, 2008

Having spent time with extremely high trafficed sites on the Sitecore CMS platform, I have always been intriqued by the challenges of sites that experience high traffic during short periods of the yeat. Think: Official World Cup homepage etc.

How many servers do you need running idle 51 weeks a year in order to handle the load 1 week a year? Woudl it somehow be possible to rent server resources when you expect traffic peaks? This morning came a post from Matt Cutts blog about a new Google initiative called Google App Engine.

Quote: At tonight’s Campfire One we launched a preview release of Google App Engine — a developer tool that enables you to run your web applications on Google’s infrastructure. The goal is to make it easy to get started with a new web app, and then make it easy to scale when that app reaches the point where it’s receiving significant traffic and has millions of users.

Google App Engine gives you access to the same building blocks that Google uses for its own applications, making it easier to build an application that runs reliably, even under heavy load and with large amounts of data. The development environment includes the following features:

And continuing:

During this preview period, applications are limited to 500MB of storage, 200M megacycles of CPU per day, and 10GB bandwidth per day. We expect most applications will be able to serve around 5 million pageviews per month. In the future, these limited quotas will remain free, and developers will be able to purchase additional resources as needed.

So you get something like 5 million pageviews per month for free. That is a whole lot. On average close to 2 pageviews per second. Imagine the possibilities if your CMS can outsource html reendering to a service like this during peak hours. It is of course not limited to CMS related tasks, but can be used for anything you can make fit into the framework.

Exciting initiative.



Advanced stress testing with Application Center Test

May 31, 2007

At work I am facing a challenge of testing how many websites can be hosted on a single server. As you never know the real circumstances that real world websites and users create, you have to make some assumptions for such a test to make sense. In this case we know that all the sites will be variations from the same original site, so we will just use this site for testing. Regarding traffic patterns we are lucky enough to have real world logfiles from similar websites that we can use.

After spending most of my working hours trying to understand what goes on under the hood of websites (mainly ASP.Net, but 99% is HTTP requests anyway), I have learned not to care so much about simultaneous users and just cut to the bone by answering the question “… and how many requests is that per second”. From my experience the webserver do not care much if 10 users clicks once a minute or 5 users click every 30 seconds. To make a long story short, we are going to use LogParser from Microsoft to find the average number of requests per second during peak hours from these logfiles. This will give me a number of requests per second (minute or hour, whatever) PER website.

Say a website receives 2 requests per minute, and I want to stress test 17 identical websites. Overall this will be 1765 milliseconds between each request. But what is the best way to test if this performs fine? Well, I have been looking at Microsofts “Application Center Test” (ACT) made for the purpose.   ACT has a nice feature where I can record clicks in a browser and save it in vnscript files. However, having to click in 17 different (but identical sites) seems just too tiresome. And what do I do when I later on want to test with 34 identical sites, and how do I get it right with the 1765 milliseconds? Not to think of the sizes of these vbscript files.  

Then it hit me: vbscript = programming language. Work smarter, not harder. So I recorded a sample script, and manipulated it into a highly dynamical one. It has two arrays, one with the list of host names and another with a list of the relative urls we are going to test. It also has a delay that i can set to anything I want e.g. 1765 milliseconds (later on I might want to make the number random around an average number to make it more realistic. The script then just picks a random domain and a random url at each request to simulate real life traffic. I think it is kinda neat. The script goes here:

Option Explicit
Dim fEnableDelays
fEnableDelays = true

Sub SendRequest1(p_domain, p_url, p_delay)
    Dim oConnection, oRequest, oResponse, oHeaders, strStatusCode
    If fEnableDelays = True then Test.Sleep (p_delay)
    Set oConnection = Test.CreateConnection(p_domain, 80, false)
    If (oConnection is Nothing) Then
        Test.Trace “Error: Unable to create connection to ” & p_domain
        Set oRequest = Test.CreateRequest
        oRequest.Path = p_url
        oRequest.Verb = “GET”
        oRequest.HTTPVersion = “HTTP/1.1”
        set oHeaders = oRequest.Headers
        oHeaders.Add “Accept”, “image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/, application/, application/msword, application/x-shockwave-flash, */*”
        oHeaders.Add “Accept-Language”, “en-us”
        oHeaders.Add “Accept-Encoding”, “gzip, deflate”
        oHeaders.Add “User-Agent”, “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)”
        oHeaders.Add “Host”, “(automatic)”
        oHeaders.Add “Cookie”, “(automatic)”
        Set oResponse = oConnection.Send(oRequest)
        If (oResponse is Nothing) Then
            Test.Trace “Error: Failed to receive response for URL to ” + p_url
            strStatusCode = oResponse.ResultCode
        End If
    End If
End Sub
Sub Main()
 dim arDomain, arUrl
 dim i, delay
 dim domainIndex, urlIndex
 dim ubounddomain, uboundurl

 delay = 500
 arDomain = Array(““)
 arUrl = Array(“/product.aspx”, “/customers.aspx”, “/parners.aspx”)
 ubounddomain = ubound(arDomain)
 uboundurl = ubound(arUrl)
 for i = 0 to 30
  domainIndex=Int((ubounddomain) * rnd())
  urlIndex=Int((uboundurl) * rnd())
  call SendRequest1(arDomain(domainIndex), arUrl(urlIndex), delay)

End Sub

When I set the delay to 500 milliseconds and run it with 2 simultaneous users, I can see the graph flatlining around 4 requests per second, which is exactly as expected. So to add more sites I just need to add the extra domains and adjust the delay to simulate the extra stress.

Happy testing 🙂

Keep your newsletters out of spam filters

May 28, 2007

Spam emails are becoming more and more of a plaque in your and my inboxes. What is not yet common knowledge is that the fight against spam may also result in your legitimate newsletters getting caught up in spam filters. Did you ever get complaints that hotmail users never got yournewsletters?

Well, Microsoft who owns hotmail are playing hardball in the fight against spam, by enforsing senders to use the Sender Policy Framework (SPF) when sending emails to hotmail users. If not used, emails are bounced and never received by the hotmail user. It is not really a bad move because SPF is kinda cool, is easy to implement and it now gets a lot of deserved attention.

So what is it about? : Well, more and more people have tried to receive “underliverable emails” to users they never sent emails to. The reason is that spammers have used their domain to send emails from, and if they are rejected you get the email, not them. SPF is about making life hard for spammers who want to abuse YOUR domain. When implementing SPF, the sender sets up an SPF text record in their DNS, stating which servers are allowed to send emails from this domain. A receiving mailserver screening incoming mails can then check the domain against the IP address of the sender and see if they match. If not the mail is rejected and that is what happens when hotmails are bounced. Of course the receiving mail servers also have to support SPF to avoid spam, but that really isn’t your headache.

What is your headache is that you should make sure that the SPF record is set up for your domain. That way your newsletter will also arrive at mailservers checking the SPF records.

Read more about the Sender Policy Framework and how to set up SPF records here: