As I mentioned in several previous blog entries, on January 12 we re-launched InstantSpot after a complete bottom-to-top rewrite. In addition to a completely new code base, we took the unlikely choice of using Railo as our CFML processing engine.
“Why?” you ask? (You aren’t the first)
The reasons were several, and I will detail a few of the key points that went into our decision.
- It’s free – InstantSpot is basically a small project of big ideas by two developers doing this out of our own pockets and – how can I put this delicately? – we are poor and cheap! Unfortunately, despite how much we will it to be so, InstantSpot has not made us bazillions of dollars (at least as of the time of this posting). From the beginning we have made an effort not to make InstantSpot a financial burden on our families, as they are already paying dearly in the time we spend tied up in code till all hours of the night, and we like to cut financial corners everywhere we can. A free CFML processing engine? That is an obvious avenue to at least explore.
- It’s fast – No lie… Railo is fast. From our very first development and tests, it just seemed to blow other engines away in the speed that it processed code. This was backed up test after test. Not only does it shine in the speed of processing code, but it also has a tiny footprint on the server. In our environment running it as a Tomcat application, you almost wouldn’t even guess it was there.
- It’s CFMX 7 compatible – To us this meant that we didn’t have to code anything differently simply because we chose to use Railo over ColdFusion. We had no issues whatsoever using our normal data model patterns we use in any other application, and we used BER releases of ColdSpring and Mach-II without the slightest hiccup. Eventually we found a couple of small places where we had to make a workaround (3 I can think of), but they were without question edge cases, and there were easy workarounds that didn’t feel as though we were compromising the application.
- It’s the underdog – If you were to poll the ColdFusion community at large, you would find that many people don’t even know there are any other choices besides Adobe ColdFusion, and many of those may have only heard of New Atlanta’s Blue Dragon. Railo hardly gets a mention in most circles. We thought it might be fun to be an advocate by example and help promote what we felt is a great alternative choice. Additionally, Aaron and I have a tendency to choose the road less traveled, but this certainly fit our m.o.
Sounds reasonable right? Since we made that choice around June of 07 and started moving forward with the rewrite, we have felt overwhelmingly positive about our decision.
All of that began to change about 1:00am January 13.
After making the DNS changes and as we started seeing traffic redirect to the new server, we started seeing absolutely inexplicable errors. The closer that we examined them, it became obvious that we had some *serious* threading issues in our application. We are extremely careful in this regard when it comes to our code, so this was very surprising. However, this *was* brand new code, and of course there could have been a hole somewhere right?
As more traffic started coming in, the errors escalated. We started seeing errors at least every minute, each of which generated a painful new email to both Aaron and me. It became clear quite rapidly that the errors actually had nothing to do with the code. We started seeing errors from both Mach-II and ColdSpring that just simply couldn’t happen. For instance , here is one we started seeing from ColdSpring:
Message variable [beandefinition] doesnt exist Tag Context /www/instantspot/www/coldspring/beans/AbstractBeanFactory.cfc (211)
Really? That is pretty interesting since line 210 is:
<cfset var beanDefinition = getBeanDefinition(arguments.beanName) />
And how about this one from Mach-II?
Message variable [nextevent] doesnt exist Tag Context /www/instantspot/www/MachII/framework/RequestHandler.cfc (115)
Oh yeah? Well… this is line 114:
<cfset nextEvent = appManager.getEventManager().createEvent(result.moduleName, result.eventName, eventArgs, result.eventName, result.moduleName) />
Clearly even our worst var-scoping misstep couldn’t have created those errors, and furthermore, these are well-tested frameworks used in hundreds if not thousands of applications. If these threading errors existed in them, Aaron and I would not be the ones discovering it in January 2008. We were also seeing some of our own objects that were attempting to call methods of other objects, and obvious sign of serious threading issues. In two instances, a person’s RSS feed actually contained someone else’s content.
We began to wonder if Railo even recognized var-scoping at all? I pulled up an old blog entry that I had made in which I wrote some code examples that showed an easy example of a var-scoping error and ran it against Railo. I set up a Railo scribble pad, and ran the test. It did pass that test, which tells me that Railo at least manages var scoping on a cursory level. However, under the load of our application it appeared that we were looking at something bigger than just var-scoping a few object methods.
At this point, there was no longer any question that in order to get out of this tailspin we needed to do something drastic. We quickly decided that the most logical step was to switch to Adobe ColdFusion 8. The cheap-gene which is so deeply embedded in our DNA had to be thrown out the window, and we had to act by getting a ColdFusion 8 license and getting it implemented asap. One immediate concern that came to mind was how much we had modified the Railo WEB-INF in order to do some of the URL handling that we had implemented as we not only use mod_rewrite in Apache, but we also use another Java application in the mix as well. After installing ColdFusion 8 Standard and digging into the /wwwroot/WEB-INF, we found that we could painlessly apply the same pieces to our ColdFusion application, and with some very small changes, we had InstantSpot running in our development environments.
After doing some heavy but rapid testing throughout all of our application, we felt that we could make the switch. Even if an error or two was discovered later, the benefits would strongly outweigh the utter nonsense we were dealing with at that time. So around midnight last Wednesday night, we pushed up the ColdFusion implementation of our application, crossed our fingers, held our breath, flipped the switch, and……
After the application initialized, suddenly there was peace… no errors… no emails… just an application purring along as it was intended. In fact, we had to push up a test template with broken code to ensure that our error notification was still working. Since that time we have not seen a single error occur in our application with well over 100K page requests since the move.
I want to be clear that this post is not meant to be an attack on Railo. I am sure that Gert and crew work extremely hard and I tend to believe that Railo will mature to a nice alternative if they keep up their efforts that they have shown to date. However I do hope that this post serves as a warning as we found that there are huge implications with using it as it stands today.