Category Archives: OpenStack

A Digital Ecommerce Transformation – Cloud Home Page – Part XV

Part XV in a multipart story.  To start at the beginning goto Part I.

Our goals for 2012 were to deliver two things, a new browse architecture for TWLER.com and a Holiday without issues. Not a small task for a team saddled down with a giant monolithic application and a mandate to deliver features and not affect the business during the rewrite.

We started in on the new browse architecture. It was our feeling that if we could simply scale up the browse capabilities of TWLER.com, we had a chance of making our aging ATG monolithic application survive another Holiday.

About 98% of traffic on an ecommerce system is people browsing the site, the other 2% is people actually trying to buy stuff. If you are planning for 10X increase in traffic for one week out of the year, than attacking the 98% seemed like a good place to start. Also, since we were dealing with a monolith, any traffic we removed from the 150 node ATG cluster was more power towards the checkout process. In fact, as we did the math, if we took off 70% of the traffic in the first year at Holiday, we’d actually have 3X the capacity we needed for the checkout process and remaining components left on the ATG servers.

We started with a project we called Cloud Home Page. The home page is the most served page on the site and at Holiday we had to make it static and cache it at the CDN. The business teams didn’t like this because, without dynamic content, there wasn’t any way to adjust what people were seeing as they landed on TWLER.com and lead them towards the new sales events. The Cloud Home Page plan was for a dynamic, cloud-based home page, with minimal personalization but modifiable within a 15 minute window.

Since we were coming from a J2EE style JSP on top of Servlets architecture, we first wanted to upgrade our front-end. The new architecture specified a thin and dynamic UI layer with zero coupling to the back end. That meant only HTML, CSS and Javascript were allowed. All data communication was done with JSON, and we tried to get the number of calls to the backend down to one.

That is, during a request for TWLER.com, the controller would make only a single request to the backend for all the data it needed to build the page. The request was handled by a service aggregator that would then manage the 20-50 service requests for data and build a JSON response in the specified SLA of less than one second (for the home page). By specifying a single request, we could let the service aggregator determine what calls to cut-off and what data to serve from cache to meet the page SLA. We also had a single point where we could add or remove functionality depending on the load. If the load was light, we could add a few more personalization services such as recommendations. If we were at peak loads, we could turn off all personalization and even some of the dynamic page elements to lessen the load on our servers. We created a highly dynamic Home Page that was tuneable given the system load.

However, to build just the home page we needed many teams to work together and deliver simultaneously. We had a new front-end team designing the Backbone based page structure and components. We had a team building a front end controller framework to drive the UX. Another team was building the service aggregation and distributed service management layer (sadly this was right before Hystrix became available). A team was building up a new distributed caching layer to gather together all the data needed to drive the home page. We continued the team that was extracting the item catalog from ATG and distributing it across our datacenters and AWS. And we imagined a team building a new customer management system we called Customer Graph, although we only had an architect working on that project. To ease the gridlock in our lower environments, we kicked off an OpenStack based lower environment PaaS just because we didn’t have enough other things going on. Topping it all off, our digital operations teams was learning how to manage systems in the AWS cloud.

We did have one final team, their job was to Mavenize the ATG build. This might sound trivial but it was the most harrowing undertaking of them all. We felt we needed to try to modernize the ATG deployment process to allow teams to move faster as we estimated it would take at least three years to exit the ATG platform. To make that palatable, we had to automate the entire ATG deployment process. Given that the current build process was a 10,000 line recursive ANT script, we put some of our most masochistic personalities on it. Besides automating ATG, they had to separate the thirteen intertwined applications that were deployed together as an ear file.  A 2GB ear file.

Goto Part XVI

Layered Cloud versus Hybrid Cloud Architecture

We had a great week at Openstack Summit in Portland.  See the article in Wired magazine for a short summary.  Or watch the Best Buy Openstack keynote.

One thing I learned from three days at the Openstack Summit is that I have always misconstrued the definition of Hybrid cloud architecture.  When we started making plans for our cloud architecture, I always thought of it as a Hybrid cloud.  At Openstack, there were numerous presentation on Hybrid cloud and all of them revolved around using the cloud to provide additional scaling for an application that runs in the datacenter.  In all cases, the datacenter architecture stack was simply recreated in the cloud and used for peak load.  The database master is in the datacenter and a slave exists in the cloud.  The Hybrid cloud architecture simply means using a cloud to elastically horizontally scale an existing application.

When I originally thought about Hybrid cloud I thought of an application that has one or more layers in the cloud, and the remaining layers in the datacenter.  I now call this a Layered Cloud architecture.  In our case we built our new product browse capability in the cloud and kept the remaining application in the datacenter.  All the data in the cloud was non-secure, basically public data so there was little to no security issues.  We are keeping the commerce pipeline in the datacenter simply because it is easier to keep the commerce data and transactions in our secure datacenter.

joel-breakout-sessionThis is a good example of assumptions clouding my view of reality.  I’ve read plenty of articles and information about Hybrid cloud, but until I was sitting in a presentation having someone tell me about Hybrid cloud, I never noticed my definition was incorrect.   Than after recognizing this, I watched every presentation to determine which definition was used more frequently.  Unfortunately for me, all the definitions were the same and they did not support my original view.