Part XVI of a multipart series, to start at the beginning, goto Part I.
As much as I love disparaging ATG, it was actually built as an all-inclusive platform, which, at the time, was a modern way to build systems. ATG had a built in Inversion of Control framework prior to Spring. It had a set of tag libraries and did its best to help teams build ecommerce sites quickly in a J2EE application server manner. It just wasn’t ever meant to scale to level we needed in 2012, the architecture was wrong. Everything you did trigged calls to the single database instance, there was no caching so a promotion calculation would be run every time a customer wanted to see the promotional price. With two million concurrent users, all interested in the latest promotions, this is a disaster. Any problem in any part of the system brought down the whole application. We needed to evolve from monolith to distributed.
Similarly, by 2012, deploying massive EAR files had been a dead architecture choice for at least five years. Tying everything together into one massive deployment was the opposite direction of the Agile/DevOps movement springing up with the Continuous Deployment mania.
But this was the legacy we faced at TWLER (The World’s Largest Electronics Retailer). We were ten years into unchecked ATG development with thirteen applications deployed on the platform, all rolled up into that 2GB EAR. You can only imagine the quarterly deployment nightmare, literally a waking nightmare as all deployments were overnight affairs, but that’s the fodder for another post.
We dedicated a team to figuring out how to deconstruct our ATG implementation and code. The codebase was series of interconnected, circularly coupled, packages. There were five or ten main packages that were used by all the applications so that a change to one package, would inevitably break 2-3 applications. Thus, an eight week QA cycle was necessary to ensure all the integrated code from 1000s of engineers would result in thirteen working applications.
But if you can’t throw it out, and you can’t break functionality, what do you do?
The first course of action was to modernize the build. It wasn’t actually possible to build the ATG application on a developer machine. The only way to build the application was by kicking off a build in the developer integration environment. Thus, if you were an engineer working on ATG, you could make a bunch of changes to your code, check them in to the CVS source control system, and kick off a build. Imagine 1000 engineers all trying to do this at the same time.
The developer integration environment was overwhelmed by the number of build requests so instead of on-demand, it would schedule 2-3 builds per day. So 1000 engineers trying to get their code in for 30 different projects to build together, guess the result of the build? If you guessed broken, than good job! You’ve worked in an enterprise environment in the mid-2000s!
To solve all these problems wasn’t easy, but the playbook is straightforward:
- We had to automate the build so anyone could run it.
- We had to separate the applications so you could just build the one you were working on.
- We had to come up with a solution for the developer laptop, so you could build and run the ATG server locally.
- We had to decide what portions of the ATG code to invest in and what to throw out.
- We had to break the dependencies between the ATG packages so developers could work on things in isolation.
- We had to automate the changes in the database so a local DB could be built and modified on the developer’s machine.
- We had to fix the integration environment so if wasn’t constantly failing.
Many might think this was a throwaway investment and not worth the time or effort. We faced those challenges internally to TWLER as well. But, we were able to convince leadership that in order to move quickly in the future, we needed to move faster now in our legacy architecture. We would be using portions of ATG for the next three years at least, and we wouldn’t be able to complete our new work without augmenting the existing systems.
Looking back on it, I’m happy to report that the decision to invest in ATG as we dismantled it, was a good one.
Goto Part XVII