I recently had the opportunity to sit down with Kelly Goetsch and Dirk Hoerig to record an interview on Target’s engineering culture as part of their Commerce Tomorrow podcast. Kelly was in town for the Open Source North conference which we were both speaking at. We sat down in the Target recording studio to tape the show. I’ve been using the studio to create an internal podcast for Target Engineering so we had an audio engineer and figured out how to include Dirk from Germany.
Part XXI of a multipart series. To start at the beginning goto Part I.
This is the last entry before we actually get to Holiday 2012. But before we get there I would be remiss if I didn’t document the organizational changes that happened over the course of 2012, and how we fought to keep our project intact. If you go back to 2012 and take a look at the TWLER stock price, you’ll see that bad things were afoot. The stock price dropped from somewhere around $25 to under $12 in the first few months of the year. Things needed a shakeup badly.
The CEO of TWLER in early 2012 had been brought up in the stores context and largely ignored the digital channel. But pressure to bring in some youthful talent must have been high as in March of 2012 we learned we had a new President of Digital we’ll call Esteban Schick, or ES from now on. ES had the digital world and took over IT and became the acting CIO as the current CIO was forced out. That did mean that we all had a new EVP over digital and all investments would be reviewed.
But, before that could happen, more wacky changes occurred. Literally the next month, April 2012, the TWLER CEO was forced to resign due to inappropriate conduct that you’ll have to figure out for yourself. Additionally, the CMO left a few months later. With an interim CEO, ES was granted marketing and effectively was in charge of half the company after being with TWLER for only a few months.
Well, this kept ES quite busy so the review of all investments in IT and Digital went quite slowly. It wasn’t until mid-summer of 2012 that we finally got in front of ES to review the first $13M investment to rewrite TWLER.com. We brought our architecture team and business leads to this meeting with the necessary deck and strategic direction. As we introduced ourselves, we went around the table and seven people introduced themselves as various architects for TWLER.com. ES was not impressed, and after we finished introducing ourselves, his first statement was “at EstrellaDinero (psuedonym), we had one architect, everyone else was engineers.”
In 2018, the fact that a large company no longer employed architects is not a shocking statement, much of the market is going this direction. In 2012 in the Midwest, this was devastating to my pride. I had spent my career building up to Chief Architect at a Fortune 50. Now, architecture was being disparaged by our new tech and digital leader!
I hid my feelings well and started in on our presentation. We always started with how we operated, rather than what we were building. We wanted to ensure we distinguished ourselves from the rest of TWLER by outlining our Agile/DevOps/Co-located/Small team strategy. No one else at TWLER was operating at scale in Agile, we had built up a team of close to 100 engineers, all on-site, working on 13 different teams. I’d hazard to say it was one of the largest all-Agile teams in the Twin Cities at the time.
As we worked through the presentation, ES started to appear more comfortable and asked a lot of questions, probing our understanding of Agile and Engineering practices. After the first 45 minutes of a schedule hour, he was now energetic and highly engaged. His comment was, “this is the first team I’ve met that understands software engineering.”
Now ES was an EVP, and EVPs have tight schedules and people wait months to meet with them. But at this point, ES called his admin and rescheduled to stay another hour with us. I knew we had won him over at this point and dove into the technology strategy to rewrite TWLER.com.
How we built TWLER.com is the subject of most of this blog, but at a high level we were building a service-oriented (no microservices yet!) distributed cloud-based auto-scaling multi-layered cloud and datacenter zero downtime ecommerce platform.
ES was impressed with our team, said keep up the good work, and basically left us on our own for the rest of his tenure. Which lasted until December 2012.
There were two other notable changes prior to Holiday of 2012. TWLER hired a new CEO we’ll just call HJ, and ES hired a President of Digital to take over that part of the company for him, ScottyD.
With a new CEO and a new Digital leader, we ended up having two more similar investment reviews in 2012. Remember, this was the first year of completely converting TWLER.com to a new ecommerce platform architecture. It would have been an easy time for a new CEO or CDO to make their mark on the company by bringing in some “real experts” to take over the dotcom rebuild. In fact, the rumors were that the leadership team had multiple talks with large outside vendors to determine how they could take over TWLER.com. I’m not privy to what actually occurred in those talks and meetings, but in our time reviewing the project with the CEO and CDO, we did impress them with our direction and strategy.
To recap, from November of 2011 to December of 2012, I sold three EVPs and a CEO, in four separate meetings, on the fact that we were the best team possible to rebuild the fourth highest scaling ecommerce site in North America, with revenue approaching $2B. At the end of 2012 we were still there, so it must have worked.
Part XX of a multipart series, goto the beginning with Part I.
The next phase of our move to the cloud in 2012 was to build the Cloud Product Detail Page (PDP). The detail pages are where you land when you select a product from a list or advertisement. It generally has the price, product description, reviews, questions and the all important Add to Cart button.
Cloud Home Page was the first foray into moving TWLER.com to the cloud, but we needed to move more parts of the site to the cloud to reach the scale we were looking for during Holiday. To achieve the goals of our architecture, we were attempting to move the 90+% of site traffic that is people landing on the home page, searching for products and viewing product details. In an eCommerce site, the majority of traffic is simply looking at products and learning about them, only a small portion of traffic makes it to the checkout process.
The Home Page was a fairly straightforward task since we were redesigning the whole page as we rebuilt it. The PDP was much more difficult. First, there were eleven PDPs which all spawned from a single PDP five or ten years ago. Second, the content in the content management system contained custom HTML embedded within the content making it extremely difficult to reuse the existing content. But with 500,000 items in the content system, it wasn’t possible to rework all the content while building out the new PDPs. Besides, the way we were building out the system, the business wanted the ability to fall back on the original PDPs in case the new ones failed. We hadn’t yet gained the trust of the business teams in our first year of rebuilding TWLER.com.
We decided an engineering solution was the only likely way to solve this problem of bridging the old content to the new system. We set some enterprising engineers on the problem; we simply tasked them with figuring out a way to make the existing content work with the new page templates.
We then set out to determine how many new PDP templates we could accomplish by Holiday of 2012. We decided to simply tackle the two highest trafficked PDP templates, computers and TVs. These two PDPs accounted for well over 50% of the traffic to the site. The schedule for the remaining PDP templates went until Holiday 2013.
But there was nothing easy about this task, we struggled with the business to define the new look of the PDPs, we needed to build a framework that allowed most of the PDP to be cached at the Content Delivery Network (CDN) so we could achieve 80-90% page offload to the CDN. We needed to push all that content out of the datacenter for the first time, and have it cached in the cloud. We needed to do this across multiple cloud availability zones. And we needed to meet our page construction SLAs of less than two seconds.
At the time, there were no other retailers pushing into clouds that we knew of, so we were designing the architecture as we went, working with the problems as they came. We tackled content replication via NoSQL databases in a hub and spoke model, only pushing the item data to the cloud and not allowing updates to that content. We built a massive service aggregation forward cache using Memcached that we used to cache the majority of service calls. Netflix Hystrix came out right at this time and we quickly switched from the homegrown version we had been making to the Netflix distribute service management framework. We had to add dynamic page construction to allow for a small amount of personalization. The problems of running a PDP out of a cloud were numerous and difficult.
The overarching Architecture principal throughout this whole effort was to serve all browse traffic completely out of the cloud. That meant getting all product information, pricing, promotions and inventory to the cloud. For this year, we settled on drawing inventory from the datacenter and only calculating promotions when people entered the checkout process. These required business compromises such as language in the cart that said “promotions will be applied at checkout.” But without these compromises, there was simply too much work to get done.
We somehow managed to spin up all the teams necessary to make the new PDP a reality, and get the work done. We relied on great engineers solving problems on their own rather than waiting for high level decisions or guidance. They didn’t fail us. In fact, they worked out a way to strip out all the HTML tags from the content system on the fly so that the existing content would work for both the new and old PDPs simultaneously.
Chalk another win up for great engineering!
We were now ready to enter Holiday 2012 with two major pieces of cloud browsing in place, Cloud Home Page and Cloud PDP.
Goto Part XXI
Part XIX of a multipart series. Start at the beginning with Part I.
As we eased into 2012 and we continued to build up our teams at TWLER.com, the IT leadership decided that they needed to get their integrators more information on the architecture and technology we were pursuing. Over the course of a few weeks, I proceeded to have 2+ hour meetings with all the major IT integrators, C* G*, I**, Acc*, T*T*, W*P* and a few others. I think they thought I didn’t understand the value brought by these integrators, and that talking to them about my architecture and direction for TWLER.com might somehow win me over. I proceeded to give my well-oiled architecture presentation to each of the integration teams. At this point I’d probably given this presentation a hundred times. The integrators had sent their top technical architects and salespeople to these meetings. They would have been better off without the salespeople waylaying the conversation, but the tech people tried to tone them down.
They were all very excited to hear about what we were building and eager to bring in consultants to help me build the systems. It was just a matter of signing off on the work orders and people would start showing up next week. That’s not how I wanted to operate; here’s how you manage integrators:
- If they wanted to be part of the project, they needed to send me their three best candidates
- I would interview each of the candidates, and if they passed the screening, they could start as individual contributors on one of my projects
- I expect that they would stay at least one year
- If they performed well, in a couple months we would interview more of their candidates
- If they continued to perform well, and their resources were not removed from our account, we would continue to ramp up individuals
- At this rate, you might have 20 people on the account in a year’s time
For some reason, this set of criteria did not sit well with the integrators or the candidates. If you understand the incentive system of the integrators, than constructing a hiring process to make your project unappealing is fairly easy. The lead sales people are only interested in placing consultants onsite. They are incentivized by the number of people placed and number of hours billed. Since I was only looking for top tech talent, the individuals that show up for the interviews are some of the best technical minds at their companies. Their job is to start up tech teams, pile on a hundred people in 3-6 months, and move on to the next company.
Probably half the candidates that were sent to interview were good enough to join the team and act as one of the coders. However, during the interview process I made it clear that I expected them to stay one year, and there would be no account building during that time. They were expected to code 100% of their billable hours. If they spent any time wandering the company looking for new projects, I would kick them out.
Given all those criteria, not a single consultant that came through was placed on our teams. The technical folks thought the project would be a great experience, but they were all in career building mode in their firms. My project would set them back a year against their peers, as they would only account for their own billable hours.
The sales people weren’t interested either; a ramp up of 20 people in a year is nothing for them and not worth their time.
After about two months of this charade, our IT team finally got angry. They decided to sit in on one of the interviews to determine why we either kept rejecting candidates, or the ones we liked declined to show up. I thought that was a great idea, and said they should get one of the integrators to send their best candidate to the interview.
So we scheduled an interview with T*T* later that week, I was sent a resume of a lead engineer that actually looked pretty good. When the day arrived and the interview was about to start, I met one of our IT Senior Directors at the interview room. The candidate was escorted in by the T*T* representative and we started introductions. It turned out the candidate wasn’t the person from the resume.
Well, that was a surprise and the TWLER Senior Director was ticked. But I said, give me this new person’s resume and let’s do the interview. So we all sat down with the new person they had sent in as their “best candidate” and started into the questions. This all turned out rather badly because the new candidate had no web background and rather limited technology skills. The interview consisted of repeated failures to answer junior level easy software engineering questions. Like, “what are the four basic SQL commands?” It was an unmitigated disaster driven home by my unmerciful continuation of questions well beyond the point where everyone knew the interview was over.
Afterwards, the TWLER Senior Director was actually abashed and said he’d pull back on the interview process with the integrators. Again, I thought I had won, but I continued to underestimate the persistence of the TWLER IT management team to instill amateur engineering hour on the TWLER.com project.
Goto Part XX
Part XVIII in a multipart series, to start at the beginning, goto Part I.
2012 was the year that I really began working for the VP of Operations at TWLER, Judy D. In 2011, the Chief Architect for TWLER decided to leave as the investment in the rewrite of TWLER.com was cut in half, see Part VII for more on that episode. Over the course of 2011, I became the main point of contact for the team, but didn’t completely take over till later that year.
To this point in my career, I had been leading many large Enterprise Java programs, mainly as a consultant, and had just finished an MBA. I had about 18 months of pure engineering management at United Health Group, having a team of about 25 writing online wellness programs. However, that management position didn’t end well, and I learned much and more about politics at large companies.
I did not have an extensive track record of management, but because I had spearheaded the funding effort and defined the vision for the future Ecommerce platform, I became the manager of the team and was promoted to Director. Much of the work we did is detailed here, or will be detailed here in the future, today the discussion is about having a great manager as coach.
I’m going to say, as a manager, I was very rough. During my time at UHG, I had four managers in 18 months, so I received very little direction or coaching. As the lead Architect and Engineer on many programs, I was only interested in software competency and delivering systems. If you couldn’t help me do that, you were dead to me. I was rightly accused of being blunt, straightforward and lacking empathy at this point in my career. I didn’t really care, I knew how to hire great engineers, build teams where everyone had fun while delivering more than asked, and being a total pain-in-the-ass to whoever was lucky enough to manage me.
But Judy decided to invest her time and started giving me feedback on how I was acting and how I was being perceived. It was the first time in my career that someone had explained perception being reality, not logic and data. In meetings with the VPs and Senior Directors in IT, I would get worked up because they knew how to push my buttons. Once you’re in attack mode, you’ve lost the room; everyone thinks that if you can’t manage yourself here, you must not be a good manager overall. Judy helped me understand how to deflect criticisms and attacks on my project’s direction, by acknowledging the speakers point, and then presenting why I thought our direction would work. Never directly saying the other person was wrong, just that our current direction was working.
Learning the managerial arts of controlling meetings, protecting my reputation, and letting others be right (even when they were wrong), helped my project thrive by giving me the tools to fend off the IT team without offending them or giving them ammunition to take back to their leaders.
Judy always took the high road, she didn’t care if others were being assholes, it wasn’t an excuse to be an asshole too. She would always let me know she’s heard about my latest escapades at meetings across the company. She no longer wanted to hear negative news; she only wanted to hear how I was helping people, nothing else.
This process was long; Judy would debrief me after meetings and point out exactly where I lost the room and when I went off on a rant. But I was willing to listen, and I corrected my mistakes. Over time, it was as if I had been in management my entire career. I had learned to manage my reactions, while also managing the outcomes of the meetings and the perception of my team and myself.
I can’t thank Judy enough for the coaching effort she put in to teach me to be a Director at TWLER, and later a VP for Fortune 100 companies.
Goto Part XIX
Part XVII of a multipart series, to start at the beginning, goto Part I.
In the ongoing development of TWLER.com (The World’s Largest Electronics Retailer), the actual process of building of the system was severely neglected over the years. The first thing that we established as a team back in 2010 was a new set of build infrastructure (Git, Jenkins, Artifactory, Chef) so that engineer’s had actual continuous integration tools to work with.
Aside: Continuous Delivery was just appearing at this time. We had Jez Humble, author of the seminal Continuous Delivery book, in to talk to us about his ideas in early 2011. This was a small gathering of about 15 engineers and architects who were interested in the concept that we eventually implemented at TWLER.com.
The biggest problem with writing software at TWLER was the lower environments. As we were breaking up the monolithic ATG application into distributed components, the lower environments were still geared towards single ATG application delivery. We had DEV, TEST, INT, STAGE, BREAKFIX, and PROD environments. STAGE and BREAKFIX were marginally close replicas of PROD, but it was really not possible to create true replica environments. The environments were not well segregated either, we found on numerous occasions PROD applications accessing services and data from non-PROD environments. In the overwhelming realm of things to fix, this wasn’t even a top priority.
We decided we had to replace the DEV and TEST environments. DEV was basically non-functional, it was one large server that all 1000+ engineers were trying to use to integrate their code. Builds constantly failed, even when they succeeded the results were often unusable. It could take you all day to just try and get one change built and hope the results were something you could work with. Most people just skipped DEV and went right to TEST, which was an actual managed environment.
We decided that a lower environment PaaS was the direction we wanted to go. We wanted every engineer at TWLER to be able to spin up every application at TWLER in their own private sandbox. That way engineers would have to ability to test their systems in isolation, while still using the latest component versions of everyone else’s system.
To achieve that we decided to build an OpenStack environment and a homegrown PaaS. But how to get the capital to create the OpenStack environment? At the time, mid-2011, I had excess dollars assigned as resource capital, it was meant to only be used on hiring engineers. In my projects at the time, we were still proving out the chosen architecture and were ending the year with excess resource capital. At the time, TWLER was extremely controlling of capital, you either had resource capital, or hardware investment capital, but not both.
So I went on a reconnaissance mission to find someone with investment capital that needed resources to see if we could trade. After a couple weeks of hunting, and getting the financial managers of TWLER involved, we finally found a team that was willing to trade their investment capital for resources. We swapped $150k and used that money to buy three OpenCompute racks to build out our OpenStack environment. We estimated with high density OpenCompute, we could house about 5000 VMs on OpenStack. If we stuck with the company standard hardware, HP, we would only be able to build out an environment that housed 1000 VMs with those dollars.
We knew that if we were successful, we would want to expand the OpenStack environment quickly. We decided on full automation from bare metal to operating OpenStack and proceeded down the path using Crowbar and Chef to build out the roll-in racks to OpenStack. At the same time we started building out the PaaS framework, Omnitank, and getting teams on-board to truly fully automate their deployments so their components could be included in the PaaS. This adventure was documented at OpenStack summit as one of the keynotes in front of 2000+ attendees.
This PaaS was so successful, we were able to shutdown our DEV and TEST environments. We also allowed engineers from around the company to use the OpenStack environment. We actually had hundreds of forward thinking engineers from the enterprise side of TWLER using our OpenStack PaaS as it was the fastest way to get a test environment created.
It was so successful that when we had our first major outage and lost all environments, there were numerous enterprise teams that were down for weeks rebuilding their environments. We used this as a lesson to learn which teams actually automated their infrastructure, and which teams were polishing virtual chrome. But we did have to explain to IT VPs that this whole environment was still in beta, and it was use at your own risk.
Goto Part XVIII
Part XVI of a multipart series, to start at the beginning, goto Part I.
As much as I love disparaging ATG, it was actually built as an all-inclusive platform, which, at the time, was a modern way to build systems. ATG had a built in Inversion of Control framework prior to Spring. It had a set of tag libraries and did its best to help teams build ecommerce sites quickly in a J2EE application server manner. It just wasn’t ever meant to scale to level we needed in 2012, the architecture was wrong. Everything you did trigged calls to the single database instance, there was no caching so a promotion calculation would be run every time a customer wanted to see the promotional price. With two million concurrent users, all interested in the latest promotions, this is a disaster. Any problem in any part of the system brought down the whole application. We needed to evolve from monolith to distributed.
Similarly, by 2012, deploying massive EAR files had been a dead architecture choice for at least five years. Tying everything together into one massive deployment was the opposite direction of the Agile/DevOps movement springing up with the Continuous Deployment mania.
But this was the legacy we faced at TWLER (The World’s Largest Electronics Retailer). We were ten years into unchecked ATG development with thirteen applications deployed on the platform, all rolled up into that 2GB EAR. You can only imagine the quarterly deployment nightmare, literally a waking nightmare as all deployments were overnight affairs, but that’s the fodder for another post.
We dedicated a team to figuring out how to deconstruct our ATG implementation and code. The codebase was series of interconnected, circularly coupled, packages. There were five or ten main packages that were used by all the applications so that a change to one package, would inevitably break 2-3 applications. Thus, an eight week QA cycle was necessary to ensure all the integrated code from 1000s of engineers would result in thirteen working applications.
But if you can’t throw it out, and you can’t break functionality, what do you do?
The first course of action was to modernize the build. It wasn’t actually possible to build the ATG application on a developer machine. The only way to build the application was by kicking off a build in the developer integration environment. Thus, if you were an engineer working on ATG, you could make a bunch of changes to your code, check them in to the CVS source control system, and kick off a build. Imagine 1000 engineers all trying to do this at the same time.
The developer integration environment was overwhelmed by the number of build requests so instead of on-demand, it would schedule 2-3 builds per day. So 1000 engineers trying to get their code in for 30 different projects to build together, guess the result of the build? If you guessed broken, than good job! You’ve worked in an enterprise environment in the mid-2000s!
To solve all these problems wasn’t easy, but the playbook is straightforward:
- We had to automate the build so anyone could run it.
- We had to separate the applications so you could just build the one you were working on.
- We had to come up with a solution for the developer laptop, so you could build and run the ATG server locally.
- We had to decide what portions of the ATG code to invest in and what to throw out.
- We had to break the dependencies between the ATG packages so developers could work on things in isolation.
- We had to automate the changes in the database so a local DB could be built and modified on the developer’s machine.
- We had to fix the integration environment so if wasn’t constantly failing.
Many might think this was a throwaway investment and not worth the time or effort. We faced those challenges internally to TWLER as well. But, we were able to convince leadership that in order to move quickly in the future, we needed to move faster now in our legacy architecture. We would be using portions of ATG for the next three years at least, and we wouldn’t be able to complete our new work without augmenting the existing systems.
Looking back on it, I’m happy to report that the decision to invest in ATG as we dismantled it, was a good one.
Goto Part XVII
Part XV in a multipart story. To start at the beginning goto Part I.
Our goals for 2012 were to deliver two things, a new browse architecture for TWLER.com and a Holiday without issues. Not a small task for a team saddled down with a giant monolithic application and a mandate to deliver features and not affect the business during the rewrite.
We started in on the new browse architecture. It was our feeling that if we could simply scale up the browse capabilities of TWLER.com, we had a chance of making our aging ATG monolithic application survive another Holiday.
About 98% of traffic on an ecommerce system is people browsing the site, the other 2% is people actually trying to buy stuff. If you are planning for 10X increase in traffic for one week out of the year, than attacking the 98% seemed like a good place to start. Also, since we were dealing with a monolith, any traffic we removed from the 150 node ATG cluster was more power towards the checkout process. In fact, as we did the math, if we took off 70% of the traffic in the first year at Holiday, we’d actually have 3X the capacity we needed for the checkout process and remaining components left on the ATG servers.
We started with a project we called Cloud Home Page. The home page is the most served page on the site and at Holiday we had to make it static and cache it at the CDN. The business teams didn’t like this because, without dynamic content, there wasn’t any way to adjust what people were seeing as they landed on TWLER.com and lead them towards the new sales events. The Cloud Home Page plan was for a dynamic, cloud-based home page, with minimal personalization but modifiable within a 15 minute window.
That is, during a request for TWLER.com, the controller would make only a single request to the backend for all the data it needed to build the page. The request was handled by a service aggregator that would then manage the 20-50 service requests for data and build a JSON response in the specified SLA of less than one second (for the home page). By specifying a single request, we could let the service aggregator determine what calls to cut-off and what data to serve from cache to meet the page SLA. We also had a single point where we could add or remove functionality depending on the load. If the load was light, we could add a few more personalization services such as recommendations. If we were at peak loads, we could turn off all personalization and even some of the dynamic page elements to lessen the load on our servers. We created a highly dynamic Home Page that was tuneable given the system load.
However, to build just the home page we needed many teams to work together and deliver simultaneously. We had a new front-end team designing the Backbone based page structure and components. We had a team building a front end controller framework to drive the UX. Another team was building the service aggregation and distributed service management layer (sadly this was right before Hystrix became available). A team was building up a new distributed caching layer to gather together all the data needed to drive the home page. We continued the team that was extracting the item catalog from ATG and distributing it across our datacenters and AWS. And we imagined a team building a new customer management system we called Customer Graph, although we only had an architect working on that project. To ease the gridlock in our lower environments, we kicked off an OpenStack based lower environment PaaS just because we didn’t have enough other things going on. Topping it all off, our digital operations teams was learning how to manage systems in the AWS cloud.
We did have one final team, their job was to Mavenize the ATG build. This might sound trivial but it was the most harrowing undertaking of them all. We felt we needed to try to modernize the ATG deployment process to allow teams to move faster as we estimated it would take at least three years to exit the ATG platform. To make that palatable, we had to automate the entire ATG deployment process. Given that the current build process was a 10,000 line recursive ANT script, we put some of our most masochistic personalities on it. Besides automating ATG, they had to separate the thirteen intertwined applications that were deployed together as an ear file. A 2GB ear file.
Goto Part XVI
Part XIV of a multipart story, to start at the beginning goto Part 1.
Let’s get back to the main story. It was now December of 2011, we had $13M to start rebuilding TWLER.com. Once you get a pile of money at TWLER, the large IT Integrators come out of the woodwork, aided and abetted by the TWLER IT leadership who had generally recently transferred from a large IT Integrator who’s name begins with A. But we weren’t playing complete favorites, we also had our choice of IT Integrators that started with I, or had initials like C.G. We also had our two favorite India based consultancies T.T and W.P.
The point was, that once someone had a pile of capital to spend, they had to spend it on one, or better two of the IT Integrators. Why? This was explained to me many times:
- We have enterprise contracts with the integrators
- These contracts guaranteed a certain amount of spend with the integrators
- I had money to spend, thus it needed to go to an Integrator so we had enough dollars spread around to meet those guarantees
- By having multiple choices for integrators, we could get “the best value” for TWLER
- The Integrators were under contract to produce a team of up to 100 developers within a week
- If they didn’t do that, we could fine them for not meeting the contract
- We didn’t care if we got shitty engineers
Well, I made up number 7, but it was the unspoken truth.
If the first six reasons didn’t boil your blood, number 7 was more than I could take. I pushed back, said I wasn’t interested to the Senior Director and VPs in IT, and proceeded to start building my own teams. I thought that was that, but it turns out the effort to force bad engineering was stronger than I realized.
We put together an engineer interview for ourselves that covered ability to code, Agile mindset, open source knowledge, and web architecture (which at the time was still generally RDBMS driven with Java EE Servlets on Tomcat). The interview had many open ended questions, forced people to think, forced them to code, and let us know whether they understood Agile software practices. We also ruthlessly let go of engineers that slipped through the interview but proved they couldn’t code in the first few weeks on the job. We were tough interviewers and might accept one out of eight interviewees. I’ll never forget how Michael N. described an interview he sat in with me during the previous year: “It felt like you were conducting a cross examination of an unfriendly witness.” I will admit, my interview style at the time lacked charm, but it did find engineering depth.
The interviewing process takes time and tons of energy. In mid-December of 2011, I personally interviewed 40 people in one week. That sounds crazy but we were trying to ramp up from a team of around 20, to a team of 100 in the next six months. I also knew I had to establish my teams, get them working, and start to show progress to fend off the IT vultures.
Back to those vultures, they did not go quietly. At least 2-3 times a week, I ended up in a meeting with Senior Directors or VPs of IT, asking me why I hadn’t picked a vendor yet, and what was my project doing? At the time I was a Senior Manger, 2-3 levels down the hierarchy. Luckily for me, hierarchies have never meant much to me, competency was all that drove me. Having spent years as a consultant, I was used to pushing back on upper management, eventually getting my way, and making everyone happy by delivering systems. I figured this would be no different. But the TWLER IT team was tenacious as well, I don’t think anyone had ever defied them so openly, and as they liked to remind me every day, I had the largest capital project running in the company! They couldn’t possibly be left out of that one!
The team was shielded from these intimidation tactics being used by IT, and they went on their merry way hiring as many great engineers as we could find. I pulled from my personal network, and once we had good people in, we leveraged theirs as well. By the end of January 2012, we had a team of 40 going. Not like getting a 100 people in one week, but adding 20 in a month and change (doubling the size of the team) was putting strain on our ability to operate.
I needed to start up 10 teams to get all the pieces in place for rebuilding the Home Page, PDP Page, and getting the base platform components and infrastructure in place. TWLER already had a UI team that was still separate from us at this point, and still working on ATG based JSPs. We had two in place from the previous year, and each of the five remaining architects would take on one or two teams.
It was going into February 2012, and we were still only at half our desired staffing levels. The IT teams had not stopped pressuring me to bring on an integrator, and even my VP of Operations was losing patience with continually having to placate the VPs of IT.
Goto Part XV
Part XIII of a multipart series, to start at the beginning goto Part I.
If you’re keeping track at Part XII, we’ve gone through half of 2010 and all of 2011. It’s currently the last day of 2017, the temperature is -10F in Minneapolis going down to something around -20F later tonight. Whatever plans we might have come up with to celebrate the New Year got shelved for the option of staying home and being warm. Instead, let’s recap the story so far…
If you remember, I came to TWLER (The World’s Largest Electronics Retailer) in mid-2010 ostensibly to join a team rewriting TWLER.com. I took a role as a Hadoop Architect because for the previous two years I had been trying to start a Big Data consulting practice within the consulting company I was working for at the time. It was a few years too early for Big Data consulting so I jumped at the chance to put into practice everything I had been discussing with companies around town. But, that didn’t happen because the Hadoop project was killed before I even started. Instead I worked on Chef and learned Infrastructure as Code in the AWS cloud while establishing an automated testing team using JBehave to try and put in place functional testing of TWLER.com before making massive changes to the codebase.
While everyone knew TWLER.com was failing and needed help, there was no clear path and no engineering team that owned the site. It had been operated as a dumping ground for external integrator’s code for so long that everyone and no one claimed ownership. A separate team had sprung up to write the mobile site and their express purpose was to obsolete TWLER.com. The Enterprise IT team claimed ownership but the best they could do was point to all the developers from Accenture and WiPro working on various projects.
In these conditions, out team of seven architects began to claim ownership by asserting that, as employees of TWLER, we were responsible for the build and operation of the site. There was little resistance at first, no one wanted TWLER.com in 2010, it was seen as a sideline business, a necessary evil that distracted from the true engine of the company, big box stores.
However, the journey was not destined to be easy or straight, and the architect that hired us all left in March of 2011 as the funding for our rewrite was cut in half. It was in this limbo state that gained the support of the architecture team and we collaborated to put forth a plan to rewrite TWLER.com. I began selling the plan across TWLER and gained a small amount of interest. But when the CEO declared we would double our ecommerce revenue in three years, my plan took off, because, it was the only plan available.
As we start 2018, I’ll harken back to 2012 and the twists and turns required to put in place the first layered cloud distributed service oriented ecommerce platform built by a major retailer.
Goto Part XIV