The Chief Architect’s Purpose

A few years ago I was attending the O’Reilly Software Architecture conference in London where I had prepared a talk on Target’s new vision for the future, a Target Retail Platform. After the conference I had a couple days of vacation and I stopped by an art exhibit at the Saatchi Gallery on Alexander Calder. It was there that I learned that Calder defined the Chief Architect’s purpose better than anyone.

The Calder quote at the top says: “I think I am a realist, because I make what I see. It’s only the problem of seeing it. If you can imagine a thing, then you can make it and ‘tout de suite you’re a realist. The universe is real, but you can’t see it. You have to imagine it. Once you imagine it, you can be realistic about producing it.

This quote struck me as I had spent the last few days thinking about architecture, and how the architecture conference I’d been attending rarely talked about architecture. I had attempted to create a presentation on Target’s architecture vision, nothing about how we would build it or the technologies used, but simply the vision, principles and structure that would guide a 3000 engineer strong team to leave its current reality and build a new one, a retail platform. I would guess that 80% of the presentations at the conference were on microservices. It was 2017 and all I could think was architects are behind the curve if this is their first introductions to learning about microservices. However, the microservices presentations were far better attended than my own, so, joke was on me.

I loved this quote because I often used the term Architecture Realist when people would ask me about my architecture style. I was only interested in creating architectures that would and could be implemented. I had found that most Chief Architects rarely present anything that engineering teams can actually use to guide their development, and instead fall back to edicts on technologies.

But what really struck me was the simple definition of Calder’s artistic approach, which coalesced for me into the Chief Architect’s purpose.

“The universe is real, but you can’t see it. You have to imagine it.”

This is it, this is the purpose of being a Chief Architect. It’s your role to take yourself out of your current reality and imagine a new one.

“Once you imagine it, you can be realistic about producing it.”

Unlike Calder who could then go about executing his vision via a painting or mobile, producing what you imagine within a large enterprise is not a straightforward task.

The presentation I had created, “Platform Architecture for Omnichannel Retail” was my attempt to convey what I had imagined for Target. Communicating to 3000 individuals to follow a vision is the Chief Architect’s job, not its purpose. I prepared the presentation for O’Reilly to force myself to document the vision in a way that could be consumed by a technical workforce.

Over the ensuing three years I’ve given some version of this presentation hundreds of times to small and large groups within Target. In the real world, producing an architecture means understanding and engagement from teams, which is best done in small groups where people feel comfortable asking hard questions.

What you’ll find is you don’t always have the right answers, your reality, which you thought was perfect, was half baked. But if your principles were sound, working through the hard questions to answers that follow the principles of the architecture builds the patterns and practices necessary to produce the new reality.

For Target in 2020, the new reality is here, we have a Retail Platform.

The Simple Formula to IT Modernization

Having instigated and led through the digital transformation of BestBuy.com, and then moving to Target where we have been completely modernizing IT across the entire company, I’ve learned the simple formula to IT success.  While the formula is simple, the execution is what is inherently difficult.

In my first year at Best Buy, I spent months pitching how we would rebuild BestBuy.com and move it to the cloud back in 2010 and 2011.  At the time, this was a radical idea as clouds were new and unstable, and were considered highly insecure by corporate IT leaders.  The pitch was centered on the standard IT formula of People, Process and Technology.  This formula is good, but it also has a big miss, there is no emphasis on outcomes.  The pitch worked and I was granted $13M to start the rebuild of BestBuy.com, a project that took four years and well over $100M to complete.

While we continued to use the People, Process and Technology formula in our decks and communication to leadership, I managed my team of 200+ engineers to outcomes.  We had to deliver a flexible, maintainable, layered cloud ecommerce platform that scaled infinitely or we were failures.  We implemented Product Management and Agile and morphed into a Product Engineering team that brought BestBuy.com into the modern world and did our part in the overall turnaround of the company.

For more on that see my 22 part series on the Digital Transformation of BestBuy.com.

The People, Process, Technology formula was great for selling to VPs and EVPs, but the dissonance between the sales pitch and the implementation kept me wondering about a better way.  Then I moved to Target which helped me understand that I had found a better way, I just didn’t have the name for it yet.

At Target, our CIO is the most architecture centric CIO that’s ever existed.  Most CIOs pay lip service to architecture, but then hand it off to an Enterprise Architecture team and say “go implement architecture.”  But when the architects try, they are constantly overruled or ignored because all good architecture decisions require tradeoffs in feature delivery in the short term.  Without an overarching vision, no business or IT leaders will make feature tradeoffs.

With little support or understanding from the CIO, the various IT VPs are free to flout architecture rules or governance, and therefore go off and implement their locally optimized solutions.  This is the core cause of IT inconsistency and sprawl, and why every SOA ever designed failed at enterprise scale.

At Target, we’ve used a different formula consistently for the last four years:

  • Architecture First
  • Team Second
  • Value Third

How does this compare to People, Process and Technology, let’s analyze it.  

People and Team sound the same but they have different inferences.  People is generic and generally boils down to something about only hiring A players, and B players hire C players because they are insecure.  This assumes you already somehow have a bunch of A players, and that you are an A player too.  This is obviously ridiculous.

Team, however, is about getting people to work together of all types and capabilities.  It’s about maximizing talent by the group coming together and creating something more than its parts.  At Target we’ve strived to create a learning culture where the most granular breakdown of the organization is the team rather than the individual.  Inside the team, there are ranges of capabilities, often based on experience, the environment encourages helping each other through pairing or mobbing, increasing everyone’s capabilities.

Process isn’t even part of the new formula though it is important.  Process actually gets rolled up into Value.  Value is what you are striving for, it’s the outcome of working on features and technology.  But if a feature doesn’t resonate with the customer, no actual value is delivered, although we have learned something that didn’t work.  Value, in the end, is how the customer perceives it, and how you measure it.  Delivery of value uses a process, in Target’s case Product and Agile.  Making Value measurable is the hard part.  Saying you delivered value by adding a new payment type is great, but measuring the impact in incremental sales through an experiment which tests whether a new payment type actually increases sales is better.

Technology and Architecture are often tied together, but the reality is Architecture is technology independent.  Technologies are tools, or, as architects like to say, implementation details.  Architecture is the vision, strategy and principles underlying and overlaying how every system is built and how it fits into the larger picture.  Getting the architecture right gives every engineering team a place to fit their work into how Target’s guests benefit.  Too often, engineering teams have no idea how they fit into the enterprise, so they make choices and build solely to please themselves and their sponsors.  But if the team understands how they benefit the company, they have a higher calling and are willing to make architecture tradeoffs.  

Getting the architecture right allows the company to achieve both known and unknown outcomes.  If we learned anything from the last two months of COVID lockdowns, a good architecture allows you to flex, scale and build new capabilities overnight.  It allows you to withstand an instant 30% channel shift from store customer to online customer.  

Architecture, Team, Value is the simple formula to IT modernization.  Just look at the recent outcomes.  

A Digital Ecommerce Transformation – 2012, The First Cloud Holiday – Part XXII

This will be the last entry in this series, at least for the foreseeable future.   Writing all this out has finally put this era of life firmly in the past, even though there were three more years of work to deliver an infinitely scalable cloud based ecommerce system. This really only scratched the surface of the amount of work and amazing team that delivered mobile and browser based ecommerce to TWLER during a time of incredible hardship for the company. New CEOs, falling revenue, and insanely tough competition left many to think TWLER was going to wind up as another retailer on the trash heap of history. But steady 30% growth online and an ecommerce platform that allowed for intra-day changes to the system gave the business teams the tools they needed to drive growth and make up for lost revenue in the stores. If you look at the revenue numbers for 2012-2014, store revenue declined, online increased and it all more or less evened out. Now onto the story.

There’s a running joke at TWLER that this Holiday is the most important Holiday ever! It’s funny because it actually is true, if you don’t deliver at Holiday, you go into oblivion like all the other failed electronics retailers like Circuit City and HH Gregg.

We went into early November scorching hot, cloud home page ramped up through the summer to take 100% of traffic, and cloud PDPs were still ramping up as we went into November. We had adopted the strategy of shunting small amounts of traffic to our new systems, than turning up the dial with global load balancing as we learned how to operate them. Remember, everything was new. We were operating in AWS for the first time, we had built all new UIs, controllers, caches, data services, and content management systems from scratch in about 10 months. We were also beating the hell out of Akamai as we load tested our systems and determined what we would cache in the CDN. Given how quickly inventory and pricing could change, we had to choose carefully what was cached in Akamai.

We were running three load tests a week trying to get the entire system up to a point that was 50% higher in traffic and transactions from the previous year. We figured 50% was a fairly safe bet considering historical traffic trends showed that we would likely only get a 20% traffic increase. The only place to truly stress test for scale was in production, so we would run the tests starting after midnight and ramp them up to full scale by 3AM. Obviously that meant a lot of late nights for the testing teams and many of the development teams as problems were uncovered at higher and higher scale.

The second week in November, the marketing team decided that the current pricing signage wasn’t good enough, it was going to be a highly promotional holiday and we needed everyone to know our pricing was as low as anyone’s. The team wanted to change all price messaging to “Guaranteed lowest price” from something else, with info on how we would price match any major competitor. The UI team estimated in the old system, it would take 3 months to make those changes; in the new cloud home page and PDPs, it took us two hours. Most of it was testing. This more than anything else, guaranteed that we would go into holiday with the cloud site taking full load even though business teams still weren’t completely bought in to the new world.

By the week of Thanksgiving, we were seeing good load tests and meeting our estimated peak loads for the Black Friday sale, which started at 2AM on Thanksgiving morning. Everyone knew when the sale was supposed to drop, and most shoppers were willing to wait up until 2AM central time, buy the limited inventory items, and then go to bed.

As 2AM rolled around, we felt like we were ready, we estimated we would take 60% of the load in AWS for home page and PDPs, and the remaining load for search and checkout would go to the ATG clusters in the datacenter. Caches were warmed, systems were scaled out to meet the load, and teams were in place to monitor everything that happened.

When the sale dropped, traffic ramped up instantly and kept rising minute over minute. We blew past 50% increase in traffic that was estimated and approached 150% increase in peak traffic. We figured out that the old systems simply couldn’t scale to demand, so the peak traffic was likely depressed and spread out over more time when the system throttled, but with systems that elastically scaled, the peak traffic just kept growing. This was ok in the cloud systems, but we were approaching scale way beyond expected in the commerce back end. As ATG servers heated up and the database was being pounded, we were minutes away from throttling traffic when the peak subsided and we had weathered the first storm.

Over the course of Thanksgiving, Black Friday and Cyber Monday, TWLER had its most successful online holiday ever, as competitor ecommerce sites melted down all around us. Virtually every retailer was down at some point over that time period, but we survived. Lots of things failed, including tax and inventory systems, but tax can be estimated and fixed before shipping, and you can guess at inventory for some amount of time until you run into trouble. While we remained up, the days and nights were spent fighting scaling issues, database issues, back end services issues, and network issues. When Cyber Monday ended successfully, the team felt like it had been fighting fires for five days straight.

The post mortem on the new cloud based systems was that it saved TWLER from catastrophe. Previously skeptical business and tech team members saw the front ends take massive loads and scale up accordingly. All the problems occurred in the legacy systems and enterprise services. It wasn’t flawless, but it showed that the investment was worth it and the direction was correct. It was my third holiday, and certainly the most exciting one ever. The architecture that I envisioned was starting to come to life, the teams we created banded together and bonded over intense problem solving, and the cloud future of TWLER.com was cemented with a successful holiday showing.

The next three years continued the evolution from ATG to cloud distributed architecture. While there were some rocky times and numerous outages during holidays, we survived them all and ended up at a fully automated future. Holiday’s became boring, and the only fun was watching daily revenue increase and pass $200M in a single day. Otherwise we just hung out and watched the metrics roll in, the mantra became “bring it on!” as we wished for higher traffic to truly test our systems.

And with that boredom, came the desire for a new challenge, which eventually took me away from TWLER in 2016 to take on restructuring the architecture for an entire company. That story may appear here someday, but it is still in progress.

Thanks to everyone that worked on TWLER.com from 2010-2015, it was truly a journey worth taking.

Commerce Tomorrow Podcast

I recently had the opportunity to sit down with Kelly Goetsch and Dirk Hoerig to record an interview on Target’s engineering culture as part of their Commerce Tomorrow podcast.  Kelly was in town for the Open Source North conference which we were both speaking at.  We sat down in the Target recording studio to tape the show.  I’ve been using the studio to create an internal podcast for Target Engineering so we had an audio engineer and figured out how to include Dirk from Germany.

Click here to hear the podcast.

A Digital Ecommerce Transformation – 2012 Organizational Diversions – Part XXI

Part XXI of a multipart series.  To start at the beginning goto Part I.

This is the last entry before we actually get to Holiday 2012. But before we get there I would be remiss if I didn’t document the organizational changes that happened over the course of 2012, and how we fought to keep our project intact. If you go back to 2012 and take a look at the TWLER stock price, you’ll see that bad things were afoot. The stock price dropped from somewhere around $25 to under $12 in the first few months of the year. Things needed a shakeup badly.

The CEO of TWLER in early 2012 had been brought up in the stores context and largely ignored the digital channel. But pressure to bring in some youthful talent must have been high as in March of 2012 we learned we had a new President of Digital we’ll call Esteban Schick, or ES from now on. ES had the digital world and took over IT and became the acting CIO as the current CIO was forced out. That did mean that we all had a new EVP over digital and all investments would be reviewed.

But, before that could happen, more wacky changes occurred. Literally the next month, April 2012, the TWLER CEO was forced to resign due to inappropriate conduct that you’ll have to figure out for yourself. Additionally, the CMO left a few months later. With an interim CEO, ES was granted marketing and effectively was in charge of half the company after being with TWLER for only a few months.

Well, this kept ES quite busy so the review of all investments in IT and Digital went quite slowly. It wasn’t until mid-summer of 2012 that we finally got in front of ES to review the first $13M investment to rewrite TWLER.com.   We brought our architecture team and business leads to this meeting with the necessary deck and strategic direction. As we introduced ourselves, we went around the table and seven people introduced themselves as various architects for TWLER.com. ES was not impressed, and after we finished introducing ourselves, his first statement was “at EstrellaDinero (psuedonym), we had one architect, everyone else was engineers.”

In 2018, the fact that a large company no longer employed architects is not a shocking statement, much of the market is going this direction. In 2012 in the Midwest, this was devastating to my pride. I had spent my career building up to Chief Architect at a Fortune 50. Now, architecture was being disparaged by our new tech and digital leader!

I hid my feelings well and started in on our presentation. We always started with how we operated, rather than what we were building. We wanted to ensure we distinguished ourselves from the rest of TWLER by outlining our Agile/DevOps/Co-located/Small team strategy. No one else at TWLER was operating at scale in Agile, we had built up a team of close to 100 engineers, all on-site, working on 13 different teams. I’d hazard to say it was one of the largest all-Agile teams in the Twin Cities at the time.

As we worked through the presentation, ES started to appear more comfortable and asked a lot of questions, probing our understanding of Agile and Engineering practices. After the first 45 minutes of a schedule hour, he was now energetic and highly engaged. His comment was, “this is the first team I’ve met that understands software engineering.”

Now ES was an EVP, and EVPs have tight schedules and people wait months to meet with them. But at this point, ES called his admin and rescheduled to stay another hour with us. I knew we had won him over at this point and dove into the technology strategy to rewrite TWLER.com.

How we built TWLER.com is the subject of most of this blog, but at a high level we were building a service-oriented (no microservices yet!) distributed cloud-based auto-scaling multi-layered cloud and datacenter zero downtime ecommerce platform.

ES was impressed with our team, said keep up the good work, and basically left us on our own for the rest of his tenure. Which lasted until December 2012.

There were two other notable changes prior to Holiday of 2012. TWLER hired a new CEO we’ll just call HJ, and ES hired a President of Digital to take over that part of the company for him, ScottyD.

With a new CEO and a new Digital leader, we ended up having two more similar investment reviews in 2012. Remember, this was the first year of completely converting TWLER.com to a new ecommerce platform architecture. It would have been an easy time for a new CEO or CDO to make their mark on the company by bringing in some “real experts” to take over the dotcom rebuild. In fact, the rumors were that the leadership team had multiple talks with large outside vendors to determine how they could take over TWLER.com. I’m not privy to what actually occurred in those talks and meetings, but in our time reviewing the project with the CEO and CDO, we did impress them with our direction and strategy.

To recap, from November of 2011 to December of 2012, I sold three EVPs and a CEO, in four separate meetings, on the fact that we were the best team possible to rebuild the fourth highest scaling ecommerce site in North America, with revenue approaching $2B.  At the end of 2012 we were still there, so it must have worked.

Goto Part XXII

A Digital Ecommerce Transformation – Cloud PDP – Part XX

Part XX of a multipart series, goto the beginning with Part I.

The next phase of our move to the cloud in 2012 was to build the Cloud Product Detail Page (PDP). The detail pages are where you land when you select a product from a list or advertisement. It generally has the price, product description, reviews, questions and the all important Add to Cart button.

Cloud Home Page was the first foray into moving TWLER.com to the cloud, but we needed to move more parts of the site to the cloud to reach the scale we were looking for during Holiday. To achieve the goals of our architecture, we were attempting to move the 90+% of site traffic that is people landing on the home page, searching for products and viewing product details. In an eCommerce site, the majority of traffic is simply looking at products and learning about them, only a small portion of traffic makes it to the checkout process.

The Home Page was a fairly straightforward task since we were redesigning the whole page as we rebuilt it. The PDP was much more difficult. First, there were eleven PDPs which all spawned from a single PDP five or ten years ago. Second, the content in the content management system contained custom HTML embedded within the content making it extremely difficult to reuse the existing content. But with 500,000 items in the content system, it wasn’t possible to rework all the content while building out the new PDPs. Besides, the way we were building out the system, the business wanted the ability to fall back on the original PDPs in case the new ones failed. We hadn’t yet gained the trust of the business teams in our first year of rebuilding TWLER.com.

We decided an engineering solution was the only likely way to solve this problem of bridging the old content to the new system. We set some enterprising engineers on the problem; we simply tasked them with figuring out a way to make the existing content work with the new page templates.

We then set out to determine how many new PDP templates we could accomplish by Holiday of 2012. We decided to simply tackle the two highest trafficked PDP templates, computers and TVs. These two PDPs accounted for well over 50% of the traffic to the site. The schedule for the remaining PDP templates went until Holiday 2013.

But there was nothing easy about this task, we struggled with the business to define the new look of the PDPs, we needed to build a framework that allowed most of the PDP to be cached at the Content Delivery Network (CDN) so we could achieve 80-90% page offload to the CDN. We needed to push all that content out of the datacenter for the first time, and have it cached in the cloud. We needed to do this across multiple cloud availability zones. And we needed to meet our page construction SLAs of less than two seconds.

At the time, there were no other retailers pushing into clouds that we knew of, so we were designing the architecture as we went, working with the problems as they came. We tackled content replication via NoSQL databases in a hub and spoke model, only pushing the item data to the cloud and not allowing updates to that content. We built a massive service aggregation forward cache using Memcached that we used to cache the majority of service calls. Netflix Hystrix came out right at this time and we quickly switched from the homegrown version we had been making to the Netflix distribute service management framework. We had to add dynamic page construction to allow for a small amount of personalization. The problems of running a PDP out of a cloud were numerous and difficult.

The overarching Architecture principal throughout this whole effort was to serve all browse traffic completely out of the cloud. That meant getting all product information, pricing, promotions and inventory to the cloud. For this year, we settled on drawing inventory from the datacenter and only calculating promotions when people entered the checkout process. These required business compromises such as language in the cart that said “promotions will be applied at checkout.” But without these compromises, there was simply too much work to get done.

We somehow managed to spin up all the teams necessary to make the new PDP a reality, and get the work done. We relied on great engineers solving problems on their own rather than waiting for high level decisions or guidance. They didn’t fail us. In fact, they worked out a way to strip out all the HTML tags from the content system on the fly so that the existing content would work for both the new and old PDPs simultaneously.

Chalk another win up for great engineering!

We were now ready to enter Holiday 2012 with two major pieces of cloud browsing in place, Cloud Home Page and Cloud PDP.

Goto Part XXI

A Digital Ecommerce Transformation – Interview Everyone – Part XIX

Part XIX of a multipart series.  Start at the beginning with Part I.

As we eased into 2012 and we continued to build up our teams at TWLER.com, the IT leadership decided that they needed to get their integrators more information on the architecture and technology we were pursuing. We were rapidly hiring software engineers to staff up to 50 people over the next couple months, which meant we were sourcing engineers wherever we could find them. Over the course of a few weeks, I proceeded to have 2+ hour meetings with all the major IT integrators, C* G*, I**, A*, T*T*, W*P* and a few others. I gave my TWLER.com architecture presentation to each of the integration teams. At this point I’d probably given this presentation a hundred times. The integrators had sent their top technical architects and salespeople to these meetings. They would have been better off without the salespeople waylaying the conversation, but the tech people tried to tone them down.

Having worked with most of the major integrators in the past, I wanted only a few things: experienced engineers and continuity. That is, I didn’t want a whole team of fresh out of college engineers that I would have to train up on engineering, we just didn’t have time for extensive training. (A few years later we started an internship and college hiring program to give new graduates a great experience at TWLER.) Secondly, I wanted the engineers to stay long enough to be effective, like a year or two.

Probably a third of the candidates that were sent to interview were good enough to join the team. However, during the interview process I made it clear that I expected them to stay one year, and they were expected to code 100% of their billable hours.

Given these simple criteria, the individual consultants that came through self selected out of the project. The technical folks thought the project would be a great experience, but they were all in career building mode in their firms. My project would set them back a year against their peers, as they would only account for their own billable hours.

After about two months of interviews, our IT team finally got angry. They decided to sit in on one of the interviews to determine why we either kept rejecting candidates, or the ones we liked declined to show up. I thought that was a great idea, and said they should get one of the integrators to send their best candidate to the interview.

So we scheduled an interview with T*T* later that week, I was sent a resume of a lead engineer that looked good. When the day arrived and the interview was about to start, I met one of our IT Senior Directors at the interview room. The candidate was escorted in by the T*T* representative and we started introductions. It turned out the candidate wasn’t the person from the resume.

Well, that was a surprise and the TWLER Senior Director was ticked. But I said, give me this new person’s resume and let’s do the interview. So we all sat down with the new person they had sent in as their “best candidate” and started into the questions. This all turned out rather badly because the new candidate had no web background and rather limited technology skills. The interview consisted of repeated failures to answer junior level easy software engineering questions. Like, “what are the four basic SQL commands?” It was an unmitigated disaster.

Afterwards, I reiterated that this was why we interviewed everyone that applied, it wasn’t the normal practice, but if I’m paying a high hourly rate for an engineer, than I want to make sure they’re good at engineering.

Over the course of the next few months this continued to be a good practice. We had video interviews where the candidates refused to turn on their video feeds and you could hear someone else typing in the background and feeding the candidate answers. We had more instances of random people showing up for an interview whose resumes were never screened. We had good interviews where we wanted the candidate, and someone else showed up for the first day of work! When engineers are hard to find, all sorts of shenanigans begin to happen.

Goto Part XX

A Digital Ecommerce Transformation – Lucky to Have a Coach – Part XVIII

Part XVIII in a multipart series, to start at the beginning, goto Part I.

2012 was the year that I really began working for the VP of Operations at TWLER, Judy D. In 2011, the Chief Architect for TWLER decided to leave as the investment in the rewrite of TWLER.com was cut in half, see Part VII for more on that episode. Over the course of 2011, I became the main point of contact for the team, but didn’t completely take over till later that year.

To this point in my career, I had been leading many large Enterprise Java programs, mainly as a consultant, and had just finished an MBA. I had about 18 months of pure engineering management at United Health Group, having a team of about 25 writing online wellness programs. However, that management position didn’t end well, and I learned much and more about politics at large companies.

I did not have an extensive track record of management, but because I had spearheaded the funding effort and defined the vision for the future Ecommerce platform, I became the manager of the team and was promoted to Director. Much of the work we did is detailed here, or will be detailed here in the future, today the discussion is about having a great manager as coach.

I’m going to say, as a manager, I was very rough. During my time at UHG, I had four managers in 18 months, so I received very little direction or coaching. As the lead Architect and Engineer on many programs, I was only interested in software competency and delivering systems. If you couldn’t help me do that, you were dead to me. I was rightly accused of being blunt, straightforward and lacking empathy at this point in my career. I didn’t really care, I knew how to hire great engineers, build teams where everyone had fun while delivering more than asked, and being a total pain-in-the-ass to whoever was lucky enough to manage me.

But Judy decided to invest her time and started giving me feedback on how I was acting and how I was being perceived. It was the first time in my career that someone had explained perception being reality, not logic and data. In meetings with the VPs and Senior Directors in IT, I would get worked up because they knew how to push my buttons. Once you’re in attack mode, you’ve lost the room; everyone thinks that if you can’t manage yourself here, you must not be a good manager overall. Judy helped me understand how to deflect criticisms and attacks on my project’s direction, by acknowledging the speakers point, and then presenting why I thought our direction would work. Never directly saying the other person was wrong, just that our current direction was working.

Learning the managerial arts of controlling meetings, protecting my reputation, and letting others be right (even when they were wrong), helped my project thrive by giving me the tools to fend off the IT team without offending them or giving them ammunition to take back to their leaders.

Judy always took the high road, she didn’t care if others were being assholes, it wasn’t an excuse to be an asshole too. She would always let me know she’s heard about my latest escapades at meetings across the company. She no longer wanted to hear negative news; she only wanted to hear how I was helping people, nothing else.

This process was long; Judy would debrief me after meetings and point out exactly where I lost the room and when I went off on a rant. But I was willing to listen, and I corrected my mistakes. Over time, it was as if I had been in management my entire career. I had learned to manage my reactions, while also managing the outcomes of the meetings and the perception of my team and myself.

I can’t thank Judy enough for the coaching effort she put in to teach me to be a Director at TWLER, and later a VP for Fortune 100 companies.

Goto Part XIX

A Digital Ecommerce Transformation – Lower Environment PaaS – Part XVII

Part XVII of a multipart series, to start at the beginning, goto Part I.

In the ongoing development of TWLER.com (The World’s Largest Electronics Retailer), the actual process of building of the system was severely neglected over the years. The first thing that we established as a team back in 2010 was a new set of build infrastructure (Git, Jenkins, Artifactory, Chef) so that engineer’s had actual continuous integration tools to work with.

Aside: Continuous Delivery was just appearing at this time. We had Jez Humble, author of the seminal Continuous Delivery book, in to talk to us about his ideas in early 2011. This was a small gathering of about 15 engineers and architects who were interested in the concept that we eventually implemented at TWLER.com.

The biggest problem with writing software at TWLER was the lower environments. As we were breaking up the monolithic ATG application into distributed components, the lower environments were still geared towards single ATG application delivery. We had DEV, TEST, INT, STAGE, BREAKFIX, and PROD environments. STAGE and BREAKFIX were marginally close replicas of PROD, but it was really not possible to create true replica environments. The environments were not well segregated either, we found on numerous occasions PROD applications accessing services and data from non-PROD environments. In the overwhelming realm of things to fix, this wasn’t even a top priority.

We decided we had to replace the DEV and TEST environments. DEV was basically non-functional, it was one large server that all 1000+ engineers were trying to use to integrate their code. Builds constantly failed, even when they succeeded the results were often unusable. It could take you all day to just try and get one change built and hope the results were something you could work with. Most people just skipped DEV and went right to TEST, which was an actual managed environment.

We decided that a lower environment PaaS was the direction we wanted to go. We wanted every engineer at TWLER to be able to spin up every application at TWLER in their own private sandbox. That way engineers would have to ability to test their systems in isolation, while still using the latest component versions of everyone else’s system.

To achieve that we decided to build an OpenStack environment and a homegrown PaaS. But how to get the capital to create the OpenStack environment? At the time, mid-2011, I had excess dollars assigned as resource capital, it was meant to only be used on hiring engineers. In my projects at the time, we were still proving out the chosen architecture and were ending the year with excess resource capital. At the time, TWLER was extremely controlling of capital, you either had resource capital, or hardware investment capital, but not both.

So I went on a reconnaissance mission to find someone with investment capital that needed resources to see if we could trade. After a couple weeks of hunting, and getting the financial managers of TWLER involved, we finally found a team that was willing to trade their investment capital for resources. We swapped $150k and used that money to buy three OpenCompute racks to build out our OpenStack environment. We estimated with high density OpenCompute, we could house about 5000 VMs on OpenStack. If we stuck with the company standard hardware, HP, we would only be able to build out an environment that housed 1000 VMs with those dollars.

We knew that if we were successful, we would want to expand the OpenStack environment quickly. We decided on full automation from bare metal to operating OpenStack and proceeded down the path using Crowbar and Chef to build out the roll-in racks to OpenStack. At the same time we started building out the PaaS framework, Omnitank, and getting teams on-board to truly fully automate their deployments so their components could be included in the PaaS. This adventure was documented at OpenStack summit as one of the keynotes in front of 2000+ attendees.

This PaaS was so successful, we were able to shutdown our DEV and TEST environments. We also allowed engineers from around the company to use the OpenStack environment. We actually had hundreds of forward thinking engineers from the enterprise side of TWLER using our OpenStack PaaS as it was the fastest way to get a test environment created.

It was so successful that when we had our first major outage and lost all environments, there were numerous enterprise teams that were down for weeks rebuilding their environments. We used this as a lesson to learn which teams actually automated their infrastructure, and which teams were polishing virtual chrome. But we did have to explain to IT VPs that this whole environment was still in beta, and it was use at your own risk.

Goto Part XVIII