Category Archives: Software Development

A Digital Ecommerce Transformation – Making the New Mission: A Whole New Architecture – Part VIII

Part VIII – To start at the beginning goto Part I.

I’ll admit it, the deck I made was terrible, I’m not a master of Power Point, and the color scheme left a lot to be desired. I had crude animations showing how we would shift our monolithic application into the cloud, while retaining the customer data and checkout processes in the datacenter.

For about three weeks I worked mainly with another architect to take the many ideas we had discussed over the last year, and what we’d learned about operating in a cloud, and turn that into an architecture vision and implementation plan. We settled on three years to transform the ATG system to a distributed service oriented layered cloud architecture. The deck outlined the current issues with the ATG system, the future state architecture and how we would get there, and the cost of the first year of development.

My colleague urged me to begin presenting the deck to interested parties to get feedback and learn what resonated with the various digital teams. He was instrumental in networking across the organization and arranging meetings with Directors, Senior Directors and VPs in Digital and Business teams.

The first presentations did not go well, the business leaders didn’t get much from a highly technical deck with $13M of capital tied to it in the first year. Mostly the feedback was that we’ve heard this pitch multiple times over the last ten years, why should we believe you? They had a point, numerous consulting firms had been through with grand plans to rewrite TWLER.com. It had already been attempted twice, the last attempt a failed implementation of the Microsoft Commerce system that was relegated to powering the Canadian site and failing miserably even at that effort.

We regrouped and tried to determine what would make this a better presentation. We knew many of the core problems with the site and that the business teams had been unable to make changes in the homepage or product detail pages (PDPs) for years. There were a few decks kicking around that defined the UX driven future of TWLER.com that would never be implemented due to technology failure. We decided to modify the deck and highlight that in the first year we would transform the homepage and PDPs into a new architecture that would allow fast changes and high scale utilizing the CDN for more caching and isolating all calls to the cloud layer. In that way we would severely limit the number of calls making it back to the ATG commerce system running in the datacenter allowing it to scale by relegating it to the Cart and Checkout functions.

There wasn’t anything we could find that outlined a similar architecture so, as far as we knew, we were embarking on a bold new way to use clouds at scale.

GOTO Part IX

A Digital Ecommerce Transformation – Disaster Strikes, Twice – Part VII

Part VII – Start at the beginning with Part I

In late 2010 as budgets were being prepared for 2011, TWLER’s (The World’s Largest Electronics Retailer) fortunes were taking a nosedive. The stock price was steadily declining from the 40s to the 30s on its way to the 10s in 2011. In that environment, increasing the funding for the web architecture team to continue its revamp of TWLER.com must not have appeared as a viable project. The $7M in funding the VP Operations secured in 2010 was slashed to $3.5M for 2011. Disaster number one. It meant we had to cut back on projects and reduce headcount to meet that budget.

I was feeling restless with the limited ability to start new work and had found a CTO role at a local startup that I was considering. But even with the lowered funding, it was clear to me that TWLER.com had to be rewritten, and it was just a matter of time before someone got the funding to do it. Since there were three teams vying for that role, I felt that our team needed to take a larger effort towards securing the funding. I decided to pass on the CTO role in February of 2011 as I felt the team we had had the ability to accomplish the rewrite once the funding returned.

But, under the funding circumstances, the digital Chief Architect, Michael N., decided to pursue more interesting options as it was clear to him that TWLER wasn’t serious about investing in the TWLER.com rewrite. That bombshell was dropped on us in March of 2011 at the Mongolian Barbecue place nearby, which claimed the reputation for the exit announcement lunch after a few more exits were communicated at that restaurant. The team was devastated and left to consider our options. Most of us were prepared to leave. I was told later there was a deadpool started in the business side and I was on top of the list. However, I had just decided to rededicate towards the goal of the TWLER.com rewrite and, even though I was now angry I let the CTO opportunity slip by, I still knew this had to happen.

As the VP Operations was deciding whom to put in the role to manage the team, I knew I had to step up and lead. I hadn’t had much interaction with the VP and was sure she didn’t know my background in managing teams and delivering large projects. I had also just finished an MBA and was itching to use the new knowledge.

The VP began trying to find a replacement and began looking at recruiting a Chief Architect level dude from Accenture. While that architect was quite good if using the Accenture scale, we had worked with him enough to know he wasn’t cut out for the digital chief architect role and would have been a step down from any of us. We revolted en mass and made it clear we wouldn’t work for him.   That left the VP without a candidate and we all decided to work that way for a few months.

During that time, I slowly started making team decisions and pushing direction with the rest of the team, and everyone decided to follow. I had extensive experience taking over teams without direct authority as I did it at every consulting gig for the last 10 years; eventually the teams let me make the decisions. Additionally I made sure to get time with the VP and let her know what decisions the team was making to give her confidence we were making progress in this interim state.

But I knew the time had come to step up, it was time to put together a comprehensive plan to rewrite TWLER.com. I dropped all my current responsibilities and did what anyone at TWLER.com did when they were pursuing investment capital, made a deck.

GOTO Part VIII

A Digital Ecommerce Transformation – The Business Viewpoint in 2010 – Part V

Part V – Start at the beginning with Part I

When I arrived at TWLER (The Worlds Largest Electronics Retailer), it was clear that the digital business teams were sad, sad, sad, sad, sad, sad, sad, sad and very sad. After a couple weeks of reading through the ATG codebase I was also sad. Sad enough that I seriously considered searching for a new job because the code was such a mess. A massive mess. We were supposed to fix this?

The business teams were in charge of managing the site, adding new items, changing pricing, removing items from the site, fixing orders, making content, creating sale landing pages, emails, etc. Everything that keeps a large ecommerce site moving, usually referred to as site operations or business ops. The business teams had created a small shadow IT organization to try and maintain stability and make changes in the only way they could with IT controlling the ATG codebase. The slogans for the shadow IT teams were things like “Do more with less!” and “Any way to get it done!”   The only recourse they had was to make changes to the UI via Javascript and use a bypass of the deployment systems to post new Javascript files directly onto the production servers. Since this was the largest known ATG cluster at more than 400 servers, this procedure was fraught with danger. Appalling, yes, but if that’s the only way to get something done than it falls within creative license.

The process to start a new project went something like this:

  1. Write an RFP for a new thing such as adding a marketplace to the browse and commerce portions of the site.
  2. Seek bids from the three IT integrators that were approved by IT.
  3. Receive bids back with one IT integrator, we’ll call them A, as the project manager and the other two IT integrators vying for delivery.
  4. Bids start at $1M and only go up. For something like a marketplace, $27M was closer to the mark.
  5. Sign the contract to start the work.
  6. Within a week, 20 onshore coordinators and 100 offshore developers magically appear and start wreaking havoc on the shared codebase.
  7. 9-19 months later, severely over budget, something resembling a marketplace appears and is attempted to merge with the existing headstream, using a branch that started 9-19 months ago.
  8. Chaos ensues as every other project delivered between that time is broken and the IT integrator’s teams start fighting amongst themselves.
  9. After another two months, victory is declared, something buggy and barely working is deployed, the contract is finished and the 120 people disappear within a week.
  10. Bug fixes are now the responsibility of the shadow IT team mentioned above, to re-engage the IT integrators to fix all the problems they created needs a new RFP.
  11. Repeat this process until spirit is broken.

Surprisingly (sarcasm) the business teams were not very receptive to a new IT-like team coming in and telling them they were going to fix everything with Agile, DevOps, Cloud and really small engineering teams. As I was told numerous times when trying to engage the business to act as the SME for the Agile teams, “heard it before, new process, SOA architecture, will be able to work magic two years from now.” “Not buying it this time!”

There’s really only one solution to this problem (besides hiring a whole new business team) and that is to start delivering on your promises. That’s what we set out to do, but the environment made it unduly difficult for us.

GOTO Part VI

A Digital Ecommerce Transformation – First Architecture Forays – Part IV

Part IV – Start at the beginning with Part I

As the first team of architects to work for TWLER.com, we started mapping out the current situation and planning for the future. ATG was the base eCommerce engine and Oracle the RDBMS. Some attempt to provide enterprise services had been undertaken in the past and tax, inventory and payment had all been removed from the ATG codebase and were now called as enterprise services. There were a number of other integrations but these were the main services that caused issues in the digital world due to the disconnect in service levels that were present in the TWLER (The Worlds Largest Electronics Retailer) environment.

The basic architecture of where we started is below, I take no responsibility for the IT side of the architecture:

 

It’s worth mapping out the organizational structure to start to understand the additional frictions present in TWLER’s attempts to run a digital eCommerce system.

The Digital teams were separated out from the IT teams many years ago and had remained in that state.  Digital was viewed as unimportant since stores drove the revenue, Dotcom was a sideline play, even in 2010.  Digital was run like a business and software development was matrixed out to the IT team. The Digital team had taken over its own operations at some point because the IT teams were unable to support 24×7 operations in their model. There were dotted lines between the Digital VP Operations and the IT VP Digital Portfolio, as well as between the Digital Chief Architect and IT Senior Director Digital.

The Business VPs in the Digital team drove the business projects, with input from the wider enterprise business teams in marketing and merchandising. Between the Digital Business and IT was a Business Relationship Manager, who was supposed to translate the business asks into IT requirements. The IT requirements were then shipped off to an enterprise integrator to spin up a new team and deliver the projects. Your success varies.

In addition, internally to the Digital team was an innovation team that had spun up a completely separate Mobile site on ATG, and managed the Mobile Apps. This team was quite clear in their goals to replace all of Dotcom with their mobile site at some point in the future. So now I knew that there were two teams with the goal to replace the main www.TWLER.com eCommerce site.

Finally, there was the Senior Director of Digital in the IT team who was mainly responsible for hiring Accenture to manage projects, and WiPro/TCS to execute projects. This was a very important role as someone had to manage our vendor partners and ensure quality delivery (since I can’t get sarcasm across in writing, read this last line with as much sarcasm as you can imagine). This Senior Director also had plans to rewrite the Dotcom site, he had not gained support from the Digital VPs because they were extraordinarily unhappy with the quantity and quality delivered thus far.

To recap, in the first six months at TWLER I learned that there were at least three teams who felt they were responsible for rewriting TWLER.com. Additionally, the software development on the ATG platform was spread across multiple divisions, multiple vendors and multiple countries. All features were added by project teams who appeared when the money started, and vanished the second the money stopped, all support fell to the Digital VP of Operations. This VP was forced to contend with the mess of code and integrations that 1000s of developers were contributing to every day. I finally understood why she had moved forward with hiring her own team to rewrite TWLER.com. We owned the codebase but lacked capital, in year one we had $7M to get the team started. By the time 2011 rolled around, that amount was cut back to $3.5M.

GOTO Part V

A Digital Ecommerce Transformation – A Little More Background Before We Get Started – Part III

Part III – Start at the beginning with Part I

The TWLER.com (The Worlds Largest Electronics Retailer) architecture team started with the chief architect and five additional people, myself for high scale Java applications and NoSQL, a TWLER consultant/architect that had lived through the entire life of TWLER.com and recently converted to an employee, a second TWLER employee architect specializing in APIs and Product catalogs, a cloud infrastructure architect, and another systems thinking high scale Java architect. With a six person team, we tasked ourselves with converting TWLER.com from its massive monolithic state to a new not yet known state. What we decided early, or was decided for us, was that we were going to evolve out of the current state to a future state. It was decided for us by two constraints, not enough capital to build a new system separately and run the current system, and the stipulation that business must continue unaffected by our efforts. At this time revenue was around $1.5 billion.

There were multiple projects that we started that first year: infrastructure automation of a modern build infrastructure deployed in a cloud, cloud based outage site, distributed product catalog and QA automation. There was one more task assigned to me as exploratory which was to look into the ATG Ant build and see if there was any way of automating it with a dependency management framework.

The selection of these projects was built around the strategy to establish a modern engineering infrastructure that we could then leverage with all our future work. Having a robust Continuous Integration environment, universally available Git and SVN version control systems, artifact repository, wiki, task management and user authentication system would allow us to move quickly in the future. The distributed product catalog was built off the theory that to exit ATG, the basis of that plan required a new distributed product catalog outside of the ATG system. QA automation was the final card needed to move past six week manual release testing windows so that we could run thousands of regression tests in hours and speed up the release process.

My main project was QA automation, which was a successful disaster. Successful in that we were able to automate many of the manual test cases using Selenium and JBehave, but that the UI was so inconsistently rendered that all fields had to be accessed directly with XPath, and even then the pages sometimes rendered differently causing the tests to fail. Additionally, we were unable to setup test data in a consistent manner causing tests to fail randomly when underlying data was changed or deleted. In reality it was still better than the manual tests, but we struggled to maintain even 80% test coverage for more than a few days.

The most interesting work for me was delving into the ANT build file for ATG. I had plenty of experience with ANT in its heyday in the late 90s and early 2000s. I thought I knew ANT pretty well, but when faced with the 20+ ANT files that made up the build of 14 separate ATG applications, I had met my match. In my spare time I started tracing through the ANT files, determining how variables were setup up and finding the main path through the files. I had to diagram out the actual workings of the files as there was multiple instances of recursion occurring within the build process. My goal was to figure out how to convert the build over to Maven, which I’m sure many of you hate, but bringing in dependency management and forcing a standard file system layout we felt were the most important considerations.

After a few weeks, it was my estimate that it would probably take a couple people 2-3 months to convert the ATG build from ANT to Maven, and that we could tackle it in smaller pieces by starting with the 13 builds that were not the main TWLER.com site. We shelved this idea for the time being, but when we did get back to it a year later, my estimate proved exceedingly optimistic.

GOTO Part IV

A Digital Ecommerce Transformation – Front End Madness – Part II

Part II – Start at the beginning with Part I

In 2010, the cloud was not new but it was ignored by large companies, particularly in non-technology focused segments such as retail. While it is clear now, at the time large retailers had not yet awakened to the new reality that a company’s prowess in software might decide its future outcomes.

As an example, in my first year at TWLER (The Worlds Largest Electronics Retailer), store sales during December were not going well. Since many retailers make 50% or more of their annual sales in November and December, this spelled impending disaster. Traffic was down in stores and the company’s reaction was to propose that the digital channel stop its free shipping offer. This clearly showed the company leadership’s inability to fathom online shopping. The logic would be that if a customer could buy it online and have it shipped for free, they would not go to a TWLER store. Seems reasonable, but if the customer is shopping online and we did not offer free shipping, they would simply click to the next store, probably Amazon, and buy their electronics from them with free shipping. This customer was not going to a TWLER store that Holiday, ever. We were actually saving sales for the company, it just wasn’t understood.

A minor digression on the state of the ATG system is necessary to understand what we were dealing with. The original ATG system was built in 2003, at the time it was an excellent decision for a mid-sized retailer to build its first ecommerce engine. But over time, and numerous one-off projects, the codebase had morphed, intertwined and been generally neglected and abused. As an example, one ongoing project when I arrived was to widen the product detail page (PDP) and move the Add To Cart button from the left side of the page to the right side. This seemed fairly innocuous, but it took six months and well over one million dollars to accomplish this task. This seems ridiculous to me so as an architect, I dug into why this was happening.

It turns out there were multiple reasons why this project was practically impossible to complete. To start with, there were nine separate versions of the PDP, each made for a different category such as TVs, Music, Computers, etc. The nine separate PDPs all had common origins in some ancestral PDP, but after years of projects aimed at the individual categories, they had all strayed in different ways, including using different Javascript frameworks and versions to accomplish dynamic page elements. These PDPs were written in JSP/Javascript and were each well over 10,000 lines long intermixing actual Java into the JSPs themselves. Imagine trying to figure out how to change nine different pages all implemented slightly differently in monstrous JSP files, with no test automation to determine if you broke anything in the process.

This sounds bad enough, but the executable for ATG was in the GB range, built as an ear, with a special Ant file which only one or two people understood (more on this later). It was necessary to build and run the entire ear to determine if the page changes worked since the JSP code was so intertwined with the server side code. However, it was impossible to actually run this ear on a developer’s machine because it also required a full working copy of the Oracle database. No one had actually figured out how to make all these things work yet on a single desktop or laptop machine.

Instead, there was one shared development server for the entire Dotcom division. This server was a large Unix box but still too small to serve thousands of developers trying to build and run the ATG codebase. This server alone routinely failed due to lack of disk space and not enough CPU. But, it was the only place to build and run code you had worked on in your IDE, so everyone had to deal with it.

If you weren’t crying yet, the next step was to actually deploy the code to the staging environment (skipping the integration environment altogether) because the reality was the front end code only worked if it could access the Internet as there were so many externally downloaded components to the page. Even though you deployed the code to the shared developer environment, you couldn’t actually run it there. The staging build happened once every night.

To sum this up, the normal front end development cycle is change some code, save it locally, have it automatically picked up by your running app server and test it. This cycle time should be in the seconds range so you can work quickly and efficiently through all the little tweaks necessary to make a UI page look good and work as expected. The cycle at TWLER was change some code, save it locally, do your best to make sure the page compiled, check in the code, push to your developer environment, do your best to check it compiled, wait for the overnight stage push (go home), come back the next morning and see if the change worked in stage (assuming the push didn’t fail, which it often did). Instead of a cycle time of seconds for each code change, the cycle time was one day. One entire 24 hour day!

Did I mention zero automated regression tests?

Now I bet you think that $1M was cheap. In fact, I still don’t know how anyone actually got any work done in these conditions, but I do know that the churn in the front end development team was enormous.

GOTO Part III

What Architects Do? – Part 2

Governance or Strategy

I’ve fielded many questions lately on how I am governing and reviewing the architectures of Target to make sure they conform to enterprise standards. This is a common question asked of Architecture teams. After all, many people believe the main responsibility of Enterprise Architecture is governance.

But governance is the last thing I like to discuss about what Architects actually do. Governance or discussions about being governed means that I’ve actually failed to deliver a simple, clear and implementable technology strategy. If I’ve clearly communicated a technology strategy with desired, but not necessarily mandated, system structure, implementation stacks and supporting platforms, than engineers will happily fit into the architecture to deliver their systems.

Let’s unpack that last statement and define point #2 of what architects do:

Answer #2: Architects define the types of systems and their boundaries that are possible within an organization.

When we discuss technology strategy my goals are to deliver composeable systems which meet current and future business demands. The key to this statement is that we will meet unknown future demands without rebuilding our current systems, or having to build large new systems. There’s an assumption here that your future unknown demands are extensions of your current capabilities. If you’re headed into new business models, you’ll likely need a few new systems.

In today’s world, unknown future demands are presented every day, and the expectation is that they can be delivered in days, weeks or months, not years. For an organization to stay relevant with their consumers, a technology strategy must meet this demand for speed above all else. To achieve speed at scale the systems are constrained to deliver one thing only. A system is generally composed of services (or microservices if you like) that deliver either data or process through an API. The only governance that architects do is to ensure that data or process is comprehensive and unique. Data systems are comprehensive when they are complete for an organization, and take all changes generated throughout the organization. Process systems are comprehensive when they contain the basic services necessary to complete a process, and can be flexibly orchestrated by any entity within the organization.

Done correctly, an organization is composed of hundreds of APIs that each implement a narrow set of functionality. Systems don’t continue to expand to take on more business demands and processes. Actually, systems never want to do this, people expand systems to become more important in their organizations. Governance falls out of strategy, the only governance necessary is ensuring teams don’t construct competing data systems or alternative processes. If the current systems don’t meet their needs, fix them by doing the work and submitting a pull request. That’s also much easier than standing up a new system and engineers generally like this solution.

What Do Architects Do? – Part 1

Having been Chief Architect for Best Buy and currently Chief Architect for Target, I get this question frequently.  The question occasionally comes at your neighborhood party but generally it’s from people in Engineering or Marketing.  At the neighborhood party you attempt to answer this question at risk of becoming known as the “boring tech guy.”

What do architects do?

People ask this question because they truly don’t understand.  They are really asking “Are architects necessary?”  They ask this question because they have rarely seen value from the architects they’ve seen in the past.  This is the sad reality of much of the architecture world, enterprise architecture in particular.  My opinion on why architecture has devolved to a place where many companies are eliminating the practice altogether is simple.  Most architects in the upper echelons of companies were never software engineers.

Why this is important and is a point I harp on continuously, is that if you haven’t spent your 10 years writing code and building and running systems with higher and higher business complexity, you cannot do the first thing architects should do.

Answer #1:  Architects create the environment for engineering culture to thrive.

To create an engineering-centric culture, you have to have been an engineer.  You have to have a few large scale Agile/DevOps systems under your belt.  You have to understand what drives and motivates engineers that want to work on six person teams tackling the toughest problems facing enterprises.  You need to feel it in your gut when one person on the team is a hack and can’t pull their weight and your management won’t address the problem.  You must have found the rock and through sheer force of will, pushed it up the hill, leading in such a way that the rest of the team helps push it along with you.   If you haven’t done these things over the course of years, you haven’t been a software engineer.

Architects that understand how engineers want to work, spend all of their time and energy creating an environment where hard problems are solved, new solutions are found, and everyone sleeps well at night.  Architects make decisions based on whether engineer’s will understand them and choose them regardless of what else is available.

Architects also create the constraints that allow engineers to solve their problems quickly.  With the near infinite array of tools, frameworks and packages available, to remain economically competitive, enterprises need to limit the scope of technology in some way.  Make no mistake, architects select and limit technologies, but only those that involve large expenditures.  If a technology selection will cost a company in excess of $1M for licenses, subscriptions or maintenance over a five year lifespan, architects should be involved.

Architects essentially act as the aggregated will of the engineers.  Architects are there to make engineer’s lives easier.  Architects in effect are servants of both engineers, and the enterprise and walk the fine line that brings the maximum value to both.

 

Using the Altimeter on iPhone 6S with Swift

I was playing around with the new xcode 7.2 which allows you to beta test applications on your own phone without an Apple developer account.  Add to that the barometer available on the iPhone 6s and I just wanted to see how hard it was to write an app that can show you your altitude.

This seems like it would be pretty straightforward but it turns out there’s practically zero documentation on how to use the altimeter, and for a Swift newbie, this turned out to be quite a problem.

All I wanted to do was create an app with one button to start the altimeter, and a label which constantly updated with the relative altitude.  It seemed like something that might take an hour or two max.  However, with the incomplete solutions available and cryptic problems with refreshing the display while in a closure, it became much harder than I ever anticipated.

Which is why I’m publishing the code, however amateur, because it does actually work.

@IBOutlet weak var altitudeLabel: UILabel!

lazy var altimeter = CMAltimeter()
lazy var queue = NSOperationQueue()

@IBAction func trackButton(sender: UIButton) {

 if CMAltimeter.isRelativeAltitudeAvailable() {

    altimeter.startRelativeAltitudeUpdatesToQueue(
        queue, 
        withHandler: {( data: CMAltitudeData?, 
                       error: NSError?) in

    // Needed to refresh the screen 
    // from inside the closure
    dispatch_async(dispatch_get_main_queue(), {
        self.altitudeLabel.text = 
           String("%.2f feet",
           ((3.28 * 
           ((data?.
             relativeAltitude.doubleValue))!)))
        })
    })
  } else {
    self.altitudeLabel.text = 
       "No barometer available"
  }
}

It did finally work and looks like this:

AltimeterScreen