Last week, I was quoted on CNN online regarding problems with engineering fundamentals on the Healthcare.gov site and the new “tech surge” to fix it (http://cnnmon.ie/16pKPnW). For anyone who has built a business critical software product or scalable website, the major mistakes are pretty clear, but here is my simple take as an outside observer. Full disclosure, I’m a big fan of Obamacare healthcare and want it to be successful which is what makes me really annoyed at what appears to be a $300 million debacle. I have no doubt that the best and brightest people and technology from Silicon Valley could have done this at 1/10th the cost if we were asked to help earlier on.
1) Always have multiple levels of redundancy with automatic recovery
Verizon reports that the 1-day outage last week was caused by a network component failure (http://cnnmon.ie/178bnN3). OK, maybe in 1999 this was excusable – I personally lived through many such notorious outages back then when I was at VERITAS, including an epic fail at eBay. But Verizon touts Terremark as an enterprise grade cloud, so how can they not have multiple levels of redundancy and network automation to recover from a single network failure? It certainly can”t be due to lack of money given the $15 million service contract.
I would expect Terremark can, or has already, quickly provisioned extra network capacity, routers and DNS servers to solve this problem. Hopefully CMS is also putting in place multiple data center locations with replicated databases across the US to survive any significant regional power outage or Hurricane Sandy events. The question is why no one thought of this in the design phase.
2) Too much code, too fast = Big Problems
The Healthcare.gov site is composed of 500 millions lines of code (http://cnnmon.ie/18KbwAB) and wasn’t stress tested until 3 weeks before the go-live date.
This truly boggles the mind. As mentioned in the CNN article, this is 10 times more than a typical online banking system and more than 5 times more than Windows 8! Those products go through 12 month or longer beta testing cycles.
My rule of thumb is great software is best built by a small team of engineers that understand what users want and bring a great sense of craft and design. Every line of code is a potential problem, so great new products have a Spartan approach to coding and well thought through architectures that support ongoing innovation. Clearly not the case here. It also appears they took a legacy waterfall approach to the web site code with a lot of custom coding vs. modern agile development approach and ready-made building blocks (standard practice for web applications) where the code is broken into small chunks and constantly tested by the design teams before integration testing. With an agile development approach, maybe there was a chance that three months of stress testing would create success. Unfortunately, with a big bang integration, at best, Healthcare.gov can be barely stabilized near term and will likely be plagued by usability and glitches until all of the code is re-written.
3) Too many organizations with poor accountability
One of the great VPs of Engineering that worked for me once said, “All software quality problems are ultimately people problems.”
The biggest problems with the site appear to be too many organizations, with too many people being paid too much money with no one really in charge.
There were six different consulting organizations involved, as described here (http://cnnmon.ie/Hg4CNt). In the casino online hearings this last week, it was clear there was no “czar” to pound the table on key technical decisions throughout the process or drive complex program management with a overall judgment of meeting the design goals. And there was probably very little common agreement on quality or design practices across the multiple firms.
Silicon Valley to the Rescue?
I personally am rooting that the efforts of Googlers like Michael Dickerson and the Silicon Valley dream team can quickly “stabilize the patient,” so to speak. My bigger hope is that he”ll bring in the new cloud computing approaches that can best deal with the massive unpredictable user workloads cost with a reliable and elastic scalable approach. Though with Oracle getting closely involved near term, it’s probably lots of big iron. Once through the crisis, they should re-think every layer of the stack using true “scale-out” architectures. It will save the taxpayers a lot of money in the long run.
Looking Beyond Healthcare.gov: The Big Economic Opportunity with Obamacare
Perhaps buried in the website distractions is the amazing opportunities for business innovation related to Obamacare. Opportunities to both deliver better, more affordable patient care and create brand new hyper-growth businesses.
One of our ClustrixDB customers, MedExpert, is showing the way by providing innovative online healthcare services to patients through Medicare insurance providers. MedExpert’s online staff helps patients with treatment advice in minutes by combining the most up-to-date medical research with the patient’s medical symptoms and current treatment plan. The result is reducing costly emergency visits by 14 percent – wasn’t cost reduction through preventative care always the point of Obamacare? The full story published by GigaOM can be found here.
And MedExpert is not alone. As reported in the NY Times (http://nyti.ms/1iKxTeM), private equity and venture capital money is readily flowing into similar on-line healthcare businesses creating a potential for a billion dollar business in both the healthcare industry and the cloud technologies that power them. Maybe that will then get the attention of the politicians blocking the once in lifetime modernization of healthcare.