The Unforgiving Law of Large Numbers

Preface:  This is a long post.  And it lacks pretty pictures of tropical construction sites.  Writing thoughts out long-form helps me examine and refine, which is one of our core values and a pillar of our success so far.  I would be very interested in your comments.

The Law of Large Numbers has been the water cooler topic here at Atlas World Headquarters for the past couple of weeks, motivated by a couple of events:

  1. NASA designed, built, launched, and landed the unmanned rover Curiosity on Mars, and
  2. Chicago Bridge & Iron bought Shaw, and 27,000 engineering company employees got a new logo on their business cards.

The Law of Large Numbers (LLN) states that the probability of success in a series of Bernoulli trials will almost surely converge to the expected value. Any process that requires a large number of events is subject to the LLN and attendant probabilistic effects. Tossing a fair coin is the classic Bernoulli trial, a random event thats yields equal numbers of heads and tails when performed a large number of times. Turning left or right on a random walk is similar, and also a great way to get lost. Large numbers of engineering decisions are exposed to the LLN despite efforts to avoid making important decisions with a coin toss, and large groups of engineers are going to have, on average, just as many underperformers as there are stars.

Whether it’s a spacecraft, dam, refinery, or interstate highway system, the only way to avoid the averaging effect of the LLN is to make each design decision unlike a coin toss. Engineering success is earned by trained individuals striving to understand and control the design outcome.  Each engineer’s skill, and his or her diligence in implementing effective quality control, affects the chance of failure. Large project teams are made up of  numerous individual engineers whose competence, on average, converges on “average”. The problem in complex projects is the dependencies between design decisions and the disproportionate failure risk introduced by even a single below-average decision. Consider how few bad decisions or incompetent engineers were necessary to create circumstances that led to these failures:

  1. Lockheed Martin’s decision to compute booster thrust in customary rather than standard units ruined NASA’s Mars Climate Orbiter in 1999.  The program cost $125 million, consumed thousands of engineering hours, and required innumerable individual decisions.  It all went irredeemably bad because one of those decisions was regrettably poor.
  2. The 1905 attempt to divert most of the Colorado River into the Imperial Valley in was abysmally ill-conceived, almost ruined a large part of southern California, re-filled the Salton Sea, and was finally remediated by building the Hoover Dam.
  3. The management decision to allow local control over New Orleans levees led to a piecemeal flood protection system whose weak links failed when exposed to a significant, but not unexpected, Hurricane Katrina and rendered the entire system unserviceable.

Think of a failure or near-miss in your engineering career, and think of the bad decision, technical or managerial, that allowed random variables like weather such influence over how your finished design performed. Then think about how that decision could have been made differently if you, at the time, had more experience, more knowledge, or more direct control over the project. The Law of Large Numbers describes how these types of influence are harder to exert when projects are larger, more complex, and designed by a larger group.

Exceptional engineers identify and exterminate design and construction risks, sometimes overriding project schedules and seemingly insurmountable business constraints to avoid identified risks. Given a large enough group of engineers, though, the LLN states that the group’s competence converges on “average” despite diligent efforts from the competent engineers. The 27,000 Shaw employees now working on large energy projects for CB&I have, among their numbers, at least a few individuals whose engineering decisions behave more like Bernoulli trials than calculated intents. It is impossible that in such a large and diverse group each specialty would be of world-class ability. All else being equal, using the employee group that your board of directors just bought instead of collaborating with the best independent specialist engineers that you know of  leaves your project exposed to increased risk of a bad decision. As NASA just demonstrated, it’s possible for a very large team to succeed at a very complex design, but the question remains: how many times in a row can they enjoy that outcome? And are there better organizational structures that would improve reliability and efficiency?

The point of all this, then: Does the Atlas business model, with its emphasis on flexible teams of highly qualified specialists, protect our designs from the unforgiving Law of Large Numbers? Or do we expose ourselves to organizational and communication risk when we assemble a specialty team for a challenging project in a remote location?

Atlas’ in-house staff is small enough to be a known, non-random, factor. We’ve got strengths and weaknesses, like all engineers, but it’s been awhile since we were surprised by an unexpected weakness. Our in-house engineering process is as unlike a Bernoulli trial as it’s possible to be.  For larger projects that exceed our in-house capacity, Atlas teams with specialist groups who share a similar commitment to eradicating chance from design. For each project we build a reliable organization block by block and then implement the systems and controls that we all agree are necessary for good engineering. Sometimes a provisional team members turns out to be of average or lesser competence. Those parts of the organization are easy to spot, because interactions with them are so different, and easy to correct because of our inherently independent nature. We take immediate action to replace the weak link and restore our immunity to the averaging effect of the Law of Large Numbers.

I believe that the “collaborating specialists” business model is the future of infrastructure engineering. Exceptional engineers will move up and on to our team, leaving behind the engineers unable or uninterested in working at the highest level or rigor, and further reducing the average competence of large semi-anonymous groups who, alarmingly, are increasingly responsible for safety and reliability of critical infrastructure. I’m very interested to see how this trend develops over the next decade or so, and am looking forward to further expanding Atlas as more and more exceptional engineers recognize the advantages of collaborative teams.