Friday, November 20, 2009

More on Six Sigma Metrics

There is an excellent article in this month's Six Sigma Forum magazine about the process sigma. The authors have examined a lot of the literature about the metric and come to some very interesting (and, in my opinion, accurate) conclusions. Six Sigma Forum offers an email address to respond to each article; this was my response:

I was very excited to see this article. I have been questioning this for years, and had just started doing some research with a view toward writing a similar article. I heartily agree with most of the authors’ points. I think they did a great job of catching at least the high level of the controversy and their enumeration of the advantages, disadvantages and myths should be made required reading for anyone in a Six Sigma role, especially Master Black Belts and Black Belts.
An area to which I had planned to give a bit more attention is the Statistical Process Control component of the metrics equation. Before you can make any assumptions about capability or process performance over time, you must measure it over time, and it must display a reasonable degree of statistical control. Only then do you have the assumption of homogeneity of data that makes any assumptions about an underlying distribution valid.
A foundational basis of SPC also sharpens the focus of the discussion surrounding the shift, and the short-term/long-term question. While it is possible for a process in a state of statistical control to have some underlying shifts that are not detected using Shewhart charts, sustained mean shifts of up to 1.5 sigma will almost certainly be detected within 10 subgroups following the shift, if the four most common Western Electric Zone Tests are applied.
Now, if you’re taking four samples per day for your monitoring subgroup, from a high volume operation—say, 2,000 units per hour—that might mean 9 days before the signal shows up; you’d have run approximately 64,000 units from a process whose mean had shifted. In those situations, CUSUM or other schemes more sensitive to gradual sustained shifts might be more appropriate.
Having said all that, though, it’s unlikely that shifts of that sort will go completely undetected in a well-monitored process. What we are essentially saying is that some assignable-cause variation is going to show up randomly, and for time periods too short to be detected by our charts. In that case, I believe that the local measures of dispersion used for control chart factors provide a reasonable way to operationally define short-term and long-term variation, if you must. R-bar/d2 and S-bar/c4, used to calculate control limits, provide very good estimates of short-term (within-subgroup) variation. Comparing that estimate with the standard deviation for the entire set of data will reveal whether any significant shifting has taken place. This would provide a fairly unambiguous test for shifts. Whether and how you want to define and measure the magnitude of any shift detected using this method could be another discussion; the fact that this argument is taking place without a method is another source of confusion.
It seems that we not only have to discriminate between short and long-term sigma, but we have to have a “short-term sigma assuming long-term data” and “long-term sigma.” Apparently, we can also have negative process sigmas, with DPMO greater than one million! Just look at the commonly-used sigma calculator at www.isixsigma.com, and click on the link for more information about the calculations. Their explanation, that sigma is just a z-score, shows how far we have come from an understanding of capability in some of these discussions. I think most of this falls under Wheeler’s category of victories “of computation over common sense.”
I can understand if you want to gig yourself 1.5 sigma to make your process sigma align with the one in all the tables, accepting the Motorola shift. What I don’t understand is why you would then decide that, in the longer term, it’s going to shift another 1.5 sigma (this seems to be the logic behind the “benchmark Z” used in some software packages these days.) So…now six sigma is actually three sigma by default? What’s the point?
I recently taught a Six Sigma certification exam prep course for my local ASQ chapter, and the primer for that course—a popular reference used by a huge number of applicants for that ASQ certification—suggested that, given binomial data (a proportion defective), you should use a log transformation to force the data into an approximation of the Poisson. I don’t know why anyone would do this, unless you really have to be able to have that negative process sigma and a DPMO of more than one million. For years, I have been doing just the opposite; transforming Poisson data to Binomial using e-DPU to estimate DPMO.
This brings up another excellent point from your authors: we need to figure out how we are going to count units and opportunities. My own belief is that we should limit ourselves to definitions that end up providing an estimate of proportion defective. An opportunity, in that case, would be the most discrete thing we could count; in other words, there could be no more than one defect per opportunity. This would get rid of the “negative sigma” nonsense.
I strongly endorse their recommendation that we use DPMO. The procedure they outline for finding DPMO is straightforward and useful. Calculating DPMO this way would provide a reasonable estimate. If we want to err on the side of safety, we might continue to use the 1.5 sigma shift for high-volume processes, or no shift for lower-volume processes. DPMO is a more intuitive metric, and would keep people from having to go to the table to translate DPMO to process sigma, and then decode it again later for anyone who wants to know what it means. That’s unnecessary rework, something we’d all like to avoid.