Wednesday, March 30, 2011

Data Homogeneity - An excerpt from "Data Analysis with Minitab"

The following is an excerpt from my Data Analysis with Minitab course. I thought this was too important; it's ignored far too often. For more information, see Davis Balestracci's Data Sanity (the paper or the book), or Don Wheeler's The Six Sigma Practitioner's Guide to Data Analysis.

We have discussed simple graphical analysis in histograms. Remember, a histogram allows us visually to get a feel for shape, center and spread of a set of data. Adding the specification limits to a histogram allow us to see performance in relationship to specifications, and any outliers might show up on a histogram.

Important to note: A histogram is a snapshot in time. It shows how the data are “piled.” If the process is not stable, we can’t make any assumptions about the distribution. So, while a histogram is a very useful tool, it’s more useful when used in conjunction with some time-series plot. The following scenarios, adapted from Davis Balestracci’s Data Sanity, illustrate the importance of looking at process data over time.

These scenarios depict the percentage of calls answered within 2 minutes for three different clinics in a metropolitan area. All three sets of data were collected during the same 60-day time period.

What can you say about the performance of the clinics, based on the histograms and data summaries?

The summaries presented in the histograms all show unimodal, fairly symmetrical, bell-shaped piles of data. The p-values for the Anderson-Darling tests for normality are all high, indicating no significant departures from a normal distribution. There are no apparent outliers. The mean percentage for each clinic is a little over 85%, and the standard deviations are all around 2.5%.

The histogram, though, is a snapshot. It only reveals how the data piled up at a particular point in time. The graphic, and its associated summary statistics, can only represent what’s happening at the clinics if the data are homogeneous. These data were gathered over time: what would a picture of the data over time reveal?

The control chart for clinic A is below. Although the histogram showed the same bell-shaped pattern and high p-value for the normality test, you can easily see that the histogram can’t represent the data for clinic A; we caught it in an overall upward trend, and so a histogram of the next sixty days will no doubt look very different from the histogram of the first sixty days.

Likewise, the control chart for Clinic B…

This chart shows that what we are actually looking at is three different processes, the data for which just appear to stack up to a single, not-different-from-normal distribution. In fact, by slicing the chart at the shifts, we can see that there are three distinct time periods when the variation is in control:

The only one of the three clinics with a stable process is clinic C. Looking at Clinic C’s plot over time, we see the random pattern of variation within the control limits. We can now expect that the histogram will not change shape significantly over time, the parameters will all remain about the same, so our assumptions about distribution will be valid and useful.

Friday, March 11, 2011

Contra the 1.5-Sigma Shift

I'm currently working up some simulations to try once again to put the "1.5-Sigma Shift" to bed for good. The simulations seem to prove out what I've long felt about the shift, but I have one to run yet to demonstrate effects of the shift -- and detectability -- on a high-volume operation.
My understanding of the origin of the use of the shift is this: people at Motorola apparently had some data that showed that you could have undetected shifts of up to 1.5 Sigma; this would certainly be a valid concern when you have high-volume production with low monitoring rates.
As an example of what can happen when you get shifts in high volume enterprises, I'll mention Don Wheeler's Japanese Control Chart story from Tokai Rika. They were running about 17,000 cigarette lighter sockets per day, and had found that they could detect shifts using one subgroup of four sockets per day. They selected one at 10 AM, 12 PM, 2 PM and 4 PM each day, and kept an XbarR chart on the data. The only rule they used was rule 1, (a single point outside the control limits).
Suppose they had decided to add rule 4 of the Western Electric Zone Tests (a run of eight above or below the centerline--Minitab and JMP call this rule 2 and use a run of nine). This would mean that if a shift in the mean occurred and and the first signal was a rule 4 signal, they might run 8 x 17,000 = 136,000 sockets at the changed level. This would be unlikely to result in any nonconforming product (since they were using less than half the specified tolerance), but from a Taguchi Loss perspective, it's not desirable.
So it might be prudent to study your processes and either sample more frequently; or you can "play the slice" as Motorola did, and assume that you might have undetected shifts up to 1.5 sigma on a regular basis. If you do this, you will end up only giving yourself credit for a Cpk of 1.5 when you actually have a Cpk of 2, and you end up estimating much higher proportions defective than what you actually get. As a fudge factor for setting specifications, it's sloppy but safe, I guess.
So let's talk about what Motorola might have gotten wrong.
1. My understanding is that they (much like Tokai Rika) only used rule 1. This would keep them from picking up some of the other signals. I don't have the data from the studies they based their conclusions on, but they might have used a different value than 1.5 had they had the added sensitivity lent by the rest of the Western Electric Zone Tests.
2. "Undetected shifts" are, logically, undefined. If we operationally define a shift in the mean by using some combination of the Western Electric Zone Tests, then any long run without a signal is not (by definition) an undetected shift. Logically, you can't detect an undetected shift. We can define the difference between long-term variation (dispersion characterized by the standard deviation of the entire data set) and short-term variation (dispersion characterized by rbar/d2 or sbar/c4). If you want an operational definition of "undetected shifts," the delta between those two measures of variation might be useful. It's silly to assume, however, that there are some bursts of variation that average 1.5 sigma and somehow escape detection. Not only that, but the false alarm rate itself induces false signals.
3. It's damned difficult to induce a shift in a simulation that isn't picked up within a few subgroups. In one of the simulations I've been working recently, I created 10,000 random variables from a normal distribution, with a mean of 50 and a standard deviation of .5. I cleaned up the false signals by substituting other randomly-generated numbers for those outside the control limits, and rearranging the order to kill off the rule 2, 3 and 4 signals. I then ramped up a 1.5 sigma shift in .05-sigma intervals, 50 at a time. An ImR chart caught the shift within the first 8 subgroups (and I had only shifted .05 sigma at that time). That was for a gradual shift; an abrupt 1.5 sigma shift signalled immediately.
4. The only way you get the results the process sigma calculations give you is if all the data are shifted 1.5 sigma; in other words, the mean has to shift 1.5-sigma and stay there. So you have a control chart, and the centerline is on 50, and the upper control limit is at 51.5, and you don't have any out-of-control signals...but the actual process mean is 50.75? In what world can that happen? Those are the conditions you would need, though, to actually get "3.4 defects per million opportunities" in any process showing six sigma units between the process mean and the nearest specification limit (a process sigma of six). Occasional process meandering to as far as 1.5 in either direction, if it could go undetected, would result in significantly lower DPMO than what the Process Sigma Table predicts.
I believe it was a mistake for the statistical communiy to allow this to become an informal standard. We are about quantifying uncertainty, not about arbitrarily adding large chunks of uncertainty. The "process sigma" is already counterintuitive. If you tell managers their process sigma is 3.2, the first question they always ask is, "So what does that mean?" It's much better, I think, to use DPMO...it makes sense to most people, doesn't require translation, and doesn't have require assumptions about shifts that probably don't exist. It also acts as a sort of Rosetta Stone, allowing to translate between data from counts and data from measurements. We do have to remind managers that DPMO is still just a best estimate based on current data, but it's certainly more meaningful than the "process sigma."
There is a danger that it will become more than just an informal standard soon. There is a proposal for a new ISO interational standard for DMAIC; it does include the Process Sigma, and the language in the proposed standard says we will adjust by 1.5 sigma "by convention." Anyone interested should watch for opportunities for public comment on the standard, either through TAG 69, NIST, or ISO.

Tuesday, November 2, 2010

"Best Practices"

Whenever I hear someone talking about a "best practice," I always add the Homer Simpson modifier: "Best practice SO FAR..." What this term means is just that it's the best solution yet to some set of problems or circumstances.

My experience has been that they don't stifle creativity in creative people...they can serve as springboards for further creativity or improvement. I think they are best used just that way...as you're studying a process, and you're analyzing the cause systems that create the outputs and outcomes, you will look for aspects of the systems that can be worked on to optimize the outcomes. Looking at "best practices" is like looking at any other process...we're just starting with a process that has already been improved before (at least for this set of inputs).

The downside to "best practices" comes from leaders who hear the term "best" and decide that it must actually mean "best it could be." Managers who do this will try to force replication, without knowing what to replicate or why it worked in its original environment (and whether it will work in the new environment). In that case, it will certainly create road blocks and slow down process improvement.

Tuesday, June 1, 2010

In a class a few years ago, we asked students to talk about quality-related projects on which they were currently working. The class comprised a number of people from several business units. At one table, a project leader stood and told us all about his project for the marketing unit: they were exploring server consolidation. They knew that only a fraction of the capacity of each of many of their servers was in use; they had a large number of servers, therefore, that could be consolidated. Because this business unit "rented" the servers from the Shared Services unit, they figured they could save \$250,000 per year by consolidating servers and turning them back over to Shared Services. The class politely applauded.

Next up was a person from the Shared Services unit, who talked about his project, which was developing a new service they could "sell" to the marketing unit, which would generate over \$250,000 in new revenue for Shared Services. The class again politely applauded.

I asked, "What's wrong with these stories?"

Blank stares (I'm the idiot!)

I tried to give them a hint: "How does the company benefit from these projects?"

A tentative hand, then (in a tone that indicates that surely, I AM the idiot), "Well, the company saves half a million dollars! Why wouldn't THAT be a benefit?"

I asked, "How is the company saving a half-million dollars?"

Again, incredulous stares..."You're the stats guy...maybe you should have taken accounting instead...250,000 plus 250,000...isn't that half a million?"

I pointed out that marketing was "saving" a quarter of a million by not "renting" a quarter of a million's worth of servers from Shared Services, but that Shared Services was "making" a quarter million by "selling" a quarter-million's worth of new services to marketing. So they just dipped a bucket into one end of the lake and dumped it into the other end...and some evaporated while they were transporting it, because of the cost of the project.

Eventually, we did work out that there were benefits...increased server capacity, benefits from the new service, etc.. Most of these numbers (the actual benefits) were "unknown and unknowable" numbers. None of those benefits had been discussed originally, because the "knowable" numbers were easily calculated (and wrong)...

Friday, April 23, 2010

What is "Productivity?"

In one of my stats classes, a nursing student mentioned that they measure productivity at her hospital. It's measured this way:

"To get the productivity ratio; you take the total number of hours worked by nursing ( all nurses on the unit) and divide that by the total number of patients on the unit at midnight. For example if there are 4 nurses per shift and they work 12 hour shifts then that is 96 hours; then say there are 30 patients on the unit at midnight; divide 96(nursing hours worked) by 30(# of patients) = 3.2."

In my consulting practice, my clients often tell me about productivity numbers. This, to me, is one of the compelling questions for those of us in the quality profession: what is "productivity?" To keep the discussion going with my student, I posted the following, to raise some of the issues I've seen organizations struggle with over the years:

This is one problem with many of the metrics used for "productivity." By trying to boil it down to the simplest, easiest to use ratio, you leave out a lot of important information. What is productivity in nursing? Is it just being there? Clocking in and clocking out? Most of the nurses I know work pretty hard, but even the amount of work completed wouldn't necessarily reflect the value of a nurse. A number of years ago, a paradigm came out called ABC (for Activity-Based Costing) that measured productivity in terms of activity...how much were you actually doing? Seems reasonable, but it doesn't necessarily reflect value, any more than motion reflects progress.

Nursing can be a lot like being in the Military. I can't tell you how many watches I stood in 20 years...tens of thousands of hours where no one took a shot at anyone. If my job was to kill enemies, then most of the time, I was a waste of taxpayer dollars. Did that mean we didn't need to be there? Our job was not to be constantly doing something, but to be alert and vigilant so that if something did happen, we could take immediate action.
Similarly, there are nights, even in Emergency Rooms, that are slow. Would you send everyone home, to keep your productivity numbers high? Or is there value in having some knowledgeable and experienced caregivers there for the probable event of an emergency?
What is the productivity measure tied to? Can you show that a higher ratio correlates to better outcomes? Higher profits? If it's just cost-cutting, it's hardly "productivity;" it's just lack of having to pay for "non-productivity."
The point is, productivity is difficult to measure, and productivity is in the eye of the recipient. What the patient may value, the administrator may not. What the doctor may value, the HMO may not. What the nurse may value, the patient may not (one example; waking a surgical patient up every hour during the night to check vitals).

Of course, I guess the whole point boils down to value...who defines that, how you prioritize the "whos." This is where you must be able to understand something about systems thinking.

Thursday, March 11, 2010

Creating a Culture of Process Improvement

This morning one of the questions posed by readers of IQ Six Sigma posed the following question:

“My department is charged with creating a "culture of process improvement" within our zone. We're struggling with what that looks like once we've created this culture. Looking at the Toyota model, they challenge employees to look for PI opportunities every day. What exactly does that look like, and what measurements should we consider (i.e. number of PI suggestions with managers being held accountable for X number per quarter, etc.) I'd like some ideas.”

Well, one thing you for sure don't want to do is set some quota for suggestions. You may already be faced with an uphill battle, because the leadership at your organization is actually the entity that has to create that culture of process improvement. If they are just rolling it downhill like any other MBO, it suggests that they don't know what they are doing.

Toyota does challenge employees with looking for improvement ideas. One of the ways they do that is by implementing them. Most suggestion boxes go unheeded by employees because they go unheeded by management. At companies like Toyota, they use mechanisms such as Quality Function Deployment to communicate the voice of the customer to everyone in the organization. It allows people on the production line a clear line of sight to the mind of the customer and the organization's leadership.

How do you establish this culture? Well, if you have to do it locally, start by knowing that you may not be as successful as you would if your leaders were leading. Empowerment is a big piece of the pie...you have to let people know they are empowered to make changes. You have to have mechanisms in place that let changes be approved at the lowest possible level. This doesn't mean that any line worker should be empowered to make design changes that require retooling the entire line without some study, but small local changes should be able to be made and standardized locally, as long as they don't suboptimize the system.

So, start by listening to people. I once found an operator potting an assembly with epoxy, using a pneumatic syringe...one of the primary quality characteristics in this assembly was that the epoxy had to be free from air bubbles! This line worker had been telling people about it for some time, but no one would listen; after all, an engineer had designed that workstation--who was this uneducated line worker to question the engineers? So, again, listen! Your people have the answers to most of your quality problems. It may take some time before they will talk (because it's a culture change for them, too).

It's not enough just to listen, though, you have to act! If you don't act on what you hear, and act promptly and visibly, soon you won't have anything to listen to. If you listen and act, you'll soon find that you can't keep up with the suggestions for improvement. That will be the beginning of changing the culture to one of improvement.

You also have to be a champion. You have to be out there talking it up, walking the talk, aggressively and visibly removing obstacles to improvement. Align whatever passes for reward and recognition in your zone with PI, to let people know that it's important. Constantly let people know what you value; proactively seek (and take) opportunities to demonstrate those values and beliefs. Measure important process and throughput measures...use SPC so you don't make boneheaded decisions about those measures.

As to what to measure to gage progress along the cultural change path...well, there are lots of things you can measure. Probably the most important are results and employee morale. If your error rates, rework rates and scrap rates are going down and your throughput is going up, it's working. You can also measure suggestions received; but you should use that number as the basis for a perhaps more important metric: percentage of suggestions implemented. This is certainly not an exhaustive list...there are numerous things you can measure. Deming said that the most important numbers are unknown and unknowable; this is what makes measuring what we can measure so important.

Standardize, do 5S, start holding 5-10 minute meetings at every cell every day, to go over quality metrics, suggestions entered, suggestions implemented (and get ideas for implementing suggestions), recognize people for advancing continuous improvement.

Monday, February 8, 2010

Bonus Plans

In one of my LinkedIn Discussion Groups, we have been going back and forth on the idea of bonus schemes for a couple of weeks now. Today, we got a thoughtful post from John, who said that "Incentives and reinforcement are part of what I design." He offered insights as to how a system might be designed. I responded to one of his ideas.
He pointed out that "bonuses have been factored into sales compensation since the dawn of time because we know that vigorous sustainted effort is required," then asked, "Why here and not in all key jobs?" One of his reasons: "Execs are unfamiliar with the ways that objective measures can be designed for staff, managers, and production people," and goes on later to suggest that "Incentives need to be based on objective measures of performance, and that "ALL incentives are ultimately individual."
While these ideas seem to make some common sense, things that we've learned over the last 30 years or so suggest that they bear some scrutiny. Here's my reply:

_________________________________________________

I think Scott points to a couple of drawbacks to many bonus schemes. There are some problems with one of his fixes, though.

Let's talk about objective criteria: sometimes they do exist, but it's not as often as we think, and it's never (an I do mean NEVER) as clear-cut as we think. Anyone who's ever seen the Red Bead Experiment can attest to that. It's also almost never possible to separate the performance of the person from the performance of the system in which they operate. So, even when we talk about "anyone who reaches the goal gets the bonus," we assume that it's possible for everyone to reach that goal, completely independent of all the factors that drive the system.

Let me illustrate with an example from my days in the Military:

An Army school convenes twice per year, and runs for 5 months. One class starts in late Fall, the other in late Spring. Each class is led and instructed by two soldiers. During a study of these classes 10-11 years back, one of these instructor teams clearly excelled, by all the “objective” criteria used to measure performance: very low dropout rates, very high academic achievement with very little remediation, almost no legal or medical problems, excellent advancement rates for graduates, etc. The other team, however, didn’t fare so well; their dropout rates were very high, most of their students struggled to pass the weekly exams (despite extensive remediation and night study), they had numerous problems reported from both base security, military police and community police, a high incidence of sick days, and most students who graduated required a lot of extra work to gain adequate proficiency, once they arrived at their units.

Of course, the team with the highest scores on all the criteria won Instructor of the Quarter/Year, Soldier of the Quarter/Year and other achievement awards given by the training command, and were consistently ranked in the top 5 by their commanders—all this, of course, led to rapid advancement for these soldiers

The low-scoring team ended up at the bottom of the heap, in the “not ranked” category, and received letters of reprimand for their poor performance.

Eventually, someone noticed that this difference in performance transcended the soldiers themselves…ALL the Fall classes were better, and ALL the Spring classes were worse. As it turned out, there was a great logical explanation for all of it.

The classes that convened in the late Fall comprised students who had come into the Army right after High School graduation, many on delayed entry programs. They had enlisted for this particular specialization. They were highly qualified and highly motivated, both for the Army and for this school. In contrast, the Spring classes were made up of people for whom the Army was something to do after they had failed to find a job, and who had been put into this class to fill a quota. Some had needed waivers to get into the Army; many had required waivers to get into the class.

Ironically, if you looked at the workloads for the instructor teams, the hardest-working and most creative teams were those for the Spring class. They had to be, just to survive. They had to conduct remedial sessions at night study, as well as before classes, lunchtimes, weekends, etc. They had to continually push the envelope to find new and better ways to get these challenged students to learn. The other team largely skated through the duty…very little extra time, no extra thought needed.

This same sorry story still happens every day in Military recruiting. Recruiters in very populous areas in more patriotic-leaning states have very few problems meeting quota. They get awards, advancements, etc. Those in rural areas work many times harder and often don't make quota, and are forced to accept low evaluations and sometimes humiliating "remedial" sessions where senior recruiters come in and yell at them like drill sergeants ...many of these are just back from Iraq or Afghanistan.