It’s Time to Bury Forced Ranking — Once and For All

The heated debate over how to assess employee performance was highlighted recently by two back to back articles on BusinessWeek.com.

One day, Yahoo’s adoption of a forced ranking system was a headline. The next day, Microsoft’s decision to end its forced ranking policy was featured. The Microsoft story was previously an article titled, How Microsoft Lost Its Mojo, in Vanity Fair.

Within days, former General Electric CEO Jack Welch added his 2 cents, defending the practice in a Wall Street Journal opinion column headlined, “Rank-and-Yank?” That’s Not How It’s Done.

Variations on the forced ranking logic – “rank and yank” — have been used for decades. New managers often learn they need to limit the “outstanding” ratings. But GE was the first to require forced rankings – and if reports are accurate, one of the first to change the policy.

Jack Welch’s argument

Welch contends that rank-and-yank is a “media invented … sledgehammer of a pejorative that perpetuates a myth …” He argues that the policy should be referred to as “differentiation.” And that the purpose is to make sure that “all employees know where they stand – how they’re doing today.”

He refers to the 20-70-10 distribution as “grading” and states it’s “not set in stone. … Some companies use A, B, and C grades.”

He understands that “some believe the bell-curve aspect of differentiation is ‘cruel.’ ” He argues that, “We grade children in school, often as young as 9 or 10, and no one calls that cruel. But somehow adults can’t take it? Explain that one to me.”

The bottom line is that Welch has not backed away from the practice. He sees it as a “powerfully effective real practice.” His defense does not acknowledge any problems. In fact, he contends it “provides dignity, develops future leaders, and creates winning companies.”

Where Forced Ranking is wrong

At this point, it’s not clear what prompted GE and others to adopt a forced ranking or bell curve appraisal policy, but Welch’s defense of the practice confirms for me it was a response to a common problem – inflated ratings.

Inflated ratings may well occur in every organization. It’s symptomatic of the Lake Woebegone effect that has crept into almost every aspect of our lives. In some organizations it’s far worse than others.

In the federal government — where ratings have few consequences — as many as 95 percent of employees in some agencies are rated as a “4” or “5” on a five-level rating scale, with a shockingly small number ranked at the lowest level..

For years, performance reviews were pass/fail and simply confirmed that employees satisfied performance standards. Employers adopted multi-level rating scales for white-collar jobs roughly half a century ago, but inflation has only recently been recognized as a problem.

Employers lived with the problem – it was not an issue studied by HR experts – until GE adopted its “forced distribution” policy and other companies jumped on the bandwagon.

Shifting the focus of performance reviews

In introducing the idea, GE shifted the focus of performance reviews in a way that violates a basic tenet of performance management.

It would also be impossible to defend if members of protected classes are adversely impacted.

In textbooks and under the law, performance ratings should be narrowly focused on an employee’s performance in the current job. With forced ranking policies, managers compare and rank employees, across occupations and departments. Comparisons like that are arbitrary – there are no valid performance-related criteria that would justify comparing, for example, accountants and HR specialists.

There are actually few job families where it would be valid to compare employees. Employees of course differ in experience and credentials.

Even jobs that appear to be similar often involve different performance expectations. Sales is one, nursing is another. With both jobs, however, the best performers are commonly assigned to the most difficult customers or patients.

At least with similar jobs it would be possible to document the differences in expectations along with the rationale that explains the ranking of employees. With different occupations and jobs, it is purely arbitrary.

Forced Ranking: No support in theory or practice

The idea of compelling managers to force ratings to fit a normal distribution has no basis in theory. A normal distribution or bell curve first surfaced as part of the thinking when intelligence tests were used by the Army in World War I. In that context it made sense; recruits came from the large population of males in the age range sought by the military.

Employers are more selective, however, recruiting typically from a narrowly defined cohort with education and experience credentials matching job specifications. College graduates, for example, account for less than 30 percent of the population. The most highly recruited graduates from elite colleges come from a small slice of the population.

Students in a basic stat course learn that a small sample from a much larger population is rarely going to resemble a bell-shaped curve on any measure.

Jack Welch is correct of course that high performing employees are more valuable. Employers certainly should identify them. However, forced ranking assumes ratings are valid and free of bias or discrimination. It’s also very likely, no doubt guaranteed, that some managers have higher standards than others.

Different grades for different occupations

What is commonly overlooked is the obvious difference in rating employees in different occupations and at different career stages. It is not at all clear how or why a “5” rating in marketing, for example, can be compared with one in accounting or engineering. For the same reason a “5” for a new hire would not have the same interpretation as the same rating for someone late in their career.

Beyond that of course a policy that requires terminating lower rated employees each year is going to impact solidly satisfactory workers who get closer and closer to the bottom. Companies have been aware of that reality but continued to live with these disastrous policies.

If objective performance measures were broadly available, it would be possible for employers to defend ratings. Reliable data, however, are available for few jobs.

In far too many organizations the performance criteria used to rate performance are only loosely tied to job duties. Even the typical use of individual performance goals is far from a consistent practice across occupations and job levels.

As a comment on Welch’s argument, yes of course teachers sometimes “grade on a curve,” but that’s to the benefit of students when the class as a whole performs poorly on tests. Moreover, they all took the same test and are marked on the same performance criteria. His analogy doesn’t work.

There are better alternatives

As the manager responsible for the performance system in two large companies, along with my consulting, I have discussed ratings with hundreds of managers. Some were good, some not so good. The overwhelming majority wanted to be fair but were uncomfortable with their role.

The reasons forced ranking became popular are clear to anyone who reviews ratings in a large organization and/or discusses performance ratings with managers. It would be unfair to call the ratings dishonest but many could never be validated. Psychologists tried for years to develop valid rating methods but essentially abandoned their research in the 1980s.

As Microsoft and other companies have learned, forced ranking contributes to problems that are far more toxic than inflated ratings. There is no silver bullet but companies can minimize the inflation problem.

First, my experience has convinced me that three-level rating scales are the most practical – Outstanding, Fully Satisfactory and Unsatisfactory. With the more common five-level scales, it’s typical to see ratings bunched in the top three levels.

Research some years ago disclosed that workers typically know and agree on the outstanding performers and the few who are unsatisfactory — they stand out. The ratings using a three-level scale will be more valid and defensible.

Where Welch’s argument in on point

Plus of course the two groups – the A Players and C Players — are most important to workforce management. Here Welch’s argument is on point.

As a change in planning performance systems, relying on performance criteria specific to key jobs or job families helps everyone agree on performance issues and enhances the validity of ratings. With guidance, a small group of high performers can do that in a couple of sessions.

A useful analogy is a football team, where each position has a unique set of performance criteria. Those small groups can also define outstanding as well unsatisfactory performance. Adding the two levels helps managers. There is no practical reason to rely on the same generic, vaguely defined criteria for every position.

Job-specific criteria contribute to effective performance management. They facilitate performance planning, feedback and coaching discussions, developmental planning, and career management.

The real key of course is the cadre of middle managers and supervisors. The basics include adequate training but far more important is top management expectations.

Where leaders make performance management a priority, it is far more likely managers will take their responsibility seriously. It would also be advantageous to have coaches to work with ineffective managers. Finally the best managers need to be rewarded and the least effective moved back to non-supervisory roles. That sends an important message.

Forced ranking: not a viable long-term strategy

Ed Lawler was the first to write about a step in the review process that is now a proven strategy to insure that ratings are more credible and valid. He recommends having managers explain and defend the high and low ratings at meetings with peers serving on “calibration committees.”

The added step makes it important for managers to develop solid justification. It introduces pressure to be honest, raises the prominence of good performance, and increases recognition. It does not completely eliminate inflated ratings but it introduces a shared sense of fairness – and is seen as a positive. Second-level manager reviews are not sufficient.

Forced ranking is by no means a viable long-term strategy to create more productive work places. It is fully possible to achieve Welch’s goal of “differentiation” without triggering the counterproductive behaviors that have surfaced in companies like Microsoft.

To quote from the Vanity Fair article, “Every current and former Microsoft employee … cited stack ranking as the most destructive process inside of Microsoft, something that drove out untold numbers of employees.”

Yahoo should talk to Microsoft.