logo
Home
About Us
Division News
ListServ
Newsletter
Q Knowledge
Bookstore
ASQ
Stories
 
Quality In The News

Archive


The issue of testing for academic proficiency has been growing in recent years, and now the new federal law will require additional testing. Two recent New York Times articles raised issues that prompted two of the Society's Division Chairs to respond to . . .

About the New York Times Articles

The articles "Right Answer, Wrong Score: Test Flaws Take Toll" and "When a Test Fails the Schools, Careers and Reputations Suffer" were written by Diana B. Henriques and Jacques Steinberg and were published May 20 and May 21, 2001.

In recent years, educational testing companies have experienced serious breakdowns in quality control. Here is some of what the articles uncovered:

·
  • Testing industry errors in the last three years have affected millions of students who took standardized proficiency tests in at least 20 states.
  • The company that scored tests in Minnesota gave 47,000 students lower scores than they deserved. In the last three years, errors by the same company produced incorrect scores for students in Arizona, Michigan, and Washington, and deadlines for delivering test results in Florida and California were missed.
  • Nearly 9,000 students in New York City were mistakenly assigned to summer school in 1999 because of an error by another big company.
  • A mistake in 1997 by a smaller company denied $2 million in achievement awards to deserving schools in Kentucky.
  • In 1998, nearly 700 of California's 8,500 schools got inaccurate test results, and more than 750,000 students were not included in the statewide analysis of the test results.
  • Scoring of writing samples is handled by temporary workers earning as little as $9 an hour, some of whom said that they were pressed to score essays without adequate training and that they saw tests scored in an arbitrary and inconsistent manner.
  • Many industry executives attribute these errors to growing pains. The industry says its error rate is infinitesimal on the millions of tests scored annually.
  • Interviews with people involved in the testing process suggest the industry cannot guarantee the error-free, high-speed testing that parents, educators and politicians expect.
  • Unable to uncover serious testing errors on their own, school districts must rely on the testing companies to do so voluntarily. Lacking federal oversight, the companies decide what they will disclose and when.
  • President Bush has proposed a 50% increase in the workload of the tiny testing industry. No one has addressed whether the industry can handle the additional business.
  • Testing specialists say educators and politicians share the blame for the rash of testing errors because they are asking too much of the industry.
  • The testing industry's code of conduct specifies that the tests not be the basis for life-altering decisions about students. But many states use them to determine whether students can be promoted or granted a diploma, to evaluate teachers and principals, and to decide how much tax money school districts receive. Schools' performance on tests can even affect property values in surrounding neighborhoods.

ASQ Offers Advice to Congressional Health Policy Staff on Prescription Drug Errors

Although there exists widespread agreement on the causes of prescription drug errors, there is little consensus on methods for addressing those problems. A new paper presented to Congressional health policy staff by the American Society for Quality advocates the wider adoption of proven quality methods in efforts to reduce medication errors.

As part of ASQ’s ongoing effort to cultivate contacts with national policymakers in Washington, officials from ASQ met earlier this year with the health policy advisor to Congressman Michael Bilirakis of Florida, who asked for advice from ASQ on ways in which quality methods could be used to reduce prescription drug errors.

A position paper titled “Using Quality Methods to Reduce Prescription Drug Errors” is ASQ’s response to that request. It was prepared by ASQ staff with input from members of the Health Care Division, the Food, Drug & Cosmetic Division and others.

This new paper draws upon ideas presented in the position paper titled “Quality and Quality Improvement in Health Care Services,” which was prepared earlier this year by the Health Care Quality Special Interest Group of the ASQ Health Care Division and the Society for Health Care Epidemiology of America.

Review the complete prescriptions paper—You will need Adobe Acrobat Reader to view and print the complete paper. Adobe Acrobat Reader is available as a free download at http://www.adobe.com/products/acrobat/readstep2.html.


Industry Needs Measurement Quality Assurance

by Duane Allen, Measurement Quality Division Chair

Recent articles in the New York Times reported on problems resulting from errors in reported test results from academic testing programs. Many of the errors could have been identified or discounted sooner if the educational testing industry used measurement quality assurance. The industrial and scientific communities have developed measurement quality assurance techniques to reduce the chance of mistakes in making judgments based on measurement results.

Several factors contributed to the errors in scoring. These included mistakes in grading keys, subjective interpretation of test answers by scorers, and incorrect adjustment factors to equate test results.

The educational testing industry maintains there is always a risk of some errors in reported test results. But there are techniques for minimizing the occurrence of such errors. More important, the established discipline of measurement science provides ways of identifying and reporting potential errors associated with a measurement system. Requiring a report of potential errors and their magnitude would temper decisions made based on test results as well as encourage the educational testing industry to improve the reliability of their products.

An important concept developed in the test community is the idea of measurement uncertainty, a numerical expression of reasonable doubt that is associated with measurement. For example, a bathroom scale at home rarely shows one's weight to be the same as on the scale at a doctor's office. Is the weight reading in the doctor's office scale more accurate? Without more information about the measurement systems in the doctor's office and at home, one cannot know. Both scales may be close enough to the true weight that we can know our health. If the uncertainty of one's weight were important, then we might talk about an overweight person's weight as being between lower and upper values-that is, instead of 240 pounds as being between 236 and 243 pounds. For other health measurements, such as with blood pressure, knowing the uncertainty may be important.

The expression of measurement uncertainty is not a randomly drawn estimate. It is based on analysis of the measurement process. The uncertainty allows for factors that affect the reported measurement value. Using the same example, one's weight in the doctor's office is often greater because one wears more clothes than at home.

An important requirement in the International Organization for Standardization's "Guide to the Expression of Uncertainty in Measurements" (GUM) demands that when reporting measurement uncertainty the method of calculating that uncertainty must be included. The GUM requires identification of potential sources of errors in measurements and provides methods for combining numerical factors in an expression of uncertainty. By referring to the reported uncertainty and methods used to calculate it, decision makers can know how much to depend on reported measurement results.

Given that there are methods for gauging the accuracy of a measurement system, the educational testing industry should be able to provide objective evidence of the accuracy of its measurement processes. Until the industry can prove the accuracy and reliability of its measurement system, if we rely only on test results we risk reaching the wrong conclusions about the conditions of our schools and our students' abilities. If, instead, the educational testing industry exercises measurement quality assurance and provides objective evidence and quantitative support for the accuracy of student testing, we can more reliably use test results as a potential basis for determining the course of improvement in the educational system.

Testing Raises Variety of Concerns

by Robert J. Kattman, Education Division Chair

Quality professionals understand that "If you can't measure it, you can't improve it." Forty-nine states have developed academic standards and required that students be tested to determine their level of proficiency. Many states have mandated testing to determine grade-level advancement and graduation. Because few states have the ability, time, or finances to develop a testing program, they have turned to large corporations that specialize in this area. But the tests and their scoring and use have spawned debate.

The major educational testing companies have had quality control problems that resulted in scoring errors caused by inadequate software design and/or human error-with negative effects on students. When test questions are changed each year and companies constantly try to obtain different analyses from the data, software must be rewritten and errors are introduced. Because security is a major issue, the number of individuals who review the tests and the software is limited. Tight deadlines add additional pressure. These problems must be addressed through quality control measures. It's a matter of improving processes to get it right.

Another concern is whether students should be retained or prevented from graduating on the basis of one exam. Certainly test results should be considered but they should not be the only determinant. Most tests are norm referenced: The student's score is compared to the norm of the group through which the test was developed. Results of a norm-referenced test tell where the student placed in regard to the norm group, not what the student knows and is capable of. Criterion-referenced tests designed to determine what a student really knows about a subject are what's needed.

Finally, we need to measure progress as well as proficiency. Most tests are designed to provide a reading of the students' proficiency-like a defect survey at the end of an assembly line. While this may be appropriate for graduation, within grades it is not. Even automobile manufacturers want more than just a count of the number of defects coming off the assembly line; they want information that will help them improve their product. If real improvement is to be made, we need information on progress over time as well as proficiency levels.

When it comes to measuring schools, the tests used today tell us little that we haven't known for years. Students' proficiency levels correlate with the socio-economic and educational status of their parents. Teachers can do little to control the overall proficiency of student bodies, but they can control the progress made by students in their classes. Recent research confirms that the single most important factor in the progress of a student over a year's time is the quality of the teacher.

There are few valid and reliable measures of student progress. Using tests like the Tennessee Value-Added Assessment System-the only statewide testing program to measure progress of which I'm aware-would allow schools to determine student progress and lack of progress, take corrective action during the school year or summer, identify teachers who are obtaining the best results, and provide training to teachers to improve results.

The political process has created a situation in which students undergo high-stakes testing to determine whether or not they will be promoted or graduate. For the most part, the tests are inappropriate to the task at hand. They do not assist in improvement efforts, and may produce results at odds with other work a student has done. Worse yet, test results may not even be scored correctly.

Parents, students, and educators are justifiably angry. If the situation is not remedied, the backlash will put the standards movement in jeopardy. If the concept of standards is thrown out, the agenda for educational improvement will be set back many years.


4/09/01

Can Quality Concepts and Tools Fix the U.S. Election Process?

Commission Should be Convened; Base Recommendations on Statistics and Technology

by Howard R. Schussler

Just five months ago, the citizens of the United States set out to elect a new president. Yet a full month after the Nov. 7 election day, the name of the winner was still not known due to problems with the election process.

Even now, with the new Bush administration fully in operation, the true impact of election process errors and the importance of continuous improvement to the process are still becoming apparent to Americans, perhaps for the first time.

The concepts, methods and tools of quality can provide the framework for understanding the systemic causes of errors, variation and ultimately inaccuracy, and can provide a basis for improvement to a system that included not just voting, but also judicial and legislative processes related to elections.

A difference of roughly 500 votes out of 5.8 million cast in Florida, 0.009%, is smaller than the predictable number of errors for miscounted votes, miscast votes, incorrectly rejected ballots, and other vote casting and vote counting errors caused by the systems and processes (people + equipment + methods + materials + environment). While no single political party would be favored over time, the number of errors that go unaccounted for would almost certainly impact more than 0.009% of the total votes and could swing the outcome of a single election one way or another.

According to numerous reports and sources, the predictable number of errors in counting prescored punch cards like those used in Palm Beach County, FL, can be as great as 5%. What this means is that the methods used to accurately account for each person's intended vote through the two essential technical processes, vote casting and vote counting, may lack the precision needed to facilitate clear, unambiguous determination of a winner in a very close election. The vote casting and counting methods, equipment, materials, people and environment cannot consistently account for each vote, thereby making the final vote tally questionable.

Flawed process

The 2000 presidential election proved the fundamental importance of process design and improvement. In the minds of many people both in the United States and worldwide, a flawed process-one that probably was intended to be fair-damaged the credibility and legitimacy of nearly every major public institution that touched it: legislative, executive and judicial branches of government at both federal and state of Florida levels.

All measuring systems have uncertainties and inherent levels of error. The unit of measure in this election was votes for president of the United States, with the outcome measure being the will of the people. Errors can be introduced into the measurement process in any of a number of process steps, such as the casting of ballots, machine counting of ballots and hand recounting of ballots. This example oversimplifies the process, but the example provides a sound basis for discussion.

The process is flawed at multiple levels, including ballot design, failure of the voters or voting machines to clearly record the vote, unsatisfactory counting and recounting mechanisms, and inconsistent standards and procedures. What sense does it make to have a law requiring a recount if the vote differential between the top two candidates is 0.5% or less, but the predictable level of vote casting and counting errors is 1% or more?

While media attention focused on Florida, the problem is far more widespread. There are between 4,000 and 5,000 jurisdictions in the United States, often with more than 100 precincts each and little standardization among them. While rigid processes are in place in most jurisdictions to ensure the quality of the system, factors such as the prescored punch cards, like those used in Palm Beach County and my own Lane County, OR, can facilitate errors such as incorrect punching. Rough handling of the cards can also cause errors.

The current system

As Deming's system of profound knowledge tells us, we must understand the system. The current system achieved exactly what it was designed to achieve; it selected the next president of the United States. The system was not necessarily designed to determine the will of the people as measured by a precise system of counting votes.

What do we know about current processes? What data are there to tell us how the processes are operating? Without reliable data to inform decision making related to improvement of elections processes, all decision making will be political or economic. It is not logical to think the solution that is easily supportable or costs the least will adequately improve the process.

Reducing errors requires a systematic approach to problem solving. While this sort of thinking has been embraced by successful private and public sector organizations for many years now, it has not become a part of the public policy dialogue and has certainly not been institutionalized as a part of any national debate.

It is important to determine what the essential characteristics are for an election system and voting process that can accurately measure the intent of all the people. As a minimum, it must be easy for people to register their intent and for that intent to be accurately recorded. Election results must also be verifiable. When the people are evenly divided, the purpose of the process must focus even more narrowly on accurately determining the winner of an election.

Ballots

The design of the ballot can contribute to voting errors. Ballots should be designed to make it easy for people to select their choices accurately and quickly. Of the most common voting methods used in the United States (for example, prescored punch card, optical scan, paper ballot, lever machine and electronic), the punch card method has risen to the surface as the most problematic.

First, the use of prescored punch cards requires that people punch the cards in a way that meets machine specifications, which is not related to the reason people punch the cards-to cast their vote. The stylus used to punch out the chad (the little piece that gets punched out of the ballot card) could be inadequate in terms of shape or sharpness.

There is also a certain rate of error related to the manufacture of the cards. For example, the die that prescores the cards wears down over time. As the dies wear, they leave thicker connections or paper tabs holding the chad to the rest of the card. The cards can be too thin or too thick, causing an extra chad to fall out or causing a hanging chad to be forced back into the ballot card-misstating the voter's choices to the computer.

Additionally, if the cards are not stored in the proper environmental conditions, they could become wet or be exposed to too much humidity and therefore not be counted accurately by the machines.

Then there is the issue of the infamous butterfly ballots used in Palm Beach County, which increased the potential for voting errors. Many voters complained that the ballots were confusing, causing them to punch their selection in the wrong place or vote for more than one candidate.

Statistical principles overlooked

In the postelection media frenzy some statistical principles seem to have been missed. The accuracy of the final vote tally depends on our ability to precisely account for votes through the entire process. Errors can occur at several places in the process:

  • Was the voter able to, and did he or she actually cast a vote?
  • Was the voter able to vote for the candidate of choice?
  • Was the ballot counted?
  • Was the ballot counted correctly?
  • If there was a manual recount, was the voter's intent recorded correctly?

If the answer for any particular vote was no at any place in the process, then one or more errors were introduced into the process.

If at any of these points the error was not detected and corrected, then the overall vote count would be inaccurate.

For example, in a hypothetical election, say 1,000,000 ballots were cast. If one out of every 1,000 people cast their ballots incorrectly - that is, somehow selected a candidate they did not intend to choose - the final vote count would be off by 1,000 from voter intent.

If this also happened in the machine counting and manual counting process steps, the final vote tally could be even more inaccurate. If the official vote differential was 500 votes, we would have no way to determine whether the winning candidate was actually the candidate of choice for a larger number of voters than was the losing candidate. Once errors occur and go uncorrected, they move from one process step to the next and can accumulate to negatively impact the overall process accuracy.

Lack of standards

Because there is no national standard procedure, the likelihood of errors varies from jurisdiction to jurisdiction. Due to a lack of standardization and data collection under realistic circumstances, there is little reliable data available to predict process error rates nationally or locally.

All voters should have the same opportunity to have their votes recorded accurately, regardless of local resources and political barriers. National standards should therefore be in place for the next presidential election in 2004 and should cover ballot design, performance standards for error rates, automatic recounts, manual recounts and voter turnout.

Defining quality in elections

A good first step in examining the process of national elections would be to convene a nonpartisan, quality oriented commission. While political perspectives are a necessary and vital part of our system of governance, this commission should base its dialogue and recommendations on sound statistical and technological principles to ensure that all Americans have an equal opportunity to cast votes and have those votes counted accurately. (The study should also include the issues of increasing voter participation and ensuring equal protection with respect to minority populations, income levels and location.)

The first step in this process must be defining quality in a national election, and this definition is a political question. The only answer to this question so far has been the one inferred by the U.S. Supreme Court-that all voters must have an equal opportunity (or equal protection) to cast a vote and have that vote counted.

From a citizen's perspective, on election night the definition could be an accurate and fast determination of the winner. As time moves on in a close election, quality can become the most accurate count possible. If the system lacks the precision to determine the actual vote count, quality might mean the ability of the system to select a winner who will be viewed as legitimate.

These definitions of quality can be affected by the public's expectation of participation in the process. Just as Thomas Jefferson and Alexander Hamilton argued more than 200 years ago about the role of citizens in self-governance, it is possible that our definition of election quality is changing in an age of information to one that values individual participation over representation. If that is true, then the system must support equal opportunity for all citizens to cast votes and much more accuracy in counting votes than is currently possible.

But a long-term decline in citizen participation as evidenced by voter turnout rates in the United States has raised questions. In spite of easier voter registration, 63% of those eligible voted in 1952, while only 50% did so in 2000.

Some solutions

If we hope to reduce the variation in the total national election process, standardization is an obvious improvement from a professional quality perspective.

Unfortunately, a number of serious barriers to standardization must be overcome. Costs for replacing obsolete technology are carried by local governments that often do not have resources adequate to the task. If faced with a choice of repairing a school, adding police officers or investing in new voting technology, the school and the police will win more often than not.

Following, however, are some general recommendations for improving electoral processes in the United States.

Ballot design. Many ballot formats can contribute to voting errors. The ballots should be standardized and designed to make it easy for people to select their choices accurately and quickly.

Punch cards are inherently prone to errors. The ballots used with optical scanning technology are not only easier to mark but are easier and more accurate in the counting process. Perhaps the most visible improvement from this technology would be in manual vote recounting. When voters circle a box rather than filling it in, the intent of the voter is obvious.

Emerging electronic technologies should also be explored. While some of these technologies can reduce error rates, they may be problematic in terms of providing an audit trail. Furthermore, if voting by mail increases turnout, then electronic voting that excludes the possibility of voting by mail would not be advantageous.

Ballots like the Palm Beach County butterfly ballot could be improved by applying the concepts of poka-yoke (mistake proofing) and signal detection theory to design experiments to identify processes that accommodate voters of all ages and abilities. An experiment does not need to reproduce the entire ballot or all voting conditions. It simply needs to test the improvement hypothesis.

Performance standards for error rates. Standards should ensure the same likelihood for correct voting and vote counting regardless of jurisdiction or local ability to pay for adequate technology. Performance standards should also demand improvement over time, not simply represent a snapshot of what is possible today.

Automatic recounts. If there is a predictable number of errors based on technology, the ballots used and all other contributing factors (say 1% of all votes cast and counted), then the automatic recount should be set by the legislature to at least that predictable level. Further, the recount methods must have a smaller predictable error rate than the initial counting method.

Manual recounts. When machine recounts are still inadequate (for example, the counted vote differential is still smaller than the predictable number of counting errors), there must be clear standards for manually recounting ballots.

The example of the punch card ballots became front- page news, with elections staff holding ballots up to the light to determine the intent of individual voters. Discussion about how to interpret chads hanging by one or two corners or a dimpled chad should have occurred prior to the election as a part of legislative or administrative rule development. In other words, quality should have been designed into the process.

Case in point: Broward County counted dimpled chads as votes, while Palm Beach County did not. Which was truly an accurate count of voter intent? Statewide guidelines in California, for example, state that if a ballot has not been fully punched it can be counted only if the paper chad hangs by a single corner. In the absence of clear and consistent standards, a high rate of error is unavoidable.

Voter turnout. Some experts believe that increased ease of registration and loosening of late registration requirements lead to higher voter turnout. In the State of Oregon it is believed that vote by mail increased turnout by at least 5%, resulting in an 80% turnout in 2000. The tools of quality can be employed jointly by government agencies and their communities to determine the causes for low turnout and to develop strategies to reverse the trend.

For example, if it was determined that computer programmers in the Silicon Valley were not voting, a project team of local elections officials, elected officials and people from the affected population (Silicon Valley programmers) could form a problem solving team. This team could then use a quality tool like cause and effect analysis to determine root causes of the problem (low voter turnout) as a first step to solving the problem (increasing voter turnout).

Develop the will to improve and begin now

The recent spectacle created by national election system and process flaws generated the attention necessary to begin change. If it's true when Philip Crosby says "Quality is everybody's job," then it's very important to ask what we can do now to reduce or eliminate the opportunities for vote casting and counting errors.

Close elections make it absolutely clear that every vote should be counted. Therefore, it behooves us to take the time to make the voting process as mistake proof as possible. In an era when people demand Six Sigma quality in manufactured products, it is not enough that most people had no problem voting when we can make it possible for all people to cast their ballots with no problem.

As our population continues to get older and more diverse, it is critical that we make every effort to accommodate difficulties resulting from increasing age and differing expectations rooted in cultural diversity.

All technologies have numerous inherent opportunities for error, but some have much more than others. If the process or standards applied to a manual recount of punch card ballots are not clear and consistent, the accuracy will be questionable. The presidential election may have been the knockout blow for punch card voting technology. It's noteworthy that in a report written in 1988, Roy G. Saltman of the National Institute of Standards and Technology recommended that this technology be retired from the voting process.1

While it is not my place or intent in this article to identify the many specific improvements that can or should take place, it is clear that the application of quality concepts, methods and tools will lead to improvements in all aspects of the elections process. It is also essential that the process be examined holistically or systemically.

If the purpose of voting and tabulating votes is to accurately reflect the will of the people, then we must reduce errors from each step of the process rather than merely focus on ballots or vote counting technology. If the voice of the people is to be heard accurately, errors must be eliminated from all aspects of voting, and the best available path to removing those errors is by designing quality into the system.


Acknowledgments

The author thanks Peter Sandrock, former district attorney, Benton County, OR; Doug Lewis, director, Voting Systems Secretariat; Gordon Booth, ASQ Statistics Division; and Duane Allen, ASQ Quality Measurement Division, for their assistance with this article.

For more on this subject, go to ASQ's website and the latest issue of Quality Progress

Reference

1. Roy G. Saltman, "CFP '93-Assuring Accuracy, Integrity and Security in National Elections: The Role of the U.S. Congress," National Institute of Standards and Technology (NIST), Feb. 12, 1993.

Author

Howard R. Schussler is chair of ASQ's Government Division and organizational development and service improvement manager for the city of Portland, OR. He earned a bachelor's degree in business from Mount Olive College, Mount Olive, NC. Schussler is a Senior Member of ASQ and an ASQ certified quality manager.