Archive
The issue of testing for academic proficiency has been growing in recent
years, and now the new federal law will require additional testing. Two recent
New York Times articles raised issues that prompted two of the Society's
Division Chairs to respond to . . .
About the New York Times Articles
The articles "Right Answer, Wrong Score: Test Flaws Take Toll" and
"When a Test Fails the Schools, Careers and Reputations Suffer" were
written by Diana B. Henriques and Jacques Steinberg and were published May 20
and May 21, 2001.
In recent years, educational testing companies have experienced serious
breakdowns in quality control. Here is some of what the articles uncovered:
·
- Testing industry errors in the last three years have affected millions of
students who took standardized proficiency tests in at least 20 states.
- The company that scored tests in Minnesota gave 47,000 students lower
scores than they deserved. In the last three years, errors by the same company
produced incorrect scores for students in Arizona, Michigan, and Washington,
and deadlines for delivering test results in Florida and California were
missed.
- Nearly 9,000 students in New York City were mistakenly assigned to summer
school in 1999 because of an error by another big company.
- A mistake in 1997 by a smaller company denied $2 million in achievement
awards to deserving schools in Kentucky.
- In 1998, nearly 700 of California's 8,500 schools got inaccurate test
results, and more than 750,000 students were not included in the statewide
analysis of the test results.
- Scoring of writing samples is handled by temporary workers earning as
little as $9 an hour, some of whom said that they were pressed to score essays
without adequate training and that they saw tests scored in an arbitrary and
inconsistent manner.
- Many industry executives attribute these errors to growing pains. The
industry says its error rate is infinitesimal on the millions of tests scored
annually.
- Interviews with people involved in the testing process suggest the industry
cannot guarantee the error-free, high-speed testing that parents, educators and
politicians expect.
- Unable to uncover serious testing errors on their own, school districts
must rely on the testing companies to do so voluntarily. Lacking federal
oversight, the companies decide what they will disclose and when.
- President Bush has proposed a 50% increase in the workload of the tiny
testing industry. No one has addressed whether the industry can handle the
additional business.
- Testing specialists say educators and politicians share the blame for the
rash of testing errors because they are asking too much of the industry.
- The testing industry's code of conduct specifies that the tests not be the
basis for life-altering decisions about students. But many states use them to
determine whether students can be promoted or granted a diploma, to evaluate
teachers and principals, and to decide how much tax money school districts
receive. Schools' performance on tests can even affect property values in
surrounding neighborhoods.
ASQ Offers Advice to Congressional Health Policy Staff on Prescription Drug
Errors
Although there exists widespread agreement on the causes of prescription
drug errors, there is little consensus on methods for addressing those
problems. A new paper presented to Congressional health policy staff by the
American Society for Quality advocates the wider adoption of proven quality
methods in efforts to reduce medication errors.
As part of ASQs ongoing effort to cultivate contacts with national
policymakers in Washington, officials from ASQ met earlier this year with the
health policy advisor to Congressman Michael Bilirakis of Florida, who asked
for advice from ASQ on ways in which quality methods could be used to reduce
prescription drug errors.
A position paper titled Using Quality Methods to Reduce Prescription
Drug Errors is ASQs response to that request. It was prepared by
ASQ staff with input from members of the Health Care Division, the Food, Drug
& Cosmetic Division and others.
This new paper draws upon ideas presented in the position paper titled
Quality and Quality Improvement in Health Care Services, which was
prepared earlier this year by the Health Care Quality Special Interest Group of
the ASQ Health Care Division and the Society for Health Care Epidemiology of
America.
Review the complete prescriptions paperYou will need Adobe
Acrobat Reader to view and print the complete paper. Adobe Acrobat Reader is
available as a free download at
http://www.adobe.com/products/acrobat/readstep2.html.
Industry Needs Measurement Quality Assurance
by Duane Allen, Measurement Quality Division Chair
Recent articles in the New York Times reported on problems resulting from
errors in reported test results from academic testing programs. Many of the
errors could have been identified or discounted sooner if the educational
testing industry used measurement quality assurance. The industrial and
scientific communities have developed measurement quality assurance techniques
to reduce the chance of mistakes in making judgments based on measurement
results.
Several factors contributed to the errors in scoring. These included
mistakes in grading keys, subjective interpretation of test answers by scorers,
and incorrect adjustment factors to equate test results.
The educational testing industry maintains there is always a risk of some
errors in reported test results. But there are techniques for minimizing the
occurrence of such errors. More important, the established discipline of
measurement science provides ways of identifying and reporting potential errors
associated with a measurement system. Requiring a report of potential errors
and their magnitude would temper decisions made based on test results as well
as encourage the educational testing industry to improve the reliability of
their products.
An important concept developed in the test community is the idea of
measurement uncertainty, a numerical expression of reasonable doubt that is
associated with measurement. For example, a bathroom scale at home rarely shows
one's weight to be the same as on the scale at a doctor's office. Is the weight
reading in the doctor's office scale more accurate? Without more information
about the measurement systems in the doctor's office and at home, one cannot
know. Both scales may be close enough to the true weight that we can know our
health. If the uncertainty of one's weight were important, then we might talk
about an overweight person's weight as being between lower and upper
values-that is, instead of 240 pounds as being between 236 and 243 pounds. For
other health measurements, such as with blood pressure, knowing the uncertainty
may be important.
The expression of measurement uncertainty is not a randomly drawn estimate.
It is based on analysis of the measurement process. The uncertainty allows for
factors that affect the reported measurement value. Using the same example,
one's weight in the doctor's office is often greater because one wears more
clothes than at home.
An important requirement in the International Organization for
Standardization's "Guide to the Expression of Uncertainty in
Measurements" (GUM) demands that when reporting measurement uncertainty
the method of calculating that uncertainty must be included. The GUM requires
identification of potential sources of errors in measurements and provides
methods for combining numerical factors in an expression of uncertainty. By
referring to the reported uncertainty and methods used to calculate it,
decision makers can know how much to depend on reported measurement results.
Given that there are methods for gauging the accuracy of a measurement
system, the educational testing industry should be able to provide objective
evidence of the accuracy of its measurement processes. Until the industry can
prove the accuracy and reliability of its measurement system, if we rely only
on test results we risk reaching the wrong conclusions about the conditions of
our schools and our students' abilities. If, instead, the educational testing
industry exercises measurement quality assurance and provides objective
evidence and quantitative support for the accuracy of student testing, we can
more reliably use test results as a potential basis for determining the course
of improvement in the educational system.
Testing Raises Variety of Concerns
by Robert J. Kattman, Education Division Chair
Quality professionals understand that "If you can't measure it, you
can't improve it." Forty-nine states have developed academic standards and
required that students be tested to determine their level of proficiency. Many
states have mandated testing to determine grade-level advancement and
graduation. Because few states have the ability, time, or finances to develop a
testing program, they have turned to large corporations that specialize in this
area. But the tests and their scoring and use have spawned debate.
The major educational testing companies have had quality control problems
that resulted in scoring errors caused by inadequate software design and/or
human error-with negative effects on students. When test questions are changed
each year and companies constantly try to obtain different analyses from the
data, software must be rewritten and errors are introduced. Because security is
a major issue, the number of individuals who review the tests and the software
is limited. Tight deadlines add additional pressure. These problems must be
addressed through quality control measures. It's a matter of improving
processes to get it right.
Another concern is whether students should be retained or prevented from
graduating on the basis of one exam. Certainly test results should be
considered but they should not be the only determinant. Most tests are norm
referenced: The student's score is compared to the norm of the group through
which the test was developed. Results of a norm-referenced test tell where the
student placed in regard to the norm group, not what the student knows and is
capable of. Criterion-referenced tests designed to determine what a student
really knows about a subject are what's needed.
Finally, we need to measure progress as well as proficiency. Most tests are
designed to provide a reading of the students' proficiency-like a defect survey
at the end of an assembly line. While this may be appropriate for graduation,
within grades it is not. Even automobile manufacturers want more than just a
count of the number of defects coming off the assembly line; they want
information that will help them improve their product. If real improvement is
to be made, we need information on progress over time as well as proficiency
levels.
When it comes to measuring schools, the tests used today tell us little that
we haven't known for years. Students' proficiency levels correlate with the
socio-economic and educational status of their parents. Teachers can do little
to control the overall proficiency of student bodies, but they can control the
progress made by students in their classes. Recent research confirms that the
single most important factor in the progress of a student over a year's time is
the quality of the teacher.
There are few valid and reliable measures of student progress. Using tests
like the Tennessee Value-Added Assessment System-the only statewide testing
program to measure progress of which I'm aware-would allow schools to determine
student progress and lack of progress, take corrective action during the school
year or summer, identify teachers who are obtaining the best results, and
provide training to teachers to improve results.
The political process has created a situation in which students undergo
high-stakes testing to determine whether or not they will be promoted or
graduate. For the most part, the tests are inappropriate to the task at hand.
They do not assist in improvement efforts, and may produce results at odds with
other work a student has done. Worse yet, test results may not even be scored
correctly.
Parents, students, and educators are justifiably angry. If the situation is
not remedied, the backlash will put the standards movement in jeopardy. If the
concept of standards is thrown out, the agenda for educational improvement will
be set back many years.
4/09/01
Can Quality Concepts and Tools Fix the U.S. Election
Process?
Commission Should be Convened; Base Recommendations on Statistics and
Technology
by Howard R. Schussler
Just five months ago, the citizens of the United States set out to elect a
new president. Yet a full month after the Nov. 7 election day, the name of the
winner was still not known due to problems with the election process.
Even now, with the new Bush administration fully in operation, the true
impact of election process errors and the importance of continuous improvement
to the process are still becoming apparent to Americans, perhaps for the first
time.
The concepts, methods and tools of quality can provide the framework for
understanding the systemic causes of errors, variation and ultimately
inaccuracy, and can provide a basis for improvement to a system that included
not just voting, but also judicial and legislative processes related to
elections.
A difference of roughly 500 votes out of 5.8 million cast in Florida,
0.009%, is smaller than the predictable number of errors for miscounted votes,
miscast votes, incorrectly rejected ballots, and other vote casting and vote
counting errors caused by the systems and processes (people + equipment +
methods + materials + environment). While no single political party would be
favored over time, the number of errors that go unaccounted for would almost
certainly impact more than 0.009% of the total votes and could swing the
outcome of a single election one way or another.
According to numerous reports and sources, the predictable number of errors
in counting prescored punch cards like those used in Palm Beach County, FL, can
be as great as 5%. What this means is that the methods used to accurately
account for each person's intended vote through the two essential technical
processes, vote casting and vote counting, may lack the precision needed to
facilitate clear, unambiguous determination of a winner in a very close
election. The vote casting and counting methods, equipment, materials, people
and environment cannot consistently account for each vote, thereby making the
final vote tally questionable.
Flawed process
The 2000 presidential election proved the fundamental importance of process
design and improvement. In the minds of many people both in the United States
and worldwide, a flawed process-one that probably was intended to be
fair-damaged the credibility and legitimacy of nearly every major public
institution that touched it: legislative, executive and judicial branches of
government at both federal and state of Florida levels.
All measuring systems have uncertainties and inherent levels of error. The
unit of measure in this election was votes for president of the United States,
with the outcome measure being the will of the people. Errors can be introduced
into the measurement process in any of a number of process steps, such as the
casting of ballots, machine counting of ballots and hand recounting of ballots.
This example oversimplifies the process, but the example provides a sound basis
for discussion.
The process is flawed at multiple levels, including ballot design, failure
of the voters or voting machines to clearly record the vote, unsatisfactory
counting and recounting mechanisms, and inconsistent standards and procedures.
What sense does it make to have a law requiring a recount if the vote
differential between the top two candidates is 0.5% or less, but the
predictable level of vote casting and counting errors is 1% or more?
While media attention focused on Florida, the problem is far more
widespread. There are between 4,000 and 5,000 jurisdictions in the United
States, often with more than 100 precincts each and little standardization
among them. While rigid processes are in place in most jurisdictions to ensure
the quality of the system, factors such as the prescored punch cards, like
those used in Palm Beach County and my own Lane County, OR, can facilitate
errors such as incorrect punching. Rough handling of the cards can also cause
errors.
The current system
As Deming's system of profound knowledge tells us, we must understand the
system. The current system achieved exactly what it was designed to achieve; it
selected the next president of the United States. The system was not
necessarily designed to determine the will of the people as measured by a
precise system of counting votes.
What do we know about current processes? What data are there to tell us how
the processes are operating? Without reliable data to inform decision making
related to improvement of elections processes, all decision making will be
political or economic. It is not logical to think the solution that is easily
supportable or costs the least will adequately improve the process.
Reducing errors requires a systematic approach to problem solving. While
this sort of thinking has been embraced by successful private and public sector
organizations for many years now, it has not become a part of the public policy
dialogue and has certainly not been institutionalized as a part of any national
debate.
It is important to determine what the essential characteristics are for an
election system and voting process that can accurately measure the intent of
all the people. As a minimum, it must be easy for people to register their
intent and for that intent to be accurately recorded. Election results must
also be verifiable. When the people are evenly divided, the purpose of the
process must focus even more narrowly on accurately determining the winner of
an election.
Ballots
The design of the ballot can contribute to voting errors. Ballots should be
designed to make it easy for people to select their choices accurately and
quickly. Of the most common voting methods used in the United States (for
example, prescored punch card, optical scan, paper ballot, lever machine and
electronic), the punch card method has risen to the surface as the most
problematic.
First, the use of prescored punch cards requires that people punch the cards
in a way that meets machine specifications, which is not related to the reason
people punch the cards-to cast their vote. The stylus used to punch out the
chad (the little piece that gets punched out of the ballot card) could be
inadequate in terms of shape or sharpness.
There is also a certain rate of error related to the manufacture of the
cards. For example, the die that prescores the cards wears down over time. As
the dies wear, they leave thicker connections or paper tabs holding the chad to
the rest of the card. The cards can be too thin or too thick, causing an extra
chad to fall out or causing a hanging chad to be forced back into the ballot
card-misstating the voter's choices to the computer.
Additionally, if the cards are not stored in the proper environmental
conditions, they could become wet or be exposed to too much humidity and
therefore not be counted accurately by the machines.
Then there is the issue of the infamous butterfly ballots used in Palm Beach
County, which increased the potential for voting errors. Many voters complained
that the ballots were confusing, causing them to punch their selection in the
wrong place or vote for more than one candidate.
Statistical principles overlooked
In the postelection media frenzy some statistical principles seem to have
been missed. The accuracy of the final vote tally depends on our ability to
precisely account for votes through the entire process. Errors can occur at
several places in the process:
- Was the voter able to, and did he or she actually cast a vote?
- Was the voter able to vote for the candidate of choice?
- Was the ballot counted?
- Was the ballot counted correctly?
- If there was a manual recount, was the voter's intent recorded correctly?
If the answer for any particular vote was no at any place in the process,
then one or more errors were introduced into the process.
If at any of these points the error was not detected and corrected, then the
overall vote count would be inaccurate.
For example, in a hypothetical election, say 1,000,000 ballots were cast. If
one out of every 1,000 people cast their ballots incorrectly - that is, somehow
selected a candidate they did not intend to choose - the final vote count would
be off by 1,000 from voter intent.
If this also happened in the machine counting and manual counting process
steps, the final vote tally could be even more inaccurate. If the official vote
differential was 500 votes, we would have no way to determine whether the
winning candidate was actually the candidate of choice for a larger number of
voters than was the losing candidate. Once errors occur and go uncorrected,
they move from one process step to the next and can accumulate to negatively
impact the overall process accuracy.
Lack of standards
Because there is no national standard procedure, the likelihood of errors
varies from jurisdiction to jurisdiction. Due to a lack of standardization and
data collection under realistic circumstances, there is little reliable data
available to predict process error rates nationally or locally.
All voters should have the same opportunity to have their votes recorded
accurately, regardless of local resources and political barriers. National
standards should therefore be in place for the next presidential election in
2004 and should cover ballot design, performance standards for error rates,
automatic recounts, manual recounts and voter turnout.
Defining quality in elections
A good first step in examining the process of national elections would be to
convene a nonpartisan, quality oriented commission. While political
perspectives are a necessary and vital part of our system of governance, this
commission should base its dialogue and recommendations on sound statistical
and technological principles to ensure that all Americans have an equal
opportunity to cast votes and have those votes counted accurately. (The study
should also include the issues of increasing voter participation and ensuring
equal protection with respect to minority populations, income levels and
location.)
The first step in this process must be defining quality in a national
election, and this definition is a political question. The only answer to this
question so far has been the one inferred by the U.S. Supreme Court-that all
voters must have an equal opportunity (or equal protection) to cast a vote and
have that vote counted.
From a citizen's perspective, on election night the definition could be an
accurate and fast determination of the winner. As time moves on in a close
election, quality can become the most accurate count possible. If the system
lacks the precision to determine the actual vote count, quality might mean the
ability of the system to select a winner who will be viewed as legitimate.
These definitions of quality can be affected by the public's expectation of
participation in the process. Just as Thomas Jefferson and Alexander Hamilton
argued more than 200 years ago about the role of citizens in self-governance,
it is possible that our definition of election quality is changing in an age of
information to one that values individual participation over representation. If
that is true, then the system must support equal opportunity for all citizens
to cast votes and much more accuracy in counting votes than is currently
possible.
But a long-term decline in citizen participation as evidenced by voter
turnout rates in the United States has raised questions. In spite of easier
voter registration, 63% of those eligible voted in 1952, while only 50% did so
in 2000.
Some solutions
If we hope to reduce the variation in the total national election process,
standardization is an obvious improvement from a professional quality
perspective.
Unfortunately, a number of serious barriers to standardization must be
overcome. Costs for replacing obsolete technology are carried by local
governments that often do not have resources adequate to the task. If faced
with a choice of repairing a school, adding police officers or investing in new
voting technology, the school and the police will win more often than not.
Following, however, are some general recommendations for improving electoral
processes in the United States.
Ballot design. Many ballot formats can contribute to voting errors.
The ballots should be standardized and designed to make it easy for people to
select their choices accurately and quickly.
Punch cards are inherently prone to errors. The ballots used with optical
scanning technology are not only easier to mark but are easier and more
accurate in the counting process. Perhaps the most visible improvement from
this technology would be in manual vote recounting. When voters circle a box
rather than filling it in, the intent of the voter is obvious.
Emerging electronic technologies should also be explored. While some of
these technologies can reduce error rates, they may be problematic in terms of
providing an audit trail. Furthermore, if voting by mail increases turnout,
then electronic voting that excludes the possibility of voting by mail would
not be advantageous.
Ballots like the Palm Beach County butterfly ballot could be improved by
applying the concepts of poka-yoke (mistake proofing) and signal detection
theory to design experiments to identify processes that accommodate voters of
all ages and abilities. An experiment does not need to reproduce the entire
ballot or all voting conditions. It simply needs to test the improvement
hypothesis.
Performance standards for error rates. Standards should ensure the
same likelihood for correct voting and vote counting regardless of jurisdiction
or local ability to pay for adequate technology. Performance standards should
also demand improvement over time, not simply represent a snapshot of what is
possible today.
Automatic recounts. If there is a predictable number of errors based
on technology, the ballots used and all other contributing factors (say 1% of
all votes cast and counted), then the automatic recount should be set by the
legislature to at least that predictable level. Further, the recount methods
must have a smaller predictable error rate than the initial counting method.
Manual recounts. When machine recounts are still inadequate (for
example, the counted vote differential is still smaller than the predictable
number of counting errors), there must be clear standards for manually
recounting ballots.
The example of the punch card ballots became front- page news, with
elections staff holding ballots up to the light to determine the intent of
individual voters. Discussion about how to interpret chads hanging by one or
two corners or a dimpled chad should have occurred prior to the election as a
part of legislative or administrative rule development. In other words, quality
should have been designed into the process.
Case in point: Broward County counted dimpled chads as votes, while Palm
Beach County did not. Which was truly an accurate count of voter intent?
Statewide guidelines in California, for example, state that if a ballot has not
been fully punched it can be counted only if the paper chad hangs by a single
corner. In the absence of clear and consistent standards, a high rate of error
is unavoidable.
Voter turnout. Some experts believe that increased ease of
registration and loosening of late registration requirements lead to higher
voter turnout. In the State of Oregon it is believed that vote by mail
increased turnout by at least 5%, resulting in an 80% turnout in 2000. The
tools of quality can be employed jointly by government agencies and their
communities to determine the causes for low turnout and to develop strategies
to reverse the trend.
For example, if it was determined that computer programmers in the Silicon
Valley were not voting, a project team of local elections officials, elected
officials and people from the affected population (Silicon Valley programmers)
could form a problem solving team. This team could then use a quality tool like
cause and effect analysis to determine root causes of the problem (low voter
turnout) as a first step to solving the problem (increasing voter turnout).
Develop the will to improve and begin now
The recent spectacle created by national election system and process flaws
generated the attention necessary to begin change. If it's true when Philip
Crosby says "Quality is everybody's job," then it's very important to
ask what we can do now to reduce or eliminate the opportunities for vote
casting and counting errors.
Close elections make it absolutely clear that every vote should be counted.
Therefore, it behooves us to take the time to make the voting process as
mistake proof as possible. In an era when people demand Six Sigma quality in
manufactured products, it is not enough that most people had no problem voting
when we can make it possible for all people to cast their ballots with no
problem.
As our population continues to get older and more diverse, it is critical
that we make every effort to accommodate difficulties resulting from increasing
age and differing expectations rooted in cultural diversity.
All technologies have numerous inherent opportunities for error, but some
have much more than others. If the process or standards applied to a manual
recount of punch card ballots are not clear and consistent, the accuracy will
be questionable. The presidential election may have been the knockout blow for
punch card voting technology. It's noteworthy that in a report written in 1988,
Roy G. Saltman of the National Institute of Standards and Technology
recommended that this technology be retired from the voting process.1
While it is not my place or intent in this article to identify the many
specific improvements that can or should take place, it is clear that the
application of quality concepts, methods and tools will lead to improvements in
all aspects of the elections process. It is also essential that the process be
examined holistically or systemically.
If the purpose of voting and tabulating votes is to accurately reflect the
will of the people, then we must reduce errors from each step of the process
rather than merely focus on ballots or vote counting technology. If the voice
of the people is to be heard accurately, errors must be eliminated from all
aspects of voting, and the best available path to removing those errors is by
designing quality into the system.
Acknowledgments
The author thanks Peter Sandrock, former district attorney, Benton County,
OR; Doug Lewis, director, Voting Systems Secretariat; Gordon Booth, ASQ
Statistics Division; and Duane Allen, ASQ Quality Measurement Division, for
their assistance with this article.
For more on this subject, go to ASQ's website and the latest issue of
Quality Progress
Reference
1. Roy G. Saltman, "CFP '93-Assuring Accuracy, Integrity and Security
in National Elections: The Role of the U.S. Congress," National Institute
of Standards and Technology (NIST), Feb. 12, 1993.
Author
Howard R. Schussler is chair of ASQ's Government Division and
organizational development and service improvement manager for the city of
Portland, OR. He earned a bachelor's degree in business from Mount Olive
College, Mount Olive, NC. Schussler is a Senior Member of ASQ and an ASQ
certified quality manager.
|