Fundamentals of Statistics
- The Five Basic Words of Statistics
- The Branches of Statistics
- Sources of Data
- Sampling Concepts
- Sample Selection Methods
- One-Minute Summary
- Test Yourself
- Answers to Test Yourself Questions
- References
1.1 The Five Basic Words of Statistics
1.2 The Branches of Statistics
1.3 Sources of Data
1.4 Sampling Concepts
1.5 Sample Selection Methods
One-Minute Summary
Test Yourself
Every day, you encounter numerical information that describes or analyzes some aspect of the world you live in. For example, here are some news items that appeared in the pages ofThe New York Timesduring a one-month period:
Between 1969 and 2001, the rate of forearm fractures rose 52% for girls and 32% for boys, with the largest increases among children in early puberty, according to a recent Mayo Clinic study.
Across the New York metropolitan area, the median sales price of a single-family home has risen by 75% since 1998, an increase of more than $140,000.
A study that explored the relationship between the price of a book and the number of copies of a book sold found that raising prices by 1% reduced sales by 4% at BN.com, but reduced sales by only 0.5% atAmazon.com.
Such stories as these would not be possible to understand withoutstatistics,反对的数学分支sists of methods of processing and analyzing data to better support rational decision-making processes. Using statistics to better understand the world means more than just producing a new set of numerical information—you mustinterpretthe results by reflecting on the significance and the importance of the results to the decision-making process you face. Interpretation also means knowing when to ignore results, either because they are misleading, are produced by incorrect methods, or just restate the obvious, as this news story "reported" by the comedian David Letterman illustrates:
USA Todayhas come out with a new survey. Apparently, 3 out of every 4 people make up 75% of the population.
As newer technologies allow people to process and analyze ever-increasing amounts of data, statistics plays an increasingly important part of many decision-making processes today. Reading this chapter will help you understand the fundamentals of statistics and introduce you to concepts that are used throughout this book.
1.1 The Five Basic Words of Statistics
The five wordspopulation, sample, parameter, statistic(singular), andvariableform the basic vocabulary of statistics. You cannot learn much about statistics unless you first learn the meanings of these five words. |
Population
CONCEPTAll the members of a group about which you want to draw a conclusion.
EXAMPLESAll U.S. citizens who are currently registered to vote, all patients treated at a particular hospital last year, the entire daily output of a cereal factory's production line.
Sample
CONCEPTThe part of the population selected for analysis.
EXAMPLESThe registered voters selected to participate in a recent survey concerning their intention to vote in the next election, the patients selected to fill out a patient-satisfaction questionnaire, 100 boxes of cereal selected from a factory's production line.
Parameter
CONCEPTA numerical measure that describes a characteristic of a population.
EXAMPLESThe percentage of all registered voters who intend to vote in the next election, the percentage of all patients who are very satisfied with the care they received, the average weight of all the cereal boxes produced on a factory's production line on a particular day.
Statistic
CONCEPTA numerical measure that describes a characteristic of a sample.
EXAMPLESThe percentage in a sample of registered voters who intend to vote in the next election, the percentage in a sample of patients who are very satisfied with the care they received, the average weight of a sample of cereal boxes produced on a factory's production line on a particular day.
INTERPRETATIONCalculating statistics for a sample is the most common activity, because collecting population data is impractical for most actual decision-making situations.
Variable
CONCEPTA characteristic of an item or an individual that will be analyzed using statistics.
EXAMPLESGender, the household income of the citizens who voted in the last presidential election, the publishing category (hardcover, trade paperback, mass-market paperback, textbook) of a book, the number of varieties of a brand of cereal.
INTERPRETATIONAll the variables taken together form the data of an analysis. Although you may have heard people saying that they are analyzing their data, they are, more precisely, analyzing their variables.
You should distinguish between a variable, such as gender, and itsvaluefor an individual, such as male. Anobservationis all the values for an individual item in the sample. For example, a survey might contain two variables, gender and age. The first observation might be male, 40. The second observation might be female, 45. The third observation might be female, 55. Avariableis sometimes known as a column of data because of the convention of entering each observation as a unique row in a table of data. (Likewise, you may hear some refer to an observation as a row of data.)
Variables can be divided into the following types:
Categorical Variables |
Numerical Variables |
|
---|---|---|
Concept |
The values of these variables are selected from an established list of categories. |
The values of these variables involve a counted or measured value. |
Subtypes |
None. |
Discrete valuesare counts of things. Continuous valuesare measures, and any value can theoretically occur, limited only by the precision of the measuring process. |
Examples |
Gender, a variable that has the categories male and female. Academic major, a variable that might have the categories English, Math, Science, and History, among others. |
The number of previous presidential elections in which a citizen voted, a discrete numerical variable. The household income of a citizen who voted, a continuous variable. |
All variables should have anoperational definition—that is, a universally-accepted meaning that is clear to all associated with an analysis. Without operational definitions, confusion can occur. A famous example of such confusion was the tallying of votes in Florida during the 2000 U.S. presidential election in which, at various times, nine different definitions of a valid ballot were used. (A later analysis[1]determined that three of these definitions, including one pursued by Al Gore, led to margins of victory for George Bush that ranged from 225 to 493 votes and that the six others, including one pursued by George Bush, led to margins of victory for Al Gore that ranged from 42 to 171 votes.) |