## Statistics and Probability: Overview

In a survey, data are collected from a population, or a group of individuals that fit a certain description. When a population is so large that collecting data from every member of that group is impractical, a **sample**, or part of the population, is used. One of the most important tasks of a statistician when collecting data from a sample is to make sure that the sample is a **random sample** and not a **biased sample**. In a random sample, all members of a population have an equal chance of being selected. In a biased sample, the data do not reflect the population accurately. A common mistake in obtaining a sample occurs when a researcher collects a random sample from a subset of the population that does not reflect the whole population, as in data collected about political candidates from the readers of a certain newspaper. Since most readers of that newspaper may have the same political leanings, the results would not reflect the voter population in general. A sample that does have characteristics similar to those of the entire population is called a **representative sample**.

Assume we want to find the three most popular flavors of ice cream among 300 sixth-grade students at a school, without having to ask every student. Let's discuss the advantages and disadvantages of the following samples.

- Randomly select 10 students from the list of 300 students.
- Select the first 30 sixth-grade students who get off a particular bus.
- Randomly select 30 students from the list of 300 students.

The first sample, consisting of only 10 students, is too small and may not reflect the whole population of students. The second sample, 30 students getting off a bus, is a biased sample, because not all students come to school on buses, so not all students would have an equal chance of being selected for the sample. The third sample seems to be large enough to provide a good idea about the views of the 300 students, and since the students are being selected randomly, everyone would have an equal chance of being chosen.

The primary purpose of finding a representative sample is so that predictions can be made about the whole population. This is usually done by setting up a proportion between the outcomes from the representative sample and the predicted outcomes of the whole population. Consider the following problem. Assume that in a state's race for governor, 6 million voters are expected to vote. A recent poll of 1,500 voters from a representative sample showed 920 voters for Candidate *A* and 580 voters for Candidate *B,* how many voters would we expect to vote for Candidate *A* if the vote were taken today?

It is important that your students learn to look critically at displays of data. The use of incorrect scales often misleads readers. For example, if you want to make a small difference between two amounts look large, you would use a small scale and not show the whole picture. In the graph shown below, the price difference between two brands of adhesive-bandage strips is 4 cents, $1.39 compared to $1.43. Yet the graph makes it look as if there is a great difference in price.

It is also important that your students read the keys and carefully examine the scales provided with many data displays, especially if they are comparing one company's data display to another company's. Examine the two graphs below.

At first glance, it might look as if the Gabby Communications company has grown more than the Dial-Me Communications company because of the steepness of the line graph. However, when you compare the scales, the Gabby company increases by 50 as the line moves vertically on the graph, and the Dial-Me company increases by 200 as the line moves vertically. The Gabby company increased by only about 200 customers during the four-month span, while the Dial-Me company increased by almost 900 customers.

Probability is another topic that is receiving more attention at earlier levels of mathematics education. Students need to understand the fundamental concepts underlying probability. The result of an experiment is an outcome, and one or more outcomes make up an event. The probability of an event is the likelihood that the event will occur. If the outcomes of an experiment are all equally likely, such as flipping a coin and getting either heads or tails, then the probability of an event A is given by the formula:

If the probability of an event is based on data collected through an experiment, then it is referred to as **experimental probability**. A baseball player's batting average is an experimental probability. It is found by dividing the total number of official at-bats into the number of hits a player had. If the probability of an event is found by calculating the expected result through mathematical reasoning, then it is called **theoretical probability**. For example, heads or tails are the only outcomes possible when you flip a coin, and they are equally likely outcomes. Therefore, the theoretical probability of flipping a coin and getting heads is . When calculating the probability of an event, you need to know the number of outcomes in the event and the total number of outcomes. The total number of possible outcomes is called the **sample space**.

The **fundamental counting principle** states that if an event has two parts with *m* different ways to do the first part and *n* different ways to do the second part, then the total number of ways for that event to occur is *m *x *n.* You can use this principle to calculate probabilities. Suppose you draw two marbles, one at a time, from a bag containing 5 red marbles, 3 blue marbles, and 1 yellow marble of identical size without replacing the first marble after it is drawn. Since there are 9 possible outcomes for the first draw and 8 possible outcomes for the second draw, the event can occur in 9 x 8, or 72, different ways. The number of ways of selecting two red marbles is 5 x 4, or 20, ways. Therefore, the probability of drawing two red marbles, one after the other, is , or .

**Disjoint events** are events that have no outcome in common. Using the same bag of marbles mentioned in the previous problem, suppose you draw one marble instead of two. The probability of drawing a red marble is , and the probability of drawing a blue marble is . Since the events are disjoint, the probability of drawing either a red or blue marble in one draw is + , or .

A **compound event** consists of the outcomes of two or more events. The events that make up a compound event can be independent events or dependent events. Two events are said to be **independent events** if the outcome of one has no effect on the outcome of the other. For example, if you were to roll a 1−6 number cube and then flip a coin, the events of rolling a 4 and of getting a heads are independent events. The probability of these two events both occurring is the product of the two probabilities. In this case, the probability of rolling a 4 and then flipping heads is x , or . Similarly, if you were to toss a coin three times, since each toss is not affected by the previous toss, the probability of getting three heads in a row is x x , or .

When the outcome of one event in a compound event *is* affected by the outcome of another event, the events are said to be **dependent events**. Going back to the bag of marbles, drawing two marbles without replacing the first one is an example of dependent events. Another way to find the probability of dependent events is to multiply the probability of the first event by the probability of the second event, given that the first event occurred. Let's look at the event of drawing two red marbles. The probability of drawing a red marble to start with is . Now that you have drawn a red marble, the probability of drawing a second red marble is , or . Therefore, the probability of drawing two red marbles is x , or . Notice that the value is the same as when we calculated the probability by using the fundamental counting principle.

Similarly, we could calculate the probability of drawing a blue marble followed by a red marble. The probability of drawing a blue marble is or and the probability of drawing a red marble after having drawn a blue marble is . Therefore, the probability of drawing a blue marble followed by a red marble is x , or .