## Collect and Analyze Data

Data
If a collection of data contains only a few numbers, then there is little need to organize the numbers in order to analyze the numbers. When there are many numbers in a collection of data, it becomes important to organize the data in a meaningful way so that they can be analyzed. Notice that the word data is the plural of datum. A collection of numbers or data is also called a data set.

One way of representing such data is to express the associated numerical values with tally marks, rather than using our base-ten positional number system. This primitive form of notation calls for representing a number n by making n marks. To make the marks easier to count, they are often organized in groups of five, each group having four vertical marks followed by a slash mark diagonally across them.

 2 is written 7 is written 14 is written

One advantage of this kind of representation of numbers is that it allows a number to be replaced by a greater one by simply adding more marks. Also, the size of the symbol corresponds to the size of the number represented. The fact that 81 is much greater than 29 becomes vividly clear if tally marks are used in place of positional notation.

Statistics
Statistics is the field of mathematics that provides ways of analyzing and interpreting data.

In analyzing data, there are three useful measures of central tendency. These are the mean, the median, and the mode. Each describes the data in a different way. Look at the following example.

Example:
Suppose 9 children practice their music lessons for the following numbers of hours in a given week.

2, 2, 2, 2, 3, 4, 4, 4, 13

How much did a typical child practice? There are three ways to answer this question. Each is correct.

Mean The mean, or average, of the data is the sum of the numbers divided by the total number of entries.

Mean: (2 + 2 + 2 + 2 + 3 + 4 + 4 + 4 + 13) ÷ 9 = 4
Four hours is a good measure for a typical child because if each child practiced for 4 hours, the total number of hours practiced would be the same.

Median The median of the data is the number in the middle when the numbers are arranged in increasing order. When there are an even number of data entries, the median is the sum of the two middle numbers divided by two.

The median is 3 hours. Three hours is a good measure for a typical child because four children practiced for more than 3 hours and four children practiced for less than 3 hours.

Mode The mode is the number that appears most often in a data set. Sometimes there is more than one mode. Sometimes there is no mode.

The number 2 appears most often, so 2 is the mode. Two hours is a good measure for a typical child because more children practiced for 2 hours than any other amount of time.

Identifying the range of the data as well as clusters, gaps, and outliers is also useful for analyzing data.

Range The range of the data is the difference between the greatest and the least values in the data set. The range tells how spread out the data are.

Range: 13 − 2 = 11

Outliers An outlier has a value that is much greater than or much less than other data in the set. An outlier may significantly affect the mean of a data set. A single outlier will not affect the mode(s) and is likely to affect the median only slightly. Outliers are easy to see when the data are shown on a line plot.

 with outlier without outlier Mean 4 2.875 Median 3 2.5 Mode 2 2

Teaching Model 14.5: Stem-and-Leaf Plots