Initially, students learn about data by gathering and organizing the information.
For example, they might chart the daily high temperatures over the course
of a week. These activities help them build the foundation for more sophisticated
data analysis, like conducting opinion polls and analyzing population trends.
For a task from the National Council of Teachers of Mathematics that further
clarifies the concepts in this topic, click
here
.
The skills that students develop through studying data have real world
applications. Students can use and understand the everyday data found in news
reports, advertising, research reports, and sports events. Data analyzing
skills have many important connections with other areas of mathematics.
Even though students will not receive a score for the Mathematical Processes
standard on the Ohio Graduation Test (OGT), it is still an important part
of the curriculum. Content and processes should be taught in tandem. To better
understand Data Analysis and Probability, click on the dropdown menu and select
Mathematical Processes.
The content in this Teaching Tool is largely based on the Ohio
Mathematics Content Standards and Benchmarks
and includes released
items from the OGT. Additionally, these materials are aligned with the NCTM standards
.
While there are various suggestions and activities here to use when working
with students, this Teaching Tool is designed to complement a rigorous, researchbased
curriculum, not to substitute for one.

Click on the following benchmarks for more information and for links to
annotated OGT items.
 
a.
 Benchmark A: Create, interpret, and use graphical displays and statistical
measures to describe data; e.g., boxandwhisker plots, histograms, scatter
plots, measures of center and variability.
The focus of this benchmark is on creating graphical displays. Students
should be able to create a variety of graphical displays, classify the data
into such categories as single variable (univariate), two variable (bivariate),
quantitative or qualitative, and analyze data using frequency distributions,
line of best fit and measures of center.
Click here
for
an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.
Click here
for
an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

 
b.
 Benchmark B: Evaluate different graphical representations of the same
data to determine which is the most appropriate representation for an identified
purpose.
There are many kinds of graphs, and they display information in different
ways. Students should be able to select the best graph type for the particular
data collected. In selecting a graphical display, students should consider
the kind of data involved. For example, is it discrete or continuous? Discrete
data can be counted; e.g., the number of people in a town is discrete (there
is no such thing as a fractional person). Continuous data is data that can
be assigned an infinite number of values between whole numbers, the assigned
values are approximated; e.g., the size of the apples on an apple tree is
continuous data.
Click here
for
an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

 
c.
 Benchmark C: Compare the characteristics of the mean, median and mode
for a given set of data, and explain which measure of center best represents
the data.
At this level students move beyond simply calculating mean, median and
mode to determining which measure best represents the data. For example, suppose
the median home price in a town is $230,000 while the mean home price is $500,000.
Students should be able to answer such questions as: Which of these measures
is a more accurate representation of ALL the home prices in this town? Why
is there such a difference in these two measures of center? How does the mode
compare to the other measures of center?
Students should also be able to define range, mean, median and mode and
represent them verbally and graphically.

 
d.
 Benchmark D: Find, use and interpret measures of center and spread,
such as mean and quartiles, and use those measures to compare and draw conclusions
about sets of data.
The focus of this benchmark is on measures of center. Students should be
able to compare two sets of data using measures of center and spread. They
should also be able to show the relationship between two variables using a
variety of graphical displays.
Click here
for
an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.
Click here
for
an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

 
e.
 Benchmark E: Evaluate the validity of claims and predictions that are
based on data by examining the appropriateness of the data collection and
analysis.
This benchmark focuses on the ways that data are collected. Students first
analyze the size of a sample compared to the target population. Next they
analyze the different kinds of studies that can be conducted and how data
can be misused in each case.

 
f.
 Benchmark F: Construct convincing arguments based on analysis of data
and interpretation of graphs.
The focus of this benchmark is on constructing arguments based on data.
Students should be able to make conjectures about scatter plots and approximate
the line of best fit. They should also recognize the difference between correlation
and causation when looking at bivariate data. Correlation is a measure of
the interdependence between two variables or sets of data. Causation is the
relationship between two variables when a change in one variable affects the
outcome of the other variable.

 
g.
 Benchmark G: Describe sampling methods and analyze the effects of method
chosen on how well the resulting sample represents the population.
At this level students should be able to identify different sampling methods.
They should also be able to identify the limitations of each method and analyze
the effects of random versus biased sampling. Finally, students should be
able to explain how bias can be present (either intentionally or unintentionally)
in a sample.
Click here
for
an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

 
h.
 Benchmark H: Use counting techniques, such as permutations and combinations,
to determine the total number of options and possible outcomes.
In this benchmark students focus on using counting techniques. They study
permutations and combinations and determine when each should be used. A permutation
is the possible order or arrangement of a set of events or items. A combination
is a selection of a group of items or events from a set without regard to
order; e.g., the number of 3game piece selections from the set of game
pieces.
Students also learn to find the number of possible outcomes in a situation
using the Fundamental Counting principle.
Click here
for
an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.
Click here
for
an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

 
i.
 Benchmark I: Design an experiment to test a theoretical probability,
and record and explain results.
The benchmark focuses on theoretical probability. Students should know
that theoretical probability uses mathematical expectations to identify the
number of ways an event could happen compared to all the events that could
happen. They should also be able to explain what a sample space isa
list of all possible outcomes of an activityand be able to use it to
calculate probability.

 
j.
 Benchmark J: Compute probabilities of compound events, independent events,
and simple dependent events.
This benchmark focuses on finding probabilities of different kinds of events.
Students should be able to identify, explain and calculate the probabilities
of dependent, independent and compound events. Dependent events are events
for which a statement or probability for one event affects a statement or
probability for another event. Independent events are two events in which
the outcome of one event does not affect the outcome of the other event. Compound
events involve considering two or more separate events or outcomes as one
single event or outcome.
Students should also be able to represent geometric probability with area
models. Geometric probability is the probability that a random point is located
in a particular part, or subregion, of a larger region.
Click here
for
an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.
Click here
for
an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

 
k.
 Benchmark K: Make predictions based on theoretical probabilities and
experimental results.
The focus of this benchmark is on using probabilities to make predictions.
To succeed in this benchmark students should understand theoretical probability
and experimental results, and use them to solve problems. Students should
also understand, use and convert between odds and probability.

ABOUT THE MATH

Selecting a measure of center
Selecting the best measure of center depends on the situation at hand.
In the following two situations, the median and the mode might be better than
the mean as a representation of average:
 If some of the numbers in a data set are extreme, the median would
probably best represent an average. For example, the prices of the last 8
houses sold in a particular area are: $120,000; $125,000; $150,000; $169,000;
$189,000; $325,000; $450,000; $515,000. The mean cost of houses, $255,375,
is the average of a few very expensive houses and a majority of inexpensive
houses. Because the mean is thrown off by the high cost of the few expensive
ones, the median of $179,000 gives us a better understanding of the cost of
houses in the area.
 If many of the numbers in a data set are the same, but there are
just a few numbers that are different, the mode would probably be the best
indication of average. For example, if a shoe store sells 30 size 5 shoes,
110 size 6 shoes, 700 size 7 shoes, 750 size 7.5 shoes and 600 size 8 shoes,
the mode is size 7.5 shoes. This number represents the size shoe that most
people bought from this shoe store. .

Outliers
An outlier is a data point in a sample that is widely separated from the
main cluster of points. For example, if a set of test scores is: 20, 75, 78,
78, 78, 80, 80, 85, 85, 90, then the outlier for this set of data is 20. There
are different definitions of what an outlier is. The most common is a data
set element that differs by more than 1.5 times the interquartile range (IQR).
To find the range for outliers using this definition, subtract 1.5 * IQR from
the first quartile and add 1.5 * IQR to the third quartile. If the element
is less than the first value or greater than the second, then it is an outlier.

Selecting an appropriate graph
Help students select an appropriate graph type by having them answer the
following questions:
 Does my data set collectively represent something whole (for example,
my whole class, everyone in the school, the whole package of jelly beans)?
 Am I trying to show what part of the whole each item represents
(for example, the part of the class that is female or male; the percentage
of the school that likes Pizza Day best; the part of the package of jelly
beans that is strawberry, orange or grape)?
If the answer to both these questions is yes, then students should use
a circle graph.
 Does my data set represent a comparison of the frequencies of different
categories?
If the answer is yes, then students should use a line plot or a histogram.
 Does my data set represent a change over time?
 Am I trying to show a relationship between two data sets?
If the answer to both these questions is yes, then students should use
a scatterplot. In cases showing change over time, time should be graphed along
the horizontal axis.

Finding simple probabilities
Students should be able to find the total possible outcomes and the number
of favorable outcomes. They should not oversimplify situations that involve
outcomes that are not equally likely. One of the rules in finding simple probabilities
is that each possible outcome is just as likely to happen as each of the others.

Experimental versus theoretical probability
Theoretical probability is identifying, using mathematical expectations,
the number of ways an event could happen compared to all the events that could
happen. Experimental probability is the probability based on a series of trials.
Experiments can be conducted on all events that can be described with theoretical
probability. As trials get larger and larger (more tosses or spins, for example),
experimental probability should get closer to theoretical probability.
