Ohio logo

Go to Ohio’s Statewide Testing Portal

Ohio Online Assessment Reporting System

High School — Mathematics
Data Analysis and Probability

Initially, students learn about data by gathering and organizing the information. For example, they might chart the daily high temperatures over the course of a week. These activities help them build the foundation for more sophisticated data analysis, like conducting opinion polls and analyzing population trends.

For a task from the National Council of Teachers of Mathematics that further clarifies the concepts in this topic, click here .

The skills that students develop through studying data have real world applications. Students can use and understand the everyday data found in news reports, advertising, research reports, and sports events. Data analyzing skills have many important connections with other areas of mathematics.

Even though students will not receive a score for the Mathematical Processes standard on the Ohio Graduation Test (OGT), it is still an important part of the curriculum. Content and processes should be taught in tandem. To better understand Data Analysis and Probability, click on the dropdown menu and select Mathematical Processes.

The content in this Teaching Tool is largely based on the Ohio Mathematics Content Standards and Benchmarks and includes released items from the OGT. Additionally, these materials are aligned with the NCTM standards . While there are various suggestions and activities here to use when working with students, this Teaching Tool is designed to complement a rigorous, research-based curriculum, not to substitute for one.



Data Analysis and Probability


1. Data Analysis and Probability

Click on the following benchmarks for more information and for links to annotated OGT items.

a.

Benchmark A: Create, interpret, and use graphical displays and statistical measures to describe data; e.g., box-and-whisker plots, histograms, scatter plots, measures of center and variability.

The focus of this benchmark is on creating graphical displays. Students should be able to create a variety of graphical displays, classify the data into such categories as single variable (univariate), two variable (bivariate), quantitative or qualitative, and analyze data using frequency distributions, line of best fit and measures of center.

Click here for an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

Click here for an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

b.

Benchmark B: Evaluate different graphical representations of the same data to determine which is the most appropriate representation for an identified purpose.

There are many kinds of graphs, and they display information in different ways. Students should be able to select the best graph type for the particular data collected. In selecting a graphical display, students should consider the kind of data involved. For example, is it discrete or continuous? Discrete data can be counted; e.g., the number of people in a town is discrete (there is no such thing as a fractional person). Continuous data is data that can be assigned an infinite number of values between whole numbers, the assigned values are approximated; e.g., the size of the apples on an apple tree is continuous data.

Click here for an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

c.

Benchmark C: Compare the characteristics of the mean, median and mode for a given set of data, and explain which measure of center best represents the data.

At this level students move beyond simply calculating mean, median and mode to determining which measure best represents the data. For example, suppose the median home price in a town is $230,000 while the mean home price is $500,000. Students should be able to answer such questions as: Which of these measures is a more accurate representation of ALL the home prices in this town? Why is there such a difference in these two measures of center? How does the mode compare to the other measures of center?

Students should also be able to define range, mean, median and mode and represent them verbally and graphically.

d.

Benchmark D: Find, use and interpret measures of center and spread, such as mean and quartiles, and use those measures to compare and draw conclusions about sets of data.

The focus of this benchmark is on measures of center. Students should be able to compare two sets of data using measures of center and spread. They should also be able to show the relationship between two variables using a variety of graphical displays.

Click here for an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

Click here for an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

e.

Benchmark E: Evaluate the validity of claims and predictions that are based on data by examining the appropriateness of the data collection and analysis.

This benchmark focuses on the ways that data are collected. Students first analyze the size of a sample compared to the target population. Next they analyze the different kinds of studies that can be conducted and how data can be misused in each case.

f.

Benchmark F: Construct convincing arguments based on analysis of data and interpretation of graphs.

The focus of this benchmark is on constructing arguments based on data. Students should be able to make conjectures about scatter plots and approximate the line of best fit. They should also recognize the difference between correlation and causation when looking at bivariate data. Correlation is a measure of the interdependence between two variables or sets of data. Causation is the relationship between two variables when a change in one variable affects the outcome of the other variable.

g.

Benchmark G: Describe sampling methods and analyze the effects of method chosen on how well the resulting sample represents the population.

At this level students should be able to identify different sampling methods. They should also be able to identify the limitations of each method and analyze the effects of random versus biased sampling. Finally, students should be able to explain how bias can be present (either intentionally or unintentionally) in a sample.

Click here for an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

h.

Benchmark H: Use counting techniques, such as permutations and combinations, to determine the total number of options and possible outcomes.

In this benchmark students focus on using counting techniques. They study permutations and combinations and determine when each should be used. A permutation is the possible order or arrangement of a set of events or items. A combination is a selection of a group of items or events from a set without regard to order; e.g., the number of 3-game piece selections from the set of game pieces.

Students also learn to find the number of possible outcomes in a situation using the Fundamental Counting principle.

Click here for an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

Click here for an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

i.

Benchmark I: Design an experiment to test a theoretical probability, and record and explain results.

The benchmark focuses on theoretical probability. Students should know that theoretical probability uses mathematical expectations to identify the number of ways an event could happen compared to all the events that could happen. They should also be able to explain what a sample space is--a list of all possible outcomes of an activity--and be able to use it to calculate probability.

j.

Benchmark J: Compute probabilities of compound events, independent events, and simple dependent events.

This benchmark focuses on finding probabilities of different kinds of events. Students should be able to identify, explain and calculate the probabilities of dependent, independent and compound events. Dependent events are events for which a statement or probability for one event affects a statement or probability for another event. Independent events are two events in which the outcome of one event does not affect the outcome of the other event. Compound events involve considering two or more separate events or outcomes as one single event or outcome.

Students should also be able to represent geometric probability with area models. Geometric probability is the probability that a random point is located in a particular part, or subregion, of a larger region.

Click here for an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

Click here for an annotated item from the 2005 Ohio Graduation Test that addresses this benchmark.

k.

Benchmark K: Make predictions based on theoretical probabilities and experimental results.

The focus of this benchmark is on using probabilities to make predictions. To succeed in this benchmark students should understand theoretical probability and experimental results, and use them to solve problems. Students should also understand, use and convert between odds and probability.

 

ABOUT THE MATH
  • Measures of center

    Measures of center are numbers that provide information about cluster and average of a collection of data. The three measures of center are mean, median and mode.

    • The mean is the sum of a set of numbers divided by the number of elements in the set.
    • The median is the middle number or item in a set of numbers or objects arranged from least to greatest, or the mean of the two middle numbers when the set has two middle numbers.
    • The mode is the number or object that appears most frequently in a set of numbers or objects.

  • Selecting a measure of center

    Selecting the best measure of center depends on the situation at hand. In the following two situations, the median and the mode might be better than the mean as a representation of average:

    • If some of the numbers in a data set are extreme, the median would probably best represent an average. For example, the prices of the last 8 houses sold in a particular area are: $120,000; $125,000; $150,000; $169,000; $189,000; $325,000; $450,000; $515,000. The mean cost of houses, $255,375, is the average of a few very expensive houses and a majority of inexpensive houses. Because the mean is thrown off by the high cost of the few expensive ones, the median of $179,000 gives us a better understanding of the cost of houses in the area.
    • If many of the numbers in a data set are the same, but there are just a few numbers that are different, the mode would probably be the best indication of average. For example, if a shoe store sells 30 size 5 shoes, 110 size 6 shoes, 700 size 7 shoes, 750 size 7.5 shoes and 600 size 8 shoes, the mode is size 7.5 shoes. This number represents the size shoe that most people bought from this shoe store. .

  • Outliers

    An outlier is a data point in a sample that is widely separated from the main cluster of points. For example, if a set of test scores is: 20, 75, 78, 78, 78, 80, 80, 85, 85, 90, then the outlier for this set of data is 20. There are different definitions of what an outlier is. The most common is a data set element that differs by more than 1.5 times the interquartile range (IQR). To find the range for outliers using this definition, subtract 1.5 * IQR from the first quartile and add 1.5 * IQR to the third quartile. If the element is less than the first value or greater than the second, then it is an outlier.

  • Selecting an appropriate graph

    Help students select an appropriate graph type by having them answer the following questions:

    • Does my data set collectively represent something whole (for example, my whole class, everyone in the school, the whole package of jelly beans)?
    • Am I trying to show what part of the whole each item represents (for example, the part of the class that is female or male; the percentage of the school that likes Pizza Day best; the part of the package of jelly beans that is strawberry, orange or grape)?

    If the answer to both these questions is yes, then students should use a circle graph.

    • Does my data set represent a comparison of the frequencies of different categories?

    If the answer is yes, then students should use a line plot or a histogram.

    • Does my data set represent a change over time?
    • Am I trying to show a relationship between two data sets?

    If the answer to both these questions is yes, then students should use a scatterplot. In cases showing change over time, time should be graphed along the horizontal axis.

  • Creating a graph

    Ensure students' graphs include:

    • Labels for axes (including units);
    • Titles;
    • Even intervals.

  • Formulas for combinations and permutations

    There are two important formulas to be remembered when r things are chosen from n things:

    • Permutations are possible orders or arrangements of a set of events or items. Permutations should be used when the order of the events or items is important. For example, there are 6 people from your school on the cross country team. How many different ways can these 6 runners place in the top 3 places in the race? The formula for finding permutations is . In this case n is 6 and r is 3.
    • Combinations are selections of a group of items or events from a set without regard to order. For example, 4 representatives are chosen from a group of 12. How many different ways can those 4 representatives be selected? The formula for finding combinations is . In this case n is 12 and r is 4.

  • Finding simple probabilities

    Students should be able to find the total possible outcomes and the number of favorable outcomes. They should not oversimplify situations that involve outcomes that are not equally likely. One of the rules in finding simple probabilities is that each possible outcome is just as likely to happen as each of the others.

  • Experimental versus theoretical probability

    Theoretical probability is identifying, using mathematical expectations, the number of ways an event could happen compared to all the events that could happen. Experimental probability is the probability based on a series of trials. Experiments can be conducted on all events that can be described with theoretical probability. As trials get larger and larger (more tosses or spins, for example), experimental probability should get closer to theoretical probability.



Strategies

Help With Fundamentals

Listed here are some of the difficulties students might have with this topic, along with a few suggestions for addressing them.



Additional Instruction and Practice

If your students need additional instruction and practice, here are a couple of activities you might want to try.

Activity 1

Let your students play games using a variety of spinners, game pieces and number cubes, so that they can become familiar with the patterns that emerge over time in a succession of random events. (For example, you might hear your students say things like, "It hardly ever lands on 2 for some reason.") Provide opportunities for students to discuss and write about what they have discovered. Encourage students to think about such questions as: How is the probability of landing on a specific space of a fair spinner determined? When rolling a number cube, do the previous results affect the next number I will roll? Using spinners and/or number cubes what is an example of a dependent event? An independent event? A compound event? This will help students to make the necessary connection between the game and the mathematics.

Activity 2

Have students create different graph types using the same data. For example, have your students create graphs of daily high temperatures as a box-and-whisker plot, histogram, and scatter plot. The goal is to become competent at creating and reading data from a variety of graphs. Students should be aware that some graph types may be more suited than others for displaying certain kinds of information.

Activity 3

Students at this level can take a closer look at sports statistics. For regular practice, spend 5-10 minutes discussing and analyzing statistics from a professional or college level sports team. For example, have students calculate the mean, median, mode and range of the Cleveland Brown's team scores. They could also compare these measures of center for home and away games.



Advanced Work

The Standard Deviation

This value tells us whether the data is spread over a large interval of values or whether it all fits in a small interval. Students who study this topic should understand how to compute the standard deviation for a small data set, and how to use a statistics package to compute the standard deviation for a large data set.

Problem 1

Based on the following data for two track teams, which team is more likely to have the first place runner in an upcoming 5 kilometer race?

TeamMean Time on 5KStandard Deviation
Hillside Hares18 minutes2 minutes
Gainesville Greyhounds18 minutes5 minutes


Method

The two track teams have the same mean which means that the teams will probably perform similarly. However, the question asks us whether the winner will come from Hillside or Gainesville. We examine the standard deviations for an answer. Since the Hillside Hares have a standard deviation of 2 minutes, we can assume that their times are clustered around the mean of 18 minutes. The Greyhounds have a larger standard deviation which means that their runners' times are more spread out. This implies that the Greyhounds may have a runner that runs far faster than 18 minutes (and one that runs far slower). We cannot predict for sure, but we can make a valid guess that a member of the Greyhounds will win the race.


Answer

Answer Explanation

A member of the Gainesville Greyhounds will more likely win the race.

Extension

Have your students work in groups to design a survey and collect data from their peers. They should compute the standard deviation of their data. Then, they can compare their methodology and results with other groups.