Finding, collecting and organising data

  • Select a sample using a simple random sample and explain why randomness matters.
  • Identify and avoid bias; understand representativeness.
  • Describe sample surveys, observational studies, and designed experiments.
  • Plan how to collect and organise data to answer a statistics question.
Select a sample using a simple random sample and explain why randomness matters

A simple random sample gives every member of a population an equal chance of being chosen. Random selection removes personal choice and helps make the sample fair and unbiased.

Examples of random sampling: using random numbers, names drawn from a hat, or a random generator.

Sample Question:
A teacher wants to ask 10 students from a class of 30 about their study habits. Suggest a fair way to choose the 10 students and explain why this method is suitable.
Assign numbers 1–30 and use a random number generator to pick 10. Each student has the same chance of selection, so the sample is random and fair.
Identify and avoid bias; understand representativeness

Bias occurs when some groups are over- or under-represented in a sample. A sample should be representative — reflecting the mix of the full population.

Bias can arise through poor wording, convenience sampling, or asking only one type of person.

Sample Question:
A survey on healthy eating is carried out by interviewing shoppers leaving a gym. Identify one problem with this sampling method.
The sample is biased — gym users are more likely to eat healthily, so the results won’t represent the general population.
Describe sample surveys, observational studies, and designed experiments

Sample surveys ask questions of a group chosen from the population to estimate population values.

Observational studies involve watching and recording without influencing outcomes.

Designed experiments involve deliberate changes to see their effect on another variable.

Sample Question:
For each situation below, decide the study type:
(a) Students record how many hours they sleep each night.
(b) Scientists test the effect of a new fertiliser on plant growth.
(c) A survey asks 200 people which streaming service they use.
(a) Observational study
(b) Designed experiment
(c) Sample survey
Plan how to collect and organise data to answer a statistics question

When collecting data, clearly define your question, population, sample size, and method of collection. Data should be recorded systematically and displayed in organised tables or charts.

Always consider: Who will you ask? How will you record results? What graph or table will make patterns easy to see?

Sample Question:
You want to find out how first-year students travel to school. Outline a plan showing how you would collect and present your data.
  • Define the population — all first-year students.
  • Take a random sample of about 30 students.
  • Use a short questionnaire (bus / walk / car / cycle).
  • Record results in a tally table and display in a bar chart or pie chart.

Representing data graphically and numerically

  • Represent data using graphs such as bar charts, histograms, and stem-and-leaf plots (including back-to-back plots).
  • Display relationships between two variables using a scatterplot.
  • Understand and interpret correlation — identify positive, negative, and no correlation.
  • Recognise that correlation does not imply causation.
  • Describe data numerically using mean, median, mode, and range.
  • Measure spread using interquartile range (IQR) and standard deviation.
  • Recognise outliers and describe skewness (symmetric, left, or right).

Histograms (continuous numerical data)

Use a histogram for measurements (e.g., heights, times). The horizontal axis shows class intervals; the vertical axis shows frequency (or frequency density if classes are unequal).

  • Equal-width classes: bar height ∝ frequency.
  • Unequal-width classes: use frequency density = frequency ÷ class width; bar area ∝ frequency.
  • Describe shape: symmetric / left-skewed / right-skewed; look for outliers.
  • Compare centres (median/mean) and spreads (IQR/SD) between groups where appropriate.
Sample Question: A histogram with class widths 5, 10, 10 shows the tallest bar in the narrowest class. Can we conclude that class has the largest frequency?
Not necessarily. With unequal class widths, bar height represents frequency density. You must compare areas to compare frequencies.

Pie charts (categorical proportions)

Use a pie chart to show how a whole is divided into categories (each slice a proportion of 360°).

  • Angle for a category = (category frequency ÷ total) × 360°.
  • Best for showing proportions, not exact counts; avoid too many tiny slices.
  • Always include a clear title and legend/labels (percentages or angles).
Sample Question: Out of 120 students, 36 cycle to school. What angle should the “Cycle” slice have?
\( \frac{36}{120}\times 360^\circ = 0.3\times 360^\circ = 108^\circ \).

Stem-and-leaf plots (quick distribution sketch)

Useful for small–medium data sets. Keep the raw values visible while showing the distribution shape.

  • Choose a stem (leading digits) and leaf (last digit). Include a key (e.g., 6 | 3 = 63).
  • Use a back-to-back stem plot to compare two groups on the same stems.
  • Comment on centre, spread, shape, and outliers.
Sample Question: A back-to-back stem plot shows two classes with similar centres but Class B has leaves spread further out. What does this mean?
Class B has greater variability (wider spread). The typical value is similar, but results are more dispersed.

Scatter graphs (bivariate relationships)

Show the relationship between two numerical variables. Look for direction, strength, and linearity.

  • Direction: positive / negative / none. Strength: how tightly points cluster.
  • Correlation coefficient \(r\) is between −1 and +1 (at OL you match plots to “strong/weak, ±”).
  • Correlation ≠ causation: a strong link does not prove one variable causes the other.
  • Identify outliers and consider their effect on the relationship.
Sample Question: A scatter graph of study time vs. test score shows a strong positive trend, with one point far below the line of most points. What should you report?
There is a strong positive association overall, but an outlier with lower marks than expected for the study time; mention both in your interpretation.

Supporting numerical summaries

  • Centre: mean, median, mode (choose what suits the context and data type).
  • Spread: range, interquartile range (IQR), standard deviation (SD).
  • Use median & IQR if the data are skewed or contain outliers; mean & SD for roughly symmetric data.

Histogram Quiz

Score: 0 / 0

Each question shows a new histogram (continuous classes). Choose the best answer.

Result

Analysing, interpreting and drawing inferences from data

  • Interpret and summarise data in context, explaining what it means about the population studied.
  • Understand how sampling variability can affect conclusions and why larger samples are more reliable.
  • Use graphs, averages, and measures of spread to describe and compare data sets.
  • Draw inferences and make judgements, using everyday language to explain findings clearly.
  • Recognise when conclusions may not be valid due to bias or insufficient data.