Skip to main content
 

MATH2801: Data Science & Statistical Modelling II

Please ensure you check the module availability box for each module outline, as not all modules will run in each academic year. Each module description relates to the year indicated in the module availability box, and this may change from year to year, due to, for example: changing staff expertise, disciplinary developments, the requirements of external bodies and partners, and student feedback.

Type Open
Level 2
Credits 20
Availability Available in 2025/2026
Module Cap
Location Durham
Department Mathematical Sciences

Prerequisites

  • One of: Calculus I (Maths Hons) (MATH1081) OR Calculus I (MATH1061)
  • AND
  • one of: Linear Algebra I (Maths Hons) (MATH1091) OR Linear Algebra I (MATH1071)
  • AND:
  • Probability I (MATH1597)
  • AND:
  • Statistics I (MATH1617)

Corequisites

  • Statistical Inference (MATH2761)

Excluded Combinations of Modules

  • None

Aims

  • To equip students with the skills to import, explore, manipulate, visualise and report real data sets using the statistical programming language R.
  • To introduce students to the concepts and mathematics behind sampling and sampling- based estimators.
  • To provide a working knowledge of the theory, computation and practice of the linear model.

Content

  • First half (Data Science):
  • Modern usage of R: fundamentals of vectors, lists, data frames, data types, data visualization (base).
  • Data wrangling: (tidy data with tidyr, data manipulation with dplyr, pipelines).
  • Advanced graphics: ggplot2.
  • Reporting tools and interactive dashboards: R Markdown, Shiny.
  • Dates and strings.
  • Monte Carlo hypothesis testing.
  • Bootstrap resampling: parametric and non-parametric.
  • Monte Carlo integration: approximating expectations, accuracy of approximation, sources of randomness.
  • Generating random variables: inverse transform, rejection methods, importance sampling, discrete.
  • Second half (Statistical Modelling)
  • Review / introduction: multivariate normal distribution, Mahalanobis distance.
  • Linear models: Estimation, inference and prediction.
  • Factors, analysis of variance (ANOVA): full and partial F-tests, sequential ANOVA.
  • Model selection: Akaike information criterion (AIC), Mallowss statistic (Cp).
  • Diagnostics & Transformations: residuals, influence, Cooks distance, Box-Cox transformation.

Learning Outcomes

Subject-specific Knowledge:

  • By the end of the module students will:
  • Have a solid foundation in the R programming language;
  • Be able to import and manipulate real world data sets using modern libraries in the R ecosystem;
  • Be able to perform an exploratory data analysis including a variety of visualisations;
  • Understand the mathematics (methodology and theory) of sampling-based estimators and simple Monte Carlo simulation;
  • Be able to use simulation approaches and apply the mathematics of sampling-based estimators to real world statistics problems.
  • Be able to formulate a given problem in terms of the linear model and use the acquired skills to solve it;
  • Have developed a set of skills to assess the suitability of a given linear model, and to compare it with competing models;
  • Have a systematic and coherent understanding of the theory and mathematics underlying the statistical methods studied;
  • Be able to relate the conceptual framework to practical implementations of the methods;
  • Have acquired a coherent body of knowledge on regression methodology, based on which extensions of the linear model such as generalized models or nonparametric regression can be learnt and understood.

Subject-specific Skills:

  • Students will have foundational skills in data science, specifically in data import, manipulation and exploration.
  • Students will have basic mathematical and statistical skills in the following areas: modelling, computation, simulation and sampling-based methodology.

Key Skills:

  • Students will have basic skills in the following: synthesis of data, critical and analytical thinking, computer skills.

Modes of Teaching, Learning and Assessment and how these contribute to the learning outcomes of the module

  • Lectures demonstrate what is required to be learned and the application of the theory to practical examples.
  • Problem classes show how to solve example problems in an ideal way, revealing also the thought processes behind such solutions.
  • Tutorials provide active problem-solving engagement and immediate feedback to the learning process.
  • Practicals consolidate the studied material, explore theoretical ideas in practice, enhance practical understanding, and develop practical data analysis skills.
  • Formative assessments provide feedback to guide students in the correct development of their knowledge and skills in preparation for the summative assessment.
  • Computer-based examinations assess the ability to use statistical software and basic programming to solve predictable and unpredictable problems.
  • The end-of-year examination assesses the knowledge acquired and the ability to solve predictable and unpredictable problems.

Teaching Methods and Learning Hours

ActivityNumberFrequencyDurationTotalMonitored
Lectures424 per week in Epiphany; 2 in Easter1 Hour42 
Tutorials6Weeks 12, 14, 16, 18, 20 (Epiphany), 22 (Easter)1 Hour6Yes
Problem Classes2Weeks 17, 191 Hour2 
Computer Classes8Weeks 11-16, 18, 201 Hour8Yes
Preparation and Reading142 
Total200 

Summative Assessment

Component: ExamComponent Weighting: 70%
ElementLength / DurationElement WeightingResit Opportunity
On Campus Written Examination2 hours100
Component: Computer-based ExamComponent Weighting: 30%
ElementLength / DurationElement WeightingResit Opportunity
Practical2 hours100

Formative Assessment

Fortnightly assignments

More information

If you have a question about Durham's modular degree programmes, please visit our FAQ webpages, Help page or our glossary of terms. If you have a question about modular programmes that is not covered by the FAQ, or a query about the on-line Undergraduate Module Handbook, please contact us.

Prospective Students: If you have a query about a specific module or degree programme, please Ask Us.

Current Students: Please contact your department.