Introduction to Statistics

A.Y. 2024/2025
Bioinformatics and Computational Genomics
Politecnico di Milano

Course Information

Instructor

Dott. Ing. Alessandra Menafoglio
MOX, Department of Mathematics
Building 14 (Nave), VI Floor
Email: alessandra.menafoglio@polimi.it

Teaching Assistant

Dott. Ing. Giulia Patané
MOX, Department of Mathematics
Building 14 (Nave), V Floor
Email: giulia.patane@polimi.it

Course Details

  • Credits: 6 CFU
  • Course Goals:
  • Summarize and visualize data.
  • Understand tools and models for analyzing random phenomena.
  • Learn fundamental methods of statistical inference.
  • Apply statistical methods to real data using appropriate software.

Timetable

  • Monday: 10:30 - 13:15 (Room T.0.3)
  • Wednesday: 11:30 - 13:15 (Room 25.1.3)
  • Additional Tutoring: Tuesday 15:15 - 17:15 (Room T.0.3)

Software

Course Program

  • Descriptive Statistics: Data types, frequency distributions, summary statistics (mean, median, variance), visualizations (histograms, boxplots).
  • Probability and Random Variables: Properties of probability, discrete and continuous distributions (Bernoulli, Binomial, Poisson, Gaussian).
  • Estimation and Hypothesis Testing: Confidence intervals, hypothesis testing, t-tests.
  • Comparing Samples: Comparing means of independent and dependent samples.
  • Regression Models: Linear models, parameter estimation, model evaluation, and residual analysis.

Exam and Assessment

  • Written Exam: Consists of two exercises in 2 hours. Graded on a scale of 0 to 30, with a passing score of 18.
  • Team Project: Involving real data analysis in teams of 2-4 students. The project will be evaluated during a final seminar.

Exam details

Here a mere copy/paste of the exams details:

Grading weights

  • 70% written exam
  • 30% project

Written Exam

  • Format: It will consist of 2 exercises to be solved autonomously in a maximum of 2 hours.
  • Evaluation:
  • The student will decide whether to have their exam evaluated at the end.
  • Scoring is from 0 to 30, with a maximum evaluation of 32/30.
  • The exam is passed with a score ≥ 18/30.
  • Evaluation considers clarity of exposition and correctness of computations.
  • Materials Allowed:
  • Calculator, statistical tables, and a formulary of A4 format containing any material deemed useful by the student.
  • Prohibited Items: Books, notes, mobile phones, and other electronic devices.

Team Project

  • Objective: Analyze a real dataset in teams of 2 to 4 students, using the models and methods introduced in the course.
  • Presentation: Projects will be presented at the end of the course in a seminar during an open workshop after the semester.
  • Evaluation: Each team will receive a score from 0 to 30.
  • Final Evaluation: The final course evaluation will be a weighted average of the scores, with weights of 0.7 for the written exam and 0.3 for the team project.

Data Analysis Project Overview

Every student must participate in a data analysis project developed by an independently formed team of 2-4 members.

Dataset Selection

  • Groups will autonomously choose the dataset to analyze.
  • Approval of the dataset by a course teacher is required before starting the analysis.
  • Deadline for approved dataset: October 14th.

Group Composition

Each group must communicate its composition, leader, and project title to the Project Manager: - Francesco Brossa: francesco.brossa@mail.polimi.it

Suggested Timeline

  • Selection of Dataset: Before October 14th
  • Explorative Analyses: Before November 15th
  • Inferential Analyses: Before December 30th
  • Project Review and Final Presentation: January
  • First Work in Progress: November 13
  • Final Presentation Date: January ?

Data Requirements and Data Sources

Data Requirements

  • At least two numerical variables.
  • Observations structured into two or more groups (one or more categorical variables).

Data Sources


Project Steps

  1. Identify Stakeholders: Company manager, competitors, students, etc.
  2. Identify Research Questions: What problem do you want to solve based on the data?
  3. Build Dataset: Select data, create labels, etc.
  4. Data Analysis: Conduct explorative analysis, null hypothesis testing, regression.
  5. Report Findings: Present answers to research questions using natural language, tables, and plots.

Course Bibliography

  • Montgomery, D.C., Runger, G.C., Hubele, N.F. Engineering Statistics, Wiley, 5th Edition, 2010.
  • Ieva, F., Masci, C., Paganoni, A.M. Laboratorio di Statistica con R, Pearson, 2016 [in Italian].
  • Freedman, D., Pisani, R., Purves, R. Statistics, 4th Edition, W.W. Norton & Company, 2007.

Course Resources