Introduction to Statistics
A.Y. 2024/2025
Bioinformatics and Computational Genomics
Politecnico di Milano
Course Information
Instructor
Dott. Ing. Alessandra Menafoglio
MOX, Department of Mathematics
Building 14 (Nave), VI Floor
Email: alessandra.menafoglio@polimi.it
Teaching Assistant
Dott. Ing. Giulia Patané
MOX, Department of Mathematics
Building 14 (Nave), V Floor
Email: giulia.patane@polimi.it
Course Details
- Credits: 6 CFU
- Course Goals:
- Summarize and visualize data.
- Understand tools and models for analyzing random phenomena.
- Learn fundamental methods of statistical inference.
- Apply statistical methods to real data using appropriate software.
Timetable
- Monday: 10:30 - 13:15 (Room T.0.3)
- Wednesday: 11:30 - 13:15 (Room 25.1.3)
- Additional Tutoring: Tuesday 15:15 - 17:15 (Room T.0.3)
Software
Course Program
- Descriptive Statistics: Data types, frequency distributions, summary statistics (mean, median, variance), visualizations (histograms, boxplots).
- Probability and Random Variables: Properties of probability, discrete and continuous distributions (Bernoulli, Binomial, Poisson, Gaussian).
- Estimation and Hypothesis Testing: Confidence intervals, hypothesis testing, t-tests.
- Comparing Samples: Comparing means of independent and dependent samples.
- Regression Models: Linear models, parameter estimation, model evaluation, and residual analysis.
Exam and Assessment
- Written Exam: Consists of two exercises in 2 hours. Graded on a scale of 0 to 30, with a passing score of 18.
- Team Project: Involving real data analysis in teams of 2-4 students. The project will be evaluated during a final seminar.
Exam details
Here a mere copy/paste of the exams details:
Grading weights
- 70% written exam
- 30% project
Written Exam
- Format: It will consist of 2 exercises to be solved autonomously in a maximum of 2 hours.
- Evaluation:
- The student will decide whether to have their exam evaluated at the end.
- Scoring is from 0 to 30, with a maximum evaluation of 32/30.
- The exam is passed with a score ≥ 18/30.
- Evaluation considers clarity of exposition and correctness of computations.
- Materials Allowed:
- Calculator, statistical tables, and a formulary of A4 format containing any material deemed useful by the student.
- Prohibited Items: Books, notes, mobile phones, and other electronic devices.
Team Project
- Objective: Analyze a real dataset in teams of 2 to 4 students, using the models and methods introduced in the course.
- Presentation: Projects will be presented at the end of the course in a seminar during an open workshop after the semester.
- Evaluation: Each team will receive a score from 0 to 30.
- Final Evaluation: The final course evaluation will be a weighted average of the scores, with weights of 0.7 for the written exam and 0.3 for the team project.
Data Analysis Project Overview
Every student must participate in a data analysis project developed by an independently formed team of 2-4 members.
Dataset Selection
- Groups will autonomously choose the dataset to analyze.
- Approval of the dataset by a course teacher is required before starting the analysis.
- Deadline for approved dataset: October 14th.
Group Composition
Each group must communicate its composition, leader, and project title to the Project Manager: - Francesco Brossa: francesco.brossa@mail.polimi.it
Suggested Timeline
- Selection of Dataset: Before October 14th
- Explorative Analyses: Before November 15th
- Inferential Analyses: Before December 30th
- Project Review and Final Presentation: January
- First Work in Progress: November 13
- Final Presentation Date: January ?
Data Requirements and Data Sources
Data Requirements
- At least two numerical variables.
- Observations structured into two or more groups (one or more categorical variables).
Data Sources
- Useful websites:
- Comune di Milano
- Regione Lombardia
- ISTAT
- EUROSTAT
- NASA
- Own data or data from others (check for confidentiality).
Project Steps
- Identify Stakeholders: Company manager, competitors, students, etc.
- Identify Research Questions: What problem do you want to solve based on the data?
- Build Dataset: Select data, create labels, etc.
- Data Analysis: Conduct explorative analysis, null hypothesis testing, regression.
- Report Findings: Present answers to research questions using natural language, tables, and plots.
Course Bibliography
- Montgomery, D.C., Runger, G.C., Hubele, N.F. Engineering Statistics, Wiley, 5th Edition, 2010.
- Ieva, F., Masci, C., Paganoni, A.M. Laboratorio di Statistica con R, Pearson, 2016 [in Italian].
- Freedman, D., Pisani, R., Purves, R. Statistics, 4th Edition, W.W. Norton & Company, 2007.