Last update 20 / Feb / 2026
Project Purpose
Doing data science requires knowledge in different areas (Statistics, Mathematics, Programming, visualization, Machine Learning, etc.), but with practice, one can soon begin to understand the various notions and terminologies of the subject. The best way to gain more experience in Data Science, besides reviewing the literature, is to carry out some practical projects that involve the application of data analysis techniques and Machine Learning to datasets in order to adopt a methodology for carrying out the processes of:
- Data extraction
- Data cleaning
- Exploratory analysis
- Feature Engineering
- Baseline model creation
- Model Selection
- Cross Validation
- Model evaluation
The purpose of this project is to provide an example of a methodology or practical guide for developing a data science project with Python, from data extraction, analysis, and preparation to the evaluation and selection of Machine Learning models. Throughout this project, we will use libraries such as pandas, matplotlib, plotly, seaborn, and scikit-learn to perform tasks of data cleaning, exploratory analysis, Feature Engineering, and predictive modeling.
The dataset
The sinking of the RMS Titanic in 1912 remains one of the greatest maritime disasters in history, causing a significant loss of life. More than 1500 passengers and crew members perished that fateful night. Understanding the factors that contributed to survival can provide valuable information about safety protocols and social dynamics during crises.
The dataset we will use in this project is the famous Titanic dataset, which can be downloaded from the Kaggle competition platform. This dataset contains information about 891 passengers aboard the Titanic, including details such as name, age, sex, the class in which they traveled, the number of siblings and spouses aboard, the number of parents and children aboard, the ticket price, the port of embarkation, and whether they survived or not. Using classification algorithms, the goal is to create a supervised classification predictive model that allows estimating the survival of each individual aboard the Titanic.
Contents
Practical guide to developing a data science project with Python, from data extraction, analysis, and preparation to the evaluation and selection of Machine Learning models. (pandas and scikit-learn)
1. Data Download
Data download and variable selection.
2. Data Exploration
Data exploration.
3. Data Analysis (EDA)
Exploratory Data Analysis.
4. Feature Engineering
This chapter covers the feature engineering process, which is the process of selecting and transforming variables to create a predictive model.
5. Baseline Model
Baseline models to later compare with more complex models.
6. Model Selection
Machine learning model selection.
7. Model Interpretation
Machine learning model interpretation.
8. Streamlit Demo
Demo of the machine learning model with Streamlit.
