🗂️ Directory Hierarchy
🗃️ Project structure
Folder structure for data science projects why?
[Data structure]
```bash
.
├── .code_quality
│ ├── mypy.ini # mypy configuration
│ └── ruff.toml # ruff configuration
├── .github # github configuration
│ ├── actions
│ │ └── python-poetry-env
│ │ └── action.yml # github action to setup python environment
│ ├── dependabot.md # github action to update dependencies
│ ├── pull_request_template.md # template for pull requests
│ └── workflows # github actions workflows
│ ├── ci.yml # run continuous integration (tests, pre-commit, etc.)
│ ├── dependency_review.yml # review dependencies
│ ├── docs.yml # build documentation (mkdocs)
│ └── pre-commit_autoupdate.yml # update pre-commit hooks
├── .vscode # vscode configuration
| ├── extensions.json # list of recommended extensions
| ├── launch.json # vscode launch configuration
| └── settings.json # vscode settings
├── conf # folder configuration files
│ └── config.yaml # main configuration file
├── data
│ ├── 01_raw # raw immutable data
│ ├── 02_intermediate # typed data
│ ├── 03_primary # domain model data
│ ├── 04_feature # model features
│ ├── 05_model_input # often called 'master tables'
│ ├── 06_models # serialized models
│ ├── 07_model_output # data generated by model runs
│ ├── 08_reporting # reports, results, etc
│ └── README.md # description of the data structure
├── docs # documentation for your project
│ ├── index.md # documentation homepage
├── models # store final models
├── notebooks
│ ├── 1-data # data extraction and cleaning
│ ├── 2-exploration # exploratory data analysis (EDA)
│ ├── 3-analysis # Statistical analysis, hypothesis testing.
│ ├── 4-feat_eng # feature engineering (creation, selection, and transformation.)
│ ├── 5-models # model training, experimentation, and hyperparameter tuning.
│ ├── 6-evaluation # evaluation metrics, performance assessment
│ ├── 7-deploy # model packaging, deployment strategies.
│ ├── 8-reports # story telling, summaries and analysis conclusions.
│ ├── notebook_template.ipynb # template for notebooks
│ └── README.md # information about the notebooks
├── src # source code for use in this project
│ ├── libs # custom python scripts
│ │ ├── data_etl # data extraction, transformation, and loading
│ │ ├── data_validation # data validation
│ │ ├── feat_cleaning # feature engineering data cleaning
│ │ ├── feat_encoding # feature engineering encoding
│ │ ├── feat_imputation # feature engineering imputation
│ │ ├── feat_new_features # feature engineering new features
│ │ ├── feat_pipelines # feature engineering pipelines
│ │ ├── feat_preprocess_strings # feature engineering pre process strings
│ │ ├── feat_scaling # feature engineering scaling data
│ │ ├── feat_selection # feature engineering feature selection
│ │ ├── feat_strings # feature engineering strings
│ │ ├── metrics # evaluation metrics
│ │ ├── model # model training and prediction
│ │ ├── model_evaluation # model evaluation
│ │ ├── model_selection # model selection
│ │ ├── model_validation # model validation
│ │ └── reports # reports
│ ├── pipelines
│ │ ├── data_etl # data extraction, transformation, and loading
│ │ ├── feature_engineering # prepare data for modeling