🗂️ Directory Hierarchy
Folder structure for data science projects why?
.
├── .code_quality
│ ├── mypy.ini # mypy configuration
│ └── ruff.toml # ruff configuration
├── .github # github configuration
│ ├── actions
│ │ └── python-poetry-env
│ │ └── action.yml # github action to setup python environment
│ ├── dependabot.md # github action to update dependencies
│ ├── pull_request_template.md # template for pull requests
│ └── workflows # github actions workflows
│ ├── ci.yml # run continuous integration (tests, pre-commit, etc.)
│ ├── dependency_review.yml # review dependencies
│ ├── docs.yml # build documentation (mkdocs)
│ └── pre-commit_autoupdate.yml # update pre-commit hooks
├── .vscode # vscode configuration
| ├── extensions.json # list of recommended extensions
| ├── launch.json # vscode launch configuration
| └── settings.json # vscode settings
├── conf # folder configuration files
│ └── config.yaml # main configuration file
├── data
│ ├── 01_raw # raw immutable data
│ ├── 02_intermediate # typed data
│ ├── 03_primary # domain model data
│ ├── 04_feature # model features
│ ├── 05_model_input # often called 'master tables'
│ ├── 06_models # serialized models
│ ├── 07_model_output # data generated by model runs
│ ├── 08_reporting # reports, results, etc
│ └── README.md # description of the data structure
├── docs # documentation for your project
│ ├── index.md # documentation homepage
├── models # store final models
├── notebooks
│ ├── 1-data # data extraction and cleaning
│ ├── 2-exploration # exploratory data analysis (EDA)
│ ├── 3-analysis # Statistical analysis, hypothesis testing.
│ ├── 4-feat_eng # feature engineering (creation, selection, and transformation.)
│ ├── 5-models # model training, evaluation and hyperparameter tuning.
│ ├── 6-interpretation # model interpretation
│ ├── 7-deploy # model packaging, deployment strategies.
│ ├── 8-reports # story telling, summaries and analysis conclusions.
│ ├── notebook_template.ipynb # template for notebooks
│ └── README.md # information about the notebooks
├── src # source code for use in this project
│ ├── README.md # description of src structure
│ ├── tmp_mock.py # example python file
│ ├── data # data extraction, validation, processing, transformation
│ ├── model # model training, evaluation, validation, export
│ ├── inference # model prediction, serving, monitoring
│ └── pipelines # orchestration of pipelines
│ ├── feature_pipeline # transforms raw data into features and labels
│ ├── training_pipeline # transforms features and labels into a model
│ └── inference_pipeline # takes features and a trained model for predictions
├── tests # test code for your project
│ ├── test_mock.py # example test file
│ ├── data # tests for data module
│ ├── model # tests for model module
│ ├── inference # tests for inference module
│ └── pipelines # tests for pipelines module
├── .editorconfig # editor configuration
├── .gitignore # files to ignore in git
├── .pre-commit-config.yaml # configuration for pre-commit hooks
├── codecov.yml # configuration for codecov
├── Makefile # useful commands to setup environment, run tests, etc.
├── mkdocs.yml # configuration for mkdocs documentation
├── pyproject.toml # dependencies and configuration project file
├── uv.lock # locked dependencies
└── README.md # description of your project