Skip to content

πŸ—‚οΈ Directory Hierarchy

Folder structure for data science projects why?

.
β”œβ”€β”€ .code_quality
β”‚Β Β  β”œβ”€β”€ mypy.ini                        # mypy configuration
β”‚Β Β  └── ruff.toml                       # ruff configuration
β”œβ”€β”€ .github                             # github configuration
β”‚Β Β  β”œβ”€β”€ actions
β”‚Β Β  β”‚Β Β  └── python-poetry-env
β”‚Β Β  β”‚Β Β      └── action.yml              # github action to setup python environment
β”‚Β Β  β”œβ”€β”€ dependabot.md                   # github action to update dependencies
β”‚Β Β  β”œβ”€β”€ pull_request_template.md        # template for pull requests
β”‚Β Β  └── workflows                       # github actions workflows
β”‚Β Β      β”œβ”€β”€ ci.yml                      # run continuous integration (tests, pre-commit, etc.)
β”‚Β Β      β”œβ”€β”€ dependency_review.yml       # review dependencies
β”‚Β Β      β”œβ”€β”€ docs.yml                    # build documentation (mkdocs)
β”‚Β Β      └── pre-commit_autoupdate.yml   # update pre-commit hooks
β”œβ”€β”€ .vscode                             # vscode configuration
|   β”œβ”€β”€ extensions.json                 # list of recommended extensions
|   β”œβ”€β”€ launch.json                     # vscode launch configuration
|   └── settings.json                   # vscode settings
β”œβ”€β”€ conf                                # folder configuration files
β”‚Β Β  └── config.yaml                     # main configuration file
β”œβ”€β”€ data
β”‚Β Β  β”œβ”€β”€ 01_raw                          # raw immutable data
β”‚Β Β  β”œβ”€β”€ 02_intermediate                 # typed data
β”‚Β Β  β”œβ”€β”€ 03_primary                      # domain model data
β”‚Β Β  β”œβ”€β”€ 04_feature                      # model features
β”‚Β Β  β”œβ”€β”€ 05_model_input                  # often called 'master tables'
β”‚Β Β  β”œβ”€β”€ 06_models                       # serialized models
β”‚Β Β  β”œβ”€β”€ 07_model_output                 # data generated by model runs
β”‚Β Β  β”œβ”€β”€ 08_reporting                    # reports, results, etc
β”‚Β Β  └── README.md                       # description of the data structure
β”œβ”€β”€ docs                                # documentation for your project
β”‚Β Β  β”œβ”€β”€ index.md                        # documentation homepage
β”œβ”€β”€ models                              # store final models
β”œβ”€β”€ notebooks
β”‚Β Β  β”œβ”€β”€ 1-data                          # data extraction and cleaning
β”‚Β Β  β”œβ”€β”€ 2-exploration                   # exploratory data analysis (EDA)
β”‚Β Β  β”œβ”€β”€ 3-analysis                      # Statistical analysis, hypothesis testing.
β”‚Β Β  β”œβ”€β”€ 4-feat_eng                      # feature engineering (creation, selection, and transformation.)
β”‚Β Β  β”œβ”€β”€ 5-models                        # model training, evaluation and hyperparameter tuning.
β”‚Β Β  β”œβ”€β”€ 6-interpretation                # model interpretation
β”‚Β Β  β”œβ”€β”€ 7-deploy                        # model packaging, deployment strategies.
β”‚Β Β  β”œβ”€β”€ 8-reports                       # story telling, summaries and analysis conclusions.
β”‚Β Β  β”œβ”€β”€ notebook_template.ipynb         # template for notebooks
β”‚Β Β  └── README.md                       # information about the notebooks
β”œβ”€β”€ src                                 # source code for use in this project
β”‚   β”œβ”€β”€ README.md                       # description of src structure
β”‚   β”œβ”€β”€ tmp_mock.py                     # example python file
β”‚   β”œβ”€β”€ data                            # data extraction, validation, processing, transformation
β”‚   β”œβ”€β”€ model                           # model training, evaluation, validation, export
β”‚   β”œβ”€β”€ inference                       # model prediction, serving, monitoring
β”‚   └── pipelines                       # orchestration of pipelines
β”‚       β”œβ”€β”€ feature_pipeline            # transforms raw data into features and labels
β”‚       β”œβ”€β”€ training_pipeline           # transforms features and labels into a model
β”‚       └── inference_pipeline          # takes features and a trained model for predictions
β”œβ”€β”€ tests                               # test code for your project
β”‚   β”œβ”€β”€ test_mock.py                    # example test file
β”‚   β”œβ”€β”€ data                            # tests for data module
β”‚   β”œβ”€β”€ model                           # tests for model module
β”‚   β”œβ”€β”€ inference                       # tests for inference module
β”‚   └── pipelines                       # tests for pipelines module
β”œβ”€β”€ .editorconfig                       # editor configuration
β”œβ”€β”€ .gitignore                          # files to ignore in git
β”œβ”€β”€ .pre-commit-config.yaml             # configuration for pre-commit hooks
β”œβ”€β”€ codecov.yml                         # configuration for codecov
β”œβ”€β”€ Makefile                            # useful commands to setup environment, run tests, etc.
β”œβ”€β”€ mkdocs.yml                          # configuration for mkdocs documentation
β”œβ”€β”€ pyproject.toml                      # dependencies and configuration project file
β”œβ”€β”€ uv.lock                             # locked dependencies
└── README.md                           # description of your project    

✨ Features and Tools