ποΈ Directory Hierarchy
Folder structure for data science projects why?
.
βββ .code_quality
βΒ Β βββ mypy.ini # mypy configuration
βΒ Β βββ ruff.toml # ruff configuration
βββ .github # github configuration
βΒ Β βββ actions
βΒ Β βΒ Β βββ python-poetry-env
βΒ Β βΒ Β βββ action.yml # github action to setup python environment
βΒ Β βββ dependabot.md # github action to update dependencies
βΒ Β βββ pull_request_template.md # template for pull requests
βΒ Β βββ workflows # github actions workflows
βΒ Β βββ ci.yml # run continuous integration (tests, pre-commit, etc.)
βΒ Β βββ dependency_review.yml # review dependencies
βΒ Β βββ docs.yml # build documentation (mkdocs)
βΒ Β βββ pre-commit_autoupdate.yml # update pre-commit hooks
βββ .vscode # vscode configuration
| βββ extensions.json # list of recommended extensions
| βββ launch.json # vscode launch configuration
| βββ settings.json # vscode settings
βββ conf # folder configuration files
βΒ Β βββ config.yaml # main configuration file
βββ data
βΒ Β βββ 01_raw # raw immutable data
βΒ Β βββ 02_intermediate # typed data
βΒ Β βββ 03_primary # domain model data
βΒ Β βββ 04_feature # model features
βΒ Β βββ 05_model_input # often called 'master tables'
βΒ Β βββ 06_models # serialized models
βΒ Β βββ 07_model_output # data generated by model runs
βΒ Β βββ 08_reporting # reports, results, etc
βΒ Β βββ README.md # description of the data structure
βββ docs # documentation for your project
βΒ Β βββ index.md # documentation homepage
βββ models # store final models
βββ notebooks
βΒ Β βββ 1-data # data extraction and cleaning
βΒ Β βββ 2-exploration # exploratory data analysis (EDA)
βΒ Β βββ 3-analysis # Statistical analysis, hypothesis testing.
βΒ Β βββ 4-feat_eng # feature engineering (creation, selection, and transformation.)
βΒ Β βββ 5-models # model training, evaluation and hyperparameter tuning.
βΒ Β βββ 6-interpretation # model interpretation
βΒ Β βββ 7-deploy # model packaging, deployment strategies.
βΒ Β βββ 8-reports # story telling, summaries and analysis conclusions.
βΒ Β βββ notebook_template.ipynb # template for notebooks
βΒ Β βββ README.md # information about the notebooks
βββ src # source code for use in this project
β βββ README.md # description of src structure
β βββ tmp_mock.py # example python file
β βββ data # data extraction, validation, processing, transformation
β βββ model # model training, evaluation, validation, export
β βββ inference # model prediction, serving, monitoring
β βββ pipelines # orchestration of pipelines
β βββ feature_pipeline # transforms raw data into features and labels
β βββ training_pipeline # transforms features and labels into a model
β βββ inference_pipeline # takes features and a trained model for predictions
βββ tests # test code for your project
β βββ test_mock.py # example test file
β βββ data # tests for data module
β βββ model # tests for model module
β βββ inference # tests for inference module
β βββ pipelines # tests for pipelines module
βββ .editorconfig # editor configuration
βββ .gitignore # files to ignore in git
βββ .pre-commit-config.yaml # configuration for pre-commit hooks
βββ codecov.yml # configuration for codecov
βββ Makefile # useful commands to setup environment, run tests, etc.
βββ mkdocs.yml # configuration for mkdocs documentation
βββ pyproject.toml # dependencies and configuration project file
βββ uv.lock # locked dependencies
βββ README.md # description of your project