Skip to content

🗂️ Directory Hierarchy

cruft update

🗃️ Project structure

Folder structure for data science projects why?

[Data structure]

```bash . ├── .code_quality │   ├── mypy.ini # mypy configuration │   └── ruff.toml # ruff configuration ├── .github # github configuration │   ├── actions │   │   └── python-poetry-env │   │   └── action.yml # github action to setup python environment │   ├── dependabot.md # github action to update dependencies │   ├── pull_request_template.md # template for pull requests │   └── workflows # github actions workflows │   ├── ci.yml # run continuous integration (tests, pre-commit, etc.) │   ├── dependency_review.yml # review dependencies │   ├── docs.yml # build documentation (mkdocs) │   └── pre-commit_autoupdate.yml # update pre-commit hooks ├── .vscode # vscode configuration | ├── extensions.json # list of recommended extensions | ├── launch.json # vscode launch configuration | └── settings.json # vscode settings ├── conf # folder configuration files │   └── config.yaml # main configuration file ├── data │   ├── 01_raw # raw immutable data │   ├── 02_intermediate # typed data │   ├── 03_primary # domain model data │   ├── 04_feature # model features │   ├── 05_model_input # often called 'master tables' │   ├── 06_models # serialized models │   ├── 07_model_output # data generated by model runs │   ├── 08_reporting # reports, results, etc │   └── README.md # description of the data structure ├── docs # documentation for your project │   ├── index.md # documentation homepage ├── models # store final models ├── notebooks │   ├── 1-data # data extraction and cleaning │   ├── 2-exploration # exploratory data analysis (EDA) │   ├── 3-analysis # Statistical analysis, hypothesis testing. │   ├── 4-feat_eng # feature engineering (creation, selection, and transformation.) │   ├── 5-models # model training, experimentation, and hyperparameter tuning. │   ├── 6-evaluation # evaluation metrics, performance assessment │   ├── 7-deploy # model packaging, deployment strategies. │   ├── 8-reports # story telling, summaries and analysis conclusions. │   ├── notebook_template.ipynb # template for notebooks │   └── README.md # information about the notebooks ├── src # source code for use in this project │ ├── libs # custom python scripts │ │ ├── data_etl # data extraction, transformation, and loading
│ │ ├── data_validation # data validation
│ │ ├── feat_cleaning # feature engineering data cleaning │ │ ├── feat_encoding # feature engineering encoding │ │ ├── feat_imputation # feature engineering imputation
│ │ ├── feat_new_features # feature engineering new features │ │ ├── feat_pipelines # feature engineering pipelines │ │ ├── feat_preprocess_strings # feature engineering pre process strings │ │ ├── feat_scaling # feature engineering scaling data │ │ ├── feat_selection # feature engineering feature selection │ │ ├── feat_strings # feature engineering strings │ │ ├── metrics # evaluation metrics │ │ ├── model # model training and prediction
│ │ ├── model_evaluation # model evaluation │ │ ├── model_selection # model selection │ │ ├── model_validation # model validation │ │ └── reports # reports │ ├── pipelines │ │ ├── data_etl # data extraction, transformation, and loading │ │ ├── feature_engineering # prepare data for modeling