Skip to content

uv tool install cookiecutter # Install cruft in a isolated environment

Or Install with pip

pip install --user cookiecutter # Install cookiecutter on your path for easy access

```shell title="create project"
cookiecutter gh:JoseRZapata/data-science-project-template

Note: Cookiecutter uses gh: as short-hand for https://github.com/

🔗 Linking an Existing Project

If the project was originally installed via [Cookiecutter], you must first use [Cruft] to link the project with the original template:

cruft link https://github.com/JoseRZapata/data-science-project-template

Then/else:

cruft update

🗃️ Project structure

Folder structure for data science projects why?

[Data structure]

```bash . ├── .code_quality │   ├── mypy.ini # mypy configuration │   └── ruff.toml # ruff configuration ├── .github # github configuration │   ├── actions │   │   └── python-poetry-env │   │   └── action.yml # github action to setup python environment │   ├── dependabot.md # github action to update dependencies │   ├── pull_request_template.md # template for pull requests │   └── workflows # github actions workflows │   ├── ci.yml # run continuous integration (tests, pre-commit, etc.) │   ├── dependency_review.yml # review dependencies │   ├── docs.yml # build documentation (mkdocs) │   └── pre-commit_autoupdate.yml # update pre-commit hooks ├── .vscode # vscode configuration | ├── extensions.json # list of recommended extensions | ├── launch.json # vscode launch configuration | └── settings.json # vscode settings ├── conf # folder configuration files │   └── config.yaml # main configuration file ├── data │   ├── 01_raw # raw immutable data │   ├── 02_intermediate # typed data │   ├── 03_primary # domain model data │   ├── 04_feature # model features │   ├── 05_model_input # often called 'master tables' │   ├── 06_models # serialized models │   ├── 07_model_output # data generated by model runs │   ├── 08_reporting # reports, results, etc │   └── README.md # description of the data structure ├── docs # documentation for your project │   ├── index.md # documentation homepage ├── models # store final models ├── notebooks │   ├── 1-data # data extraction and cleaning │   ├── 2-exploration # exploratory data analysis (EDA) │   ├── 3-analysis # Statistical analysis, hypothesis testing. │   ├── 4-feat_eng # feature engineering (creation, selection, and transformation.) │   ├── 5-models # model training, experimentation, and hyperparameter tuning. │   ├── 6-evaluation # evaluation metrics, performance assessment │   ├── 7-deploy # model packaging, deployment strategies. │   ├── 8-reports # story telling, summaries and analysis conclusions. │   ├── notebook_template.ipynb # template for notebooks │   └── README.md # information about the notebooks