Skip to content

[SLURM][FSDP][CONTAINER] Docker build fails for slurm example #814

@aravneelaws

Description

@aravneelaws

The docker build command in the README.md in Slurm section is docker build -f ../Dockerfile -t fsdp:pytorch2.7.1 . which fails due to incorrect docker context:

 => ERROR [stage-0 5/8] COPY src/ /fsdp/                                                                                                                     0.0s
------
 > [stage-0 5/8] COPY src/ /fsdp/:
------
Dockerfile:8
--------------------
   6 |     RUN ln -s /usr/bin/python3 /usr/bin/python
   7 |     
   8 | >>> COPY src/ /fsdp/
   9 |     RUN --mount=type=cache,target=/root/.cache/pip pip install -r /fsdp/requirements.txt
  10 |     RUN pip install hyperpod-elastic-agent
--------------------
ERROR: failed to build: failed to solve: failed to compute cache key: failed to calculate checksum of ref 95e3c277-5b6d-4dfd-88ce-9edbed5e8e1d::1038ja6eurg77w7ujnt5uz6jn: "/src": not found

This can be fixed by changing the docker context. The command needs to be updated to docker build -f ../Dockerfile -t fsdp:pytorch2.7.1 ../. so that the src/ directory comes into the context when building the image.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions