Cluster Environments
Anyscale provides optimized docker images with common ML libraries for Cluster Environments.
You may need to add additional dependencies, for example pip package installations or post-build commands to the base images Anyscale provides.
Let’s walk you through setting up your cluster environment to have the dependencies needed to run your batch processing workload.
Expand the Step-by-Step section below to see a detailed explanation of how to accomplish these tasks.
Step-by-Step Instructions: Creating a Cluster Environment
-
navigate to the cluster environment creation page.
-
Base docker image: Use Anyscale-provided docker image
-
Ray version: 2.5.1 (latest)
-
Python version: 3.10
-
Base docker image:
See here for details about the Anyscale-provided docker images.
- CPU optimized image- NO ML libraries
anyscale/ray:2.5.0-py310-cpu
- CPU optimized image- includes ML libraries like Tensorflow – anyscale/ray:2.5.1-py310-cpu
- GPU support, includes ML Libraries like ensorflow, pytorch, etc
anyscale/ray-ml:2.5.1-py310-gpu
Note: You can also build a docker image with your own infrastructure, then simply link it with the “Use my own docker image” option! See here for more information.
-
Cluster environment name: Name your cluster environment something meaningful!
-
Pip packages: Specify any additional packages to install example:
-
Conda packages: Add any if you need!
-
Debian packages: Add any if you need!
-
Environment variables: Set any cluster-wide environment variables if needed!
-
Post build commands: Any additional commands needed to complete set up. These may be needed if using packages like the RFdiffusion package, which includes installation steps from the package Github to download models to ~/RFdiffusion/models.
-
Finally, click Create to build your cluster environment! Here’s a reference screenshot of what your Cluster Env may look like:
-
Note that this will take 5-10 minutes. The docker image build will only happen one time, unless you create new versions of the cluster environment in the future. This docker image will contain the base set of dependencies needed to run your script.
If you need to install any packages while your workspace is active, you can also install packages to shared cluster storage with pip install –user. These will be available on all nodes in the cluster.
See here for more info.
There is also a command line interface for defining a Cluster Environment. You can read more here