Jetstream2 Storage Overview¶
Need more storage space?
Jetstream2 storage is an ACCESS-allocated resource. All allocations will be given a default storage amount (as noted on the Quotas page), and any needs beyond this initial quota require a discrete allocation on the “Indiana Jetstream2 Storage” resource.
Jetstream2 offers three primary methods for data storage, each with different use cases and limitations:
- Volumes: Simple block storage devices that can be attached to a single instance at a time
- Manila Shares: Shared-Filesystems-as-a-Service that can be network mounted to many instances at once
- Object Store: Cloud storage buckets compatible with Amazon S3 that serve hosted objects over HTTP
When to Use Volumes¶
Volumes are ideal for workloads requiring dedicated storage that doesn’t need to be shared across multiple instances simultaneously. For example:
- Scratch space for datasets and results: Place your large datasets on a volume so you can compute against them and save results without permanently tying the data to an instance.
- Persistent database storage: A volume can act as the data directory for a database (e.g. PostgreSQL or MySQL), allowing future size expansion as data grows, snapshots for quick-and-dirty backups.
- Custom boot disk size: A volume-backed instance can give you more space for installs where it might be tricky to use an external location, such as cached Docker images.
Pros:
- Simple: Volumes can be created, attached, and mounted via Exosphere in only a couple of clicks.
- Flexible: Since volumes are raw block devices, advanced users have the freedom to choose what filesystem type to use (e.g., ext4 or xfs).
- Bootable: A volume can be marked as bootable and attached to an instance as its root disk.
- Snapshots: Volumes can easily have immutable, point-in-time snapshots created from them.
- Extendable: A volume’s size can be increased at any point after its creation.
Cons:
- Single-attach: Volumes can only be attached to a single instance at a time. A more complicated setup such as an NFS server is required to share data across multiple machines.
- Instance required: In order for the data on a volume to be accessed, it needs to be attached to a running Jetstream2 instance.
- Unshrinkable: A volume’s size can never be decreased later to reclaim empty space.
When to Use Manila Shares¶
Choose a Manila Share when you need shared, networked storage accessible across multiple instances. Some examples are:
- Collaborative research data: If your team is computing against the same data on multiple instances, a Manila Share can let everyone access that data without needing to waste space with multiple copies.
- Classroom assignments: Similar to the above example, if you’re teaching a class with Jetstream2, a read-only Manila Share can let students easily copy assignment instructions or starter files from a central location on their instances.
- Scaling batch/distributed workloads: A Manila Share can be used for shared input/output data and software installs in a cluster of instances.
Pros:
- Networked: Manila Shares can be mounted on many instances at once, including across multiple allocations, making sharing data easy.
- Resizable: A Manila Share’s size can be either increased or decreased at any point after its creation (though we advise caution when shrinking filesystems).
- Access control: Users can limit mount credentials’ access level to read-only. For example, this is how the Jetstream2 team implements the Software Share.
- Easy to create: Manila Shares can be created and quickly mounted through Exosphere.
Cons:
- Instance required: In order for the data on a Manila Share to be accessed, it needs to be attached to a running Jetstream2 instance.
- Unsupported workflows: Certain behaviors (e.g. lock files) are not permitted on Manila Shares due to the disproportionate load they place on our Ceph cluster’s metadata servers, and may cause your instances to be evicted. Metadata-heavy workflows are better served using volumes.
When to Use the Object Store¶
The Object Store is an ideal resource for web applications, public data hosting, and anything that advertizes S3 compatibility. For example:
- Public dataset hosting: The Object Store can easily serve files over direct download URLs, making sharing with collaborators dead-simple, even outside of Jetstream2.
- Web application assets: Object Store buckets can serve static assets (e.g. images or code), share data between microservices, or directly drop into frameworks made for Amazon S3.
Pros:
- Web hosting (no instance required): Objects in storage buckets are served over the internet by Jetstream2 infrastructure, meaning the data is acessible without going through an instance.
- Cloud-native: The Object Store offers compatibility for Amazon’s S3 APIs, usually working as a drop-in replacement for AWS in S3-compatible software.
- Access from anywhere: The Object Store is the only Jetstream2 storage service that can be accessed from outside the Jetstream2 network without additional user-level hosting.
- Dynamic sizing/usage: Object Store buckets don’t have a preset size, so quota usage follows space filled (not space allocated).
- Scalable: The Object Store can be suitable for extremely large (even petabyte-scale) collections of files.
Cons:
- Not a traditional filesystem: The Object Store can’t be mounted as a traditional, POSIX-compliant filesystem without additional software like s3fs.
- Performance: Due to the overhead of uploads and downloads over HTTP, we expect I/O throughput on the Object Store to be lower than a volume or Manila Share.
- Learning curve: Using the Object Store isn’t as straightforward as a traditional filesystem, and buckets/containers can only be managed via Horizon or the CLI.