Inference Service API Usage Examples¶
Info
First read Access the APIs for guidance on how to connect to the APIs.
Python example (from Jetstream2 instance or tunnelled connection)¶
pip install openai
, then create a Python script with these contents, and run it.
from openai import OpenAI
client = OpenAI(base_url="https://llm.jetstream-cloud.org/vllm/v1", api_key="empty")
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "What is the difference between SSH and SSL",
}
],
model="Llama-3.3-70B-Instruct-FP8-Dynamic",
)
print(chat_completion.choices[0].message.content)
Command line example (from Jetstream2 instance or tunnelled connection)¶
You can also use the llm
to access the LLM from the command line,
this is particularly convenient because you can integrate it with bash commands.
First, install llm
in your favorite Python virtual environment:
pip install llm
Then, find where the configuration files are located:
dirname "$(llm logs path)"
Add a file named extra-openai-models.yaml
to the directory that was printed by the previous command, with the following content:
- model_id: llama3.370B
model_name: "Llama-3.3-70B-Instruct-FP8-Dynamic"
api_base: "https://llm.jetstream-cloud.org/vllm/v1/"
And set it default:
llm models default llama3.370B
Finally you can use it interactively (-s
sets the system prompt):
curl https://docs.jetstream-cloud.org/general/inference-service/ | html2text | llm -s "make a 1 paragraph summary"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 69412 100 69412 0 0 105k 0 --:--:-- --:--:-- --:--:-- 105k
Here is a 1-paragraph summary of the Jetstream2 Large Language Model Inference Service documentation:
**Summary**: Jetstream2 offers a free, unlimited-use Large Language Model (LLM) Inference Service, powered by Llama 3.3, for its community. The service provides an OpenAI-compatible API and a browser-based chat interface (Open WebUI) for tasks like programming assistance, literature reviews, brainstorming, and writing aid. Access is restricted to Jetstream2 or IU Research Cloud networks and instances, but can be tunneled through from external computers. The service runs on an NVIDIA Grace Hopper server with an H100 GPU, supporting up to 4 simultaneous requests, and is subject to Jetstream2's acceptable use policies, primarily for research, education, or learning purposes.
or you can start a chat session on the command line (-c
continues the conversation):
llm chat -c
Using API with Your IDE¶
Using with Cline extension for VSCode or VSCodium (from Jetstream2 instance or tunnelled connection)¶
Cline is a semi-autonomous coding agent with two modes (“plan” and “act”). Cline can also use your terminal to test and debug its own code changes.
- Install the “Cline” extension, if you haven’t already. Then, in the extension pane:
- If you’re starting the extension for the first time, choose “Use your own API key”. Otherwise, go to the settings part of the Extension pane (gear icon).
- Choose “OpenAI Compatible” API Provider
- For Base URL, enter
https://llm.jetstream-cloud.org/sglang/v1
to use DeepSeek R1- (slower but higher-quality model, verbose thinking output in extension pane)
https://llm.jetstream-cloud.org/v1
to use Llama 3.3- (faster but lower-quality model, more concise output in extension pane)
- For API key, enter anything (like a space character)
- For Model ID, enter
DeepSeek-R1
if using DeepSeek R1Llama-3.3-70B-Instruct
if using Llama 3.3
- Click “let’s go!” or “Done”
You should now be able to give Cline a task to do.
Using with Continue extension for VSCode or VSCodium (from Jetstream2 instance or tunnelled connection)¶
Continue is an in-editor AI assistant.
Install the Continue extension. In the extension’s config.json
, set the models
like so:
"models": [
{
"provider": "openai",
"title": "Jetstream2 Inference Service",
"apiBase": "https://llm.jetstream-cloud.org/vllm/v1/",
"model": "Llama-3.3-70B-Instruct-FP8-Dynamic",
"useLegacyCompletionsEndpoint": true
}
],
The chat pane should now work.
Using with JupyterLab via JupyterAI (from Jetstream2 instance or tunnelled connection)¶
Install the jupyter-ai
package, version 2.29.1
or higher, and langchain-openai
.
In JupyterLab, open the JupyterAI settings, and configure:
- Completion model =
OpenRouter :: *
- API Base url =
https://llm.jetstream-cloud.org/sglang/v1/
for Deepseek orhttps://llm.jetstream-cloud.org/vllm/v1/
for Llama. - Local model ID = currently
Deepseek R1
orLlama-3.3-70B-Instruct-FP8-Dynamic
, you can find the most recent available models using the API Base url, appendingmodels
to it, and checking the output in your browser, for example https://llm.jetstream-cloud.org/vllm/v1/models for vLLM. OPENROUTER_API_KEY
= “EMPTY”
Now you should be able to use the JupyterLab chat and the code assistant in the notebooks.