GLM-5

High-end GLM lane for production reasoning and long-context workflows

Public model detailLimited previewMoE Transformer

Params

744B / 40B active

Context

198K

Max Output

64K

License

GLM

TTFT

520ms

Throughput

42 tok/s

Why pick it

Standard $2.25/M with $1.80/M batch
Useful for GLM-heavy enterprise builds

Pricing

This model does not currently expose public self-serve pricing. Public rates appear only after backend verification.

TierPublicCachedPrice sourceNote

RealtimeNot publicNot publicSiliconFlow lanePublic price reflects the runtime catalog without claimed savings comparisons

BatchNot publicNot publicSiliconFlow laneBatch public pricing follows the same runtime source

Quick start

OpenAI-compatible surface. Swap the base URL and ship

Try in Playground Open pricing

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.luminapath.tech/v1",
    api_key="BATCHIN_API_KEY"
)

resp = client.chat.completions.create(
    model="glm-5",
    messages=[{"role": "user", "content": "Summarize why this model is a fit for my workload"}]
)

print(resp.choices[0].message.content)

JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.luminapath.tech/v1",
  apiKey: process.env.BATCHIN_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "glm-5",
  messages: [{ role: "user", content: "Summarize why this model is a fit for my workload" }],
});

console.log(resp.choices[0]?.message?.content);

cURL

curl https://api.luminapath.tech/v1/chat/completions \
  -H "Authorization: Bearer ***" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5",
    "messages": [{"role":"user","content":"Summarize why this model is a fit for my workload"}]
  }'

Specs

Architecture

MoE Transformer

Vendor group

Z.ai

Context window

198K

Max output

64K

Best for

glm

Related models