Skip to main content
Compose Market provides multimodal AI capabilities through OpenAI-compatible endpoints.

Image Generation

POST /v1/images/generations
model
string
required
Image model ID: fal-ai/flux-2, bytedance/seedream-4-5, dall-e-3
prompt
string
required
Text description of the image to generate
size
string
Image size: 1024x1024, 1024x1792, 1792x1024
n
number
Number of images to generate (1-4). Default: 1

Response

{
  "created": 1735000000,
  "data": [
    {
      "url": "https://...",
      "revised_prompt": "..."
    }
  ]
}

Video Generation

POST /v1/videos/generations
Video generation uses an async job pattern due to longer processing times.
model
string
required
Video model ID: openai/sora-2, google/veo3-1, kwai/kling-2.6
prompt
string
required
Text description of the video
duration
number
Video duration in seconds (5-60)

Job Response

{
  "id": "job_abc123",
  "status": "processing",
  "created": 1735000000
}

Poll for Status

GET /v1/videos/generations/{job_id}
{
  "id": "job_abc123",
  "status": "completed",
  "video_url": "https://..."
}

Text-to-Speech

POST /v1/audio/speech
model
string
required
TTS model ID: tts-1, tts-1-hd
input
string
required
Text to synthesize (max 4096 characters)
voice
string
required
Voice: alloy, echo, fable, onyx, nova, shimmer
Returns audio binary in the specified format.

Speech Recognition

POST /v1/audio/transcriptions
file
file
required
Audio file (mp3, mp4, mpeg, mpga, m4a, wav, webm)
model
string
required
ASR model ID: whisper-1
language
string
ISO-639-1 language code
{
  "text": "Transcribed audio content here..."
}

Embeddings

POST /v1/embeddings
model
string
required
Embedding model: text-embedding-3-large, intfloat/e5-mistral-7b-instruct
input
string | array
required
Text or array of texts to embed
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0023064255, -0.009327292, ...],
      "index": 0
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Music Generation

POST /v1/audio/music
model
string
required
Music model: google/lyria2
prompt
string
required
Text description of the music to generate
duration
number
Duration in seconds (10-300)
Uses async job pattern similar to video generation.