Video Creation Process

Learn the steps and key endpoints for creating videos with the Visla Open API.

Setup Open API Key/Secret

Visla provides an API key and secret for secure access to the OpenAPI. Include these credentials in your API request headers using an encrypted format. For authentication and encryption guidelines, see authenticated interactions.

Video Creation Methods

Visla’s OpenAPI supports multiple ways to create videos depending on your input type — webpage, text script, or document.

Idea to Video

Provide a brief idea or topic to generate a video. Visla uses AI to expand the idea into a structured script and automatically create a cohesive video with narration and visuals.

For full request and response details, see idea to video

Webpage to Video

Provide a webpage URL in your request, and Visla will automatically extract the content and generate a video. This is the simplest way to create videos directly from online articles or landing pages.

For full request and response details, see webpage to video

Script to Video

Submit your own text script to generate a video. Visla uses the provided script as narration and visual guidance to automatically create a cohesive video.)

You can control how the script is processed using the script_text_mode parameter:

  • ai_rewrite – Visla will rewrite and summarize the provided script using AI to create a more concise narration structure.
  • direct_script – Visla will use the script exactly as provided without rewriting.

For full request and response details, see script to video

Visual to Video

Provide images or short video clips to generate a story-driven video. Visla organizes the visual assets and automatically creates a cohesive video with transitions, pacing, and supporting narration.

To create a video from visual assets:

  1. Upload your images or video clips using the resource upload process below.
  2. Create a video project using the visual to video endpoint.

You can control the structure of the generated video using the video_style parameter:

  • montage – A silent visual montage without narration.
  • storytelling – An AI-generated narrative video that tells a story.
  • explainer - An AI-generated video designed to explain a concept or topic.

Speech to Video

Submit an audio file or speech-centric video to generate a video. Visla analyzes the spoken content and transforms it into a polished video with synchronized visuals and narration.

To create a video from speech content:

  1. Upload your audio or video file using the resource upload process below.
  2. Create a video project using the speech to video endpoint with the uploaded resource URLs.

You can control the output format of the generated video using the project_function parameter:

  • SPEECH_TO_VIDEO_SUMMARY – Generates a concise highlight-style video summarizing the key points from the speech.
  • SPEECH_TO_VIDEO_FULL_LENGTH – Produces a full-length video that closely follows the entire spoken content with synchronized visuals.

Document to Video

To create a video from a document:

  1. Upload your document using the resource upload process below.
  2. Create a video project using the document to video endpoint with the uploaded document URL.

Control how the document is used to generate the video with the doc_usage parameter:

  • page_by_page_walkthrough (default)
    • Equivalent to: Visla AI agent "Present Your Visuals".
    • Scenes: One scene per page. The target video duration is determined by the number of pages and user-defined pacing.
    • Narration: AI explains each page based on its visuals and text content.
    • Footage: The page image only (no stock footage).
  • content_source
    • Equivalent to: Visla AI agent "Convert Text to Video".
    • Scenes: AI composes scenes based on user-defined target video’s duration and pacing.
    • Narration: Document text is the content source for the voiceover script.
    • Footage: Mix of relevant page images and stock footage.
📘

PPT override: For .ppt / .pptx files, page_by_page_walkthrough is always applied. Any value provided for doc_usage is ignored.

  • speaker_notes_verbatim

    If the input source is a PowerPoint (PPT) file and speaker notes are provided, the notes will be read verbatim in the video narration. Default is false.

Key Workflow Steps

StepEndpointPurposeWait Time
1GET /openapi/v1/project/get-asset-upload-urlGet S3 upload URLInstant
2PUT {uploadUrl}Upload PDF to S310s–2min
3POST /openapi/v1/project/doc-to-videoCreate project & wait for editing status2–10min
4POST /openapi/v1/project/{projectUuid}/export-videoExport videoAsync via webhook
📘

video_duration_in_seconds applies to create video from script, webpage and doc when doc_usage = content_source. In other modes (e.g., page_by_page_walkthrough), it may be ignored and the system will choose the length.

Resource upload process

Step 1: Get Upload URL

Obtain a pre-signed URL for file upload. This URL allows you to securely upload your resource file to S3.

  • Parameters:
    • mediaType: Resource type
    • suffix: File extension
  • Purpose: Generate a secure, temporary URL for file upload
  • Response: Returns upload URL and upload ID

Step 2: Upload File to S3

Once you have the upload URL, you must upload your resource file directly to S3 using HTTP PUT method

  • Method: HTTP PUT
  • URL: Use the exact uploadUrl from step 1 response
  • Headers:
    • Content-Type: Set appropriate MIME type (application/octet-stream, application/pdf, application/vnd.ms-powerpoint, etc.)
    • Do NOT include authentication headers for S3 upload
  • Body: Raw binary file data
  • Timeout: Complete upload within 10 minutes (URL expires after 600 seconds)
  • Security: Uses pre-signed URL for secure, time-limited access
# Use the pre-signed URL from Step 1 — no auth headers needed for S3

with open("resource.pdf", "rb") as file:
    response = requests.put(
        upload_url,  # URL from step 1
        data=file.read(),  # Raw binary data
        headers={
            "Content-Type": "application/octet-stream"
        }
    )
    
if response.status_code == 200:
    print("Upload successful")
else:
    print(f"Upload failed: {response.status_code}")

File Constraints

  • File Size: 100MB
  • Supported formats: PDF, PPT, PPTX, PPS, PPSX, MP3, WAV, M4A, AAC, FLAC, OGG, MP4, MOV, AVI, WEBM, MKV

Video Status Lifecycle

Understanding the project status is crucial for successful integration:

StatusDescriptionNext Action
initProject created; initial setup in progress. AI is analyzing document content.Wait (typically 1–2 minutes)
preparationBasic configuration is being set (sound, avatar, subtitles, etc.) and initial video generation begins.Wait while video is generated (typically 3–10 minutes)
generated_videosVideo content is being created.Wait 5–15 minutes
editingProject is ready for preview.Proceed with editing or exporting
errorThe process failed.

After creating a video project, poll the Get Project Info API repeatedly to check the project's status. Once the status reaches editing, the project is ready for preview or export.

Clip Status Lifecycle

After exporting a project, monitoring the clip status lifecycle ensures you know exactly when the video is ready for use:

StatusDescriptionTypical Use Case
initClip just created; initial setup in progressImmediately after export request
downloadingDownloading source contentProcessing remote assets
uploadingUploading to storageFile transfer in progress
processingVideo processing in progressEncoding, compression, etc.
publishingFinalizing and publishingMaking clip available
completedClip ready for download or viewingFully processed and available
failedProcessing failedAn error occurred during processing

After exporting a project, poll the Get Clip Info API repeatedly to monitor clip progress. The lifecycle ends when the status reaches completed, indicating the video is fully processed and ready for download or viewing.