Skip to main content

Overview

KrispVivaTurn is a turn analyzer that uses Krisp’s VIVA SDK turn detection (Tt) API to determine when a user has finished speaking. Unlike the Smart Turn model which analyzes audio in batches when VAD detects a pause, KrispVivaTurn processes audio frame-by-frame in real time using Krisp’s streaming model.

Installation

KrispVivaTurn requires the Krisp Python SDK. See the Krisp VIVA guide for installation instructions.

Environment Variables

You need to provide the path to the Krisp turn detection model file (.kef extension). This can either be done by setting the KRISP_VIVA_TURN_MODEL_PATH environment variable or by passing model_path to the constructor. For SDK v1.6.1+, you also need to provide a Krisp API key via the api_key constructor parameter or the KRISP_VIVA_API_KEY environment variable.
KRISP_VIVA_TURN_MODEL_PATH=/path/to/krisp-viva-tt-v2.kef
KRISP_VIVA_API_KEY=your_api_key_here

Configuration

The KrispTurnParams class configures turn detection behavior:
threshold
float
default:"0.5"
Probability threshold for turn completion (0.0 to 1.0). Higher values require more confidence before marking a turn as complete.
frame_duration_ms
int
default:"20"
Frame duration in milliseconds for turn detection. Supported values: 10, 15, 20, 30, 32.

Constructor Parameters

model_path
Optional[str]
default:"None"
Path to the Krisp turn detection model file (.kef extension). If not provided, falls back to the KRISP_VIVA_TURN_MODEL_PATH environment variable.
sample_rate
Optional[int]
default:"None"
Audio sample rate (will be set by the transport if not provided).
params
KrispTurnParams
default:"KrispTurnParams()"
Configuration parameters for turn detection.
api_key
str
default:"\"\""
Krisp SDK API key for licensing (required for SDK v1.6.1+). If empty, falls back to the KRISP_VIVA_API_KEY environment variable.

Example

from pipecat.audio.turn.krisp_viva_turn import KrispVivaTurn
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies

# Configure Krisp turn detection via user turn strategies
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(
        user_turn_strategies=UserTurnStrategies(
            stop=[TurnAnalyzerUserTurnStopStrategy(
                turn_analyzer=KrispVivaTurn()
            )]
        ),
        vad_analyzer=SileroVADAnalyzer(),
    ),
)

How It Works

KrispVivaTurn processes audio as a streaming model, analyzing each audio frame in real time:
  1. Frame-by-frame processing: Each incoming audio frame is processed by the Krisp turn detection model, which outputs a probability that the user’s turn is complete.
  2. Speech tracking: VAD signals are used to track when speech starts and stops.
  3. Threshold crossing: When the model’s probability exceeds the configured threshold after speech has been detected, the turn is marked as complete.
This differs from the Smart Turn model which buffers audio and runs batch inference when VAD detects a pause. KrispVivaTurn makes its decision continuously as audio flows through, which can result in faster turn detection.

Notes

  • Requires a valid Krisp SDK license and turn detection model file
  • Works with any VAD analyzer (Silero is recommended)
  • Emits TurnMetricsData with end-to-end processing time, measuring the interval from VAD speech-to-silence transition to the model crossing the probability threshold