Skip to main content

Architecture Overview

The predictive buffering system uses Gemini API to anticipate audio requirements and optimize playback performance before audio chunks arrive from Lyria RealTime. It operates on two levels: predictive analysis and real-time audio processing:

Core Components

BufferManager Class

The main BufferManager handles both predictive analysis and audio playback:
export class BufferManager {
  constructor(geminiApiKey) {
    this.geminiApiKey = geminiApiKey;
    this.genAI = null;
    this.isInitialized = false;
    
    // Essential audio playback (Web Audio API)
    this.audioContext = null;
    this.audioQueue = [];
    this.isQueueProcessing = false;
    this.nextStartTime = 0;
    this.masterGain = null;
    
    // Predictive buffering state
    this.currentInterpretation = null;
    this.targetBufferSize = 4096;
    
    // Rate limiting for Vertex AI
    this.lastGeminiCall = 0;
    this.geminiCallCooldown = 3000;
  }
}

Initialization

The system initializes with fallback support for basic buffering:
async initialize() {
  try {
    if (!this.geminiApiKey) {
      console.warn('⚠️ No Gemini API key - using basic buffering');
      this.isInitialized = true;
      return true;
    }

    // Initialize Vertex AI client
    this.genAI = new GoogleGenAI({ apiKey: this.geminiApiKey });
    
    this.isInitialized = true;
    return true;
  } catch (error) {
    console.warn('⚠️ Buffer manager initialization failed, using basic buffering:', error.message);
    this.isInitialized = true; // Still allow basic buffering
    return true;
  }
}

Predictive Processing

Pre-Audio Analysis

The system processes interpretations before audio arrives to prepare optimal buffering:
async processInterpretation(interpretation) {
  try {
    console.log('🧠 PREDICTIVE BUFFERING: Processing interpretation BEFORE audio arrives');
    
    this.currentInterpretation = interpretation;
    
    // PREDICTIVE: Prepare buffer parameters
    this.preparePredictiveBuffering(interpretation);
    
    // PREDICTIVE: Generate code for upcoming transition
    if (interpretation.requiresCrossfade) {
      await this.prepareCrossfadeBuffering(interpretation);
    } else {
      await this.prepareLayerBuffering(interpretation);
    }
    
  } catch (error) {
    console.error('❌ Predictive buffering failed:', error);
  }
}

Buffer Parameter Optimization

The system calculates optimal buffer sizes based on musical characteristics:
preparePredictiveBuffering(interpretation) {
  const bpm = interpretation.lyriaConfig?.bpm || 140;
  const density = interpretation.lyriaConfig?.density || 0.5;
  
  // Predict optimal buffer size
  let optimalBufferSize;
  if (bpm > 160 && density > 0.7) {
    optimalBufferSize = 8192; // Fast, dense music
  } else if (bpm < 100) {
    optimalBufferSize = 2048; // Slow music
  } else {
    optimalBufferSize = 4096; // Standard techno/house
  }
  
  this.targetBufferSize = optimalBufferSize;
  console.log('🧠 PREDICTIVE: Buffer optimized for', bpm, 'BPM →', optimalBufferSize, 'samples');
}

Audio Processing Pipeline

Base64 to Float32 Conversion

Lyria audio chunks arrive as base64-encoded 16-bit PCM and must be converted for Web Audio API:
base64ToFloat32AudioData(base64String) {
  const byteCharacters = atob(base64String);
  const byteArray = [];

  for (let i = 0; i < byteCharacters.length; i++) {
    byteArray.push(byteCharacters.charCodeAt(i));
  }

  const audioChunks = new Uint8Array(byteArray);

  // Convert Uint8Array (16-bit PCM) to Float32Array for Web Audio API
  const length = audioChunks.length / 2; // 16-bit audio, so 2 bytes per sample
  const float32AudioData = new Float32Array(length);

  for (let i = 0; i < length; i++) {
    // Combine two bytes into one 16-bit signed integer (little-endian)
    let sample = audioChunks[i * 2] | (audioChunks[i * 2 + 1] << 8);
    // Convert from 16-bit PCM to Float32 (range -1 to 1)
    if (sample >= 32768) sample -= 65536;
    float32AudioData[i] = sample / 32768;
  }

  return float32AudioData;
}

Seamless Audio Playback

The system implements seamless audio playback with crossfading:
async playAudioData() {
  this.isQueueProcessing = true;

  if (!this.audioContext || this.audioContext.state === "closed") {
    this.audioContext = new (window.AudioContext || window.webkitAudioContext)({
      sampleRate: 48000,
      latencyHint: 'interactive'
    });
    this.nextStartTime = this.audioContext.currentTime;
    
    // Create master gain node
    this.masterGain = this.audioContext.createGain();
    this.masterGain.gain.setValueAtTime(0.8, this.audioContext.currentTime);
    this.masterGain.connect(this.audioContext.destination);
  }

  // Resume audio context if suspended
  if (this.audioContext.state === 'suspended') {
    await this.audioContext.resume();
  }

  while (this.audioQueue.length > 0) {
    const audioChunks = this.audioQueue.shift();
    await this.processAudioChunk(audioChunks);
  }
  
  this.isQueueProcessing = false;
}

Stereo Channel Processing

Lyria provides interleaved stereo data that must be split for Web Audio API:
// Create AudioBuffer (Lyria uses 48kHz stereo)
const audioBuffer = this.audioContext.createBuffer(2, audioChunks.length / 2, 48000);

// Split interleaved stereo data into separate channels
const leftChannel = audioBuffer.getChannelData(0);
const rightChannel = audioBuffer.getChannelData(1);

for (let i = 0; i < audioChunks.length / 2; i++) {
  leftChannel[i] = audioChunks[i * 2];
  rightChannel[i] = audioChunks[i * 2 + 1];
}

Crossfading System

Predictive Crossfade Preparation

The system prepares crossfade parameters based on musical characteristics:
async prepareCrossfadeBuffering(interpretation) {
  try {
    const expectedBPM = interpretation.lyriaConfig?.bpm || 140;
    const expectedDensity = interpretation.lyriaConfig?.density || 0.5;
    
    // Adjust crossfade timing based on BPM
    this.crossfadeTime = expectedBPM > 150 ? 0.08 : expectedBPM > 120 ? 0.12 : 0.15;
    
    console.log('🌊 PREDICTIVE: Crossfade ready -', {
      crossfadeTime: this.crossfadeTime,
      expectedBPM,
      expectedDensity
    });
  } catch (error) {
    console.warn('⚠️ Crossfade preparation failed:', error.message);
  }
}

Real-time Crossfade Implementation

Each audio chunk is played with automatic crossfading for seamless transitions:
// Apply crossfading for seamless transitions
const crossfadeTime = 0.05; // 50ms crossfade
fadeGain.gain.setValueAtTime(0, startTime);
fadeGain.gain.linearRampToValueAtTime(1, startTime + crossfadeTime);
fadeGain.gain.setValueAtTime(1, startTime + audioBuffer.duration - crossfadeTime);
fadeGain.gain.linearRampToValueAtTime(0, startTime + audioBuffer.duration);

source.start(startTime);
source.stop(startTime + audioBuffer.duration);

// Update next start time for seamless playback
this.nextStartTime = startTime + audioBuffer.duration - crossfadeTime;

Layer Addition System

Predictive Layer Preparation

For additive layering (non-crossfade transitions), the system optimizes for longer overlaps:
async prepareLayerBuffering(interpretation) {
  try {
    console.log('📫 PREDICTIVE: Setting up layer addition parameters...');
    
    // Optimize for additive layering
    this.overlapBuffer = 0.05; // Longer overlap for layers
    
    console.log('📫 PREDICTIVE: Layer buffering ready');
  } catch (error) {
    console.warn('⚠️ Layer preparation failed:', error.message);
  }
}

Performance Monitoring

Buffer Status Tracking

The system provides real-time buffer status for monitoring:
getBufferStatus() {
  return {
    isInitialized: this.isInitialized,
    hasAudioContext: !!this.audioContext,
    queueLength: this.audioQueue.length,
    isPlaying: this.isQueueProcessing
  };
}

Rate Limiting

Gemini API calls are rate-limited to prevent quota exhaustion:
// Rate limiting for Vertex AI
this.lastGeminiCall = 0;
this.geminiCallCooldown = 3000;
this.lastInterpretationSignature = null;

// Check rate limiting before API calls
const now = Date.now();
if (now - this.lastGeminiCall < this.geminiCallCooldown) {
  console.log('⏰ Rate limiting: Skipping Gemini call');
  return;
}

Integration with Lyria

Audio Chunk Handling

The system processes Lyria audio chunks for immediate playback:
async handleLyriaAudioChunk(base64AudioData) {
  if (!base64AudioData) return;

  try {
    console.log('🎵 BUFFERING: Processing Lyria audio chunk for immediate playback');
    
    // Convert base64 to Float32Array for Web Audio API
    const float32AudioData = this.base64ToFloat32AudioData(base64AudioData);
    
    // Add to audio queue for immediate playback
    this.audioQueue.push(float32AudioData);

    // Start playback if not already playing
    if (!this.isQueueProcessing) {
      await this.playAudioData();
    }
    
  } catch (error) {
    console.error('❌ Failed to handle Lyria audio chunk:', error);
  }
}

Resource Management

Cleanup System

Proper cleanup prevents memory leaks and audio artifacts:
cleanup() {
  try {
    console.log('🧹 Cleaning up Buffer Manager...');
    
    // Stop audio playback
    this.audioQueue = [];
    this.isQueueProcessing = false;
    
    // Cleanup master gain
    if (this.masterGain) {
      try {
        this.masterGain.disconnect();
      } catch (error) {
        // Ignore disconnect errors
      }
      this.masterGain = null;
    }
    
    // Cleanup AudioContext
    if (this.audioContext && this.audioContext.state !== 'closed') {
      this.audioContext.close();
      this.audioContext = null;
    }
    
    // Reset state
    this.isInitialized = false;
    this.nextStartTime = 0;
    
  } catch (error) {
    console.error('❌ Cleanup failed:', error);
  }
}

Factory Function

The system provides a factory function for easy instantiation:
/**
 * Create and return a configured buffer manager instance
 */
export function createBufferManager(geminiApiKey) {
  return new BufferManager(geminiApiKey);
}

export default BufferManager;

Configuration

Environment Variables

# Gemini API Configuration
EXPO_PUBLIC_GEMINI_API_KEY=your_gemini_api_key

Audio Specifications

The system is configured for Lyria’s audio format:
  • Sample Rate: 48kHz
  • Channels: 2 (stereo)
  • Format: 16-bit PCM
  • Encoding: Base64 (from Lyria)
  • Processing: Float32 (for Web Audio API)