duohub Documentation home page
Search...
⌘K
Support
Search...
Navigation
Essentials
Improve Latency
Documentation
API Reference
Blog
Get Started
Introduction
Quickstart
Essentials
Add Content
Vector or Graph?
Create a Knowledge Graph
Create a Vector Store
Retrieval Modes
Improve Latency
Integration
Use with Pipecat
Use with AWS Lambda
On this page
Minimize Client-Server Communication
Consider Geographic Proximity
Implement Data Streaming
Utilize Caching Effectively
Choose Optimized Services
Essentials
Improve Latency
Tips on how to get the best latency from your application
Minimize Client-Server Communication
Avoid making multiple calls between the client (user’s machine) and server (cloud processes). Instead, aim to:
Make a single call to the cloud
Keep cross-service communication within the cloud infrastructure
Batch requests where possible
Consider Geographic Proximity
Location matters significantly for latency:
duohub automatically routes queries to the nearest server
Ensure your servers are strategically located close to your user base
Use geo-distributed infrastructure when possible
Implement Data Streaming
Leverage streaming capabilities:
Stream data whenever possible
Begin processing data before the entire stream completes
Enables parallel processing and faster response times
Utilize Caching Effectively
Implement caching strategies:
Use caching for frequently accessed data
Consider Redis cache for high-performance caching needs
Implement appropriate cache invalidation policies
Choose Optimized Services
Select services designed for low latency:
Use platforms like duohub, Vercel, AWS Lambda @ Edge, etc. that are optimized for performance
Look for services that offer:
Geo-distribution
Built-in caching
Optimized network routing
Retrieval Modes
Use with Pipecat
Assistant
Responses are generated using AI and may contain mistakes.