5.5 KiB
5.5 KiB
Similarity Threshold Configuration
Overview
The similarity threshold feature allows you to control the precision of vector search results by setting a minimum similarity score required for results to be returned. This ensures that only highly relevant matches are included in search results.
Default Configuration
- Default Threshold:
0.85(85% similarity) - Environment Variable:
VECTOR_SIMILARITY_THRESHOLD - Range: 0.0 to 1.0 (0% to 100% similarity)
API Endpoints
1. Get Current Threshold
GET /api/pgvector/threshold
Response:
{
"threshold": 0.85,
"description": "Minimum similarity score required for search results (0.0 - 1.0)"
}
2. Set Threshold
POST /api/pgvector/threshold
Content-Type: application/json
{
"threshold": 0.90
}
Response:
{
"message": "Similarity threshold updated successfully",
"threshold": 0.9,
"previousThreshold": 0.85
}
3. Advanced Vector Search
POST /api/pgvector/advanced-search
Content-Type: application/json
{
"query": "diabetes mellitus type 2",
"limit": 10,
"category": "ICD10",
"threshold": 0.90
}
Search Methods
Standard Vector Search
- Uses cosine similarity
- Default threshold from environment variable
- Good for general use cases
Advanced Vector Search
- Combines cosine and euclidean similarity metrics
- Weighted scoring: 70% cosine + 30% euclidean
- Higher precision results
- Recommended for production use
Hybrid Search
- Combines vector similarity with text search
- Uses threshold from environment variable
- Best balance of semantic and text matching
Threshold Recommendations
Medical Coding Use Cases
| Use Case | Recommended Threshold | Description |
|---|---|---|
| High Precision Diagnosis | 0.90 - 0.95 | Very strict matching for critical diagnoses |
| Standard Medical Coding | 0.85 - 0.90 | Recommended for most medical coding scenarios |
| General Medical Search | 0.80 - 0.85 | Good balance between precision and recall |
| Research & Exploration | 0.70 - 0.80 | More lenient for research purposes |
Environment-Specific Settings
Production Environment
VECTOR_SIMILARITY_THRESHOLD=0.85
Development Environment
VECTOR_SIMILARITY_THRESHOLD=0.70
Testing Environment
VECTOR_SIMILARITY_THRESHOLD=0.75
Implementation Details
Environment Variable
# Set in .env file
VECTOR_SIMILARITY_THRESHOLD=0.85
# Or set as system environment variable
export VECTOR_SIMILARITY_THRESHOLD=0.85
Runtime Configuration
// Get current threshold
const currentThreshold = pgVectorService.getSimilarityThreshold();
// Set new threshold
pgVectorService.setSimilarityThreshold(0.9);
SQL Query Optimization
The system automatically optimizes SQL queries to:
- Filter results at database level using threshold
- Order results by similarity score
- Use appropriate vector similarity operators
Performance Impact
Higher Threshold (0.90+)
- ✅ Fewer results to process
- ✅ Higher precision
- ❌ May miss relevant results
- ❌ Slower query execution (more filtering)
Lower Threshold (0.70-)
- ✅ Faster query execution
- ✅ More comprehensive results
- ❌ Lower precision
- ❌ More irrelevant results
Optimal Range (0.80-0.90)
- ✅ Good balance of precision and performance
- ✅ Suitable for most medical coding scenarios
- ✅ Reasonable query execution time
Troubleshooting
Common Issues
-
No Results Returned
- Check if threshold is too high
- Verify embeddings are generated
- Check database connection
-
Too Many Results
- Increase threshold value
- Use advanced search method
- Add category filters
-
Performance Issues
- Optimize threshold for your use case
- Use database indexes
- Consider batch processing
Debug Commands
# Check current threshold
curl -X GET http://localhost:3000/api/pgvector/threshold
# Get embedding statistics
curl -X GET http://localhost:3000/api/pgvector/stats
# Test with different thresholds
curl -X POST http://localhost:3000/api/pgvector/advanced-search \
-H "Content-Type: application/json" \
-d '{"query": "test", "threshold": 0.80}'
Best Practices
- Start with Default: Begin with threshold 0.85
- Test Incrementally: Adjust threshold in small increments (0.05)
- Monitor Results: Evaluate precision vs. recall trade-offs
- Environment Specific: Use different thresholds for different environments
- Document Changes: Keep track of threshold changes and their impact
Migration Guide
From Previous Version
If upgrading from a version without configurable threshold:
-
Set Environment Variable:
VECTOR_SIMILARITY_THRESHOLD=0.85 -
Update Search Calls:
// Old way (hardcoded 0.7) const results = await service.vectorSearch(query, limit, category, 0.7); // New way (uses environment variable) const results = await service.vectorSearch(query, limit, category); -
Test New Thresholds:
# Test with current threshold curl -X GET http://localhost:3000/api/pgvector/threshold # Adjust if needed curl -X POST http://localhost:3000/api/pgvector/threshold \ -H "Content-Type: application/json" \ -d '{"threshold": 0.90}'