try fix similaruuty and add seed for master excel icd
This commit is contained in:
73
docs/ENVIRONMENT_VARIABLES.md
Normal file
73
docs/ENVIRONMENT_VARIABLES.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# Environment Variables
|
||||
|
||||
## Database Configuration
|
||||
- `DATABASE_URL`: PostgreSQL connection string
|
||||
- Example: `postgresql://username:password@localhost:5432/claim_guard_db`
|
||||
|
||||
## OpenAI Configuration
|
||||
- `OPENAI_API_KEY`: Your OpenAI API key for embeddings
|
||||
- `OPENAI_API_MODEL`: OpenAI model for embeddings (default: `text-embedding-ada-002`)
|
||||
|
||||
## Vector Search Configuration
|
||||
- `VECTOR_SIMILARITY_THRESHOLD`: Minimum similarity threshold for vector search (default: `0.85`)
|
||||
- Range: 0.0 to 1.0
|
||||
- Higher values = more strict matching
|
||||
- Recommended: 0.85 for production, 0.7 for development
|
||||
|
||||
## Application Configuration
|
||||
- `PORT`: Application port (default: 3000)
|
||||
- `NODE_ENV`: Environment mode (development/production)
|
||||
|
||||
## Example .env file
|
||||
```bash
|
||||
# Database
|
||||
DATABASE_URL="postgresql://username:password@localhost:5432/claim_guard_db"
|
||||
|
||||
# OpenAI
|
||||
OPENAI_API_KEY="your-openai-api-key-here"
|
||||
OPENAI_API_MODEL="text-embedding-ada-002"
|
||||
|
||||
# Vector Search
|
||||
VECTOR_SIMILARITY_THRESHOLD=0.85
|
||||
|
||||
# App
|
||||
PORT=3000
|
||||
NODE_ENV=development
|
||||
```
|
||||
|
||||
## Similarity Threshold Guidelines
|
||||
|
||||
### Production Environment
|
||||
- **High Precision**: 0.90 - 0.95 (very strict matching)
|
||||
- **Standard**: 0.85 - 0.90 (recommended for most use cases)
|
||||
- **Balanced**: 0.80 - 0.85 (good balance between precision and recall)
|
||||
|
||||
### Development Environment
|
||||
- **Testing**: 0.70 - 0.80 (more lenient for testing)
|
||||
- **Debugging**: 0.60 - 0.70 (very lenient for development)
|
||||
|
||||
### How to Set Threshold
|
||||
|
||||
#### Via Environment Variable
|
||||
```bash
|
||||
export VECTOR_SIMILARITY_THRESHOLD=0.90
|
||||
```
|
||||
|
||||
#### Via .env file
|
||||
```bash
|
||||
VECTOR_SIMILARITY_THRESHOLD=0.90
|
||||
```
|
||||
|
||||
#### Via API (Runtime)
|
||||
```bash
|
||||
POST /api/pgvector/threshold
|
||||
{
|
||||
"threshold": 0.90
|
||||
}
|
||||
```
|
||||
|
||||
## Impact of Threshold Changes
|
||||
|
||||
- **Higher Threshold (0.90+)**: Fewer results, higher precision, more relevant matches
|
||||
- **Lower Threshold (0.70-)**: More results, lower precision, may include less relevant matches
|
||||
- **Optimal Range (0.80-0.90)**: Good balance between precision and recall for most medical coding use cases
|
||||
Reference in New Issue
Block a user