This code implements a REST API service layer on top of the RAG engine we previously discussed. Here's a detailed explanation of each component:
1. Pydantic Request/Response Models
QueryRequest
Function: Defines the structure for standard RAG query requests
Features:
-
query
: The user's question/text to process -
context
: Optional additional context for the query -
max_tokens
: Override for maximum tokens in response -
temperature
: Override for response creativity -
include_sources
: Whether to include source documents in response -
stream
: Whether to stream the response (though streaming implementation isn't shown)
ChatMessage & ChatRequest
Function: Support chat-style interactions with conversation history
Features:
-
role
: Participant role (system, user, assistant) -
content
: Message content -
timestamp
: When the message was created -
messages
: Complete conversation history for context-aware responses
AnalysisRequest
Function: Specialized request for semiconductor data analysis
Features:
-
data_type
: Type of data to analyze (process_data, test_results, etc.) -
data
: The actual data to analyze -
analysis_type
: Type of analysis to perform (trend, correlation, etc.) -
context
: Additional context for the analysis
TroubleshootingRequest
Function: Specialized request for equipment/process troubleshooting
Features:
-
issue_description
: Description of the problem -
symptoms
: List of observed symptoms -
equipment_id
: Specific equipment involved -
process_step
: Manufacturing process step where issue occurs -
recent_changes
: Any recent changes that might be relevant -
data_context
: Additional data context for troubleshooting
OptimizationRequest
Function: Specialized request for process optimization
Features:
-
target_metric
: What to optimize (yield, throughput, etc.) -
current_performance
: Current performance metrics -
constraints
: Any constraints on the optimization -
process_parameters
: Current process parameters -
historical_data
: Historical data for analysis
HealthResponse
Function: Standardized health check response format
Features:
-
status
: Overall service status -
timestamp
: When the health check was performed -
rag_status
: Detailed status of RAG components -
service_info
: General service information
2. Configuration Management
load_config()
Function: Loads service configuration from YAML file or provides defaults
Features:
- Looks for config file at
config/rag_config.yaml
- Falls back to sensible defaults if file doesn't exist
- Handles both RAG engine and security configurations
- Provides robust error handling with logging
verify_api_key()
Function: API key authentication middleware
Features:
- Checks if authentication is enabled in configuration
- Validates API key against configured list of valid keys
- Integrates with FastAPI's dependency injection system
- Returns appropriate HTTP 401 errors for invalid/missing keys
3. Service Lifecycle Management
initialize_services()
Function: Initializes the RAG engine with proper configuration
Features:
- Loads configuration using
load_config()
- Creates RAGConfig object with appropriate values
- Initializes the global RAG engine instance using
get_rag_engine()
- Provides comprehensive error handling and logging
startup_event()
Function: FastAPI startup event handler
Features:
- Calls
initialize_services()
during application startup - Logs successful service initialization
- Records metrics for service startup
- Exits application on initialization failure
shutdown_event()
Function: FastAPI shutdown event handler
Features:
- Gracefully shuts down the RAG engine
- Logs service shutdown completion
- Records metrics for service stoppage
- Handles errors during shutdown process
signal_handler()
Function: Handles OS signals for graceful shutdown
Features:
- Catches SIGINT and SIGTERM signals
- Logs the received signal
- Initiates graceful application shutdown
4. API Endpoint Handlers
health_check() - GET /health
Function: Service health monitoring endpoint
Features:
- Checks if RAG engine is initialized
- Gets detailed health status from RAG engine
- Returns standardized health response format
- Provides service metadata (name, version, etc.)
process_query() - POST /query
Function: Core RAG query processing endpoint
Features:
- Validates authentication
- Allows overriding model parameters per request
- Processes query using RAG engine
- Formats response with answer and metadata
- Optionally includes source documents (truncated for efficiency)
chat_completion() - POST /chat
Function: Chat-completion style endpoint
Features:
- Extracts the latest user message from conversation history
- Builds conversation context from message history
- Processes query with conversation context
- Returns response in OpenAI-compatible format
- Includes RAG-specific metadata in response
analyze_data() - POST /analyze
Function: Specialized semiconductor data analysis endpoint
Features:
- Constructs analysis-specific query from request data
- Processes with RAG engine
- Returns analysis results with type-specific metadata
troubleshoot_issue() - POST /troubleshoot
Function: Specialized troubleshooting endpoint
Features:
- Constructs detailed troubleshooting query from symptoms and context
- Processes with RAG engine
- Returns troubleshooting guidance with issue context
optimize_process() - POST /optimize
Function: Specialized process optimization endpoint
Features:
- Constructs optimization query from performance data and constraints
- Processes with RAG engine
- Returns optimization recommendations with technical justification
search_knowledge_base() - GET /knowledge/search
Function: Direct knowledge base access endpoint
Features:
- Performs semantic search without LLM generation
- Supports collection filtering
- Returns raw search results with scores and metadata
- Includes query intent classification
get_metrics() - GET /metrics
Function: Service metrics endpoint
Features:
- Returns basic service metrics
- Designed for integration with monitoring systems
- Placeholder implementation (would connect to actual metrics system)
5. Main Application Setup
FastAPI App Configuration
Features:
- Sets application title, description, and version
- Adds CORS middleware for cross-origin requests
- Uses HTTPBearer security scheme for API key authentication
main()
Function: Application entry point
Features:
- Sets up signal handlers for graceful shutdown
- Configures server host and port
- Starts uvicorn server with appropriate settings
Key Architectural Features
- RESTful API Design: Clean, standardized endpoints with appropriate HTTP methods
- Domain Specialization: Semiconductor-specific endpoints with tailored request formats
- Authentication: Configurable API key authentication with proper security practices
- Error Handling: Comprehensive error handling with appropriate HTTP status codes
- Logging and Metrics: Integrated logging and metrics collection throughout
- Configuration Management: Flexible configuration with file-based and default options
- Health Monitoring: Built-in health checks for service monitoring
- Graceful Shutdown: Proper handling of shutdown signals and cleanup
- OpenAPI Compliance: Automatic API documentation generation through FastAPI
- Extensibility: Modular design makes it easy to add new endpoints or functionality
This service layer provides a robust, production-ready API for interacting with the semiconductor RAG engine, with appropriate security, monitoring, and domain-specific functionality.
RAG Engine Service Architecture with REST API
I'll create a comprehensive Mermaid diagram showing the architecture and information flow for this RAG Engine Service with REST API.
Information Flow for API Requests
API Endpoints Detail
Endpoint | Method | Purpose | Input | Output |
---|---|---|---|---|
/query |
POST | Process a standard RAG query |
QueryRequest with query text |
RAG response with answer and sources |
/chat |
POST | Chat completion with conversation history |
ChatRequest with message history |
OpenAI-compatible chat response |
/analyze |
POST | Analyze semiconductor data |
AnalysisRequest with data and analysis type |
Analysis results with technical insights |
/troubleshoot |
POST | Troubleshoot process issues |
TroubleshootingRequest with symptoms and context |
Root cause analysis and solutions |
/optimize |
POST | Optimize process parameters |
OptimizationRequest with metrics and constraints |
Optimization recommendations |
/knowledge/search |
GET | Direct knowledge base search | Query string and collection filters | Raw search results without LLM generation |
/health |
GET | Service health status | None | Health status of RAG engine and dependencies |
/metrics |
GET | Service performance metrics | None | Operational metrics and statistics |
Security Architecture
Key Architectural Features
-
RESTful API Design:
- Standard HTTP methods and status codes
- JSON request/response format
- Comprehensive error handling
- OpenAPI documentation automatically generated by FastAPI
-
Authentication & Security:
- Optional API key authentication
- CORS enabled for cross-origin requests
- Input validation using Pydantic models
- Secure dependency injection for auth
-
Specialized Semiconductor Endpoints:
- Domain-specific endpoints for analysis, troubleshooting, and optimization
- Structured input formats for technical queries
- Context-aware processing for manufacturing scenarios
-
Service Management:
- Health checks for all dependencies
- Metrics endpoint for monitoring
- Graceful startup and shutdown procedures
- Configuration management with YAML files
-
Integration Capabilities:
- Compatible with OpenAI's chat format
- Support for streaming responses (SSE)
- Direct access to knowledge base without generation
- Flexible context passing for complex workflows
This architecture provides a robust, scalable foundation for deploying RAG capabilities in semiconductor manufacturing environments, with appropriate security, monitoring, and domain-specific functionality.
Excellent! You've shared the REST API service layer that exposes the previously defined RAG engine as a production-ready, semiconductor-focused AI backend using FastAPI.
Below is a comprehensive breakdown of this code, followed by a Mermaid architecture diagram showing the full system with information flow, components, and endpoints.
π§ Semiconductor RAG Engine API β Full Analysis
This service wraps the SemiconductorRAGEngine
from rag_manager.py
into a RESTful microservice with:
- Multiple domain-specific endpoints
- Authentication support
- Health checks
- Configurable behavior
- Streaming capability (planned)
- Observability & metrics
π§ Key Components Overview
Component | Purpose |
---|---|
FastAPI |
Modern Python web framework for building APIs with automatic OpenAPI/Swagger docs |
RAGConfig , SemiconductorRAGEngine
|
Reused from core engine |
Pydantic Models |
Define request/response schemas |
HTTPBearer |
Optional API key authentication |
YAML Config Loader |
External configuration via config/rag_config.yaml
|
Uvicorn |
ASGI server for async performance |
SSE (planned) |
Server-Sent Events for streaming responses |
π¦ Configuration: load_config()
& RAGConfig
πΉ config/rag_config.yaml
(Example)
rag:
vector_db_url: http://vector-db:8091
llm_service_url: http://llm-service:8092
max_context_length: 4000
top_k_documents: 10
similarity_threshold: 0.7
default_model: llama2-7b-semiconductor
max_tokens: 1024
temperature: 0.3
security:
enable_auth: true
api_keys:
- "semiconductor_api_key_123"
β Allows external configuration without code changes.
π Authentication: verify_api_key()
- Uses
HTTPBearer
scheme (Authorization: Bearer <api_key>
). - If
security.enable_auth == True
, validates againstapi_keys
list. - Disabled by default if not configured.
π Optional but production-ready security model.
π Lifecycle: Startup & Shutdown
@app.on_event("startup")
- Loads config
- Initializes
RAGConfig
- Creates singleton
SemiconductorRAGEngine
viaget_rag_engine()
- Logs startup success or exits on failure
@app.on_event("shutdown")
- Safely shuts down the RAG engine (closes HTTP clients)
- Logs shutdown event
β Ensures graceful boot and cleanup.
π‘ API Endpoints Summary
Endpoint | Method | Purpose |
---|---|---|
GET /health |
Health check | System status and dependent services |
POST /query |
Core RAG | General Q&A with optional config override |
POST /chat |
Chat interface | Conversational AI with history |
POST /analyze |
Domain-specific | Analyze process/test/yield data |
POST /troubleshoot |
Domain-specific | Root cause + solutions for issues |
POST /optimize |
Domain-specific | Process improvement recommendations |
GET /knowledge/search |
Retrieval-only | Raw semantic search (no LLM) |
GET /metrics |
Monitoring | Service-level metrics (stubbed for Prometheus) |
π₯ Request Models (Pydantic)
1. QueryRequest
{
"query": "Why is CD uniformity poor?",
"context": { "step": "lithography" },
"max_tokens": 512,
"temperature": 0.5,
"include_sources": true,
"stream": false
}
Used for general queries with optional overrides.
2. ChatRequest
{
"messages": [
{ "role": "user", "content": "What causes overlay drift?" },
{ "role": "assistant", "content": "Thermal expansion..." },
{ "role": "user", "content": "How do I fix it?" }
],
"context": {},
"stream": false
}
Supports multi-turn conversations. Extracts last user message as query.
3. AnalysisRequest
{
"data_type": "process_data",
"data": { "temp": 120, "pressure": 50 },
"analysis_type": "anomaly",
"context": "After chamber cleaning"
}
Tailored for data-driven analysis (trend, correlation, anomaly, summary).
4. TroubleshootingRequest
{
"issue_description": "High defect count",
"symptoms": ["particles", "scratches"],
"equipment_id": "ETCH-007",
"process_step": "plasma etch",
"recent_changes": ["new gas line installed"]
}
Guides structured troubleshooting with root cause analysis.
5. OptimizationRequest
{
"target_metric": "yield",
"current_performance": { "current_yield": 92.1 },
"constraints": { "max_temp": 150 },
"process_parameters": { "power": 800, "pressure": 45 }
}
For process optimization with constraints and justification.
π€ Response Format Example (/query
)
{
"answer": "Poor CD uniformity can result from...",
"confidence": 0.87,
"response_time_ms": 1420,
"tokens_generated": 234,
"model_used": "llama2-7b-semiconductor",
"query_intent": {
"type": "analysis",
"confidence": 0.91,
"entities": ["CD", "nm"],
"process_modules": ["lithography"],
"equipment_types": []
},
"sources": [
{
"source": "litho_handbook_v3.pdf",
"collection": "technical_reports",
"score": 0.88,
"content": "Critical dimension variation..."
}
]
}
β Rich, structured, traceable output.
π οΈ Special Endpoints
GET /knowledge/search
Bypasses LLM β returns raw search results only.
Useful for:
- Building search UIs
- Debugging retrieval
- Integrating with other systems
GET /metrics
Stubbed for future Prometheus integration.
Could expose:
rag_queries_total
rag_response_time_seconds_bucket
rag_confidence_avg
π§± Architecture Diagram (Mermaid)
Hereβs the complete system architecture with all components and data flow.
π Information Flow Example: /troubleshoot
-
User β
POST /troubleshoot
with issue details -
FastAPI validates input using
TroubleshootingRequest
- Auth checks API key
- Service builds a natural language query from structured fields
-
RAG Engine:
- Classifies intent β
troubleshooting
- Searches
failure_analysis
,equipment_manuals
, etc. - Builds context
- Sends to LLM β generates troubleshooting steps
- Classifies intent β
-
Response includes:
- Guidance
- Sources
- Confidence
- Timing
- Logs & metrics recorded
π― Turns structured inputs into actionable expert advice.
π Auto-Generated API Docs
FastAPI automatically provides:
-
Swagger UI:
http://localhost:8093/docs
-
ReDoc:
http://localhost:8093/redoc
Includes:
- Interactive testing
- Schema definitions
- Example payloads
- Status codes
π Instant developer documentation.
π‘οΈ Error Handling & Resilience
Failure Mode | Handling |
---|---|
RAG engine not ready | 503 Service Unavailable |
Invalid request | 400 Bad Request |
Missing auth | 401 Unauthorized |
Internal error |
500 Internal Server Error + log |
Service down | Health check reflects degraded status |
β Robust and observable.
π Deployment Notes
Run with:
python rag_service.py
Exposes:
- Port 8093: Main API
-
Endpoints:
-
GET /health
β Monitoring -
GET /docs
β Swagger UI
-
Suggested Docker Setup:
FROM python:3.10
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "rag_service.py"]
Deployable in Kubernetes, Docker, or bare metal.
β Summary: Key Features
Feature | Benefit |
---|---|
Domain-Specific Endpoints | Tailored for semiconductor engineers |
Unified RAG Backend | Single engine powers chat, analysis, troubleshooting |
Configurable via YAML | No rebuild needed for endpoint tuning |
Secure (Optional) | API key protection |
Observable | Logging, metrics, health checks |
Extensible | Easy to add new endpoints (e.g., /predict_yield ) |
Production-Ready | Async, scalable, resilient |
π Future Enhancements
Idea | Description |
---|---|
Streaming Responses | Use StreamingResponse or EventSourceResponse for real-time LLM output |
Prometheus Integration | Export metrics for monitoring dashboards |
Rate Limiting | Prevent abuse using fastapi-limiter
|
Caching | Cache frequent queries with Redis |
Feedback Endpoint | Allow users to rate responses for improvement |
Multi-Tenant Support | Isolate knowledge by fab or product line |
π Conclusion
This rag_service.py
file transforms the core RAG engine into a full-fledged AI microservice tailored for semiconductor manufacturing.
It enables:
- Engineers to ask questions in natural language
- Systems to analyze data and recommend actions
- Support teams to troubleshoot faster
- Process engineers to optimize yield
With clean separation of concerns, strong typing, async support, and excellent observability, this is a model implementation of an enterprise RAG service.
Top comments (0)