Y.C Lee

Posted on Aug 28

Code:Implement RAG engine with semantic search_Service

This code implements a REST API service layer on top of the RAG engine we previously discussed. Here's a detailed explanation of each component:

1. Pydantic Request/Response Models

QueryRequest

Function: Defines the structure for standard RAG query requests
Features:

query: The user's question/text to process
context: Optional additional context for the query
max_tokens: Override for maximum tokens in response
temperature: Override for response creativity
include_sources: Whether to include source documents in response
stream: Whether to stream the response (though streaming implementation isn't shown)

ChatMessage & ChatRequest

Function: Support chat-style interactions with conversation history
Features:

role: Participant role (system, user, assistant)
content: Message content
timestamp: When the message was created
messages: Complete conversation history for context-aware responses

AnalysisRequest

Function: Specialized request for semiconductor data analysis
Features:

data_type: Type of data to analyze (process_data, test_results, etc.)
data: The actual data to analyze
analysis_type: Type of analysis to perform (trend, correlation, etc.)
context: Additional context for the analysis

TroubleshootingRequest

Function: Specialized request for equipment/process troubleshooting
Features:

issue_description: Description of the problem
symptoms: List of observed symptoms
equipment_id: Specific equipment involved
process_step: Manufacturing process step where issue occurs
recent_changes: Any recent changes that might be relevant
data_context: Additional data context for troubleshooting

OptimizationRequest

Function: Specialized request for process optimization
Features:

target_metric: What to optimize (yield, throughput, etc.)
current_performance: Current performance metrics
constraints: Any constraints on the optimization
process_parameters: Current process parameters
historical_data: Historical data for analysis

HealthResponse

Function: Standardized health check response format
Features:

status: Overall service status
timestamp: When the health check was performed
rag_status: Detailed status of RAG components
service_info: General service information

2. Configuration Management

load_config()

Function: Loads service configuration from YAML file or provides defaults
Features:

Looks for config file at config/rag_config.yaml
Falls back to sensible defaults if file doesn't exist
Handles both RAG engine and security configurations
Provides robust error handling with logging

verify_api_key()

Function: API key authentication middleware
Features:

Checks if authentication is enabled in configuration
Validates API key against configured list of valid keys
Integrates with FastAPI's dependency injection system
Returns appropriate HTTP 401 errors for invalid/missing keys

3. Service Lifecycle Management

initialize_services()

Function: Initializes the RAG engine with proper configuration
Features:

Loads configuration using load_config()
Creates RAGConfig object with appropriate values
Initializes the global RAG engine instance using get_rag_engine()
Provides comprehensive error handling and logging

startup_event()

Function: FastAPI startup event handler
Features:

Calls initialize_services() during application startup
Logs successful service initialization
Records metrics for service startup
Exits application on initialization failure

shutdown_event()

Function: FastAPI shutdown event handler
Features:

Gracefully shuts down the RAG engine
Logs service shutdown completion
Records metrics for service stoppage
Handles errors during shutdown process

signal_handler()

Function: Handles OS signals for graceful shutdown
Features:

Catches SIGINT and SIGTERM signals
Logs the received signal
Initiates graceful application shutdown

4. API Endpoint Handlers

health_check() - GET /health

Function: Service health monitoring endpoint
Features:

Checks if RAG engine is initialized
Gets detailed health status from RAG engine
Returns standardized health response format
Provides service metadata (name, version, etc.)

process_query() - POST /query

Function: Core RAG query processing endpoint
Features:

Validates authentication
Allows overriding model parameters per request
Processes query using RAG engine
Formats response with answer and metadata
Optionally includes source documents (truncated for efficiency)

chat_completion() - POST /chat

Function: Chat-completion style endpoint
Features:

Extracts the latest user message from conversation history
Builds conversation context from message history
Processes query with conversation context
Returns response in OpenAI-compatible format
Includes RAG-specific metadata in response

analyze_data() - POST /analyze

Function: Specialized semiconductor data analysis endpoint
Features:

Constructs analysis-specific query from request data
Processes with RAG engine
Returns analysis results with type-specific metadata

troubleshoot_issue() - POST /troubleshoot

Function: Specialized troubleshooting endpoint
Features:

Constructs detailed troubleshooting query from symptoms and context
Processes with RAG engine
Returns troubleshooting guidance with issue context

optimize_process() - POST /optimize

Function: Specialized process optimization endpoint
Features:

Constructs optimization query from performance data and constraints
Processes with RAG engine
Returns optimization recommendations with technical justification

search_knowledge_base() - GET /knowledge/search

Function: Direct knowledge base access endpoint
Features:

Performs semantic search without LLM generation
Supports collection filtering
Returns raw search results with scores and metadata
Includes query intent classification

get_metrics() - GET /metrics

Function: Service metrics endpoint
Features:

Returns basic service metrics
Designed for integration with monitoring systems
Placeholder implementation (would connect to actual metrics system)

5. Main Application Setup

FastAPI App Configuration

Features:

Sets application title, description, and version
Adds CORS middleware for cross-origin requests
Uses HTTPBearer security scheme for API key authentication

main()

Function: Application entry point
Features:

Sets up signal handlers for graceful shutdown
Configures server host and port
Starts uvicorn server with appropriate settings

Key Architectural Features

RESTful API Design: Clean, standardized endpoints with appropriate HTTP methods
Domain Specialization: Semiconductor-specific endpoints with tailored request formats
Authentication: Configurable API key authentication with proper security practices
Error Handling: Comprehensive error handling with appropriate HTTP status codes
Logging and Metrics: Integrated logging and metrics collection throughout
Configuration Management: Flexible configuration with file-based and default options
Health Monitoring: Built-in health checks for service monitoring
Graceful Shutdown: Proper handling of shutdown signals and cleanup
OpenAPI Compliance: Automatic API documentation generation through FastAPI
Extensibility: Modular design makes it easy to add new endpoints or functionality

This service layer provides a robust, production-ready API for interacting with the semiconductor RAG engine, with appropriate security, monitoring, and domain-specific functionality.

RAG Engine Service Architecture with REST API

I'll create a comprehensive Mermaid diagram showing the architecture and information flow for this RAG Engine Service with REST API.

Information Flow for API Requests

API Endpoints Detail

Endpoint	Method	Purpose	Input	Output
`/query`	POST	Process a standard RAG query	`QueryRequest` with query text	RAG response with answer and sources
`/chat`	POST	Chat completion with conversation history	`ChatRequest` with message history	OpenAI-compatible chat response
`/analyze`	POST	Analyze semiconductor data	`AnalysisRequest` with data and analysis type	Analysis results with technical insights
`/troubleshoot`	POST	Troubleshoot process issues	`TroubleshootingRequest` with symptoms and context	Root cause analysis and solutions
`/optimize`	POST	Optimize process parameters	`OptimizationRequest` with metrics and constraints	Optimization recommendations
`/knowledge/search`	GET	Direct knowledge base search	Query string and collection filters	Raw search results without LLM generation
`/health`	GET	Service health status	None	Health status of RAG engine and dependencies
`/metrics`	GET	Service performance metrics	None	Operational metrics and statistics

Security Architecture

Key Architectural Features

RESTful API Design:
- Standard HTTP methods and status codes
- JSON request/response format
- Comprehensive error handling
- OpenAPI documentation automatically generated by FastAPI
Authentication & Security:
- Optional API key authentication
- CORS enabled for cross-origin requests
- Input validation using Pydantic models
- Secure dependency injection for auth
Specialized Semiconductor Endpoints:
- Domain-specific endpoints for analysis, troubleshooting, and optimization
- Structured input formats for technical queries
- Context-aware processing for manufacturing scenarios
Service Management:
- Health checks for all dependencies
- Metrics endpoint for monitoring
- Graceful startup and shutdown procedures
- Configuration management with YAML files
Integration Capabilities:
- Compatible with OpenAI's chat format
- Support for streaming responses (SSE)
- Direct access to knowledge base without generation
- Flexible context passing for complex workflows

This architecture provides a robust, scalable foundation for deploying RAG capabilities in semiconductor manufacturing environments, with appropriate security, monitoring, and domain-specific functionality.

Excellent! You've shared the REST API service layer that exposes the previously defined RAG engine as a production-ready, semiconductor-focused AI backend using FastAPI.

Below is a comprehensive breakdown of this code, followed by a Mermaid architecture diagram showing the full system with information flow, components, and endpoints.

🧠 Semiconductor RAG Engine API – Full Analysis

This service wraps the SemiconductorRAGEngine from rag_manager.py into a RESTful microservice with:

Multiple domain-specific endpoints
Authentication support
Health checks
Configurable behavior
Streaming capability (planned)
Observability & metrics

🔧 Key Components Overview

Component	Purpose
`FastAPI`	Modern Python web framework for building APIs with automatic OpenAPI/Swagger docs
`RAGConfig`, `SemiconductorRAGEngine`	Reused from core engine
`Pydantic Models`	Define request/response schemas
`HTTPBearer`	Optional API key authentication
`YAML Config Loader`	External configuration via `config/rag_config.yaml`
`Uvicorn`	ASGI server for async performance
`SSE` (planned)	Server-Sent Events for streaming responses

📦 Configuration: `load_config()` & `RAGConfig`

🔹 `config/rag_config.yaml` (Example)

rag:
  vector_db_url: http://vector-db:8091
  llm_service_url: http://llm-service:8092
  max_context_length: 4000
  top_k_documents: 10
  similarity_threshold: 0.7
  default_model: llama2-7b-semiconductor
  max_tokens: 1024
  temperature: 0.3

security:
  enable_auth: true
  api_keys:
    - "semiconductor_api_key_123"

✅ Allows external configuration without code changes.

🔐 Authentication: `verify_api_key()`

Uses HTTPBearer scheme (Authorization: Bearer <api_key>).
If security.enable_auth == True, validates against api_keys list.
Disabled by default if not configured.

🔒 Optional but production-ready security model.

🚀 Lifecycle: Startup & Shutdown

`@app.on_event("startup")`

Loads config
Initializes RAGConfig
Creates singleton SemiconductorRAGEngine via get_rag_engine()
Logs startup success or exits on failure

`@app.on_event("shutdown")`

Safely shuts down the RAG engine (closes HTTP clients)
Logs shutdown event

✅ Ensures graceful boot and cleanup.

📡 API Endpoints Summary

Endpoint	Method	Purpose
`GET /health`	Health check	System status and dependent services
`POST /query`	Core RAG	General Q&A with optional config override
`POST /chat`	Chat interface	Conversational AI with history
`POST /analyze`	Domain-specific	Analyze process/test/yield data
`POST /troubleshoot`	Domain-specific	Root cause + solutions for issues
`POST /optimize`	Domain-specific	Process improvement recommendations
`GET /knowledge/search`	Retrieval-only	Raw semantic search (no LLM)
`GET /metrics`	Monitoring	Service-level metrics (stubbed for Prometheus)

📥 Request Models (Pydantic)

1. `QueryRequest`

{
  "query": "Why is CD uniformity poor?",
  "context": { "step": "lithography" },
  "max_tokens": 512,
  "temperature": 0.5,
  "include_sources": true,
  "stream": false
}

Used for general queries with optional overrides.

2. `ChatRequest`

{
  "messages": [
    { "role": "user", "content": "What causes overlay drift?" },
    { "role": "assistant", "content": "Thermal expansion..." },
    { "role": "user", "content": "How do I fix it?" }
  ],
  "context": {},
  "stream": false
}

Supports multi-turn conversations. Extracts last user message as query.

3. `AnalysisRequest`

{
  "data_type": "process_data",
  "data": { "temp": 120, "pressure": 50 },
  "analysis_type": "anomaly",
  "context": "After chamber cleaning"
}

Tailored for data-driven analysis (trend, correlation, anomaly, summary).

4. `TroubleshootingRequest`

{
  "issue_description": "High defect count",
  "symptoms": ["particles", "scratches"],
  "equipment_id": "ETCH-007",
  "process_step": "plasma etch",
  "recent_changes": ["new gas line installed"]
}

Guides structured troubleshooting with root cause analysis.

5. `OptimizationRequest`

{
  "target_metric": "yield",
  "current_performance": { "current_yield": 92.1 },
  "constraints": { "max_temp": 150 },
  "process_parameters": { "power": 800, "pressure": 45 }
}

For process optimization with constraints and justification.

📤 Response Format Example (`/query`)

{
  "answer": "Poor CD uniformity can result from...",
  "confidence": 0.87,
  "response_time_ms": 1420,
  "tokens_generated": 234,
  "model_used": "llama2-7b-semiconductor",
  "query_intent": {
    "type": "analysis",
    "confidence": 0.91,
    "entities": ["CD", "nm"],
    "process_modules": ["lithography"],
    "equipment_types": []
  },
  "sources": [
    {
      "source": "litho_handbook_v3.pdf",
      "collection": "technical_reports",
      "score": 0.88,
      "content": "Critical dimension variation..."
    }
  ]
}

✅ Rich, structured, traceable output.

🛠️ Special Endpoints

`GET /knowledge/search`

Bypasses LLM — returns raw search results only.

Useful for:

Building search UIs
Debugging retrieval
Integrating with other systems

`GET /metrics`

Stubbed for future Prometheus integration.

Could expose:

rag_queries_total
rag_response_time_seconds_bucket
rag_confidence_avg

🧱 Architecture Diagram (Mermaid)

Here’s the complete system architecture with all components and data flow.

🔁 Information Flow Example: `/troubleshoot`

User → POST /troubleshoot with issue details
FastAPI validates input using TroubleshootingRequest
Auth checks API key
Service builds a natural language query from structured fields
RAG Engine:
- Classifies intent → troubleshooting
- Searches failure_analysis, equipment_manuals, etc.
- Builds context
- Sends to LLM → generates troubleshooting steps
Response includes:
- Guidance
- Sources
- Confidence
- Timing
Logs & metrics recorded

🎯 Turns structured inputs into actionable expert advice.

🌐 Auto-Generated API Docs

FastAPI automatically provides:

Swagger UI: http://localhost:8093/docs
ReDoc: http://localhost:8093/redoc

Includes:

Interactive testing
Schema definitions
Example payloads
Status codes

📚 Instant developer documentation.

🛡️ Error Handling & Resilience

Failure Mode	Handling
RAG engine not ready	`503 Service Unavailable`
Invalid request	`400 Bad Request`
Missing auth	`401 Unauthorized`
Internal error	`500 Internal Server Error` + log
Service down	Health check reflects degraded status

✅ Robust and observable.

🚀 Deployment Notes

Run with:

python rag_service.py

Exposes:

Port 8093: Main API
Endpoints:
- GET /health → Monitoring
- GET /docs → Swagger UI

Suggested Docker Setup:

FROM python:3.10
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "rag_service.py"]

Deployable in Kubernetes, Docker, or bare metal.

✅ Summary: Key Features

Feature	Benefit
Domain-Specific Endpoints	Tailored for semiconductor engineers
Unified RAG Backend	Single engine powers chat, analysis, troubleshooting
Configurable via YAML	No rebuild needed for endpoint tuning
Secure (Optional)	API key protection
Observable	Logging, metrics, health checks
Extensible	Easy to add new endpoints (e.g., `/predict_yield`)
Production-Ready	Async, scalable, resilient

📈 Future Enhancements

Idea	Description
Streaming Responses	Use `StreamingResponse` or `EventSourceResponse` for real-time LLM output
Prometheus Integration	Export metrics for monitoring dashboards
Rate Limiting	Prevent abuse using `fastapi-limiter`
Caching	Cache frequent queries with Redis
Feedback Endpoint	Allow users to rate responses for improvement
Multi-Tenant Support	Isolate knowledge by fab or product line

🏁 Conclusion

This rag_service.py file transforms the core RAG engine into a full-fledged AI microservice tailored for semiconductor manufacturing.

It enables:

Engineers to ask questions in natural language
Systems to analyze data and recommend actions
Support teams to troubleshoot faster
Process engineers to optimize yield

With clean separation of concerns, strong typing, async support, and excellent observability, this is a model implementation of an enterprise RAG service.

1. Pydantic Request/Response Models

QueryRequest

ChatMessage & ChatRequest

AnalysisRequest

TroubleshootingRequest

OptimizationRequest

HealthResponse

2. Configuration Management

load_config()

verify_api_key()

3. Service Lifecycle Management

initialize_services()

startup_event()

shutdown_event()

signal_handler()

4. API Endpoint Handlers

health_check() - GET /health

process_query() - POST /query

chat_completion() - POST /chat

analyze_data() - POST /analyze

troubleshoot_issue() - POST /troubleshoot

optimize_process() - POST /optimize

search_knowledge_base() - GET /knowledge/search

get_metrics() - GET /metrics

5. Main Application Setup

FastAPI App Configuration

main()

Key Architectural Features

RAG Engine Service Architecture with REST API

Information Flow for API Requests

API Endpoints Detail

Security Architecture

Key Architectural Features

🧠 Semiconductor RAG Engine API – Full Analysis

🔧 Key Components Overview

📦 Configuration: load_config() & RAGConfig

🔹 config/rag_config.yaml (Example)

🔐 Authentication: verify_api_key()

🚀 Lifecycle: Startup & Shutdown

@app.on_event("startup")

@app.on_event("shutdown")

📡 API Endpoints Summary

📥 Request Models (Pydantic)

1. QueryRequest

2. ChatRequest

3. AnalysisRequest

4. TroubleshootingRequest

5. OptimizationRequest

📤 Response Format Example (/query)

🛠️ Special Endpoints

GET /knowledge/search

GET /metrics

🧱 Architecture Diagram (Mermaid)

🔁 Information Flow Example: /troubleshoot

🌐 Auto-Generated API Docs

🛡️ Error Handling & Resilience

🚀 Deployment Notes

Run with:

Exposes:

Suggested Docker Setup:

✅ Summary: Key Features

📈 Future Enhancements

🏁 Conclusion

📦 Configuration: `load_config()` & `RAGConfig`

🔹 `config/rag_config.yaml` (Example)

🔐 Authentication: `verify_api_key()`

`@app.on_event("startup")`

`@app.on_event("shutdown")`

1. `QueryRequest`

2. `ChatRequest`

3. `AnalysisRequest`

4. `TroubleshootingRequest`

5. `OptimizationRequest`

📤 Response Format Example (`/query`)

`GET /knowledge/search`

`GET /metrics`

🔁 Information Flow Example: `/troubleshoot`