Go to file

AI Station Server bffd9aa249 Fix: Smart chunking per Excel e Hybrid Search funzionante		2025-12-30 17:06:13 +01:00
.chainlit	Stable: Upgrade to Chainlit 2.8.3 complete. Fixed DB schema, OAuth, and Storage.	2025-12-29 16:24:49 +01:00
.chainlit_backup_v1	Stable: Upgrade to Chainlit 2.8.3 complete. Fixed DB schema, OAuth, and Storage.	2025-12-29 16:24:49 +01:00
.files	implementazione BGE-M3 Dense	2025-12-30 08:51:10 +01:00
.venv	Fix: Revert to default theme for stability (v1.3.2)	2025-12-29 13:39:29 +01:00
__pycache__	stage1	2025-12-26 16:19:12 +01:00
public	Fix: Revert to default theme for stability (v1.3.2)	2025-12-29 13:39:29 +01:00
workspaces/admin	moficica app.py	2025-12-25 19:00:13 +01:00
.gitignore	postgress version connect	2025-12-26 13:14:27 +01:00
Dockerfile	new css	2025-12-29 06:50:06 +01:00
Marimo_Multi-User_Hub.md	Primo salvataggio dopo il ripristino	2025-12-25 15:54:33 +01:00
PROMPT_V2.md	new css	2025-12-29 06:50:06 +01:00
README.md	new css	2025-12-29 06:50:06 +01:00
SPEC.md	Primo salvataggio dopo il ripristino	2025-12-25 15:54:33 +01:00
app-final.py	feat: PDF RAG support with glm-4.6:cloud integration	2025-12-26 21:52:00 +01:00
app-oauth2.py	new css	2025-12-29 06:50:06 +01:00
app.py	Fix: Smart chunking per Excel e Hybrid Search funzionante	2025-12-30 17:06:13 +01:00
app.py.backup	new css	2025-12-29 06:50:06 +01:00
app.py.broken-082810	Fix: Revert to default theme for stability (v1.3.2)	2025-12-29 13:39:29 +01:00
app.py.broken-20251229-081214	Fix: Revert to default theme for stability (v1.3.2)	2025-12-29 13:39:29 +01:00
chainlit.md	Primo salvataggio dopo il ripristino	2025-12-25 15:54:33 +01:00
debugchainlit-app.txt	feat: PDF support with RAG integration	2025-12-26 20:38:43 +01:00
debugchanlit-app.txt	feat: PDF support with RAG integration	2025-12-26 20:38:43 +01:00
docker-compose.yml	Stable: Upgrade to Chainlit 2.8.3 complete. Fixed DB schema, OAuth, and Storage.	2025-12-29 16:24:49 +01:00
docker.logs	pre Oauth2	2025-12-26 17:48:51 +01:00
dockerignore	pre Oauth2	2025-12-26 17:48:51 +01:00
error.log	Fix: RAG implementation and connection fix	2025-12-26 08:45:40 +01:00
init_chainlit_db.py	Fix: Revert to default theme for stability (v1.3.2)	2025-12-29 13:39:29 +01:00
init_db.py	Stable: Upgrade to Chainlit 2.8.3 complete. Fixed DB schema, OAuth, and Storage.	2025-12-29 16:24:49 +01:00
note command.txt	Stable: Upgrade to Chainlit 2.8.3 complete. Fixed DB schema, OAuth, and Storage.	2025-12-29 16:24:49 +01:00
requirements-backup.txt	new css	2025-12-29 06:50:06 +01:00
requirements-oauth2.txt	new css	2025-12-29 06:50:06 +01:00
requirements.txt	implementazione BGE-M3 Dense	2025-12-30 08:51:10 +01:00

README.md

AI Station - Document Analysis Platform

📋 Overview

AI Station è una piattaforma di analisi documentale basata su AI che utilizza Retrieval-Augmented Generation (RAG) per analizzare PDF e documenti testuali con il modello GLM-4.6:Cloud.

Stack Tecnologico

Backend: Python + Chainlit (LLM UI framework)
LLM: GLM-4.6:Cloud (via Ollama Cloud)
Vector DB: Qdrant (semantic search)
PDF Processing: PyMuPDF (fitz)
Database: PostgreSQL + SQLAlchemy ORM
Containerization: Docker Compose
Embeddings: nomic-embed-text (via Ollama local)

🚀 Quick Start

Prerequisites

Docker & Docker Compose
Ollama installed locally (for embeddings)
Ollama Cloud account (for glm-4.6:cloud)

1️⃣ Clone & Setup

git clone git@github.com:your-username/ai-station.git
cd ai-station

# Configure environment
cat > .env << 'EOF'
DATABASE_URL=postgresql+asyncpg://ai_user:secure_password_here@postgres:5432/ai_station
OLLAMA_URL=http://192.168.1.243:11434
QDRANT_URL=http://qdrant:6333
EOF

2️⃣ Authenticate Ollama Cloud

ollama signin
# Follow the link to authenticate with your Ollama account

3️⃣ Start Services

docker compose up -d
docker compose logs -f chainlit-app

4️⃣ Access UI

Navigate to: http://localhost:8000

📁 Project Structure

ai-station/
├── app.py                 # Main Chainlit application
├── requirements.txt       # Python dependencies
├── docker-compose.yml     # Docker services config
├── .env                   # Environment variables (gitignored)
├── workspaces/           # User workspace directories
│   └── admin/            # Admin user files
└── README.md             # This file

🔧 Features

✅ Implemented

PDF Upload & Processing: Extract text from PDF documents using PyMuPDF
Document Indexing: Automatic chunking and semantic indexing via Qdrant
RAG Search: Retrieve relevant document chunks based on semantic similarity
Intelligent Analysis: GLM-4.6:Cloud analyzes documents with full context
Code Extraction: Automatically save Python code blocks from responses
Chat History: Persistent conversation storage via SQLAlchemy
Streaming Responses: Real-time token streaming via Chainlit

🔄 Workflow

User uploads PDF or TXT file
System extracts text and creates semantic chunks
Chunks indexed in Qdrant vector database
User asks questions about documents
RAG retrieves relevant chunks
GLM-4.6:Cloud analyzes with full context
Streaming response to user

📊 Technical Details

Document Processing Pipeline

PDF Upload
    ↓
PyMuPDF Text Extraction
    ↓
Text Chunking (1500 chars, 200 char overlap)
    ↓
nomic-embed-text Embeddings (Ollama local)
    ↓
Qdrant Vector Storage
    ↓
Semantic Search on User Query
    ↓
GLM-4.6:Cloud Analysis with RAG Context
    ↓
Chainlit Streaming Response

Key Functions

Function	Purpose
`extract_text_from_pdf()`	Convert PDF to text using PyMuPDF
`chunk_text()`	Split text into overlapping chunks
`get_embeddings()`	Generate embeddings via Ollama
`index_document()`	Store chunks in Qdrant
`search_qdrant()`	Retrieve relevant context
`on_message()`	Process user queries with RAG

🔐 Environment Variables

DATABASE_URL=postgresql+asyncpg://user:pass@postgres:5432/ai_station
OLLAMA_URL=http://192.168.1.243:11434          # Local Ollama for embeddings
QDRANT_URL=http://qdrant:6333                  # Vector database

Note: GLM-4.6:Cloud authentication is handled automatically via ollama signin

🐳 Docker Services

Service	Port	Purpose
`chainlit-app`	8000	Chainlit UI & API
`postgres`	5432	Conversation persistence
`qdrant`	6333	Vector database
`ollama`	11434	Local embeddings (external)

Start/Stop:

docker compose up -d      # Start all services
docker compose down       # Stop all services
docker compose logs -f    # View logs
docker compose restart    # Restart services

📝 Usage Examples

Example 1: Analyze Tax Document

User: "Qual è l'importo totale del documento?"
AI Station: 
  ✅ Extracts PDF content
  ✅ Searches relevant sections
  ✅ Analyzes with GLM-4.6:Cloud
  📄 Returns: "Based on the document, the total amount is..."

Example 2: Multi-Document Analysis

1. Upload multiple PDFs (invoices, contracts)
2. All documents automatically indexed
3. Query across all documents simultaneously
4. RAG retrieves most relevant chunks
5. GLM-4.6:Cloud synthesizes answer

🛠️ Development

Install Dependencies

pip install -r requirements.txt

Requirements

chainlit==1.3.2
pydantic==2.9.2
ollama>=0.1.0
asyncpg>=0.29.0
psycopg2-binary
qdrant-client>=1.10.0
sqlalchemy>=2.0.0
greenlet>=3.0.0
sniffio
aiohttp
alembic
pymupdf
python-dotenv

Local Testing (without Docker)

# Start Ollama, PostgreSQL, Qdrant manually
ollama serve &
chainlit run app.py

🔄 Model Details

GLM-4.6:Cloud

Provider: Zhipu AI via Ollama Cloud
Capabilities: Long context, reasoning, multilingual
Cost: Free tier available
Authentication: Device key (automatic via ollama signin)

nomic-embed-text

Local embedding model for chunking/retrieval
Dimensions: 768
Speed: Fast, runs locally
Used for: RAG semantic search

📈 Monitoring & Logs

Check Service Health

# View all logs
docker compose logs

# Follow live logs
docker compose logs -f chainlit-app

# Check specific container
docker inspect ai-station-chainlit-app

Common Issues

Issue	Solution
`unauthorized` error	Run `ollama signin` on server
Database connection failed	Check PostgreSQL is running
Qdrant unavailable	Verify `docker-compose up` completed
PDF not extracted	Ensure PyMuPDF installed: `pip install pymupdf`

🚀 Deployment

Production Checklist

Set secure PostgreSQL credentials in .env
Enable SSL/TLS for Chainlit endpoints
Configure CORS for frontend
Setup log aggregation (ELK, Datadog, etc.)
Implement rate limiting
Add API authentication
Configure backup strategy for Qdrant

Cloud Deployment Options

AWS: ECS + RDS + VectorDB
Google Cloud: Cloud Run + Cloud SQL
DigitalOcean: App Platform + Managed Databases

📚 API Reference

REST Endpoints (via Chainlit)

POST /api/chat - Send message with context
GET /api/threads - List conversations
POST /api/upload - Upload document

WebSocket

Real-time streaming responses via Chainlit protocol

🔮 Future Features

OAuth2 Google authentication
Document metadata extraction (dates, amounts, entities)
Advanced search filters (type, date range, language)
Export results (PDF, CSV, JSON)
Analytics dashboard
Multi-language support
Document versioning
Compliance reporting (GDPR, audit trails)

📞 Support

Troubleshooting

Check logs: docker compose logs chainlit-app
Verify Ollama authentication: ollama show glm-4.6:cloud
Test Qdrant connection: curl http://localhost:6333/health
Inspect PostgreSQL: docker compose exec postgres psql -U ai_user -d ai_station

Performance Tips

Increase chunk overlap for better context retrieval
Adjust embedding model based on latency requirements
Monitor Qdrant memory usage for large document sets
Implement caching for frequent queries

📄 License

MIT License - See LICENSE file

👤 Author

AI Station Team

Last Updated: December 26, 2025 Version: 1.0.0 Status: Production Ready ✅

README.md Unescape Escape