Go to file
AI Station Server bffd9aa249 Fix: Smart chunking per Excel e Hybrid Search funzionante 2025-12-30 17:06:13 +01:00
.chainlit Stable: Upgrade to Chainlit 2.8.3 complete. Fixed DB schema, OAuth, and Storage. 2025-12-29 16:24:49 +01:00
.chainlit_backup_v1 Stable: Upgrade to Chainlit 2.8.3 complete. Fixed DB schema, OAuth, and Storage. 2025-12-29 16:24:49 +01:00
.files implementazione BGE-M3 Dense 2025-12-30 08:51:10 +01:00
.venv Fix: Revert to default theme for stability (v1.3.2) 2025-12-29 13:39:29 +01:00
__pycache__ stage1 2025-12-26 16:19:12 +01:00
public Fix: Revert to default theme for stability (v1.3.2) 2025-12-29 13:39:29 +01:00
workspaces/admin moficica app.py 2025-12-25 19:00:13 +01:00
.gitignore postgress version connect 2025-12-26 13:14:27 +01:00
Dockerfile new css 2025-12-29 06:50:06 +01:00
Marimo_Multi-User_Hub.md Primo salvataggio dopo il ripristino 2025-12-25 15:54:33 +01:00
PROMPT_V2.md new css 2025-12-29 06:50:06 +01:00
README.md new css 2025-12-29 06:50:06 +01:00
SPEC.md Primo salvataggio dopo il ripristino 2025-12-25 15:54:33 +01:00
app-final.py feat: PDF RAG support with glm-4.6:cloud integration 2025-12-26 21:52:00 +01:00
app-oauth2.py new css 2025-12-29 06:50:06 +01:00
app.py Fix: Smart chunking per Excel e Hybrid Search funzionante 2025-12-30 17:06:13 +01:00
app.py.backup new css 2025-12-29 06:50:06 +01:00
app.py.broken-082810 Fix: Revert to default theme for stability (v1.3.2) 2025-12-29 13:39:29 +01:00
app.py.broken-20251229-081214 Fix: Revert to default theme for stability (v1.3.2) 2025-12-29 13:39:29 +01:00
chainlit.md Primo salvataggio dopo il ripristino 2025-12-25 15:54:33 +01:00
debugchainlit-app.txt feat: PDF support with RAG integration 2025-12-26 20:38:43 +01:00
debugchanlit-app.txt feat: PDF support with RAG integration 2025-12-26 20:38:43 +01:00
docker-compose.yml Stable: Upgrade to Chainlit 2.8.3 complete. Fixed DB schema, OAuth, and Storage. 2025-12-29 16:24:49 +01:00
docker.logs pre Oauth2 2025-12-26 17:48:51 +01:00
dockerignore pre Oauth2 2025-12-26 17:48:51 +01:00
error.log Fix: RAG implementation and connection fix 2025-12-26 08:45:40 +01:00
init_chainlit_db.py Fix: Revert to default theme for stability (v1.3.2) 2025-12-29 13:39:29 +01:00
init_db.py Stable: Upgrade to Chainlit 2.8.3 complete. Fixed DB schema, OAuth, and Storage. 2025-12-29 16:24:49 +01:00
note command.txt Stable: Upgrade to Chainlit 2.8.3 complete. Fixed DB schema, OAuth, and Storage. 2025-12-29 16:24:49 +01:00
requirements-backup.txt new css 2025-12-29 06:50:06 +01:00
requirements-oauth2.txt new css 2025-12-29 06:50:06 +01:00
requirements.txt implementazione BGE-M3 Dense 2025-12-30 08:51:10 +01:00

README.md

AI Station - Document Analysis Platform

📋 Overview

AI Station è una piattaforma di analisi documentale basata su AI che utilizza Retrieval-Augmented Generation (RAG) per analizzare PDF e documenti testuali con il modello GLM-4.6:Cloud.

Stack Tecnologico

  • Backend: Python + Chainlit (LLM UI framework)
  • LLM: GLM-4.6:Cloud (via Ollama Cloud)
  • Vector DB: Qdrant (semantic search)
  • PDF Processing: PyMuPDF (fitz)
  • Database: PostgreSQL + SQLAlchemy ORM
  • Containerization: Docker Compose
  • Embeddings: nomic-embed-text (via Ollama local)

🚀 Quick Start

Prerequisites

  • Docker & Docker Compose
  • Ollama installed locally (for embeddings)
  • Ollama Cloud account (for glm-4.6:cloud)

1 Clone & Setup

git clone git@github.com:your-username/ai-station.git
cd ai-station

# Configure environment
cat > .env << 'EOF'
DATABASE_URL=postgresql+asyncpg://ai_user:secure_password_here@postgres:5432/ai_station
OLLAMA_URL=http://192.168.1.243:11434
QDRANT_URL=http://qdrant:6333
EOF

2 Authenticate Ollama Cloud

ollama signin
# Follow the link to authenticate with your Ollama account

3 Start Services

docker compose up -d
docker compose logs -f chainlit-app

4 Access UI

Navigate to: http://localhost:8000


📁 Project Structure

ai-station/
├── app.py                 # Main Chainlit application
├── requirements.txt       # Python dependencies
├── docker-compose.yml     # Docker services config
├── .env                   # Environment variables (gitignored)
├── workspaces/           # User workspace directories
│   └── admin/            # Admin user files
└── README.md             # This file

🔧 Features

Implemented

  • PDF Upload & Processing: Extract text from PDF documents using PyMuPDF
  • Document Indexing: Automatic chunking and semantic indexing via Qdrant
  • RAG Search: Retrieve relevant document chunks based on semantic similarity
  • Intelligent Analysis: GLM-4.6:Cloud analyzes documents with full context
  • Code Extraction: Automatically save Python code blocks from responses
  • Chat History: Persistent conversation storage via SQLAlchemy
  • Streaming Responses: Real-time token streaming via Chainlit

🔄 Workflow

  1. User uploads PDF or TXT file
  2. System extracts text and creates semantic chunks
  3. Chunks indexed in Qdrant vector database
  4. User asks questions about documents
  5. RAG retrieves relevant chunks
  6. GLM-4.6:Cloud analyzes with full context
  7. Streaming response to user

📊 Technical Details

Document Processing Pipeline

PDF Upload
    ↓
PyMuPDF Text Extraction
    ↓
Text Chunking (1500 chars, 200 char overlap)
    ↓
nomic-embed-text Embeddings (Ollama local)
    ↓
Qdrant Vector Storage
    ↓
Semantic Search on User Query
    ↓
GLM-4.6:Cloud Analysis with RAG Context
    ↓
Chainlit Streaming Response

Key Functions

Function Purpose
extract_text_from_pdf() Convert PDF to text using PyMuPDF
chunk_text() Split text into overlapping chunks
get_embeddings() Generate embeddings via Ollama
index_document() Store chunks in Qdrant
search_qdrant() Retrieve relevant context
on_message() Process user queries with RAG

🔐 Environment Variables

DATABASE_URL=postgresql+asyncpg://user:pass@postgres:5432/ai_station
OLLAMA_URL=http://192.168.1.243:11434          # Local Ollama for embeddings
QDRANT_URL=http://qdrant:6333                  # Vector database

Note: GLM-4.6:Cloud authentication is handled automatically via ollama signin


🐳 Docker Services

Service Port Purpose
chainlit-app 8000 Chainlit UI & API
postgres 5432 Conversation persistence
qdrant 6333 Vector database
ollama 11434 Local embeddings (external)

Start/Stop:

docker compose up -d      # Start all services
docker compose down       # Stop all services
docker compose logs -f    # View logs
docker compose restart    # Restart services

📝 Usage Examples

Example 1: Analyze Tax Document

User: "Qual è l'importo totale del documento?"
AI Station: 
  ✅ Extracts PDF content
  ✅ Searches relevant sections
  ✅ Analyzes with GLM-4.6:Cloud
  📄 Returns: "Based on the document, the total amount is..."

Example 2: Multi-Document Analysis

1. Upload multiple PDFs (invoices, contracts)
2. All documents automatically indexed
3. Query across all documents simultaneously
4. RAG retrieves most relevant chunks
5. GLM-4.6:Cloud synthesizes answer

🛠️ Development

Install Dependencies

pip install -r requirements.txt

Requirements

chainlit==1.3.2
pydantic==2.9.2
ollama>=0.1.0
asyncpg>=0.29.0
psycopg2-binary
qdrant-client>=1.10.0
sqlalchemy>=2.0.0
greenlet>=3.0.0
sniffio
aiohttp
alembic
pymupdf
python-dotenv

Local Testing (without Docker)

# Start Ollama, PostgreSQL, Qdrant manually
ollama serve &
chainlit run app.py

🔄 Model Details

GLM-4.6:Cloud

  • Provider: Zhipu AI via Ollama Cloud
  • Capabilities: Long context, reasoning, multilingual
  • Cost: Free tier available
  • Authentication: Device key (automatic via ollama signin)

nomic-embed-text

  • Local embedding model for chunking/retrieval
  • Dimensions: 768
  • Speed: Fast, runs locally
  • Used for: RAG semantic search

📈 Monitoring & Logs

Check Service Health

# View all logs
docker compose logs

# Follow live logs
docker compose logs -f chainlit-app

# Check specific container
docker inspect ai-station-chainlit-app

Common Issues

Issue Solution
unauthorized error Run ollama signin on server
Database connection failed Check PostgreSQL is running
Qdrant unavailable Verify docker-compose up completed
PDF not extracted Ensure PyMuPDF installed: pip install pymupdf

🚀 Deployment

Production Checklist

  • Set secure PostgreSQL credentials in .env
  • Enable SSL/TLS for Chainlit endpoints
  • Configure CORS for frontend
  • Setup log aggregation (ELK, Datadog, etc.)
  • Implement rate limiting
  • Add API authentication
  • Configure backup strategy for Qdrant

Cloud Deployment Options

  • AWS: ECS + RDS + VectorDB
  • Google Cloud: Cloud Run + Cloud SQL
  • DigitalOcean: App Platform + Managed Databases

📚 API Reference

REST Endpoints (via Chainlit)

  • POST /api/chat - Send message with context
  • GET /api/threads - List conversations
  • POST /api/upload - Upload document

WebSocket

  • Real-time streaming responses via Chainlit protocol

🔮 Future Features

  • OAuth2 Google authentication
  • Document metadata extraction (dates, amounts, entities)
  • Advanced search filters (type, date range, language)
  • Export results (PDF, CSV, JSON)
  • Analytics dashboard
  • Multi-language support
  • Document versioning
  • Compliance reporting (GDPR, audit trails)

📞 Support

Troubleshooting

  1. Check logs: docker compose logs chainlit-app
  2. Verify Ollama authentication: ollama show glm-4.6:cloud
  3. Test Qdrant connection: curl http://localhost:6333/health
  4. Inspect PostgreSQL: docker compose exec postgres psql -U ai_user -d ai_station

Performance Tips

  • Increase chunk overlap for better context retrieval
  • Adjust embedding model based on latency requirements
  • Monitor Qdrant memory usage for large document sets
  • Implement caching for frequent queries

📄 License

MIT License - See LICENSE file

👤 Author

AI Station Team


Last Updated: December 26, 2025 Version: 1.0.0 Status: Production Ready