RAG Data Prep
Transform PDFs into vector-DB-ready chunks. Structure detection, smart chunking, PII scrubbing — all in your browser.
Free plan: 5 exports/dayUpgrade to Pro
Drop a PDF here or click to browse
Your file stays in browser memory. Nothing is uploaded to any server.
Loading AI model for enhanced PII detection...
Your Files Never Leave Your DeviceVerified
All processing happens entirely in your browser. Your files are never uploaded to our servers. Want proof? Turn off your WiFi or enable Airplane Mode—our tools will still work perfectly.
How It Works
1
Upload PDF
Drop a document for RAG preparation
2
Configure
Choose chunking strategy and options
3
Export
Download JSONL for your vector database
Frequently Asked Questions
What is RAG data prep?+
RAG (Retrieval-Augmented Generation) requires documents to be split into chunks and stored in a vector database. This tool automates the entire pipeline: parsing PDFs, detecting structure, smart chunking, and exporting in formats compatible with Pinecone, Weaviate, Qdrant, and ChromaDB.
Does my document leave my browser?+
No. All processing — text extraction, structure detection, PII scrubbing, and chunking — happens entirely in your browser. The exported JSONL file is generated locally. No data is sent to any server.
Which chunking strategy should I use?+
Document Structure (default) works best for most documents. It groups headings with their body text and keeps tables intact. Use Semantic for well-structured documents with clear sections. Use Sentence for narrative text like reports or correspondence.
What is PII scrubbing?+
PII (Personally Identifiable Information) scrubbing detects and replaces sensitive data like names, emails, phone numbers, and SSNs with tokens before export. This is critical for compliance when building RAG systems over sensitive documents.