RAG Data Prep

Transform PDFs into vector-DB-ready chunks. Structure detection, smart chunking, PII scrubbing — all in your browser.

Free plan: 5 exports/dayUpgrade to Pro

Drop a PDF here or click to browse

Your file stays in browser memory. Nothing is uploaded to any server.

Loading AI model for enhanced PII detection...

Your Files Never Leave Your DeviceVerified

All processing happens entirely in your browser. Your files are never uploaded to our servers. Want proof? Turn off your WiFi or enable Airplane Mode—our tools will still work perfectly.

How It Works

Upload PDF

Drop a document for RAG preparation

Configure

Choose chunking strategy and options

Export

Download JSONL for your vector database

Frequently Asked Questions

What is RAG data prep?+

RAG (Retrieval-Augmented Generation) requires documents to be split into chunks and stored in a vector database. This tool automates the entire pipeline: parsing PDFs, detecting structure, smart chunking, and exporting in formats compatible with Pinecone, Weaviate, Qdrant, and ChromaDB.

Does my document leave my browser?+

No. All processing — text extraction, structure detection, PII scrubbing, and chunking — happens entirely in your browser. The exported JSONL file is generated locally. No data is sent to any server.

Which chunking strategy should I use?+

Document Structure (default) works best for most documents. It groups headings with their body text and keeps tables intact. Use Semantic for well-structured documents with clear sections. Use Sentence for narrative text like reports or correspondence.

What is PII scrubbing?+

PII (Personally Identifiable Information) scrubbing detects and replaces sensitive data like names, emails, phone numbers, and SSNs with tokens before export. This is critical for compliance when building RAG systems over sensitive documents.