Data Preparation
DeepTutor provides demo knowledge bases and sample questions to help you get started quickly.
Demo Knowledge Bases
We provide two pre-built knowledge bases on Google Drive:
1. Research Papers Collection
A curated collection of cutting-edge research papers from our lab, covering RAG and Agent fields.
Included Papers:
- AI-Researcher - Automated research paper generation
- AutoAgent - Autonomous agent framework
- RAG-Anything - Multimodal RAG system
- LightRAG - Simple and fast RAG
- VideoRAG - Video understanding with RAG
Best for: Research scenarios, broad knowledge coverage
2. Data Science Textbook
A comprehensive deep learning textbook from UC Berkeley.
Source: Deep Representation Learning Book
Topics Covered:
- Neural Network Fundamentals
- Representation Learning
- Deep Learning Architectures
- Advanced Topics
Best for: Learning scenarios, deep knowledge depth
Download & Setup
Step 1: Download
Visit our Google Drive folder and download:
knowledge_bases.zip- Pre-built knowledge bases with embeddingsquestions.zip- Sample questions and usage examples (optional)
Step 2: Extract
Extract the downloaded files into the data/ directory:
DeepTutor/
├── data/
│ └── knowledge_bases/
│ ├── research_papers/ # Research papers KB
│ ├── data_science_book/ # Textbook KB
│ └── kb_config.json # Knowledge base config
└── user/ # User data (auto-created)Step 3: Verify
After extracting, your knowledge bases will be automatically available when you start DeepTutor.
Embedding Compatibility
Our demo knowledge bases use text-embedding-3-large with dimensions = 3072.
If your embedding model has different dimensions, you'll need to create your own knowledge base instead.
Creating Custom Knowledge Bases
Supported File Formats
| Format | Extension | Notes |
|---|---|---|
.pdf | Supports text extraction and layout analysis | |
| Text | .txt | Plain text files |
| Markdown | .md | Markdown with formatting support |
Via Web Interface
- Navigate to
http://localhost:3782/knowledge - Click "New Knowledge Base"
- Enter a unique name for your knowledge base
- Upload your documents (single or batch upload)
- Wait for processing to complete
Processing Time
- Small documents (< 10 pages): ~1 minute
- Medium documents (10-100 pages): ~5-10 minutes
- Large documents (100+ pages): May take longer
Via Command Line
# Initialize a new knowledge base with documents
python -m src.knowledge.start_kb init <kb_name> --docs <pdf_path>
# Add documents to existing knowledge base
python -m src.knowledge.add_documents <kb_name> --docs <new_document.pdf>Data Storage Structure
All user data is stored in the data/ directory:
data/
├── knowledge_bases/ # Knowledge base storage
│ ├── <kb_name>/
│ │ ├── documents/ # Original documents
│ │ ├── chunks/ # Chunked content
│ │ ├── embeddings/ # Vector embeddings
│ │ └── graph/ # Knowledge graph data
└── user/ # User activity data
├── solve/ # Problem solving results
├── question/ # Generated questions
├── research/ # Research reports
├── notebook/ # Notebook records
└── logs/ # System logsNext Step: Local Installation →
