Document Processing - SystemSculpt Documentation | Documentation

Extract text from PDFs, Word docs, Excel sheets, and PowerPoints directly into Obsidian

Process documents in 30 seconds

Drag any document into chat or note
Choose extraction mode (full text, summary, key points)
Get markdown ready to edit, link, and search

Turn locked content into living notes!

What document processing does for you

Research papers: PDF → Searchable notes with AI summaries
Meeting docs: Word files → Action items and key decisions
Data tables: Excel → Markdown tables for analysis
Presentations: PowerPoint → Structured notes with slide content

Supported formats

What you can process

Format	Extensions	Extracts
PDF	.pdf	Text, structure, pages
Word	.docx, .doc	Text, tables, formatting
Excel	.xlsx, .xls	All sheets, tables, values
PowerPoint	.pptx, .ppt	Slides, speaker notes
Text	.txt, .rtf, .csv	Direct content

Format details

PDF capabilities:

Multi-page documents
Preserves headings
Maintains paragraphs
Tables when possible
Note: Scanned PDFs need OCR first

Word features:

Bold and italic preserved
Lists maintained
Tables converted
Basic structure kept

Excel conversion:

Markdown
| Product | Q1 | Q2 | Q3 |
|---------|----|----|----| 
| Widget  | 100| 150| 200|

PowerPoint structure:

Markdown
## Slide 1: Title
Content...

## Slide 2: Main Points
- Bullet point
- Another point

Speaker Notes: Remember to emphasize...

How to process

Method 1: Drag and drop

Fastest approach:

Drag document into chat/note
SystemSculpt detects format
Choose processing option
Done!

Method 2: Command palette

Cmd/Ctrl + P → "Process Document"
→ Select file → Choose options

Method 3: Right-click

File Explorer → Right-click document
→ "Process with SystemSculpt"

Processing options

Extraction modes

Mode	What you get	Best for
Full Text	Complete content	Archives, reference
Summary	AI-condensed version	Quick review
Key Points	Main ideas only	Busy professionals
Structured	Headings preserved	Study notes

Output options

Where to put extracted text:

Create new note
Insert in current note
Copy to clipboard
Add to chat context

Custom prompts

Extract specific content:

"Extract only the methodology section"
"Find all dates and deadlines"
"Get financial data as tables"
"List all action items mentioned"

Real-world workflows

Research workflow

PDF papers → Knowledge base:

Collect PDFs in folder
Select all → Process batch
Choose "Structured + Summary"
Each becomes searchable note
AI helps synthesize findings

Example result:

Markdown
# Paper Title - Smith et al. 2024

## Summary
[AI-generated overview]

## Introduction
[Extracted text...]

## Methods
[Extracted text...]

## Key Findings
- Finding 1
- Finding 2

## My Notes
- [ ] Follow up on methodology
- [ ] Compare with Jones 2023

Meeting documentation

Agenda → Minutes workflow:

Process agenda (Word/PDF)
Add notes during meeting
Process minutes after
AI extracts all action items

Financial reports

Excel → Analysis:

Drop spreadsheet
Get markdown tables
Ask AI to:
- Calculate trends
- Identify anomalies
- Create summaries
- Generate insights

Course materials

Slides → Study notes:

Process all PowerPoints
Organize by lecture
AI creates:
- Study guides
- Practice questions
- Key concepts list
- Flashcards

Advanced techniques

Batch processing

Multiple files at once:

Select 10 PDFs → Right-click 
→ Process all → Creates 10 notes

Consistent naming:

Original: Report_Q1_2024.pdf
Output: Report_Q1_2024 - Extracted.md

Combine with AI

After extraction:

Drop into chat
Ask questions:
- "Summarize the main arguments"
- "What actions are required?"
- "Compare with [other doc]"
- "Create outline for presentation"

Smart organization

Documents/
├── Original/
│   ├── contract-v1.pdf
│   └── contract-v2.pdf
├── Extracted/
│   ├── contract-v1.md
│   └── contract-v2.md
└── Analysis/
    └── contract-comparison.md

Tips for best results

Before processing

✅ Check file:

Under 50MB size
Has selectable text (PDFs)
Not corrupted
Supported format

✅ Prepare docs:

OCR scanned PDFs first
Save Word as .docx
Clean up Excel data
Add speaker notes to slides

Quality optimization

PDFs:

Text-based, not scanned
Clear formatting
Avoid complex layouts

Word docs:

Use styles for structure
Clean formatting
Avoid track changes

Excel sheets:

Clear headers
Simple table structure
Remove formulas
Delete empty rows

PowerPoints:

Include speaker notes
Use slide titles
Keep text in shapes
Avoid heavy graphics

Limitations & workarounds

What's not extracted

Content	Workaround
Images	Add descriptions manually
Charts	Screenshot separately
Formulas	Shows results only
Macros	Ignored
Comments	Not included

Size limits

Max file: 50MB
Recommended: Under 10MB
Large files: Split first or process in sections

Format issues

"Can't extract text":

PDF might be scanned
Run OCR first
Try different tool

"Formatting lost":

Complex layouts simplify
Manual cleanup needed
Focus on content

Integration examples

With templates

Markdown
## Document: {{filename}}
Processed: {{date}}
Type: {{document-type}}

### Extracted Content
<!-- Extraction appears here -->

### AI Analysis
- Summary:
- Key points:
- Actions needed:

### Related
- [[Original file]]
- [[Project notes]]

With embeddings

Process documents
Embeddings index content
Semantic search finds everything
Connect related documents

With chat

Research assistant:

You: [Drop 5 PDFs into chat]
You: "Compare the methodologies"
AI: [Analyzes all 5 documents]

You: "Create literature review outline"
AI: [Generates comprehensive outline]

Privacy & security

Secure upload: Encrypted transmission
No storage: Processed and deleted
Your data: Never used for training
Quick process: Usually under 1 minute

Troubleshooting

Issue	Fix
"Processing failed"	Check format, size, connection
"No text found"	PDF needs OCR, file empty
"Takes too long"	Large file, try smaller
"Wrong content"	Check extraction mode

Next steps

Audio Features - Transcribe recordings too
Premium Overview - All premium benefits
Try it: Drag a PDF into Obsidian now!

📄 Start simple: Try with a small PDF first, then tackle your document backlog!