We're currently undergoing massive updates and changes to the core infrastructure of SystemSculpt. Website and plugin updates are occurring daily.

Document Processing

6 min read

Extract text from PDFs, Word docs, Excel sheets, and PowerPoints directly into Obsidian

Process documents in 30 seconds

  1. Drag any document into chat or note
  2. Choose extraction mode (full text, summary, key points)
  3. Get markdown ready to edit, link, and search

Turn locked content into living notes!

What document processing does for you

  • Research papers: PDF → Searchable notes with AI summaries
  • Meeting docs: Word files → Action items and key decisions
  • Data tables: Excel → Markdown tables for analysis
  • Presentations: PowerPoint → Structured notes with slide content

Supported formats

What you can process

FormatExtensionsExtracts
PDF.pdfText, structure, pages
Word.docx, .docText, tables, formatting
Excel.xlsx, .xlsAll sheets, tables, values
PowerPoint.pptx, .pptSlides, speaker notes
Text.txt, .rtf, .csvDirect content

Format details

PDF capabilities:

  • Multi-page documents
  • Preserves headings
  • Maintains paragraphs
  • Tables when possible
  • Note: Scanned PDFs need OCR first

Word features:

  • Bold and italic preserved
  • Lists maintained
  • Tables converted
  • Basic structure kept

Excel conversion:

Markdown
| Product | Q1 | Q2 | Q3 |
|---------|----|----|----| 
| Widget  | 100| 150| 200|

PowerPoint structure:

Markdown
## Slide 1: Title
Content...

## Slide 2: Main Points
- Bullet point
- Another point

Speaker Notes: Remember to emphasize...

How to process

Method 1: Drag and drop

Fastest approach:

  1. Drag document into chat/note
  2. SystemSculpt detects format
  3. Choose processing option
  4. Done!

Method 2: Command palette

Cmd/Ctrl + P → "Process Document"
→ Select file → Choose options

Method 3: Right-click

File Explorer → Right-click document
→ "Process with SystemSculpt"

Processing options

Extraction modes

ModeWhat you getBest for
Full TextComplete contentArchives, reference
SummaryAI-condensed versionQuick review
Key PointsMain ideas onlyBusy professionals
StructuredHeadings preservedStudy notes

Output options

Where to put extracted text:

  • Create new note
  • Insert in current note
  • Copy to clipboard
  • Add to chat context

Custom prompts

Extract specific content:

"Extract only the methodology section"
"Find all dates and deadlines"
"Get financial data as tables"
"List all action items mentioned"

Real-world workflows

Research workflow

PDF papers → Knowledge base:

  1. Collect PDFs in folder
  2. Select all → Process batch
  3. Choose "Structured + Summary"
  4. Each becomes searchable note
  5. AI helps synthesize findings

Example result:

Markdown
# Paper Title - Smith et al. 2024

## Summary
[AI-generated overview]

## Introduction
[Extracted text...]

## Methods
[Extracted text...]

## Key Findings
- Finding 1
- Finding 2

## My Notes
- [ ] Follow up on methodology
- [ ] Compare with Jones 2023

Meeting documentation

Agenda → Minutes workflow:

  1. Process agenda (Word/PDF)
  2. Add notes during meeting
  3. Process minutes after
  4. AI extracts all action items

Financial reports

Excel → Analysis:

  1. Drop spreadsheet
  2. Get markdown tables
  3. Ask AI to:
    • Calculate trends
    • Identify anomalies
    • Create summaries
    • Generate insights

Course materials

Slides → Study notes:

  1. Process all PowerPoints
  2. Organize by lecture
  3. AI creates:
    • Study guides
    • Practice questions
    • Key concepts list
    • Flashcards

Advanced techniques

Batch processing

Multiple files at once:

Select 10 PDFs → Right-click 
→ Process all → Creates 10 notes

Consistent naming:

  • Original: Report_Q1_2024.pdf
  • Output: Report_Q1_2024 - Extracted.md

Combine with AI

After extraction:

  1. Drop into chat
  2. Ask questions:
    • "Summarize the main arguments"
    • "What actions are required?"
    • "Compare with [other doc]"
    • "Create outline for presentation"

Smart organization

Documents/
├── Original/
│   ├── contract-v1.pdf
│   └── contract-v2.pdf
├── Extracted/
│   ├── contract-v1.md
│   └── contract-v2.md
└── Analysis/
    └── contract-comparison.md

Tips for best results

Before processing

Check file:

  • Under 50MB size
  • Has selectable text (PDFs)
  • Not corrupted
  • Supported format

Prepare docs:

  • OCR scanned PDFs first
  • Save Word as .docx
  • Clean up Excel data
  • Add speaker notes to slides

Quality optimization

PDFs:

  • Text-based, not scanned
  • Clear formatting
  • Avoid complex layouts

Word docs:

  • Use styles for structure
  • Clean formatting
  • Avoid track changes

Excel sheets:

  • Clear headers
  • Simple table structure
  • Remove formulas
  • Delete empty rows

PowerPoints:

  • Include speaker notes
  • Use slide titles
  • Keep text in shapes
  • Avoid heavy graphics

Limitations & workarounds

What's not extracted

ContentWorkaround
ImagesAdd descriptions manually
ChartsScreenshot separately
FormulasShows results only
MacrosIgnored
CommentsNot included

Size limits

  • Max file: 50MB
  • Recommended: Under 10MB
  • Large files: Split first or process in sections

Format issues

"Can't extract text":

  • PDF might be scanned
  • Run OCR first
  • Try different tool

"Formatting lost":

  • Complex layouts simplify
  • Manual cleanup needed
  • Focus on content

Integration examples

With templates

Markdown
## Document: {{filename}}
Processed: {{date}}
Type: {{document-type}}

### Extracted Content
<!-- Extraction appears here -->

### AI Analysis
- Summary:
- Key points:
- Actions needed:

### Related
- [[Original file]]
- [[Project notes]]

With embeddings

  1. Process documents
  2. Embeddings index content
  3. Semantic search finds everything
  4. Connect related documents

With chat

Research assistant:

You: [Drop 5 PDFs into chat]
You: "Compare the methodologies"
AI: [Analyzes all 5 documents]

You: "Create literature review outline"
AI: [Generates comprehensive outline]

Privacy & security

  • Secure upload: Encrypted transmission
  • No storage: Processed and deleted
  • Your data: Never used for training
  • Quick process: Usually under 1 minute

Troubleshooting

IssueFix
"Processing failed"Check format, size, connection
"No text found"PDF needs OCR, file empty
"Takes too long"Large file, try smaller
"Wrong content"Check extraction mode

Next steps


📄 Start simple: Try with a small PDF first, then tackle your document backlog!