PDF to markdown without images?

Viewed 14

I'm looking for a way to transform large PDF or epub files to markdown without images. Does anybody has a workflow for Marked 3 to accomplish this? I can turn a PDF into a Word file and then open as markdown in Marked, but that leaves the images intact. Anybody an idea on how to go about?

2 Answers

Thanks, Brett, I'll check these out. And what about converting docx to markdown without images through Marked? Is that yet possible by using a certain style?
By the way, the thing is a want to keep the original structure of the text as much as possible. Extracting the text out of the PDF directly will a make a jumble of paragraphs and headers, therefore f.e. Preview is no option.

There's no option to exclude images, you'd have to manually remove them. Even my PDF->Markdown importer still tries to convert images. I don't think it makes sense to most people to have an option to exclude images, I'm not sure when that would be useful. Care to elaborate?

Sure. I want to use the markdown files in Claude Co-pilot in order to optimize speed and token usage. I know more people are looking for ways to create ‘clean’ md files to feed their systems. And there will be more of them when usage proliferates and/or prices go up. So, I think there is certainly a demand a tool to easily create minimal md files from image heavy pdf’s or rich text files.