<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Ibm on </title>
    <link>https://augmentedresilience.com/tags/ibm/</link>
    <description>Recent content in Ibm on </description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Sat, 02 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://augmentedresilience.com/tags/ibm/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Upgrading My PDF Converter to IBM&#39;s Docling</title>
      <link>https://augmentedresilience.com/posts/augmented-resilience-posts/upgrading-my-pdf-converter-to-ibm-docling/</link>
      <pubDate>Sat, 02 May 2026 00:00:00 +0000</pubDate>
      
      <guid>https://augmentedresilience.com/posts/augmented-resilience-posts/upgrading-my-pdf-converter-to-ibm-docling/</guid>
      <description>&lt;h2 id=&#34;when-my-own-tool-couldnt-handle-my-work&#34;&gt;When My Own Tool Couldn&amp;rsquo;t Handle My Work&lt;/h2&gt;
&lt;p&gt;The error message was easy to dismiss: &lt;code&gt;RapidOCR returned empty result!&lt;/code&gt;. It appeared twice in the terminal, then silence — a blank .md file where a 40-page Oracle HCM implementation guide should have been. The PDF had come straight from Oracle&amp;rsquo;s support portal, the same format I use for every triage session. But this one stored its pages as images, and PyMuPDF4LLM had nothing to work with.&lt;/p&gt;</description>
      <content>&lt;h2 id=&#34;when-my-own-tool-couldnt-handle-my-work&#34;&gt;When My Own Tool Couldn&amp;rsquo;t Handle My Work&lt;/h2&gt;
&lt;p&gt;The error message was easy to dismiss: &lt;code&gt;RapidOCR returned empty result!&lt;/code&gt;. It appeared twice in the terminal, then silence — a blank .md file where a 40-page Oracle HCM implementation guide should have been. The PDF had come straight from Oracle&amp;rsquo;s support portal, the same format I use for every triage session. But this one stored its pages as images, and PyMuPDF4LLM had nothing to work with.&lt;/p&gt;
&lt;p&gt;That was one category of failure. The other was quieter. For documents that did convert, I started noticing the tables were wrong — not corrupted, just structurally dissolved. An eligibility matrix that should have had six clearly labeled columns came back as a run of loosely connected text. Useful for nothing.&lt;/p&gt;
&lt;p&gt;I had built this tool to serve my Oracle work. Then my Oracle work showed me exactly where it fell short.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;the-problem-with-pymupdf4llm&#34;&gt;The Problem with PyMuPDF4LLM&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;ve followed this series, you know that PyMuPDF4LLM was a solid choice when I first &lt;a href=&#34;https://augmentedresilience.com/posts/when-your-pdf-workflow-breaks-building-a-markdown-converter-with-claude-code/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;built the converter&lt;/a&gt;
. It handled text-based PDFs cleanly, installed without friction, and required almost no configuration. For research papers and simple documentation, it worked well.&lt;/p&gt;
&lt;p&gt;But Oracle HCM documentation is a different category of document. Oracle&amp;rsquo;s guides are dense with tables: configuration reference grids, eligibility matrices, step-and-action setup tables. These are not decorative — they carry most of the meaning. When PyMuPDF4LLM dissolved those tables into unstructured text, it was silently degrading the most important parts of the document.&lt;/p&gt;
&lt;p&gt;The image-based PDF problem was a hard wall. If a document was captured as page images rather than extractable text, the converter returned nothing. No partial output, no warning — just empty files.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;discovering-docling&#34;&gt;Discovering Docling&lt;/h2&gt;
&lt;p&gt;IBM Research Zurich&amp;rsquo;s AI for Knowledge team open-sourced &lt;a href=&#34;https://github.com/docling-project/docling&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Docling&lt;/a&gt;
 in July 2024. The project has a specific focus: turning complex documents into structured, AI-ready output. In April 2025, IBM donated it to the Linux Foundation AI &amp;amp; Data, and it now powers data ingestion for Red Hat Enterprise Linux AI. As of this writing it has over 24,000 GitHub stars.&lt;/p&gt;
&lt;p&gt;What makes Docling different is that it treats document conversion as a computer vision problem, not just a text extraction problem.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Layout analysis:&lt;/strong&gt; Docling uses an RT-DETR-derived model trained on DocLayNet — IBM&amp;rsquo;s human-annotated dataset of real-world documents — to detect and classify every region on the page: tables, figures, headers, footers, section titles, body text. It knows the structure before it extracts any content.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Table reconstruction:&lt;/strong&gt; This is where Docling earns its place for Oracle documentation. It uses a vision transformer called TableFormer that predicts row/column structure and header roles directly from the page image. The result is a proper Markdown table, not a stream of cell values.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Image-based PDFs:&lt;/strong&gt; For documents stored as page images, Docling integrates OCR into its pipeline natively. The same converter handles text-based and image-based PDFs without any changes on your end.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;the-switch&#34;&gt;The Switch&lt;/h2&gt;
&lt;p&gt;The API change was minimal. The old code:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; pymupdf4llm
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;md_text &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pymupdf4llm&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;to_markdown(pdf_path)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The new code:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; docling.document_converter &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; DocumentConverter
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;converter &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; DocumentConverter()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;result &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; converter&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;convert(pdf_path)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;md_text &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; result&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;document&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;export_to_markdown()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Three lines instead of one, but the extra structure pays dividends: &lt;code&gt;DocumentConverter&lt;/code&gt; can be initialized once and reused across an entire batch, which matters when processing a folder of 50 Oracle guides.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A note on startup:&lt;/strong&gt; The first time you run Docling, it downloads its ML models from Hugging Face. You will see this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;Loading weights: 100%|██████████| 770/770 [00:00&amp;lt;00:00, 1656.35it/s]
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This is normal. The models cache locally after the first download and subsequent runs start immediately. If you see a warning about &lt;code&gt;HF_TOKEN&lt;/code&gt;, that is also expected — Docling works without one, but setting a token removes the rate-limit warning:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-zsh&#34; data-lang=&#34;zsh&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;export HF_TOKEN=&amp;#34;hf_your_token_here&amp;#34;&amp;#39;&lt;/span&gt; &amp;gt;&amp;gt; ~/.zshrc
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;h2 id=&#34;what-changed-in-practice&#34;&gt;What Changed in Practice&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Oracle documentation:&lt;/strong&gt; Tables that previously collapsed into text now render as proper Markdown tables. A 6-column configuration reference comes back with headers intact and every row correctly aligned.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI books:&lt;/strong&gt; My knowledge base includes dense technical books on LLM engineering and machine learning. These have complex layouts — sidebars, multi-column sections, figures with captions. Docling&amp;rsquo;s layout model handles these significantly better than PyMuPDF4LLM&amp;rsquo;s heuristic approach.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Image-based PDFs:&lt;/strong&gt; Documents that previously produced empty output now convert cleanly. The two-step workaround (ocrmypdf → pdf2md) is no longer necessary for most cases.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;two-other-improvements&#34;&gt;Two Other Improvements&lt;/h2&gt;
&lt;p&gt;While I was updating the engine, I added two things that were overdue:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;DOCX support.&lt;/strong&gt; The converter now handles Word documents using pandoc as a backend. The same &lt;code&gt;pdf2md&lt;/code&gt; command works for both file types. This matters for Oracle support exports and study notes from my reMarkable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Batch manifest.&lt;/strong&gt; When processing a large folder, the converter now writes a manifest file tracking which files have been converted and their checksums. Re-running on the same folder skips files that haven&amp;rsquo;t changed. A &lt;code&gt;--force&lt;/code&gt; flag overrides this when you need a fresh conversion.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pdf2md --batch ~/oracle-pdfs/                         &lt;span style=&#34;color:#75715e&#34;&gt;# skips already-converted&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pdf2md --batch ~/oracle-pdfs/ --force                 &lt;span style=&#34;color:#75715e&#34;&gt;# reconverts everything&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s Next&lt;/h2&gt;
&lt;p&gt;The web UI — which I added in the &lt;a href=&#34;https://augmentedresilience.com/posts/adding-a-web-ui-to-my-pdf-to-markdown-converter/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;last post&lt;/a&gt;
 — has also been updated to use Docling. Drag a PDF onto it, click Convert, and the same deep-learning pipeline runs behind the scenes.&lt;/p&gt;
&lt;p&gt;The next thing I want to add is direct output to the Obsidian inbox. Right now the flow is: convert → download ZIP → move to vault. A toggle that sends output directly to &lt;code&gt;~/projects/obsidian-vault/00-inbox/&lt;/code&gt; would cut that manual step entirely.&lt;/p&gt;
&lt;p&gt;The tool is doing what I originally wanted: converting my Oracle documentation and AI library into clean, searchable Markdown. Docling is what makes that reliable for the documents that actually matter.&lt;/p&gt;
</content>
    </item>
    
  </channel>
</rss>
