How to Copy a Webpage as Plain Text (for AI)

The problem: a webpage isn’t really text

You copy a page from a company website, paste it into Copilot, and ask for a summary. The answer starts with “Skip to main content” and ends with the cookie policy. Half of what you pasted was navigation, footer, chat widget, and ad copy.

A language model doesn’t see a page. It sees a wall of text with no distinction between content and interface. The less noise you feed in, the better the answer. Here are two ways to extract just the content.

Option 1: Immersive Reader in Microsoft Edge

Edge has a button that strips a page down to its readable content. No menus, no banners, no ads. Just the text.

  1. Open the page in Edge
  2. Click the book icon in the address bar (or press F9)
  3. Choose “Immersive reader”
  4. Select all (Cmd+A or Ctrl+A), copy, paste into your AI

Immersive Reader sits in the Microsoft Edge address bar, next to the star icon

What you get back is the page without formatting or interface elements. Works on most news pages, knowledge bases, and articles. On heavy web apps (dashboards, forms) the button sometimes doesn’t appear, because there’s no clear “reading content” for Edge to extract.

Option 2: defuddle.md in front of the URL

Sometimes you don’t want to click buttons, and sometimes you’re not using Edge. There’s a trick that works in any browser: put https://defuddle.md/ in front of the URL.

So instead of:

https://www.bngbank.nl/en/about-bng

You go to:

https://defuddle.md/https://www.bngbank.nl/en/about-bng

With defuddle.md in front of the URL, you get a markdown version of the page with title, source, and word count on top

What comes back is markdown: the text with headings, lists, and links intact, without the rest of the page. At the top you get useful metadata (title, source, language, word count) that you can copy along into your AI.

Two alternatives that do the same thing:

  • https://markdown.new/[url]
  • https://r.jina.ai/[url]

Which one you pick is taste. Defuddle gives slightly cleaner output, Jina sometimes handles JavaScript-heavy pages better.

When to use which

SituationMethod
One page, quick summaryImmersive Reader
Behind a login or paywallImmersive Reader (you’re already signed in)
No Edge at handdefuddle.md in front of the URL
Page needs JavaScript to loadr.jina.ai in front of the URL
You want to save the text as markdowndefuddle.md (save the file)

Why this matters

A lot of teams paste pages straight into Copilot or ChatGPT and then wonder why the answer is messy. The input is messy. A language model doesn’t distinguish between “the article” and “the button in the navigation” unless you draw that line for it.

This is part of context management: the content you give AI shapes the answer as much as the question you ask. Clean input, better result.

The same reason you convert PDFs to plain text before handing them to AI applies to webpages. It takes ten seconds. It saves you an unusable answer.

contextwebpagemarkdownAI workspaceknowledge base
Casimir Morreau
Written by Casimir Morreau

Co-founder & Lead Trainer

20+ years of experience, incl. Professor of Digital at HvA, Leadership training.

LinkedIn