All work
Web Scraping & Data Pipelines · 2025
Resilient Web Content Extractor
Pulls clean markdown from JS-heavy pages, via CLI or HTTP.
Node.jsPlaywrightCLIHTTP server
The problem
Other tools choked on modern, client-rendered sites — returning empty shells or broken markup — which made downstream content processing unreliable.
What I built
I built it on a headless browser that fully renders each page before extraction, then converts the result to clean markdown. The same core is wrapped in a CLI for one-off pulls and an HTTP server for programmatic use, so it slots into both scripts and larger pipelines.
The outcome
Gave every downstream system a single dependable way to turn a URL into usable, structured text.
Want something like this built? Get in touch