Puppeteer, Node.js, and a 1500+ article migration
Notes from automating a large WordPress article migration with scripts instead of repeating manual browser work.
Large content migrations are rarely technically glamorous, but they are good tests of engineering judgment. A manual migration can look faster at the start, then become slow, error-prone, and difficult to verify once the count grows.
One automation project involved migrating 1500+ WordPress articles. The useful decision was to stop treating it as repeated browser work and start treating it as a controlled workflow.
The shape of the automation
The script work centered on a few practical needs:
- reading source content consistently
- preserving titles, body structure, categories, and metadata where possible
- handling browser sessions with Puppeteer
- recording progress so failed runs could resume
- separating extraction, transformation, and publishing steps
The goal was not to write a clever script. The goal was to reduce manual error and make the migration observable.
What mattered most
Retries and logging mattered more than speed. A fast migration that fails silently is worse than a slower one that tells you exactly which article failed and why.
I also learned to keep transformation logic separate from browser automation. Browser automation is already fragile because it depends on page structure, timing, and logged-in state. Mixing it with content cleanup makes the whole script harder to reason about.
The practical lesson
Automation becomes valuable when it turns repeated work into a reviewable process. For migration work, that means checkpoints, logs, resumability, and clear boundaries between steps.