How do ad spy tools capture native ads without clicking them?

They request the same JSON recommendation feed the publisher's widget calls to fill its slots, then read the ad data straight from that response. The advertiser, headline, creative image URL, and click URL all arrive in the payload, so the tool never has to render or click a live unit to get the data.

Is scraping native ads from Taboola or Outbrain legal?

Reading publicly served ad data is generally treated like accessing public web content, and transparency regulation is trending toward more public ad data, not less. Reputable tools capture only public ads and never click live units in a way that bills an advertiser. This is general information, not legal advice, so review each platform's terms for your use case.

How do spy tools follow a click to the landing page without spending the advertiser's budget?

They resolve the click URL out-of-band, replaying the redirect chain server-side (often from a clean residential exit in the right geo) instead of firing a billable click on a live impression. That returns the destination URL and the landing page, including any pre-lander, without charging the advertiser a cent.

What does it mean to classify the ad supply chain?

It means parsing every intermediary between the widget and the landing page (the demand platform, exchanges, trackers, and redirect domains) and labelling each one. Done right, it reveals the real advertiser behind a generic branding name, the network that served the ad, and the full tracking stack behind a creative.

Why do screenshot-based ad spy tools miss ads?

Native widgets are personalized and rotate constantly, so a single page render only captures the handful of ads served to one session, in one geo, at one moment. Feed-level harvesting across many geos, devices, and personas catches far more of the rotation, which is how OpenAdLibrary's index reached 157,727 Taboola creatives and 84,252 on Outbrain by June 2026.

Native Ad Spy Tools

How Ad Spy Tools Capture Native Ads (Supply Chain Explained)

The best native ad spy tools don't screenshot pages; they read the same JSON feed the widget calls, store the full-quality creative, and trace the click to the landing page without ever spending an advertiser's budget. Here's exactly how.

The OpenAdLibrary TeamAd intelligence & native advertising research

June 30, 2026 · 12 min read

Diagram of the native ad supply chain showing the publisher widget, demand platform, redirect and tracking hops, and the advertiser landing page captured by an ad spy tool

Picture an ad spy tool and most people imagine a robot that loads web pages and screenshots the ads it finds. That picture is wrong, and the gap is the whole story. A screenshot only captures what one browser session was handed at one moment, in one country. That is a tiny, biased slice of a feed that rotates thousands of times a day and personalizes for every visitor.

The tools that build reliable native-ad datasets work a layer below the pixels. They speak the same machine-to-machine language a publisher's widget uses to fetch its ads, and they rebuild the entire path from the slot on the page to the advertiser's landing page. This piece walks that path end to end, and it leans on the actual architecture OpenAdLibrary runs to do it at scale: 589,036 creatives, 25,933 advertisers, and 5.4 million ad observations across 42 networks (OpenAdLibrary index, June 2026).

If you want to know what is under the hood of a native ad spy tool before you trust its data, this is for you.

How do ad spy tools capture native ads?#

Ad spy tools capture native ads by calling the same recommendation API that fills a publisher's widget, then reading the advertiser, headline, image, and click URL straight from the JSON response. They store the creative at full quality, resolve the click URL out-of-band to reach the landing page, and parse the redirect chain to label every intermediary in the supply chain. No live ad ever gets clicked.

That is the whole game in one paragraph. The rest unpacks each step, why the obvious approaches fail, and what separates a thin dataset from a deep one. For the conceptual primer first, start with what a native ad spy tool actually is and how it differs from social-ad libraries.

The native ad supply chain, hop by hop#

You can't capture something you don't understand. A native ad is not a static image dropped into a page. It is the visible tip of a supply chain with several moving parts. Here is the path a single Taboola or Outbrain unit travels:

The publisher page loads a native ad widget. That is the "Around the Web" or "You May Like" box. At first paint it is mostly an empty container.
The widget script calls a recommendation endpoint. This is a JSON API on the demand platform's domain. It passes the publisher ID, the slot, the user's approximate geo, and device signals.
The demand platform runs an auction. In programmatic native, advertisers bid for that impression in milliseconds, and the winners come back in the response.
The response returns the ads as structured data. Each item carries a title, a thumbnail image URL, a branding or advertiser name, and a click-tracking URL.
The click URL is a redirect, not a destination. Click it and you bounce through one or more tracker and exchange domains before you reach the advertiser's real page, which is frequently a pre-lander or advertorial rather than a product page.

The single most useful realization in native ad intelligence: the ad data you want already arrives as clean JSON before a single pixel is painted. Screenshotting the rendered widget is reverse-engineering something you could have just read.

Every step is a capture opportunity and a classification problem at the same time. Steps 2 and 4 hand you the creative and the advertiser. Step 5 hands you the supply chain and the landing page. Tools differ wildly in how many of these they actually use.

API-only harvesting vs browser screenshotting#

There are two fundamentally different ways to harvest native ads, and the choice cascades into cost, coverage, and data quality.

Approach	How it works	Coverage	Cost to run	Creative quality
Browser screenshot	Headless browser loads pages, renders widgets, captures pixels	Low: one rotation per render	High: full Chromium per page	Lossy, cropped to viewport
API-only harvest	Requests the recommendation feed directly, reads JSON	High: many rotations, geos, personas	Low: no browser needed	Original asset, full resolution

The browser route is the obvious one, and plenty of legacy tools started there. It is also expensive and shallow. Spinning up real Chromium for every page is heavy, slow, and easy to fingerprint and block. Worse, a render only ever shows you the few ads served to that one session. You are sampling a slot that personalizes and rotates thousands of times a day and calling it a dataset.

The API-only route is what OpenAdLibrary's native harvester runs on. Instead of rendering pages, it requests the same recommendation feeds the widgets call, across many geographies and rotating device and identity personas, and reads the ads directly. In practice that is roughly an order of magnitude cheaper per ad than driving a browser, which is exactly why it can run continuously and catch far more of the rotation. No browser also means no clicking live ads. The data comes off the feed, not off a rendered impression.

That difference shows up in the numbers. On Taboola alone the index holds 157,727 creatives, and on Outbrain 84,252 (OpenAdLibrary, June 2026). You do not get to six figures per network by screenshotting pages one at a time. Here is a live Taboola finance ad pulled straight from the feed, not a render:

Taboola native finance ad about IRS tax forgiveness — Caption: A live Taboola finance ad, headline 2026 - IRS Forgives Millions By June 30th Tax Deadline, captured by OpenAdLibrary, June 2026

This is the difference between a tool that shows you a handful of a competitor's ads and one that shows you the full spread. For affiliates and media buyers, the spread is the whole point. See how affiliate marketers use native spy tools to find rotations worth modelling.

Capturing the creative at full quality#

Once the feed is parsed, the creative image URL points to the advertiser's original asset on a CDN. A good harvester fetches that asset directly and stores it at native resolution instead of keeping a downscaled screenshot crop.

Why it matters in practice:

Reverse image search and dedup only work on the original asset. Cropped screenshots break perceptual hashing and bloat your dataset with near-duplicates.
Creative analysis needs the real pixels. The hook, the face, the text overlay, the color treatment. You cannot study what made a winner win from a thumbnail.
Cross-network asset reuse detection depends on the source file. Matching the same image running on Taboola, MGID, and Revcontent at once is only reliable when you hold the original.

Storing the original also means the creative survives after the campaign ends and the CDN URL 404s, which is most of why an ad library has value at all. Across the index, OpenAdLibrary has held onto 926,259 landing-page captures tied to those creatives. The campaign dies; the record does not.

The health vertical is where this pays off hardest, because the creatives are aggressive and they recycle constantly. Health is the third-largest vertical in the index at 14,895 creatives, behind finance (17,232) and insurance (15,629). Here is one running on Taboola for 26 days straight:

Taboola native health ad about a new hearing device — Caption: A Taboola health ad, headline Americans Are Ditching Hearing Aids for This New Device, observed for 26 days by OpenAdLibrary, June 2026

Following the click to the landing page (without clicking)#

This is the step most tools skip, and it is where the real intelligence lives. The click URL in the feed is a tracking redirect. The destination behind it, the landing page or pre-lander and the advertiser whose domain it sits on, answers the only question that matters: who is actually running this, and where are they sending traffic?

The naive approach is to fire the click in a browser. That can register as a billable click on a live impression and burn budget you do not own. Not acceptable. The correct approach is to resolve the redirect chain out-of-band: replay the hops server-side, usually from a clean residential exit in the relevant geo, to retrieve the final URL and capture the landing page itself, without ever triggering a live billable click.

Done well, this surfaces three things at once:

The real advertiser behind a generic-looking branding name in the feed.
The pre-lander or advertorial, the bridge page native buyers rely on, which never appears in the widget.
The geo-gated destination. The same ad often routes to different landers by country, and only a multi-geo resolver catches that. The example below is geo-targeted at Australia, which you would miss entirely capturing from a US exit:

Taboola native life insurance ad targeted at Australia — Caption: A geo-targeted Taboola insurance ad, headline Australians looking for life insurance should read this, captured by OpenAdLibrary, June 2026

OpenAdLibrary traces each click to its landing page this way and stores the destination next to the creative, so a creative is never an orphaned image. It is tied to the advertiser and the funnel it feeds.

Classifying the ad-tech supply chain#

Capturing the hops is one thing. Making them legible is another. Between the widget and the lander, a single ad can pass through a demand platform, an exchange, several tracker domains, and a redirect service. Classification is the work of parsing that chain and labelling each node.

A capable system keeps a dynamic registry of known networks, trackers, and redirect domains, then matches each hop against it to answer:

Which demand platform served the ad (Taboola, Outbrain, MGID, Revcontent, MediaGo, Yahoo, Microsoft Audience Network)?
Which trackers and exchanges sit in the redirect chain?
Who is the end advertiser, normalized across the many branding aliases the same buyer hides behind?

Hard-coding tracker lists is a losing game. The domains rotate constantly, so it turns into whack-a-mole. A registry that updates as new patterns appear is the only approach that holds up. Get the supply chain right and you can answer questions a flat ad list never can: which advertisers concentrate on which networks, which trackers signal a particular affiliate stack, which redirect services correlate with the most aggressive offers.

The pattern is visible in the data. Both Taboola and Outbrain are dominated by the same three verticals, but the order flips. On Taboola the lead is health (6,048 creatives), then finance (5,558) and insurance (4,303). On Outbrain finance leads (2,640), then insurance (2,615), then health (2,016) (OpenAdLibrary, June 2026). That kind of per-network advertiser concentration is exactly what supply-chain classification exists to expose.

If the terminology is new, the glossary entry on the ad supply chain lays out the roles, and programmatic native advertising explains the auction that decides which ads you capture in the first place.

Coverage, geo, and personas: why one capture isn't enough#

A native slot is personalized. The ads it serves depend on geo, device, time of day, and inferred interest. So a single capture, from one IP, one device profile, one moment, is a biased sample of a much larger rotation.

Serious harvesting treats this as a sampling problem:

Geo rotation. The same widget on the same publisher serves different advertisers in the US, UK, AU, and DE. One geo gives you one slice.
Device and identity rotation. Desktop, Android, and iOS sessions, plus rotated personas, surface different demand and different creative formats.
Cadence. Rotations turn over throughout the day, so capture has to be continuous, not a one-off crawl.

This is also why longevity and spread are the most honest winner signals available without inside data. You cannot see a competitor's spend, but you can see how long a creative has run and how widely it has spread across publishers, geos, and networks. Right now the longest continuously observed creatives in the index sit at 28 days of unbroken observation. SmartAsset has been running "Ask a Pro: How Can I Avoid Paying Taxes on IRA Withdrawals?" on Outbrain for all 28 of them:

Outbrain native finance ad from SmartAsset about IRA taxes — Caption: An Outbrain finance ad from SmartAsset observed continuously for 28 days by OpenAdLibrary, June 2026

Worth being precise here: 28 days is the span of our observation window, not a claim that the creative has run for exactly 28 days and no more. The industry lore about 90-day winners is a separate thing, useful as a rule of thumb but not something our index measures. What we can stand behind is the observed run: an ad that has been live across dozens of placements for weeks is telling you something a spend number cannot.

What this means for the data you actually get#

Pull the threads together and the harvesting method directly decides what you can do with the output.

Capability	Needs feed-level capture?	Needs click-trace?	Needs supply-chain classification?
See a creative ran at all	Yes	No	No
Identify the real advertiser	Partly	Yes	Yes
Find the landing page or pre-lander	No	Yes	No
Detect cross-network asset reuse	Yes	No	Yes
Rank winners by longevity and spread	Yes	No	No

A tool that only screenshots gives you the first row. A tool that harvests feeds, traces clicks, and classifies the supply chain gives you all of them. That is the difference between a curiosity and a research instrument. On top of clean capture, OpenAdLibrary layers tooling that uses it: Creative Studio to remix what works, Optimize to act on it, Copy DNA to break down the angle, plus a full API and MCP so you can pull the data into your own stack.

The transparency tailwind#

For anyone weighing the legitimacy of all this, the regulatory direction of travel is toward more public ad data, not less. The EU's Digital Services Act now requires very large platforms (Meta, TikTok, Google, and similar) to maintain public, queryable ad repositories listing the advertiser, the payer, and the run dates for each ad. Native discovery widgets are not covered by that mandate, but the principle is established: ads served to the public are increasingly treated as public information. Tools that capture only public ads, and never click live units in a billable way, sit comfortably inside that trend. (General context, not legal advice. Check each platform's terms for your specific use.)

The bottom line#

Stripped to essentials, here is how the good ad spy tools work. They read the recommendation feed instead of screenshotting the page. They store the original creative instead of a crop. They resolve the click out-of-band to reach the landing page instead of firing a billable click. And they classify every hop in the supply chain instead of leaving you a flat list. Coverage comes from rotating geos, devices, and personas continuously, and the most trustworthy winner signal is longevity and spread.

When you compare vendors, those are the questions to ask. Our tested, ranked breakdown of the best native ad spy tools scores each one on exactly these axes. If budget is the constraint, there is also a guide to running native ad research for free.

OpenAdLibrary built its native harvester on every principle above: API-only capture, full-quality creatives, click-traced landing pages, and a live supply-chain registry. It is open and affordable rather than $80 to $400 a month. Start free and browse 200 ads, no credit card required.

How do ad spy tools capture native ads?#

The native ad supply chain, hop by hop#

API-only harvesting vs browser screenshotting#

Capturing the creative at full quality#

Following the click to the landing page (without clicking)#

Classifying the ad-tech supply chain#

Coverage, geo, and personas: why one capture isn't enough#

What this means for the data you actually get#

The transparency tailwind#

The bottom line#

Frequently asked questions

Related reading

Best Native Ad Spy Tools in 2026 (Compared, Ranked & Priced)

What Is a Native Ad Spy Tool? How to Spy on Taboola & Outbrain Ads

Free Native Ad Spy Tool: Research Competitor Native Ads at No Cost

Best Ad Spy Tools for Affiliate Marketers in 2026 (Native Focus)

Ad Supply Chain

Native Ad Widget