Duplicate Content Between HTML & PDF Pages? Google Should Figure It Out

February 17, 2024

Less than 1 min.read

Duplicate Content Between HTML & PDF Pages? Google Should Figure It Out

A Google Webmaster Help thread has discussion about a potential duplicate content issues between HTML and PDF documents. In this case, the content found on the HTML is the same as on the PDFs. Be it an automated “print as PDF” feature or manual download of the content in PDF format.

How does Google handle the duplicate nature of such content available on the web?

JohnMu at Google chimed in saying that in most cases, they will use the HTML file. He does recommend that in these cases, you block the PDFs from being crawled and indexed. But ultimately, he said, that is your call. Google will likely just want to keep the HTML version in their index.

John said:

Forum discussion at Google Webmaster Help.

Where Did Bing's Webmaster Support Rep Go? Brett Yount

Google Reader Tracks Changes To All Web Pages: Tips on How to Block It

Barry Schwartz http://ikanju.net

UrbanObserver

Advertising

Movies

TV Shows

Music

Celebrity

Scandals

Drama

Lifestyle

Health

Technology

Company