Top 5 This Week

Related Posts

Duplicate Content Between HTML & PDF Pages? Google Should Figure It Out

A Google Webmaster Help thread has discussion about a potential duplicate content issues between HTML and PDF documents. In this case, the content found on the HTML is the same as on the PDFs. Be it an automated “print as PDF” feature or manual download of the content in PDF format.

How does Google handle the duplicate nature of such content available on the web?

JohnMu at Google chimed in saying that in most cases, they will use the HTML file. He does recommend that in these cases, you block the PDFs from being crawled and indexed. But ultimately, he said, that is your call. Google will likely just want to keep the HTML version in their index.

John said:

Forum discussion at Google Webmaster Help.

Popular Articles