Dawn Anderson followed up on a topic around what is near duplicate content with Google’s Gary Illyes – asking if it is similar to content stitching and quilting. As Dawn suspected, Gary said no, it is not. Here it is on Twitter where Dawn asked “‘Content stitching / quilting’… this is not the same as near-duplicate as defined in ur prev tweet?” and Gary responded that she is correct.
Here are the tweets:
Dawn then sent me some more technical information on this. She said that Marc Najork, who is now at Google, wrote a paper on this while at Microsoft named Detecting Quilted Web Pages at Scale. Here is the abstract:
There is no doubt Google and other search engines are on to this type of behavior but it is always nice pointing to research papers when we can. Thanks Dawn.
Forum discussion at Twitter.