On Friday, Google’s JohnMu tweeted an important tip. Let’s say you block a directory from being crawled in your robots.txt file. Let’s say you want to add content in that directory to be crawled by Google. John explained that since Google caches your robots.txt, you want to update the robots.txt file at least 24 hours prior to update content within that directory.
Here is JohnMu’s tweet:
.bbpBox14893860956 {background:url(http://a1.twimg.com/profile_background_images/635162/backgr2.jpg) #e0e9f2;padding:20px;} p.bbpTweet{background:#fff;padding:10px 12px 10px 12px;margin:0;min-height:48px;color:#000;font-size:18px !important;line-height:22px;-moz-border-radius:5px;-webkit-border-radius:5px} p.bbpTweet span.metadata{display:block;width:100%;clear:both;margin-top:8px;padding-top:12px;height:40px;border-top:1px solid #fff;border-top:1px solid #e6e6e6} p.bbpTweet span.metadata span.author{line-height:19px} p.bbpTweet span.metadata span.author img{float:left;margin:0 7px 0 0px;width:38px;height:38px} p.bbpTweet a:hover{text-decoration:underline}p.bbpTweet span.timestamp{font-size:12px;display:block}
Robots-tip: crawlers cache your robots.txt; update it at least a day before adding content that is disallowed. Q&A in Buzz.less than a minute ago via webJohn MuellerJohnMu
Tedster in WebmasterWorldposted this tip and added:
The discussion goes off on to if you should use the robots.txt protocol to block content or not. But I won’t get into that debate in this post.
Forum discussion at WebmasterWorld.