During the Stone Temple Consulting Q&A with Gary Illyes on YouTube the other week, he spoke about crawl budget and how Google doesn’t necessarily crawl based on crawl budget. He said internally, Google calls the crawl scheduling “host load.” Host load kind of sets a bucket of URLs in importance order and GoogleBot will crawl in that order based on the schedule the host load decided. If Google thinks your server can handle it, it will crawl the whole bucket, if not, it will stop.
He was asked a question on crawl budget at the 17:25 minute mark by Eric Enge who asked, “Historically, people have talked about Google having a crawl budget. Is that a correct notion, like Google comes in they’re going to take 327 pages from your site today.”
Gary Illyes responded and it was hard to transcribe…
I recommend you listen to it yourself, I am embedding it at the start time:
I did find one document from Google on host load related to their search appliance, it says:
It goes on and on, so it is an interesting read.
Forum discussion at YouTube.