In addition to the language indexing diversity, Gary Illyes from Google said in the Search Off the Record podcast that Google uses different indexing tiers. He said the search company “might use different kinds of storages to build the index.” Some of the index goes on cheaper storage and some go on more expensive storage to be served and accessed faster.
If a document needs to be served often, Google might use one type of storage device over another. This is to balance cost and efficiency.
This part started at about 7:03 into the podcast.
Gary explained how computers are built to explain why Google uses different levels of storage types for its indexing tiers. Gary said:
He then explains based on “how many times we think that the document might be served, we might store the documents in our index in these different kinds of storage mechanisms.” This is how Google defines its indexing tiers he said, “And that’s what practically defines the index tiers that we have.” “So for example, for documents that we know that might be surfaced every second, for example, they will end up on something super fast. And the super fast would be the RAM. Like part of our serving index is on RAM,” Gary added.
He goes on a bit more “Then will have another tier, for example, for solid state drives because they are fast and not as expensive as RAM. But still not– the block of the index wouldn’t be on that. The bulk of the index would be on something that’s cheap, accessible, easily replaceable, and doesn’t break the bank.”
It makes sense that Google would take this approach to storing information in its search index like this.
Now, you will ask, how does one optimize to be on the most expensive indexing tier? 🙂
Here is the embed so you can listen:
Forum discussion at Twitter.