One of the foundations of Google’s matter of fact (Google pushes for an official web crawler standard) is the robots.txt document that destinations use to reject a portion of their substance from the web index’s web crawler, Googlebot. It limits trivial ordering and here and there holds touchy data hush-hush. Google figures its crawler tech can improve, however, as it’s shedding a portion of its mystery. The organization is publicly releasing the parser used to interpret robots.txt n an offer to encourage a genuine standard for web creeping. In a perfect world, this removes a significant part of the riddle from how to decode robots.txt records and will make to a greater degree a typical arrangement.
While the Robots Exclusion Protocol has been around for a fourth of a century, it was just an informal standard – and that has made issues with groups deciphering the arrangement in an unexpected way. One may deal with an edge case uniquely in contrast to another. Google’s drive, which incorporates presenting its way to deal with the Internet Engineering Task Force, would “better characterize” how crawlers should deal with robots.txt and make less inconsiderate astonishments.
The draft isn’t completely accessible, yet it would work with something beyond sites, incorporate a base document measure, set a maximum one-day store time and offer locales a reprieve if there are server issues.
There’s no certification this will end up being a standard, at any rate in its present condition. On the off chance that it does, however, it could help web guests as much as it does makers. You may see progressively steady web indexed lists that regard destinations’ desires. In the case of nothing else, this demonstrates Google isn’t totally opposed to opening significant resources in the event that it supposes they’ll progress the two its innovation and the business on the loose.
Source: Google Webmaster Central Blog (Google pushes for an official web crawler standard)