Uncategorized

Playing with Google’s new Indexing API and getting pages crawled immediately

By January 8, 2019 No Comments

Google’s new Indexing API support page say “it can only be used to crawl pages with either job posting or livestream structured data”, but of course I was curious and it turns out that we can get regular pages crawled as well, and damn fast.

How fast did it crawl my pages? Within 1 minute, and I’m not even kidding.

After you ping the API, Google will come by with 2 different user-agents within 2 minutes (according to my 3 tests):

  1. Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  2. Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

How to test it out (Node JS setup)

I’m literally typing this post out on a cruise so please excuse the lack of a full tutorial with batching, but basically here’s what I did to get this up and running:

Head to https://developers.google.com/search/apis/indexing-api/v3/prereqs and make sure you follow the instructions to the letter.

When you’re assigning a role to your service account, just choose “owner” and take note of the member email, you’ll be using this in Google search console later.

google indexing api service account

Also, when you’re adding in this “member” email to Google search console, make sure you’re adding it as a verified owner and not as a user.

For your Node JS setup, go ahead and use this adapted code (below) that works.  The code that was in the quickstart didn’t work for me, specifically I had to import the Googleapis module as an object and there was an error in the options object literal.

let request = require("request");
let {google} = require("googleapis");
let key = require("./service_account.json");

const jwtClient = new google.auth.JWT(
key.client_email,
null,
key.private_key,
["https://www.googleapis.com/auth/indexing"],
null
);

jwtClient.authorize(function (err, tokens) {
if (err) {
console.log(err);
return;
}
let options = {
url: "https://indexing.googleapis.com/v3/urlNotifications:publish",
method: "POST",
// Your options, which must include the Content-Type and auth headers
headers: {
"Content-Type": "application/json"
},
auth: {
"bearer": tokens.access_token
},
// Define contents here. The structure of the content is described in the next step.
json: {
"url": "https://www.davidsottimano.com/doesnotexist",
"type": "URL_UPDATED"
},

}

request(options, function (error, response, body) {
// Handle the response
console.log(body);
});
});

A successful response

What can you do right now with this API?

You can do a few things with the indexing API right now (https://developers.google.com/search/apis/indexing-api/v3/using-api), here’s what Google says:

  1. Update a URL: Notify Google of a new URL to crawl or that content at a previously-submitted URL has been updated.
  2. Remove a URL: After you delete a page from your servers, notify Google so that we can remove the page from our index and so that we don’t attempt to crawl the URL again.
  3. Get the status of a request: Check the last time Google received each kind of notification for a given URL.
  4. Send batch indexing requests: Reduce the number of HTTP connections your client has to make by combining up to 100 calls into a single HTTP request.

Things that I really want to test out:

  1. Decrease lag time with the indexing API for hreflang, directives, redirects
  2. Use it on Google news approved publishers to see if we can get new and updated articles back into the featured stories carousel
  3. Batching requests – apparently they’ll let you batch 100. However, I don’t see the harm in just using single HTTP requests, since you’re not spamming, right? RIGHT?

Last note & Limits

I don’t know if this was ready for primetime yet,  but I hope Google lets us keep this gem.  If this keeps up, I’m more than happy to drop XML sitemaps completely – wouldn’t that be great?

Limits are 600 requests per minute and 200 publishing requests per day, and it’s free!

Dave

Author Dave

More posts by Dave