How to extract title & meta data using Gdocs, Xpath and ImportXml

I’m pretty sure everyone knows I have an unhealthy obsession with Google docs, and the wonderful things it can achieve. I’ve actually switched from my beloved Microsoft suite to Gdocs full time. Just waiting for those clever Google engineers to up the capacity on the 400k rows of data in a spreadsheet.

This one goes out to Paul from Clixfuel.com who’s asked how to get important meta data from a webpage quickly using ImportXML.

Okay, so the first thing I’m going to say is to always do a quick Google for any Xpath around your topic, and then understand you’ll need to adapt it for Google docs – you just need to know what you need to change đŸ™‚ I found a brilliant response here, which I’m actually going to use.

Normal Xpath like this doesn't work:
 /html/head/meta[@name="description"]/@content

Oooh, cool. Only one teeny, tiny problem – Google docs uses a slightly different syntax for Xpath (don’t ask me why exactly, I’m not a programmer, I’m a hack job). This is right, notice the single quotes around description:

/html/head/meta[@name='description']/@content

We don’t even need to step backwards to the <html> or <head> tag either! This will work too:

//meta[@name='description']/@content

Okay, let’s go ahead and pull out the title, meta description and meta keywords from a webpage:

=importxml("http://www.davidsottimano.com/how-to-extract-title-meta-data-using-gdocs-xpath-and-importxml/","//title")

=importxml("http://www.davidsottimano.com/how-to-extract-title-meta-data-using-gdocs-xpath-and-importxml/","//meta[@name='description']/@content")

=importxml("http://www.davidsottimano.com/how-to-extract-title-meta-data-using-gdocs-xpath-and-importxml/","//meta[@name='keywords']/@content")

Go check out the sheet to see it work!

Hope that helps, leave any questions in the comments and I’ll get back to you.

About The Author

Related Posts

Leave a Reply