Sitemap XML contribution

XML sitemaps were developed to list out all the URLs in a website, so that a search engine indexer (such as Google or Yahoo) could get to web content more easily.

The simple harvesting solution we’ve designed for DigitalNZ uses XML sitemaps in a similar way.

Making a sitemap.xml file

Generate a sitemap.xml file on the fly

The ideal approach is to build or install a plugin to your repository or website that first creates the XML sitemap, and then automatically updates it every time new content is added. This means your content will always be current in the DigitalNZ search system.

Manually generate a sitemap.xml file yourself

If your content doesn't change (or you want to get started quickly) it is easy to manually generate an XML sitemap.

Free tools for generating sitemaps exist online (just Google sitemap generator). But they will only work if your content is on the 'surface' web, that is if internet search engines can already index your site.

If your content can't be easily indexed, tools such as Sitemap Writer (which carries a small charge) may be useful for you. We use this at DigitalNZ.

Sitemap Writer includes an option for creating a sitemap from a list of URLs in a text file, which means we can generate a Sitemap XML file for you if you just send us URLs.

Generating URLs for a sitemap.xml file

How to generate the URLs to include in your sitemap.xml file will depend on your database. Databases generally allow you to run a query against content objects to either generate the URLs or at least get an object ID. Once you've got an object ID you can usually construct URLs using search and replace in a text editor. Here is a free XML editor you can use.

If you are unable to create the sitemap.xml file yourself, send DigitalNZ the object IDs or URLs and we will help you out.

Example of a sitemap

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>    
<lastmod>2005-01-01</lastmod>      
<changefreq>monthly</changefreq>      
<priority>0.8</priority>   
</url>
</urlset>

The XML sitemap above (from sitemaps.org) is the basic XML sitemap. This example lists one URL showing:

  • Where the content is located (loc)
  • How important it is that it is harvested relative to other content (priority)
  • When the content was last modified (lastmod)
  • How frequently the content changes (changefreq)

If you know a little bit about XML, you'll see it's really easy to understand.

What happens next?

Once the sitemap has been created, it needs to go onto your webserver somewhere. Just load it up and let us know the location.

We'll then send in our harvester to index the content as part of DigitalNZ.

Remember this is just one of the available technical options. If you're not sure whether this is right for you, drop us a line