For the website project that I’ve been working on recently we’ve focussed on ensuring that we have good Search Engine Optimisation. Now there are a lot of clever technical things that can be attempted here – such as replacing the 302 redirects etc – however there are also some very simple things that can and should be done too!
One of those simple things is submitting a sitemap to the major search engines, such as Google and Bing, through their WebMaster tools. Now this is a no-brainer and should be done for any website. The only potentially tricky area is how to make sure that the sitemap you’ve submitted remains up-to-date with the ever changing content in a CMS – like our website built on SharePoint 2010.
With my buddy Tristan Watkins we took a look at the various options out there for dynamically generating the sitemap file in the standard XML format for our SharePoint site. A great resource we investigated for a while was the IIS SEO Toolkit which is able to perform a site analysis and even generate a robots.txt and sitemap.xml. This is an especially useful tool for all website editors out there when you see it can be installed on a client, rather than having to be on the web front end. Tristan summarised this on his blog.
We may be missing a trick with the IIS SEO Toolkit, but we couldn’t manage to schedule an update without manually running it and so ultimately we chose to go with using PowerShell to generate a sitemap which can be run as a scheduled task on the web front ends as necessary.
The script I have below has to be largely credited to Jie Lie who first produced this excellent post on using PowerShell for this particular task. When running Jie’s script we noticed that it was producing a perfect sitemap but only for the root web, i.e. the top level site in a site collection.
With my rudimentary scripting skills we extended the original script to run through each of the sites in the site collection and also to take in parameters for the URL of the site and the desired location on the file system for the sitemap file:
param($Site,$FilePath) Add-PSSnapin microsoft.sharepoint.powershell -ErrorAction SilentlyContinue function New-SPSiteMap ($SavePath, $Url) { try { $site=Get-SPSite $Url $list = $site.Allwebs | ForEach { $_.Lists } | ForEach-Object -Process {$_.Items}| ForEach-Object -Process {$_.web.url.Replace(" ","%20") + "/" + $_.url.Replace(" ","%20")} #excludes directories you don’t want in sitemap. you can put multiple lines here: $list= $list | ? {$_ -notmatch "_catalogs"} $list= $list | ? {$_ -notmatch "Cache%20Profiles"} $list= $list | ? {$_ -notmatch "Reports%20List"} $list= $list | ? {$_ -notmatch "Long%20Running%20Operation%20Status"} $list= $list | ? {$_ -notmatch "Relationships%20List"} $list= $list | ? {$_ -notmatch "ReusableContent"} $list= $list | ? {$_ -notmatch "Style%20Library"} $list= $list | ? {$_ -notmatch "PublishingImages"} $list= $list | ? {$_ -notmatch "SiteCollectionDocuments"} $list= $list | ? {$_ -notmatch "Lists/"} $list= $list | ? {$_ -notmatch "Documents/"} $list= $list | ? {$_ -notmatch "PublishedLinks"} $list= $list | ? {$_ -notmatch "WorkflowHistory"} $list= $list | ? {$_ -notmatch "WorkflowTasks"} $list | New-Xml -RootTag urlset -ItemTag url -ChildItems loc -SavePath $SavePath } catch { write-host "Unable to create sitemap." -foregroundcolor red break } } function New-Xml { param($RootTag="urlset",$ItemTag="url",$ChildItems="*",$SavePath) Begin { $xml="<?xml version=""1.0"" encoding=""UTF-8""?> <urlset xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">" } Process { $xml += " <$ItemTag>" foreach ($child in $_){ $Name = $child $xml += " <$ChildItems>$child</$ChildItems>" } $xml += " </$ItemTag>" } End { $xml += "</$RootTag>" $xmltext=$xml $xmltext.Save($SavePath) } } New-SPSiteMap -Url $Site -SavePath $FilePath write-host "SiteMap for $Site saved to $FilePath" -foregroundcolor green
Produces an output such as:
To see this script in action, copy the above into an appropriate file on one of your web front ends, e.g. newsitemap.ps1, then open up PowerShell and run it similar to the below:
[…] sitemap stays current with the changing content in your CMS. Thankfully, my colleague Glyn Clough whipped up some PowerShell to produce a full sitemap for your web application based on Jie Li's initial script, which was […]
[…] up a couple of posts on how I’ve been harnessing the power of Powershell with SharePoint 2010 (to generate a sitemap and to list features) I was contacted by the guys over at the awesome Hey, Scripting Guy! Blog and […]
thanks for your scripts.
Here ist the batch script:
@echo off
rem #### computer-anleitung.de ####
%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe c:\sitemap.ps1 http://MyWebsite.de C:\inetpub\wwwroot\wss\VirtualDirectories\MyWebsite\sitemap.xml
iisreset
exit
Do you have an idea about underwebsites?I have a underwebsites but its not into the sitemap..
Hi Freaque,
Thanks for the batch script. It’s been a while since I did this, but I think we ended up doing something similar and running it as a scheduled windows task – so that the sitemap was kept up-to-date.
I’m not sure what you mean by ‘underwebsites’ sorry. If you mean sub-webs in the same site collection then this should be displaying those in the sitemap. If you have other site collections at managed paths ‘below’ your root site collection then I think you’d need to have separate sitemaps or modify the script quite a bit.
Hope that helps,
Thanks for the batch script. This is working fine for me in the Win7 environment.
I got this working in one environment (internal) then tried it at a client site and it seemed to do everything right. (I was using PowerGUI to step through in debug mode).
The last line where it tried to save the XML keeps failing. I’ve tried different locations and inspected acces rights with no luck.
Any suggestions on why .SAVE would not work?
Thanks.
Glyn thanks for the script , how can we generate sitemap to get with combination below fields too
url,loc,lastmod,changefrequency,priority any sample will be helpfull.
Thanks in advance
Hi – I think that might be quite involved. Generating the XML should be the simple (simpler, anyway) bit, but actually you’d have to change the $list object to include this information – which I think would be quite difficult. At the moment it’s quite straight-forward as it’s more of a simple directory listing rather than looking for modified dates etc on each individual item.
Sorry I can’t be more help.
Thanks Glyn for your quick response.
i decided to go in c# route , but thanks again i leared something new 🙂
I took Glyn’s example and tweaked it a bit to output the remaining nodes for the sitemap XML. The only issue is that I have hard-coded the “changefreq” and “priority”. The “lastmod” node is pulling the date from the list item object. Hope this helps:
param($Site,$FilePath)
Add-PSSnapin microsoft.sharepoint.powershell -ErrorAction SilentlyContinue
function New-SPSiteMap ($SavePath, $Url)
{
try
{
$site=Get-SPSite $Url
$items = $site.Allwebs | ForEach { $_.Lists } | ForEach-Object -Process {$_.Items}
$items | New-Xml -RootTag urlset -ItemTag url -ChildItems loc -ChildItems2 lastmod -ChildItems3 changefreq -ChildItems4 priority -SavePath $SavePath
}
catch
{
write-host “Unable to create sitemap.” -foregroundcolor red
break
}
}
function New-Xml
{
param($RootTag=”urlset”,$ItemTag=”url”,$ChildItems=”*”,$ChildItems2=”*”,$ChildItems3=”*”,$ChildItems4=”*”,$SavePath)
Begin {
$xml=”
”
}
Process {
$xml += ” ”
foreach ($item in $_){
$itemUrl = $item.web.url.Replace(” “,”%20”) + “/” + $_.url.Replace(” “,”%20”)
#excludes directories you don’t want in sitemap. you can put multiple lines here:
$itemUrl= $itemUrl | ? {$_ -notmatch “_catalogs”}
$itemUrl= $itemUrl | ? {$_ -notmatch “Cache%20Profiles”}
$itemUrl= $itemUrl | ? {$_ -notmatch “Reports%20List”}
$itemUrl= $itemUrl | ? {$_ -notmatch “Long%20Running%20Operation%20Status”}
$itemUrl= $itemUrl | ? {$_ -notmatch “Relationships%20List”}
$itemUrl= $itemUrl | ? {$_ -notmatch “ReusableContent”}
$itemUrl= $itemUrl | ? {$_ -notmatch “Style%20Library”}
$itemUrl= $itemUrl | ? {$_ -notmatch “PublishingImages”}
$itemUrl= $itemUrl | ? {$_ -notmatch “SiteCollectionDocuments”}
$itemUrl= $itemUrl | ? {$_ -notmatch “Lists/”}
$itemUrl= $itemUrl | ? {$_ -notmatch “Documents/”}
$itemUrl= $itemUrl | ? {$_ -notmatch “PublishedLinks”}
$itemUrl= $itemUrl | ? {$_ -notmatch “WorkflowHistory”}
$itemUrl= $itemUrl | ? {$_ -notmatch “WorkflowTasks”}
if ($itemUrl -ne “”)
{
$lastMod = $item[“Modified”]
$xml += ” $itemUrl $lastMod daily 0.5″
}
}
$xml += ” ”
}
End {
$xml += “”
$xmltext=$xml
$xmltext.Save($SavePath)
}
}
New-SPSiteMap -Url $Site -SavePath $FilePath
write-host “SiteMap for $Site saved to $FilePath” -foregroundcolor green
Hi Sean,
Thanks for sharing this – sorry it took so long for me to approve the comment. I especially like your addition of the last modified date; very handy.
Glyn
Gly:
I attempted to run your script against our internal SharePoint 2010 Environment both using Powershell and as a bat file from the command line. Both times, I get the message “Unable to Create SiteMap”… Could you help me out?
Batch File:
@echo off
rem #### computer-anleitung.de ####
%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe c:\testsitemap.ps1 http://intranetSB.emersonclimate.com C:\inetpub\wwwroot\wss\VirtualDirectories\IntranetSB.emersonclimate.com80\testsitemap.xml
iisreset
exit
———————————————————————————-
Using Powershell:
PS C:\Users\ctsidspsetupsvc> cd Desktop
PS C:\Users\ctsidspsetupsvc\Desktop> cd SiteMapScript
PS C:\Users\ctsidspsetupsvc\Desktop\SiteMapScript> .\NewSiteMap.ps1 http://intra
netsb.emersonclimate.com C:\inetpub\wwwroot\wss\VirtualDirectories\IntranetSB.em
ersonclimate.com80\test-sitemap.xml
Unable to create sitemap.
PS C:\Users\ctsidspsetupsvc\Desktop\SiteMapScript>
———————————————————————————–
param($Site,$FilePath)
Add-PSSnapin microsoft.sharepoint.powershell -ErrorAction SilentlyContinue
function New-SPSiteMap ($SavePath, $Url)
{
try
{
$site=Get-SPSite $Url
$list = $site.Allwebs | ForEach { $_.Lists } | ForEach-Object -Process {$_.Items}| ForEach-Object -Process {$_.web.url.Replace(” “,”%20”) + “/” + $_.url.Replace(” “,”%20”)}
#excludes directories you don’t want in sitemap. you can put multiple lines here:
$list= $list | ? {$_ -notmatch “_catalogs”}
$list= $list | ? {$_ -notmatch “Cache%20Profiles”}
$list= $list | ? {$_ -notmatch “Reports%20List”}
$list= $list | ? {$_ -notmatch “Long%20Running%20Operation%20Status”}
$list= $list | ? {$_ -notmatch “Relationships%20List”}
$list= $list | ? {$_ -notmatch “ReusableContent”}
$list= $list | ? {$_ -notmatch “Style%20Library”}
$list= $list | ? {$_ -notmatch “PublishingImages”}
$list= $list | ? {$_ -notmatch “SiteCollectionDocuments”}
$list= $list | ? {$_ -notmatch “Lists/”}
$list= $list | ? {$_ -notmatch “Documents/”}
$list= $list | ? {$_ -notmatch “PublishedLinks”}
$list= $list | ? {$_ -notmatch “WorkflowHistory”}
$list= $list | ? {$_ -notmatch “WorkflowTasks”}
$list | New-Xml -RootTag urlset -ItemTag url -ChildItems loc -SavePath $SavePath
}
catch
{
write-host “Unable to create sitemap.” -foregroundcolor red
break
}
}
function New-Xml
{
param($RootTag=”urlset”,$ItemTag=”url”,$ChildItems=”*”,$SavePath)
Begin {
$xml=”
”
}
Process {
$xml += ” ”
foreach ($child in $_){
$Name = $child
$xml += ” $child”
}
$xml += ” ”
}
End {
$xml += “”
$xmltext=$xml
$xmltext.Save($SavePath)
}
}
New-SPSiteMap -Url $Site -SavePath $FilePath
write-host “SiteMap for $Site saved to $FilePath” -foregroundcolor green
Mitch, take a look at your PowerShell execution permissions. I think you may not see the error.
Also, this is a nice script!
Very handy script! Much appreciated for sharing.
I am trying to also include the item title in the site map not just the url. Is there a way you can alter this script to include item titles?
e.g.: http://mysp2010site/listurl/mypage?
I am not a PowerShell literal, cannot do it myself… :'(
[…] http://www.glynblogs.com/2010/07/generate-a-sitemap-for-sharepoint-2010-using-powershell.html – Glyn Clough […]
Hi!
Thanks for the post. Really good way to generate a sitemap for one site collection but I really would like to know how to generate a sitemap of all my site collections.
(I have One web application with several different site collections.)
I have googled and googled AND googled but havent found anything useful.
Can anybody help me please?!
Thanks!