One of my clients had a simple request:
"Please provide a list in Excel of every single valid URL on the live global site, please."
After running ScreamingFrog and obtaining a report with missing URLs (the final list returned was faulty – likely due to the software’s inability to hit specific links only available via AJAX rendered components) - we had a couple options on the table:
- Similar to functionality found in a Sitemap component, we need a simple ASPX page that loops through the content that filters out everything but the global English version, generate the URLs, trigger a web request to determine the URL's web status, and display it on the page (or create a Download button to get the list). Code this, deploy it, etc.
- Do all of the above - but with Powershell - which happened to already be installed on the CMS
GUESS which I opted for? :)
Yeah!...you guessed it!
Let's get right into it.
Using the Get-ChildItem command and targetting a specific part of the content tree (explicitly using the Web DB), we get the initial list of English versioned items.
$itemsWithMatchingCondition = Get-ChildItem -Path web:'/sitecore/content/WebsiteName/Home' -Language 'en' -Version * -Recurse
With this specific implementation, I was lucky enough to have a stable template naming convention where all items using a template that ended with "Page" were always going to be...well...pages.
(Without this luck, I may have had to check if the item contained at least a main layout within the renderings).
To filter this, we'll use a simple IF statement with a LIKE operator against the initial item list's item:
iif ($item.Template.Name -like $script:pageString)
Now that we have a list of page items we want to process, we need to generate the item's URL.
This handy function that sets the site context, configures the UrlOptions, and gets the URL via the LinkManager does just that:
This handy function that sets the site context, configures the UrlOptions, and gets the URL via the LinkManager does just that:
function Get-ItemUrl($itemToProcess){ [Sitecore.Context]::SetActiveSite("website") $urlop = New-Object ([Sitecore.Links.UrlOptions]::DefaultOptions) $urlop.AddAspxExtension = $false $urlop.AlwaysIncludeServerUrl = $true $linkUrl = [Sitecore.Links.LinkManager]::GetItemUrl($itemToProcess,$urlop) $linkUrl }
Here's the fun part!
Per the requirement, we'll need to validate that the URLs Sitecore was generating were actually functioning. Any non-functioning URLs (if any) shouldn't be included in the final report (only status code 200).
Powershell lets us make web requests - which we could then check the status of.
All we need to do here is pass in the URL we generated and expect a true or false value in return:
function IsValidPageStatus($urStr){ $return = $false; $HTTP_Request = [System.Net.WebRequest]::Create($urStr) $HTTP_Response = $HTTP_Request.GetResponse() $HTTP_Status = [int]$HTTP_Response.StatusCode if ($HTTP_Status -eq 200) { $return = $true } else { Write-Host $urStr Write-Host "Response: " $HTTP_Status $return = $false } $HTTP_Response.Close() return $return }
(Note: Any page URL that fails will be listed in the console after the script completes.)
After every URL goes through this check, we add the item to the array list:
if($isValidUrl){ $script:itemIDsWithPassedCriteria.Add($item) > $null }
Finally, build out the report - which can then be exported via the Powershell ISE in CSV/Excel format:
if ($script:itemIDsWithPassedCriteria.Count -eq 0) { Write-Warning "No page items found." }else{ $props = @{ InfoTitle = "Live Page Urls" InfoDescription = "Provides a list of all valid page URLs " PageSize = 100 } $script:itemIDsWithPassedCriteria|Show-ListView @props -Property @{ Label = "Url"; Expression = { Get-ItemUrl ($_) } } Close-Window }
Here's the full script:
<# .SYNOPSIS Provides a list report of all valid page URLs .AUTHOR Written by Gabe Streza #> # Variables $script:pageString = "* Page" #page string function GetItemsWhichUsePageTemplate() { $itemsWithMatchingCondition = Get-ChildItem -Path web:'/sitecore/content/WebsiteName/Home' -Language 'en' -Version * -Recurse { if ($item.Template.Name -like $script:pageString) { $linkUrl = Get-ItemUrl($item) $isValidUrl = IsValidPageStatus($linkUrl) if($isValidUrl){ $script:itemIDsWithPassedCriteria.Add($item) > $null # The output of the Add is ignored } } } } function Get-ItemUrl($itemToProcess){ [Sitecore.Context]::SetActiveSite("website") $urlop = New-Object ([Sitecore.Links.UrlOptions]::DefaultOptions) $urlop.AddAspxExtension = $false $urlop.AlwaysIncludeServerUrl = $true $linkUrl = [Sitecore.Links.LinkManager]::GetItemUrl($itemToProcess,$urlop) $linkUrl } function IsValidPageStatus($urStr){ $return = $false; $HTTP_Request = [System.Net.WebRequest]::Create($urStr) $HTTP_Response = $HTTP_Request.GetResponse() $HTTP_Status = [int]$HTTP_Response.StatusCode if ($HTTP_Status -eq 200) { $return = $true } else { Write-Host $urStr Write-Host "Response: " $HTTP_Status $return = $false } $HTTP_Response.Close() return $return } $script:itemIDsWithPassedCriteria = New-Object System.Collections.ArrayList GetItemsWhichUsePageTemplate if ($script:itemIDsWithPassedCriteria.Count -eq 0) { Write-Warning "No page items found." }else{ $props = @{ InfoTitle = "Live Page Urls" InfoDescription = "Provides a list of all valid page URLs " PageSize = 100 } $script:itemIDsWithPassedCriteria|Show-ListView @props -Property @{ Label = "Url"; Expression = { Get-ItemUrl ($_) } } Close-Window } Write-Host "Done."
This took about 8 minutes to process a 2000 page site - which is good for a one-time run - but there are certainly some optimizations we should make if this was a report the client would use repeatedly in order to make it a bit snappier. For this purpose, we're all set!
Feel free to grab this, tinker with it, and make it your own!
Let me know in the comments if this has helped - or if you have any additional recommendations.
ReplyDeleteI like this blog, this is a very simple but very good explanation about this useful topic. Well done and keep continuing...
Oracle Training in Chennai
Oracle Training institute in chennai
Social Media Marketing Courses in Chennai
Tableau Training in Chennai
Primavera Training in Chennai
Unix Training in Chennai
Oracle DBA Training in Chennai
Power BI Training in Chennai
Oracle Training in Chennai
Oracle Training institute in chennai
Nice blog! Thanks for sharing this valuable information
ReplyDeleteSoftware Testing Training in Chennai
Software Testing Training in Bangalore
Software Testing Course in Coimbatore
Software Testing Training in Madurai
Software Testing Training Institute in Bangalore
Software Testing Course in Bangalore
Testing Course in Bangalore
Ethical hacking course in bangalore
The blog you shared is very good. I expect more information from you like this blog. Thankyou.
ReplyDeleteweb designing training in chennai
web designing training in bangalore
web design training coimbatore
Web Designing Course in bangalore
web designing course in madurai
Web development training in bangalore
Web development training in chennai
Big Data Course in Coimbatore
Very nicely coded :)
ReplyDelete