SEO TIP!
Address NON INDEXED page issues from search console by doing the following.
1. Crawl your site in full with Screaming Frog SEO Spider - ensuring you set the following:
Go to CONFIGURATION > CRAWL CONFIG
Then go to
> Crawl: Check follow internal nofollow, crawl linked XML sitemaps
> Rendering: Javascript
> Robots.txt: Ignore robots.txt but report status
> API Access: Google Search Console - connect it, select last 16 months
2. Run your crawl
3. Export to a sheet and upload to Google Sheets
4. Export each NON INDEXED ITEM from search console i.e. crawled currently not indexed, discovered currently not indexed, duplicate google chose different canonical etc.
5. Make sure your CRAWL and NON INDEXED items are in the same spreadsheet (different tabs) i.e. name your site crawl tab FULL SITE CRAWL, then name each non indexed tab as the issue
6. Use HTTPSTATUSCODE script and run in APPS SCRIPT and then call it to check each NON INDEXED URLS status code (script is in comments)
7. Use VLOOKUP to pair up CRAWLED HTTP Status - this is great for identifying orphan pages (if your POLLED HTTP Status is 200 but crawled has no value it wasn't seen or found on the crawl - potential orphan)
You can pair anything up - I typically tend to look for:
Polled HTTP Status
Crawled HTTP Status
Word Count
Internal Links (unique)
Index states
Canonical (you can also use IMPORTXML here)
The VLOOKUP CODE is here:
=IFERROR(VLOOKUP(
#cellref,'
#tabref'!$A$1:$YM$145124,
#sheetlookupindex,FALSE),"-")
You need to adjust the vlookup to match your data sets i.e.
=IFERROR(VLOOKUP(A2,'Full Site Crawl'!$A$1:$YM$145124,2,FALSE),"-")
In the example above I would look at cell A2 (URL) in the FULL SITE CRAWL SHEET, the index number is the column index of what I want to look up i.e. column 2 = HTTP status.
When you build your data set you can then do the following:
1. Create a DATA FILTER
2. Make sure you copy and paste special > values only on your HTTP status column so it doesn't rerun the script when you sort/filter
3. You can apply a HTTP STATUS FILTER on say pages under crawled currently not indexed - I tend to duplicate the tab and apply filters for different HTTP status codes - i.e. I would apply 200 to the status filter for crawled/discovered currently not indexed to find all ACTIVE PAGES that are not indexed, I would then look at word count, internal link counts
4. You can also export GSC data to a sheet and vlookup URLS to see if anything NOT INDEXED had been in the last 16 months
Enjoy :)
#seo #seotips #digitalmarketing