If you use the Page Structure API to analyse one of more URLs, you will notice in the JSON response that there is a field called 'Page Type'.
Please refer to the relevant page on the website above to see the latest data JSON returned and the full list of data elements analysed and parsed.
Page Type Classification
We download the full HTML of every page (unless blocked by a site) and try and extract the main content of each page (we say 'try' as there are some terribly coded websites out there and this may not always be possible!).
We've trained an algorithm capable of recognising patterns and signals that the following types of page have in common.
This is not 100% foolproof and sometimes it is difficult for the algorithm to distinguish between pages which are a hybrid of two types, e.g. An ecommerce product category page that has so much text on it that it looks like an article, or an article that has inserted repetitive patterns of product links or blocks so it resembles some aspects of a normal ecommerce category page.
We have not included links to sites as they are likely to have changed by the time you visit them. But hopefully the descriptions give you a good idea of what types of pages we classify into each bucket. We use the 'generic' category as a catch-all for pages that have no distinguishing factors. You see a lot of company home pages and templated pages in this category.
Any blog post, typically will have RSS feed and more than 1 post. A long-form narrative (lots of text and often with images and videos) on a subject. Also includes news pages.
A Category / Search Listings page, e.g. A typical ecommerce page showing multiple products. Or a page featuring search functionality and/or result listings. E.g. A directory of used cars for sale.
e.g. PDF, TXT or Word doc.
A normal page on any website not otherwise classified. e.g. The home pages on many sites.
An image or image sharing site.
A typical ecommerce product page.
e.g. A page or profile on any recognised Social Network like Twitter, Instagram, Facebook, etc, etc.
A video or YouTube or Vimeo (or other video hosting page). Not including videos that are just part of the content. (This is detected based on the domain).
e.g. Wikipedia. (We could add similar sites later). (This is detected based on the domain).
How can SEOs use this data?
SEOs will find this useful to analyse the "Smell of the SERP" (a term first coined by Laurent Bourrelly - a leading French SEO) for a keyword or group of keywords. It can help you identify the type of content that Google prefers to rank on Page 1 of the SERP and you can compare it to your own page(s) to see whether you have the right type of content to rank well.
e.g. If Google is showing 10 articles on the 1st page for "best family SUV 2023" then it's unlikely your used SUV directory page will rank (no matter how powerful your domain and backlink profile) if the content does not provide expert editorial on how to select the best SUV like the top ranking pages do.
So, by running the Page Structure API in bulk against the top n competing pages and your ranking pages for all your ranking keywords, you can easily see whether you have a match or mis-match between your page structure and content vs what Google believes users want and what it prefers to show at the top of the SERPs.