Unraveling Panda Patterns

Posted by billslawski

This is my first official blog post at Moz.com, and I’m going to be requesting your help and expertise and imagination.

I’m going to be asking you to take over as Panda for a little while to see if you can identify the kinds of things that Google’s Navneet Panda addressed when faced with what looked like an incomplete patent created to identify sites as parked domain pages, content farm pages, and link farm pages. You’re probably better at this now then he was then.

You’re a subject matter expert.

To put things in perspective, I’m going to include some information about what appears to be the very first Panda patent, and some of Google’s effort behind what they were calling the “high-quality site algorithm.”

I’m going to then include some of the patterns they describe in the patent to identify lower-quality pages, and then describe some of the features I personally would suggest to score and rank a higher-quality site of one type.

Google’s Amit Singhal identified a number of questions about higher quality sites that he might use, and told us in the blog post where he listed those that it was an incomplete list because they didn’t want to make it easy for people to abuse their algorithm.

In my opinion though, any discussion about improving the quality of webpages is one worth having, because it can help improve the quality of the Web for everyone, which Google should be happy to see anyway.

Warning searchers about low-quality content

In “Processing web pages based on content quality,” the original patent filing for Panda, there’s a somewhat mysterious statement that makes it sound as if Google might warn searchers before sending them to a low quality search result, and give them a choice whether or not they might actually click through to such a page.

As it notes, the types of low quality pages the patent was supposed to address included parked domain pages, content farm pages, and link farm pages (yes, link farm pages):

“The processor 260 is configured to receive from a client device (e.g., 110), a request for a web page (e.g., 206). The processor 260 is configured to determine the content quality of the requested web page based on whether the requested web page is a parked web page, a content farm web page, or a link farm web page.

Based on the content quality of the requested web page, the processor is configured to provide for display, a graphical component (e.g., a warning prompt). That is, the processor 260 is configured to provide for display a graphical component (e.g., a warning prompt) if the content quality of the requested web page is at or below a certain threshold.

The graphical component provided for display by the processor 260 includes options to proceed to the requested web page or to proceed to one or more alternate web pages relevant to the request for the web page (e.g., 206). The graphical component may also provide an option to stop proceeding to the requested web page.

The processor 260 is further configured to receive an indication of a selection of an option from the graphical component to proceed to the requested web page, or to proceed to an alternate web page. The processor 260 is further configured to provide for display, based on the received indication, the requested web page or the alternate web page.”

This did not sound like a good idea.

Recently, Google announced in a post on the Google Webmaster Central blog post, Promoting modern websites for modern devices in Google search results, that they would start providing warning notices on mobile versions of sites if there were issues on those pages that visitors might go to.

I imagine that as a site owner, you might be disappointed seeing such warning notice shown to searchers on your site about technology used on your site possibly not working correctly on a specific device. That recent blog post mentions Flash as an example of a technology that might not work correctly on some devices. For example, we know that Apple’s mobile devices and Flash don’t work well together.

That’s not a bad warning in that it provides enough information to act upon and fix to the benefit of a lot of potential visitors. :)

But imagine if you tried to visit your website in 2011, and instead of getting to the site, you received a Google warning that the page you were trying to visit was a content farm page or a link farm page, and it provided alternative pages to visit as well.

That ” your website sucks” warning still doesn’t sound like a good idea. One of the inventors listed on the patent is described in LinkedIn as presently working on the Google Play store. The warning for mobile devices might have been something he brought to Google from his work on this Panda patent.

We know that when the Panda Update was released that it was targeting specific types of pages that people at places such as The New York Times were complaining about, such as parked domains and content farm sites. A follow-up from the Timesafter the algorithm update was released puts it into perspective for us.

It wasn’t easy to know that your pages might have been targeted by that particular Google update either, or if your site was a false positive—and many site owners ended up posting in the Google Help forums after a Google search engineer invited them to post there if they believed that they were targeted by the update when they shouldn’t have been.

The wording of that invitation is interesting in light of the original name of the Panda algorithm. (Note that the thread was broken into multiple threads when Google did a migration of posts to new software, and many appear to have disappeared at some point.)

As we were told in the invite from the Google search engineer:

“According to our metrics, this update improves overall search quality. However, we are interested in hearing feedback from site owners and the community as we continue to refine our algorithms. If you know of a high-quality site that has been negatively affected by this change, please bring it to our attention in this thread.

Note that as this is an algorithmic change we are unable to make manual exceptions, but in cases of high quality content we can pass the examples along to the engineers who will look at them as they work on future iterations and improvements to the algorithm.

So even if you don’t see us responding, know that we’re doing a lot of listening.”

The timing for such in-SERP warnings might have been troublesome. A site that mysteriously stops appearing in search results for queries that it used to rank well for might be said to have gone astray of Google’s guidelines. Instead, such a warning might be a little like the purposefully embarrassing “Scarlet A” in Nathaniel Hawthorn’s novel The Scarlet Letter.

A page that shows up in search results with a warning to searchers stating that it was a content farm, or a link farm, or a parked domain probably shouldn’t be ranking well to begin with. Having Google continuing to display those results ranking highly, showing both a link and a warning to those pages, and then diverting searchers to alternative pages might have been more than those site owners could handle. Keep in mind that the fates of those businesses are usually tied to such detoured traffic.

My imagination is filled with the filing of lawsuits against Google based upon such tantalizing warnings, rather than site owners filling up a Google Webmaster Help Forum with information about the circumstances involving their sites being impacted by the upgrade.

In retrospect, it is probably a good idea that the warnings hinted at in the original Panda Patent were avoided.

Google seems to think that such warnings are appropriate now when it comes to multiple devices and technologies that may not work well together, like Flash and iPhones.

But there were still issues with how well or how poorly the algorithm described in the patent might work.

In the March, 2011 interview with Google’s Head of Search Quality, Amit Sighal, and his team member and Head of Web Spam at Google, Matt Cutts, titled TED 2011: The “Panda” That Hates Farms: A Q&A With Google’s Top Search Engineers, we learned of the code name that Google claimed to be using to refer to the algorithm update as “Panda,” after an engineer with that name came along and provided suggestions on patterns that could be used by the patent to identify high- and low-quality pages.

His input seems to have been pretty impactful—enough for Google to have changed the name of the update, from the “High Quality Site Algorithm” to the “Panda” update.

How the High-Quality Site Algorithm became Panda

Danny Sullivan named the update the “Farmer update” since it supposedly targeted content farm web sites. Soon afterwards the joint interview with Singhal and Cutts identified the Panda codename, and that’s what it’s been called ever since.

Google didn’t completely abandon the name found in the original patent, the “high quality sites algorithm,” as can be seen in the titles of these Google Blog posts:

The most interesting of those is the “more guidance” post, in which Amit Singhal lists 23 questions about things Google might look for on a page to determine whether or not it was high-quality. I’ve spent a lot of time since then looking at those questions thinking of features on a page that might convey quality.

The original patent is at:

Processing web pages based on content quality
Inventors: Brandon Bilinski and Stephen Kirkham
Assigned to Google
US Patent 8,775,924
Granted July 8, 2014
Filed: March 9, 2012

Abstract

“Computer-implemented methods of processing web pages based on content quality are provided. In one aspect, a method includes receiving a request for a web page.

The method includes determining the content quality of the requested web page based on whether it is a parked web page, a content farm web page, or a link farm web page. The method includes providing for display, based on the content quality of the requested web page, a graphical component providing options to proceed to the requested web page or to an alternate web page relevant to the request for the web page.

The method includes receiving an indication of a selection of an option from the graphical component to proceed to the requested web page or to an alternate web page. The method further includes providing, based on the received indication, the requested web page or an alternate web page.

The patent expands on what are examples of low-quality web pages, including:

  • Parked web pages
  • Content farm web pages
  • Link farm web pages
  • Default pages
  • Pages that do not offer useful content, and/or pages that contain advertisements and little else

An invitation to crowdsource high-quality patterns

This is the section I mentioned above where I am asking for your help. You don’t have to publish your thoughts on how quality might be identified, but I’m going to start with some examples.

Under the patent, a content quality value score is calculated for every page on a website based upon patterns found on known low-quality pages, “such as parked web pages, content farm web pages, and/or link farm web pages.”

For each of the patterns identified on a page, the content quality value of the page might be reduced based upon the presence of that particular pattern—and each pattern might be weighted differently.

Some simple patterns that might be applied to a low-quality web page might be one or more references to:

  • A known advertising network,
  • A web page parking service, and/or
  • A content farm provider

One of these references may be in the form of an IP address that the destination hostname resolves to, a Domain Name Server (“DNS server”) that the destination domain name is pointing to, an “a href” attribute on the destination page, and/or an “img src” attribute on the destination page.

That’s a pretty simple pattern, but a web page resolving to an IP address known to exclusively serve parked web pages provided by a particular Internet domain registrar can be deemed a parked web page, so it can be pretty effective.

A web page with a DNS server known to be associated with web pages that contain little or no content other than advertisements may very well provide little or no content other than advertising. So that one can be effective, too.

Some of the patterns listed in the patent don’t seem quite as useful or informative. For example, the one stating that a web page containing a common typographical error of a bona fide domain name may likely be a low-quality web page, or a non-existent web page. I’ve seen more than a couple of legitimate sites with common misspellings of good domains, so I’m not too sure how helpful a pattern that is.

Of course, some textual content is a dead giveaway the patent tells us, with terms on them such as “domain is for sale,” “buy this domain,” and/or “this page is parked.”

Likewise, a web page with little or no content is probably (but not always) a low-quality web page.

This is a simple but effective pattern, even if not too imaginative:

… page providing 99% hyperlinks and 1% plain text is more likely to be a low-quality web page than a web page providing 50% hyperlinks and 50% plain text.

Another pattern is one that I often check upon and address in site audits, and it involves how functional and responsive pages on a site are.

The determination of whether a web site is full functional may be based on an HTTP response code, information received from a DNS server (e.g., hostname records), and/or a lack of a response within a certain amount of time. As an example, an HTTP response that is anything other than 200 (e.g., “404 Not Found”) would indicate that a web site is not fully functional.

As another example, a DNS server that does not return authoritative records for a hostname would indicate that the web site is not fully functional. Similarly, a lack of a response within a certain amount of time, from the IP address of the hostname for a web site would indicate that the web site is not fully functional.

As for user-data, sometimes it might play a role as well, as the patent tells us:

A web page may be suggested for review and/or its content quality value may be adapted based on the amount of time spent on that page.

For example, if a user reaches a web page and then leaves immediately, the brief nature of the visit may cause the content quality value of that page to be reviewed and/or reduced. The amount of time spent on a particular web page may be determined through a variety of approaches. For example, web requests for web pages may be used to determine the amount of time spent on a particular web page.”

My example of some patterns for an e-commerce website

There are a lot of things that you might want to include on an ecommerce site that help to indicate that it’s high quality. If you look at the questions that Amit Singhal raised in the last Google Blog post I mentioned above, one of his questions was “Would you be comfortable giving your credit card information to this site?” Patterns that might fit with this question could include:

  • Is there a privacy policy linked to on pages of the site?
  • Is there a “terms of service” page linked to on pages of the site?
  • Is there a “customer service” page or section linked to on pages of the site?
  • Do ordering forms function fully on the site? Do they return 404 pages or 500 server errors?
  • If an order is made, does a thank-you or acknowledgement page show up?
  • Does the site use an https protocol when sending data or personally identifiable data (like a credit card number)?

As I mentioned above, the patent tells us that a high-quality content score for a page might be different from one pattern to another.

The questions from Amit Singhal imply a lot of other patterns, but as SEOs who work on and build and improve a lot of websites, this is an area where we probably have more expertise than Google’s search engineers.

What other questions would you ask if you were tasked with looking at this original Panda Patent? What patterns would you suggest looking for when trying to identify high or low quality pages?  Perhaps if we share with one another patterns or features on a site that Google might look for algorithmically, we could build pages that might not be interpreted by Google as being a low quality site. I provided a few patterns for an ecommerce site above. What patterns would you suggest?

(Illustrations: Devin Holmes @DevinGoFish)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

How To Conduct A Winning Local Search Audit

Every week, our support team at BrightLocal fields numerous questions from our customers about how best to conduct a local search audit. The questions range from the type and quantity of data to include, to what processes and tools can [Read More]

Twitter Cards: A Quick Start Guide

Twitter cards are one social media marketing tool you can’t afford to ignore. By going beyond basic tweets to include videos, pictures, and other information, your business can integarate a transactional message into the conversation.

10 Social Media PR Lessons From #OITNB

With the big push in content marketing and the use of social media as a commonplace tactic for brands, rising above the noise is a challenge. With a little creativity, any brand can steal some inspiration from shows like “Orange is the New Black”.

How to Simplify Google Analytics Reporting

simple google analytics reportingWhen you look at all your Google Analytics data, does it make your head hurt?

Google Analytics is very useful but it includes an awful lot of information. Whenever you talk to anyone about Analytics, they always feel they should be doing more in this area.

Of course, Google engineers keep on adding functionality so, although it gets more useful, it can also get more complex with regular, new functionality that you need to figure out.

But sometimes less is more!

What I mean is that focussing on a couple of good pieces of information can be more valuable than having a sore head looking at lots of data all the time.

Head Hurting Syndrome (HHS) is a well known condition and it is particularly noticeable when someone is trying to figure out pages and pages of Analytics :-)

 

head hurting

 

 

So, instead of contracting this condition, you may instead want to think of ways of reducing the stress and anxiety and aim for an easier life.

Here are some ways to simplify your Google Analytics reporting:

 

1.  Use Quillengage to Simplify the Data

Quillengage is a completely free tool that you connect to your Google Analytics account.  It will retrieve all relevant data from Google Analytics, parse it, and present it in an easy-to-understand format.

You get initial statistics showing your overall performance comparing the current week to the previous week.

Next, you get paragraphs of text explaining what has happened over the previous week.  So, for example, it will say something similar to the following in your report:

“Your organic traffic went up by 10% last week compared to the previous week and your most popular post was ’5 Tools to Grow Instagram followers”.

This really helps with HHS (Head Hurt Syndrome) because you don’t have to parse all the data yourself.  This is all done automatically by Quillengage.

Here is an example of how a report looks:

 

QuillEngage

View an analysis of your Google Analytics data in an easy-to-understand format

 

You can see the overall data at the top, followed by text that explains what has happened with your website over the past week.

When you start using Quillengage, you’ll enjoy getting the reports because it gives you very relevant information and it’s as easy to read as an email.  Each week, you’ll get a comparison with previous weeks, so you’ll instantly know if you are making progress.

 

2.  Use Google Analytics Summary Reports

If you like looking at data but you want a report emailed to you on a weekly basis, you can set this up in Google Analytics.

As you go through the menu options on the left of the Google Analytics interface, you will see a reporting tab for every page.

 

Google analytics reporting option

The report tab is available on all screens

 

Based on the information displayed on that screen, you can create an e-mail report that is sent to you regularly.  To set this up, select the Email option under Reporting.

 

Google analytics email reports

Select the email option

 

When you select this option, you are presented with the following screen:

 

Google analytics email setup

Email setup for Google Analytics

 

On this screen, you can configure the following:

  • Frequency – Do you want it daily, weekly, monthly?
  • Format – You may want a CSV (text file with data separated by commas),  a PDF, Excel document etc.
  • Day of the week – The day of the week it should be sent, assuming you haven’t picked a daily report!
  • Recipients – You can specify multiple recipients.

If you are not happy with the standard report on the screen, you can customize the report to suit your needs.

So, consider setting up a PDF report that will give you more visual data than Quillengage provides.  Some people like the visual data, and some prefer text!

 

3.  Set up Custom Dashboards

Custom Dashboards in Google Analytics give you a configurable overview of how you are doing for specific areas of Analytics.  You can create custom dashboards yourself, or add a custom dashboard made and shared by someone else.

Here is an example of a couple of widgets that are available on a social media dashboard.  All of the widgets are about your social media activity.  You can have many widgets on a page and you can drag and drop them to reorganize them.

 

Custom Dashboard

Social Media Custom Dashboard

 

When you want to add a widget you click the ‘add widget’ button and then specify the type of information you want to include.

Here’s an article from Kristi Hynes which explains how to set up a custom dashboard – Click here.  And here’s one from Koozai about a range of custom dashboards you can click on and add to your page – Click here.

So, your dashboards can be a good way of providing a nice visual representation of how things are going.  You can have multiple dashboards for different areas also (e.g. an SEO dashboard, a social dashboard, etc.).  When you log into Google Analytics, it may be the first and only place you need to go.

4.   Pick out three stats

If none of the above work for you, just pick one to three statistics that are really important to your business and focus on these.

For example, one of them could be the goals achieved in the current week, compared to previous weeks.  If  you have set up goals in Google Analytics, and each goal is achieved when a product is purchased, then your most important statistic could be the goals achieved.

If the number of goals achieved is going down, you’ll need to investigate the Analytics in more depth.  If it’s going up all the time, maybe you don’t need to.

Of course, you should be looking at Analytics in detail, but if you don’t have time and if you’ve ignored it up to now maybe this will work better for you.

Summary

There is no shortage of Analytics data to look, at but that doesn’t mean to say that you need to look at it all!  You are better off simplifying your Analytics reporting and focussing in on the data that really matters.

How do you manage your reporting?  Which tools do you use? What data do you monitor?

I’d love to hear from you!

Ian

 

Analytics image by Shutterstock

The post How to Simplify Google Analytics Reporting appeared first on RazorSocial and was written by Ian Cleary