Duplicate Content & ECom

Embed Size (px)

Citation preview

  • 8/4/2019 Duplicate Content & ECom

    1/15

    Is your E-commerceSystem Harming Your

    Search Engine Rankings

    www.altruik.com

    Hamlet BatistaChie Search [email protected]

  • 8/4/2019 Duplicate Content & ECom

    2/15

    Copyright 2011 Altruik, Inc.

    1

    Table o Contents

    What Is Duplicate Content? 3

    How Duplicate Content Aects Your

    Search Engine Rankings 5

    How To Put An End To Duplicate Content

    So You Can Reclaim Your Ranking 8

    When Duplicate Content Is Not Really

    Duplicate At All 11

    Sound Like Too Much Manual Labor? 12

    Will You Prot From Addressing Duplicate

    Content Issues? 13

    Heres What You Should Do Next 14

    As an online retailer, your search engine

    strategy isyour business strategy. Have

    you noticed your search engine rankingsslipping away recently? Do you wonder what

    the cause might be? It is critical that every

    page selling your products ranks as highly as

    possible in search engines like Google. Thats

    why it is important that you optimize your site

    or search engine spiders, especially i you are

    using a CMS (content management system).

    There is a hidden dangeran issue that aects

    the majority o e-commerce websiteswhichmost business owners dont know about it

    until it is too late. That problem is duplicate

    content.

    I you have multiple copies o the same page,

    dierent URLs that point to the same content,

    and navigation systems that track your users,

    there is a good chance that you have an issue

    with duplicate content, too.

    Most Content Management Systems, as use-

    ul as they are, surprisingly are not designed

    with SEO in mind. Your CMS eatures tools

    that make nding products easier or visitors

    to your website. But those same eatures that

    duplicate product pages into multiple catego-

    ries oten make it dicult or Google to crawl,

    index, and rank all o the pages on your site.

  • 8/4/2019 Duplicate Content & ECom

    3/15

    Copyright 2011 Altruik, Inc.

    2

    Duplicate content causes serious problems

    because it:

    Weakens the rank o your most popular

    pages

    Sends Google on a wild goose chase, caus-

    ing it to abandon your site altogether

    Blocks large portions o your website rom

    getting indexed

    Prevents your most protable pages rom

    reaching the top o Googles rankings

    Cripples your best link-building eorts

    Duplicate content problems are like leaky au-

    cets. As more sites link to your duplicate URLs,

    the reputation and rank o your top-selling prod-

    uct pages go down the drain. Products that once

    ranked very high suddenly begin tumbling down

    the rankings, and your competition gains the

    upper hand. The question now is: how can you

    identiy duplicate content and patch up the leaks

    that are ruining your search engine rankings?

    Keep on reading because were going to teach

    you what most people dont know about the

    mess their CMS is leaving behind. Your priority

    is to patch these leaks beore they drown yourentire online business. With the right tools, you

    can build an even stronger search presence.

    I you have a duplicate content problem,

    huge portions o your website might not be in Googles index.

  • 8/4/2019 Duplicate Content & ECom

    4/15

    Copyright 2011 Altruik, Inc.

    3

    What Is DuplicateContent?

    First, the basics. Duplicate content

    is any page on the Internet that is

    either exactly the same or nearly

    identical to another page. Google

    compares the text o multiple pages

    to determine a match. I the writ-

    ten content is exactly the same, or

    almost exactly the same, Google

    considers the newer page to be du-plicate content.

    Most duplicate content is created

    when your CMS allows visitors

    (and Google) to access the same

    page rom dierent URLs. Lets say

    your online store has a category or

    shoes and another category or all

    products in the color black. The

    same pair o black shoes can be ac-

    cessed rom two dierent category

    combinations, one in which the user

    selects shoes rst, and another in

    which the user chooses black rst.

    These pages are almost identical. The URLs are dierent

    but lead you to the same product, Jessica Simpson Womens

    Leve Black Leather shoe. In each example, users selectedvarious categories in dierent orders and were able to ac-

    cess the same content via dierent paths.

  • 8/4/2019 Duplicate Content & ECom

    5/15

    Copyright 2011 Altruik, Inc.

    4

    Googles search engine robot crawls your web-

    site like a nosy visitor, ollowing each link or

    every category. It will nd the same page twice,

    once under one combination o categories,

    and another under the other. You dont actu-

    ally have two copies o the same page, but your

    CMS setup certainly makes it look like you do.

    Duplicate content is also created when:

    You use multiple subdomains. Google thinks

    you have duplicate content when you put

    the same page on http://example.com as

    you do on http://www.example.com. That

    www. makes a big dierence to Google.

    Your CMS creates separate pages or di-

    erent product colors. Google cant tell the

    dierence between an image o a blue shoe

    and a red shoe (it relies on textual descrip-

    tions). It will conclude that one o these

    pages is a duplicate.Your CMS dynamically

    generates pages as your users click on links.A good example o this is a calendar that

    creates a new page every time you click on

    the next month link.

    You, or people linking to your pages, add

    extra parameters to URLs (sometimes or

    tracking), creating multiple URLs that direct

    Google to the same page over and over

    again.

    As you can see, duplicate content can arise rom

    a variety o sources. Each o these is another

    leak in your aucet. It creates a number o nasty

    problems, both or Googles search engine robot

    and or other search engines. Sometimes it

    sends the robot on an endless chase that Google

    eventually abandons, and at other times, it

    simply dilutes the reputation o all your aected

    pages. When only a small portion o your site

    makes it into the search engine rankings, your

    overall ranking suers.

    Page reputation is diluted with the same con-

    tent is accessible through multiple URLs. You

    can recapture reputation and prevent duplicate

    content by consolidating non-canonical versions

    with 301 redirects. Source: Googles SEO Report

    CardGoogle Webmaster Central

    In the next section, well show you how dupli-cate content prevents your most protable pages

    rom making it to the top o Googles rankings.

    Glossary

    301 Redirect An HTTP status code. Automatically redirects users to a specic URL

    200 A successul request, content is returned

  • 8/4/2019 Duplicate Content & ECom

    6/15

    Copyright 2011 Altruik, Inc.

    5

    How Duplicate Content Aects Your Search EngineRankings

    Now were ready to see how duplicate content aects Googles impression o your content, wreak-

    ing havoc on your search rankings in the process.

    How duplicate content

    dilutes the ranking o your

    top-rated pages

    Lets say you just wrote a

    popular article that wentviral. Would you rather see

    the entire article getting a

    million views, or would your

    preer to split the article

    in two and assign 500,000

    views to each section? I you

    chose the ormer option,

    youre on the right track.

    As a single page receives

    more views, it increases thechances o receiving natural

    links. More people share the

    page, blog about it, and link

    to it.

    You want your website listed in the prime real estate o the results

    page. Splitting links will dilute rankings o your strongest pages.

  • 8/4/2019 Duplicate Content & ECom

    7/15

    Copyright 2011 Altruik, Inc.

    6

    When you have multiple versions o the same

    article, video, or page, Google splits your repu-

    tation between all o the pages. Your duplicate

    pages siphon o a large portion o your inbound

    links, and it takes longer or your article, video,

    or page to rank highly in Google. No matter

    how many links you get, some o them are go-

    ing down the drain.

    How duplicate content cripples your best

    link building eforts

    Consider another example. A Doggy Care

    website using a CMS creates two URLs or dog

    bone under the category ood and another

    under the category treats. To a search engine,

    the result is once again duplicate content.

    Thats only the hal o it. What happens when

    customers really like the dog bone and want

    to tell others about it? They link to it on their

    website. However, because there are two dier-

    ent pages created by the CMS or the same dogbone, they might link to either one o them.

    A product that would have received 100 links

    only receives hal that. The rest leak over to the

    duplicate page.

    When youre trying to rank highly in Google,

    you must avoid wasting your links and repu-

    tation on duplicate pages. I these duplicates

    make it into Googles index, they will almost

    certainly be ltered out o the rankings. 100% o

    your inbound links should go to the same page.

    In Googles eyes, that gives you 100% o the

    reputation.

    How duplicate content sends Google on a

    wild goose chase

    Another problem to consider can be even more

    tragic or your search rankings. What might

    happen i Google decides that your website is

    composed mostly o duplicate content? The

    short answer is that it will stop indexing your

    pages and move on to other websites. Here is

    what Matt Cutts, head o Googles Webspam

    team, has to say about duplicate content:

    Imagine we crawl three pages from a site, and

    then we discover that the two other pages were

    duplicates of the third page. Well drop two out of

    the three pages and keep only one, and thats why

    it looks like it has less good content. So we might

    tend to not crawl quite as much from that site

    [T]he fact that you had duplicate content and

    we discarded those pages meant you missed

    an opportunity to have other pages with good,

    unique quality content show up in the index.

    There are a number o scenarios in which

    Googles robot will give up crawling your web-

    site, leaving vast numbers o pages completely

    out o the index and your site fagged as mostly

    spam. Here are some o the most common

    caused by your CMS:

  • 8/4/2019 Duplicate Content & ECom

    8/15

    Copyright 2011 Altruik, Inc.

    7

    Your CMS creates a calendar that generates

    a new page or a new month every time you

    click on the next month link. Because your

    website keeps generating a new link every

    time Googlebot ollows the next month

    link, Googlebot keeps ollowing this link as

    long as it can and eventually times out.

    Your website eatures a guided navigation

    shopping cart with categories or dierent

    brands and types o products. Because the

    products and categories are linked to each

    other (oten in very complex ways), Google-bot keeps ollowing the links in circles until it

    times out.

    Your website uses a session ID in its URLs

    to track users who have cookies disabled

    (jsessionid is a common example o an

    in-URL session ID that gets indexed as du-

    plicate content). I these IDs are present in

    the path_ino portion o your URL, they are

    particularly dangerous.

    This last one can be particularly nasty. When a

    search engine bot crawls the site, it acts like a

    user with browser cookies disabled. Each time

    Googlebot requests a page, it is given a new page

    with a new jsessionid. This quickly causes the

    bot to see millions o pages that are identical,

    diering only in the URLan innite space that

    Googlebot treats as duplicate content.

    Once Googlebot understands that is going in

    circles (or down an endless drain like the calen-

    dar example), it concludes that your site is com-

    posed mostly o duplicate content and stops

    crawling your website. This is a very bad thing,and it can cause large portions o your site to go

    unnoticed. You can make vast improvements in

    your search engine rankings by tackling just this

    problem alone.

    Now that you understand how duplicate con-

    tent can harm your search engine rankings,

    we want to show you what you can do to stop

    your CMS rom creating so much o it. Youll be

    happy to know that all o these problems can be

    solved, and you can use automated tools to help

    you handle most o them.

    Source: Google's SEO Report CardGoogle Webmaster Central

  • 8/4/2019 Duplicate Content & ECom

    9/15

    Copyright 2011 Altruik, Inc.

    8

    How To Put An End To Duplicate Content So You CanReclaim Your Ranking

    As you have already seen, duplicate content problems happen all on their own. I you dont do

    something to address them beore they aect your ranking, your competitors will gain the edge.

    There are solutions to duplicate content problems and well take a look at the how to solve the

    most dangerous ones.

    Is Your Content Accessible From Multiple

    Subdomains?

    As we discussed earlier, when your website

    is accessible rom multiple subdomains (or

    example, both example.com and www.ex-

    ample.com), Google treats the content on one

    o the subdomains as a duplicate. It can also

    happen when your CMS uses multiple URLs to

    point to the same content. I Google ollows

    the link http://

    www.dogtoys.com/

    chewybone.php

    and http://www.dogtoys.com/bones/

    chewybone.php

    to the same page,

    Google will index a

    duplicate page or

    one o the URLs.

    But the x is rela-

    tively easy. You just

    need to tell Google

    which subdomain contains the original source

    material. There are two ways to do this:

    1. Implement 301 redirects to send people to the

    right subdomain with the original content.

    2. Or use the Googles Webmaster Tools to

    choose which domain contains the original

    content. This process is sometimes called

    canonicalization.

    When this page is selected in the search engine results page, us-

    ers are automatically directed to the canonical URL.

  • 8/4/2019 Duplicate Content & ECom

    10/15

    Copyright 2011 Altruik, Inc.

    9

    Once you have indicated to Google where it can

    fnd the original content, it will no longer index

    your subdomain. Because your primary (canoni-

    cal) page will be the only page that can get links

    and reputation, it will start perorming much

    better in the search engine rankings. Congratu-

    lations: youve just fxed one o the leaks in your

    aucet.

    Are some pages near duplicates o others?

    What to do when your product descriptions

    only difer by ew words.

    This problem usually aects online retailers

    who sell many dierent versions o the same

    product. Perhaps you sell a golden chocolate

    basket, a silver chocolate basket, and a bronze

    chocolate basket. I the only dierence between

    one product description and the next is the

    color or the image, you need to indicate this to

    Google so that it does not conclude that you

    have duplicate content.

    You can do this by using the rel=canonical

    link tag on the pages with the near-identical

    content. Make sure you place this tag some-

    where in section o these near-duplicate

    pages, just as you would with meta tags. Heres

    an example.

    Whenever you use this tag, you are telling

    Google that the current page is either a dupli-cate or a near-duplicate, and the original page

    can be ound at the address you have specied.

    Do your URLs contain extra parameters or

    tracking and sorting? They might accidentally

    convince Google that you have a duplicate

    content problem.

    Some shopping carts add parameters to your URLs or the purposes o sorting, dividing products into

    pages by category, and tracking users. Googles search engine robot unwittingly ollows all o these

    URLs, and it keeps fnding more duplicate content. I you dont tell Google which parameters to ignore,

    Googlebot will keep spinning its proverbial wheels. Heres what you can do:

  • 8/4/2019 Duplicate Content & ECom

    11/15

    Copyright 2011 Altruik, Inc.

    10

    How to stop Google rom going on a wild

    goose chase.

    Sometimes Google nds large sections o your

    website that contain links to pages with no

    original content. This is called the innite

    space problem because Googlebot gets stuck

    in these sections, continually crawling the

    same series o dynamically generated pages or

    URLs with session IDs and tracking parameters,

    over and over again. As we discussed, oten the

    culprit is the jsessionid parameter. Thankully,

    there is a way to stop it.

    Google knows about the innite space prob-

    lem, and will tell you i your website has this

    issue when you log in to Google Webmaster

    Tools. Specically, it will list which links lead

    to an innite space, and oers a ew tips topatch things up.

    Once youve ound the links that lead to an

    infnite space, do one o the ollowing:

    Set the rel attribute in the suspicious link

    to noollow. When you do this, your new

    link should look like the ollowing:

    < a hre=http://www.calendar.com/nextmonth.php

    rel=noollow>next month

    Block the innite space URLs in your robots.

    txt le.

    Make it impossible or search engines to ex-

    tract these URLs. You can do this by hiding

    them within JavaScript.

    Now that you have the tools to clean up dupli-

    cate content, in the next section well consider

    a ew important cases where duplicate content

    is not only acceptable, but necessary.

    Google Web-

    master Tools

    allows users to

    defne what pa-

    rameters Google

    should ignore

    when crawling

    a website.

  • 8/4/2019 Duplicate Content & ECom

    12/15

    Copyright 2011 Altruik, Inc.

    11

    Sometimes you end up with exact duplicatepages or legitimate reasons. This is no crime,

    o course, but it does require you to let Google

    know so that your site may be indexed appropri-

    ately by the search engine robot. It also pre-

    vents your website rom being fagged as mostly

    duplicate content. Heres the x:

    Use a 301 redirect i you have duplicatepages that just cant be avoided.

    Using a 301 redirect not only sends your users

    to the canonical page, it also tells Google that

    the page is an exact or near duplicate. Google

    continues to crawl your site because you are no

    longer using up its bandwidth unnecessarily.

    There are also two minor cases worth under-

    standing where duplicate content can actually

    help your rankings. Keep in mind, these are very

    specic and do not apply to every website.

    You dont have to worry about localizedcontent on international domains.

    What happens when you host the same content

    on dierent regional servers and international

    domains? For example, suppose you copy the

    same content on http://www.example.com to

    your local servers at http://www.example.r.

    Will the content make it into the search engineresults page abroad, or will also be deemed du-

    plicate content?

    In this case, there is nothing to worry about.When hosted on dierent international do-

    mains, search engines like Google do not con-

    sider the same content as duplicate content.

    That said, the issues concerning subdomains

    that we discussed previously also apply to your

    international websites. That means you will

    have to go through the time-consuming task o

    canonicalizing your URLs so that they all point

    to the same international pages, just like you

    did on your home website domain.

    Sometimes you dont need to consolidate your

    duplicate pages. Heres how to know when...

    As youve learned, in most cases it is benecial

    or a single page to garner the highest possible

    rank. Ater all, i this page eatures one o your

    bestselling products, you are practically guaran-

    teed more sales. But there is one case when us-ing canonical tags and giving all o your reputa-

    tion to a single page isnt the best idea.

    When your visitors really care about your

    products attributes (e.g. the products color),

    it might be smart to separate your pages. Lets

    return to the example about shoes. I your

    online store oers the same shoe in multiple

    colors, and you have ound that customers are

    specically searching or products in the colorturquoise, you might benet rom treating

    each color o the product as a separate page.

    When Duplicate Content Is Not Really Duplicate At All

  • 8/4/2019 Duplicate Content & ECom

    13/15

    Copyright 2011 Altruik, Inc.

    12

    Both your shoes and your turquoise pages

    will get trac rom color-based searches. Your

    competitors are probably doing the same thing.

    Whenever you separate your pages, you need

    to make them stand on their own. Your product

    page or the turquoise shoes must be distinct

    enough rom the page or the black shoes to

    pass Googles duplicate content lter. Otherwise,

    Google will not rank the page at all. It is not

    enough to swap out a ew words and reorganize

    paragraphs to create a new description. Google

    is too smart or that. Youll need to rewrite eachnew product description rom scratch.

    Once again, it bears repeating that this is an ex-

    ceptional case. You must really understand your

    customers, and more importantly, pay attention

    to their search behavior.

    I your customers are not typically searching

    or dierent variations o the same product, it

    is sae to use canonical tags and consolidate

    duplicate content. But i they usually search

    or items by their color, size, weight, etc., you

    should keep the pages separate and write new

    descriptions to individualize the content.

    It doesnt matter i you own one website or

    many websites on several international do-

    mains. By now, you have the knowledge to

    understand and tackle the problem o duplicate

    content. However, you have probably realized

    just how time-consuming the process o consol-

    idating your content can be. Do you really want

    to go through every duplicate or near-duplicate

    page, every subdomain, and every extra param-

    eter in your URLs?

    You are a businessperson, so like us your an-

    swer will be an armative NO! You have better

    ways o spending your valuable time. Luckily

    or you, we developed our Lighthouse sotware

    originally to solve our own duplicate content

    problems. We were slaving away, consolidating

    content or one o our clients, and we sim-

    ply grew tired o the whole process. You can

    manually implement only so many 301 redirects

    beore you start thinking, There has to be a

    better way.

    Lighthouse does everything weve discussed so

    ar, and it does a ew more things beyond the

    scope o this paper. Here is a quick rundown:

    Sound Like Too Much Manual Labor? Theres Good News.Most O It Can Be Automated.

  • 8/4/2019 Duplicate Content & ECom

    14/15

    Copyright 2011 Altruik, Inc.

    13

    Automated 301 redirects and

    rel=canonical tags. Lighthouse spots your

    duplicate pages and automatically imple-

    ments 301 redirects and rel=canonical tags.

    Automated robots.txt analysis. Lighthouse

    nds and corrects problems with sitemap

    accessibility, innite spaces, and crawl

    delays.

    We understand how all o this can seem like

    a huge project at rst. Thats why wed like to

    show you a way to measure the direct business

    benet youll get rom tackling each o these

    issues head on. In the next section, youll learn

    what you need to know beore you decide to

    launch an all-out assault on your websites du-

    plicate content.

    Its one thing to suspect you have a problem.

    Its quite another to know the severity o the

    problem and identiy where it is located. You

    wouldnt x a aucet that isnt leaking, so why

    would you tackle a duplicate content problem

    that is practically nonexistent? We want to

    show you how to measure the direct business

    benet youll get rom patching up the leaks

    your CMS leaves behind. It works wonders orus, and we are sure it will or you too.

    Step one: establish a baseline or measure-

    ment. First, determine how many pages your

    site has. Add up your product pages, category

    pages, and ancillary pages. The total number is

    your real number o site pages. Consider two

    key monthly metrics: 1) Revenue per page (total

    site revenue pages indexed in Google) and 2)

    Searches per page (total search clicks to your

    site pages indexed in Google).

    Step two: implement the change and wait. Fix

    your duplicate content issues, or hire a proession-

    al to do it or you. Then sit back and wait at least

    one month beore you make another measure-

    ment. Sometimes it takes a while beore Google

    returns to crawl the extra pages on your site.

    Step three: look or an increase in active pages

    and trafc. What you measure next dependson your goals. I you are looking primarily or

    increased revenue, as we all are, you want to

    see an increase in the number o pages indexed.

    Compare your revenue per page beore and ater

    the duplicate content fx. The second metric to

    consider is search clicks per page. Youll notice an

    increase here i your site suered rom duplicate

    pages that divided your audience and your links,

    reducing your primary page reputation in Google.

    I all went well, your canonical pages will rank

    higher, and as they perorm better youll also in-

    Will You Prot From Addressing Duplicate Content Issues?Heres a Surere Way to Know.

  • 8/4/2019 Duplicate Content & ECom

    15/15

    Copyright 2011 Altruik, Inc.

    crease your revenue. You should see an increase

    in the number o unique pages receiving regular

    search trac as well as an overall increase in

    trac to your website. This increase usually

    happens because Google crawled more o your

    website and more o your pages made it onto

    the search engine results page.

    By now you realize that duplicate content prob-

    lems happen all on their own, and it is up to you

    to stop them beore you lose your rankings to

    the competition. Even i you take care o every

    piece o duplicate content today, you still will

    have to deal with it periodically in the uture.The more content you add to your website, the

    more likely duplicate pages will pop up. Its nice

    to have a way to constantly keep it in check.

    Most business owners wait until their next web-

    site redesign to start tackling their duplicate

    content problems, but this approach comes at

    a huge cost. Each low-ranking page amounts

    to customers who never

    made it to your store.

    Can you really aord

    to lose a single sale

    between now and your

    next redesign?

    Automatic duplicate

    content management

    is the only solution that

    makes sense. When youallow our Lighthouse sot-

    ware to consolidate your

    content as you create it, your pages start to rank

    better right out o the gate. You dont have to stop

    what you are doing to handle a situation that can

    easily get out o control. Its something we like to

    call peace o mind.

    I you are interested in ridding your site o dupli-

    cate content problems or good, we encourage

    you to give us a call. Well tell you more about

    Lighthouse and how you can use it to take care

    o your duplicate content automatically. Why go

    through page ater page when sotware can do

    all the dirty work? We created Lighthouse be-

    cause youve got better things to do.

    Heres What You Should Do Next