How to Noindex parts of a web page?

There is many ways to tell the google bot not to index pages:

Using the Allow Directive in the robot.txt
User-Agent: gsa-crawler
Disallow: /folder1/
Allow: /folder1/myfile.html

Using Robots META Tags to Control Access to a Web Page
Using .htaccess
User-agent: *
Disallow: /tralllala

But how does it work, if you just want part from your pages not get craweled?

Excluding Unwanted Text from the Google Index is still not as easy as it should be.

Here is a video answer from Matt Cutts to this question:




Google still doesn’t offer any solution that allows us to exclude parts from pages from geting indexed, using noindexed iframes is too not a real solution.

So how to Noindex parts of a web page?

It would be cool if there would be such a solution like Google offers this for the Google Search Appliance there you can exclude part from pages from geting indexed.

The Google Search Appliance supports “googleon” and “googleoff” tags, special proprietary HTML tags that can be embedded in the HTML of crawled documents to prevent searching of text between these special tags.

The googleoff/googleon tags disable the indexing of a part of a web page. The result is that those pages do not appear in search results when users search for the tagged word or phrase. For example, some customers use googleoff/googleon tags to comment out a navigation bar in static HTML pages.

You can use googleon/off to tell the Google Search Appliance to ignore portions of a page. Insert at the point you want the Google Search Appliance to stop indexing, then insert where you want it to resume indexing the page.

GoogleOn and GoogleOff tags (which may see live on adobe.com) will be ignored by regular Google spiders or other search engines. They make sense only when used in conjunction with Google Search Appliance or possibly, Google Mini.

http://code.google.com/apis/searchappliance/documentation/46/admin_crawl/Preparing.html

I think google should offer such GoogleON and GoogleOf Tags not just for the Google Search Appliance, these Tags should work for the normal Googlebot too. This would help to get Webpages indexed more proper.

Website Architecture

The Website Architecture – SEO folks know that once a Web portal get up to more than thousands ore even hundred thousands or million of pages it gets difficult to bulk of the juice to the most valuable pages of the project or most profitable product content pages.

Here is some speech from Dr. Pete with Rand Fishkin, my thanks goes to SEOmoz for this good Infos.

SEOmoz Whiteboard Friday – Architecture for Commerce with Dr. Pete from Scott Willoughby on Vimeo.

How can I make sure Google reaches my deeper pages?

How can I make sure Google reaches my deeper pages? Pai from Portugal asks Matt Cutts (Google): “How can I make sure that Google reaches and indexes pages that are on a lower (deeper) level of a website?” Here Matt Cutts answer:

SEOfriendly Website Architecture

How to develop a SEOfriendly Website Architecture?




My contact details: Ortwin Oberhauser . 6900 Bregenz . Austria. Mobile.: 0043 (0)664 7501 3618 . Email: ortwin@oberhauser.at