Just spotted that a lot of pages of this site have ended up confined to the supplemental results of Google, effectively being dropped from the main results.
This is an annoying feature of Google in particular in that it does really know how to deal properly with sessions (a way of storing information, such as login status, when you are on a site). Basically every time Googlebot visited it was seeing a different URL and thus assuming that there were hundreds of similar pages on the site.
As a quick fix I have simply added the following to my .htaccess file:
php_value session.use_trans_sid 0
php_value session.use_only_cookies 1
This forces PHP to use cookies to store the session and removes session support where cookies are not enabled. Not really the best solution, but on that will do for now at least. A better approach would be to do something like this on the pages instead:
$spiders = array(”Googlebot”,”WebCrawler, “etc etc”);
$from_spider = FALSE;
foreach($spiders as $Val)
{
if (eregi($Val, $_SERVER["HTTP_USER_AGENT"]))
{
$from_spider=TRUE;
break;
}
}
// Session
if(!$from_spider)
session_start();
This would then only start the session if the user_agent was not a recognised search engine spider. This is effectively cloaking (usually seriously frowned upon), but is one of the few legitimate uses for it.
The second solution is what I would use on a commercial site, but search engine results are not as important for this one so I have just gone for the quick fix.












