You are here:  » Robots.txt file


Robots.txt file

Submitted by Perce2 on Sat, 2018-03-31 10:16 in

Hi David,

have a question that I can't seem to find an answer to, hopefully you can advise.
Your normal PT install includes a robots.txt file containing the following:

User-agent: *
Disallow: /categories.php
Disallow: /brands.php
Disallow: /reviews.php
Disallow: /category/
Disallow: /brand/
Disallow: /review/
Disallow: /admin/
Disallow: /search.php
Disallow: /jump.php

Assuming a normal recommended install of PT on WP, where price-tapestry has been installed at wp-root/pt, what would your revised robots.txt now look like and what unwanted file/folders would you remove from the pt folder ?

Thanks.

Submitted by support on Tue, 2018-04-03 09:26

The equivalent would be;

Disallow: /productcategory
Disallow: /brand
Disallow: /review
Disallow: /shopping?pto_q=
Disallow: /pt/jump.php

(if your site is review focussed you might want to permit the /review path, the reason for the exclusion is to avoid duplicate content with the /product path...)

Cheers,
David.
--
PriceTapestry.com

Submitted by Perce2 on Tue, 2018-04-03 10:07

Hi David,

thanks for your reply, although I am a little confused.

What about the files / folders in the pt directory ? (as asked)
As I understand it, crawlers only take notice of the robots.txt file that is included at root, not from a sub-directory / folder.

Originally files / folders were excluded from being crawled with your included robots.txt file, but now as the /pt directory is not fully included at root, they will presumably setup duplicate content between the two installs. Would it not be better to either remove some of the unwanted files / folders from /pt or use Disallow: /pt/ in the new WP robots.txt ?

What is actually still needed in /pt for WP PTO to function correctly ?

Thanks.

Submitted by support on Tue, 2018-04-03 11:01

Hi,

You could do both;

Disallow: /pt/

(in which case no need for the disallow /pt/jump.php)

And you can cut the top level files in /pt/ to just:

/pt/config.php
/pt/config.advanced.php
/pt/jump.php

In addition, I would upload placeholder /pt/index.php:

<?php
?>

...and /pt/index.html:

<!-- -->

Cheers,
David.
--
PriceTapestry.com

Submitted by Perce2 on Tue, 2018-04-03 11:30

Thanks David, much appreciated.

Submitted by Perce2 on Tue, 2018-04-03 12:07

Hi David, back again!

Now I have the correct robots.txt in place, it has at last managed to remove about 90% of the duplicate page titles I was seeing. However I still seem to have quite a few left all attributed to /merchant/.

Would you have any idea what could be causing them ?
A couple of examples;

https://example.com/merchant/merchant-name
https://example.com/merchant/merchant-name/

Permalinks setup with no trailing forward slash.

Submitted by support on Wed, 2018-04-04 12:06

Hi,

First double check that there are no links being generated to the versions ending "/" - specifically, from the /shopping page go to Merchant A-Z and ensure that you have

https://example.com/merchant

...and then select a merchant, which should then be;

https://example.com/merchant/merchant-name

If that all looks OK, try adding to your WordPress .htaccess the following rule to make sure that any request to the old "/" versions is 301 (Moved Permanently) redirected to the new version:

RewriteRule ^merchant/(.*)/$ merchant/$1 [L,R=301]

Hope this helps!

Cheers,
David.
--
PriceTapestry.com