Duplicated product URL keys in Magento Community Edition

Recently, we have noticed an interesting issue in Magento Community Edition and decided to share our findings with our readers. There is no restriction on the product URL key duplicates in the Community Edition, so, multiple products can have the same URL key and the product URL path generation should handle it. This article is about how the URL path generation manages the duplicated URL keys, the issues that appear during that process and how to cope with all these problems.

It turned out that the duplicated URL keys can appear when there are products with the similar or the same names created. Magento creates the URL keys automatically, basing on the product name, and if this URL key was not manually set.

First of all, let’s create two products with the same URL key “test-duplicated-url-key”. In our sample data environment they get IDs 173 and 174. Let’s call the ID 173 the ORIGINAL and ID 174 the DUPLICATE further in this article. If we try to reach the ORIGINAL page, it will be found in www.magento-test-store.com/test-duplicated-url-key.html and the DUPLICATE will be located in www.magento-test-store.com/test-duplicated-url-key-174.html. The URL path for the DUPLICATE was generated by adding the product’s ID to the URL key, and it is a good solution. The thing is that the URL path generator has checked that there was another product with the “test-duplicated-url-key.html” path and solved the problem.

Now, let’s assume that we have made some changes in our www.magento-test-store.com and decided to make a reindex, including the URL rewrites reindex. After reindexing we will see that DUPLICATE’s URL path has been changed. However, these things can be noticed in a significant amount of time, after many reindexes and in case, we try to check the core_url_rewrite table for the ORIGINAL’s and the DUPLICATE’s (we have made 2 reindexes in this example):

2014-10-31_1504

So, the ORIGINAL currently has three URL rewrites (one per store) because it was not assigned to any category. But the DUPLICATE has already nine rewrites (one system rewrite per store and two custom rewrites per store as there were two reindexes). What happened? The URL path generator checked the DUPLICATE’s URL key and founded that the respective URL path is occupied, and even the “test-duplicated-url-key-174.html” is occupied too. So, it simply created a new one, and then created a custom redirect from the old URL to the NEW one. And as a result:

1) The DUPLICATE’s URL path is being changed on each reindex, which is not good for Magento SEO.

2) The core_url_rewrites table is increased for url_key_duplicate x stores number every reindex.

3) And the reindex process time is being increased in geometrical progression.

Even more problems will appear if URL keys ending with digits.

What is the solution? You may create an observer on the product save, which will add something unique to the product URL key if such URL key is already created in the database for the new products. But, what will you do if the problem is present on the website at the current moment, and the rewrites table already contains millions of records? That is the reason why we have developed a shell script, which will update the duplicated URL keys with their product’s SKUs and remove the unnecessary rewrites from the database. Here is the code of that script:

<?php

require_once 'abstract.php';

class Atwix_Shell_Rewrites_Doctor extends Mage_Shell_Abstract
{
    const PAGE_SIZE = 1000;
    const LOG_MESSAGE_ROWS = 100;

    public function run()
    {
        if($left = $this->getArg('remove_rewrites')){
            $this->clearExtraRewrites($left);
        } elseif($this->getArg('update_keys')) {
            $this->updateDuplicatedKeys();
        } elseif($this->getArg('remove_sku_spaces')) {
            $this->removeSkuSpaces();
        } else {
            echo $this->usageHelp();
        }
    }

    /**
     * Update duplicated url keys by adding product SKU to the duplicated key
     */
    public function updateDuplicatedKeys()
    {
        try {
            $counter = 0;
            $logMessage = '';
            $start = time();
            $storeId = Mage::app()->getStore()->getId() . PHP_EOL;

            //url key attriubte load for further use
            $entityType = Mage::getModel('eav/entity_type')->loadByCode('catalog_product');
            $attributes = $entityType->getAttributeCollection()
                ->addFieldToFilter('attribute_code', array('eq' => 'url_key'))
            ;
            $urlKeyAttribute = $attributes->getFirstItem();
            $urlKeyAttributeTable = $attributes->getTable($entityType->getEntityTable());

            //loading collection with number of duplicated url keys
            $duplicatesCollection = Mage::getModel('catalog/product')->getCollection();
            $duplicatesCollection->getSelect()
                ->joinLeft(
                    array('url_key' => $urlKeyAttributeTable . '_' . $urlKeyAttribute->getBackendType()),
                    'e.entity_id' . ' = url_key.entity_id AND url_key.attribute_id = ' . $urlKeyAttribute->getAttributeId() . ' AND url_key.store_id = ' . $storeId,
                    array($urlKeyAttribute->getAttributeCode() => 'url_key.value')
                )
                ->columns(array('duplicates_calculated' => new Zend_Db_Expr ('COUNT(`url_key`.`value`)')))
                ->group('url_key.value')
                ->order('duplicates_calculated DESC')
            ;

            foreach($duplicatesCollection as $item) {
                if($item->getData('duplicates_calculated') > 1) {
                    //loading product ids with duplicated url keys
                    $duplicatedUrlKey = $item->getData('url_key');
                    $productCollection = Mage::getModel('catalog/product')->getCollection()
                        ->addAttributeToSelect('url_key')
                        ->addAttributeToFilter('url_key', array('eq' => $duplicatedUrlKey))
                    ;
                    $ids = $productCollection->getAllIds();

                    foreach($ids as $id){
                        try {
                            //update product url key
                            $product = Mage::getModel('catalog/product')->load($id);
                            $sku = $product->getData('sku');
                            $urlKey = $product->getData('url_key');
                            $product->setData('url_key', $urlKey . '-' . strtolower(str_replace(' ', '-', $sku)));
                            $product->save();
                            $counter++;
                            $message = 'Product id# ' . $product->getId() . ' "' . $product->getName() . '" ' . ' url key was changed from "' . $urlKey . '" to "' . $product->getData('url_key') . '"' . PHP_EOL;
                            $logMessage .= $message;
                            //log will be update with the packs of messages
                            if($counter % self::LOG_MESSAGE_ROWS == 0) {
                                Mage::log($logMessage, null, 'atwix_rewrites_doctor.log', true);
                                $logMessage = '';
                            }
                            echo $message;
                        } catch (Exception $e) {
                            echo $e->getMessage() . PHP_EOL;
                            Mage::log($e->getMessage(), null, 'atwix_rewrites_doctor.log', true);
                        }
                    }
                } else {
                    //we will break the cycle after all duplicates in query were processed
                    break;
                }
            }

            if($counter % self::LOG_MESSAGE_ROWS != 0) {
                Mage::log($logMessage, null, 'atwix_rewrites_doctor.log', true);
            }

            $end = time();
            $message = $counter . ' products were updated, time spent: ' . $this->timeSpent($start, $end);
            Mage::log($message, null, 'atwix_rewrites_doctor.log', true);
            echo $message . PHP_EOL;

        } catch (Exception $e) {
            echo $e->getMessage() . PHP_EOL;
            Mage::log($e->getMessage(), null, 'atwix_rewrites_doctor.log', true);
        }
    }

    /**
     * Remove extra product url rewrites leaving $left of last
     *
     * @var $left
     */
    public function clearExtraRewrites($left)
    {
        try {
            $start = time();
            //Get product collection
            $productCollection = Mage::getModel('catalog/product')->getCollection();
            $productCollection->setPageSize(self::PAGE_SIZE);
            $pages = $productCollection->getLastPageNumber();
            $currentPage = 1;
            $counter = 0;

            while($currentPage <= $pages) {
                $productCollection->setCurPage($currentPage);
                $productCollection->load();

                $ids = $productCollection->getAllIds();
                foreach($ids as $id) {
                    //get rewrites collection for current product id
                    $urlRewritesCollection = Mage::getModel('core/url_rewrite')->getCollection()
                        ->addFieldToFilter('product_id', array('eq' => $id))
                        ->addFieldToFilter('is_system', array('eq' => '0'))
                        ->setOrder('url_rewrite_id', 'DESC')
                    ;
                    $urlRewritesCollection->getSelect()->limit(null, $left);

                    foreach($urlRewritesCollection as $urlRewrite) {
                        try {
                            $urlRewrite->delete();
                            $counter++;
                        } catch(Exception $e) {
                            echo "An error was occurred: " . $e->getMessage() . PHP_EOL;
                            Mage::log($e->getMessage(), null, 'atwix_rewrites_doctor.log', true);

                        }
                    }
                }

                echo $counter . " URL rewrites were deleted" . PHP_EOL;
                $currentPage++;
                $productCollection->clear();
            }

            $end = time();
            $message = 'Total URL rewrites deleted: ' . $counter . ', time spent: ' . $this->timeSpent($start, $end);
            Mage::log($message, null, 'atwix_rewrites_doctor.log', true);
            echo $message . PHP_EOL;

        } catch (Exception $e) {
            echo "An error was occurred: " . $e->getMessage() . PHP_EOL;
            Mage::log($e->getMessage(), null, 'atwix_rewrites_doctor.log', true);
        }
    }

    public function timeSpent($start, $end)
    {
        $seconds = $end - $start;
        $hours = floor($seconds / 3600);
        $mins = floor(($seconds - ($hours*3600)) / 60);
        $secs = floor($seconds % 60);

        return $hours . ' hours ' . $mins . ' minutes ' . $secs . ' seconds';
    }

    /**
     * Retrieve Usage Help Message
     *
     */
    public function usageHelp()
    {
        return "
\n
\n Usage:  php -f fix_attributes -- [options]
\n
\n    --remove_rewrites number    Remove old product rewrites, leaving the 'number' of last ones
\n    update_keys                 Update duplicated product keys
\n    remove_sku_spaces           Remove space from all product SKU's if they are present
\n
\n    help                        This help
\n
";
    }
}

$shell = new Atwix_Shell_Rewrites_Doctor();
$shell->run();

Just put the script with the code above to your_magento_root/shell/rewrites_doctor.php. Moreover, you can see the “usage help” if you attempt to run the script. First of all, you will need to update the duplicated URL keys. So, type this in your shell, if you are already in your Magento installation root folder:

php shell/rewrites_doctor.php update_keys

After the script finished its task, perform the reindex to create new URLs of the modified product URL keys:

php shell/indexer.php --reindex url_redirect

Furthermore, when it is done, you can remove the unnecessary URL rewrites leaving the specified number of the most recent ones with this shell command:

php shell/rewrites_doctor.php --remove_rewrites 4

If we were performing reindex weekly, then 4 last rewrites would spare us the URLs of last month, which should be enough for the crawlers.

Summing it up, both script procedures can be repeated if necessary and they can be run separately due to your requirements. These methods are quite time consuming, especially the URL keys updating since it is using the EAV structure. From the other side, all code is written in Magento way and should not use much server resources, so it can be run safely. All the changes are added to the log file of your_magento_root/var/log/atwix_rewrites_doctor.log.

We hope that this information and script will save your time in resolving the described problem. You are welcome to add your reviews in the comments below. Thanks!

You may also want to read: