Multipage sitemap.xml Magento module

Magento already has a built-in Google sitemap generator. Nevertheless, there is one problem – it is generating a single file which includes all the category, product and CMS page links. But, due to the Google specifications it should not contain more than 50 000 items. So, if your shop is large enough, then your XML sitemap file will be quite big and will cause the error from the crawlers site. That is why, we decided to write an article that describes the extension which generates a sitemap with the index file and numerous sub-files for each type of the pages (products, categories, CMS).

Let’s start. First of all, we need to inform Magento about our new extension (note, that the current and further file locations are specified at the top of the file):

<?xml version="1.0"?>
<!--

Location:
magento_root/app/etc/modules/Atwix_Sitemap.xml

-->
<config>
    <modules>
        <Atwix_Sitemap>
            <active>true</active>
            <codePool>local</codePool>
        </Atwix_Sitemap>
    </modules>
</config>

Next what we need is a config file. It is pretty simple, due to our needs – so, let’s check the full code. The proper sections will be described with the corresponding extension parts:

<?xml version="1.0"?>
<!--

Location:
magento_root/app/code/local/Atwix/Sitemap/etc/config.xml

-->
<config>
    <modules>
        <Atwix_Sitemap>
            <version>1.0.0</version>
        </Atwix_Sitemap>
    </modules>
    <global>
        <models>
            <atwix_sitemap>
                <class>Atwix_Sitemap_Model</class>
            </atwix_sitemap>
            <sitemap>
                <rewrite>
                    <sitemap>Atwix_Sitemap_Model_Sitemap</sitemap>
                </rewrite>
            </sitemap>
        </models>
        <helpers>
            <atwix_sitemap>
                <class>Atwix_Sitemap_Helper</class>
            </atwix_sitemap>
        </helpers>
    </global>
    <adminhtml>
        <acl>
            <resources>
                <all>
                    <title>Allow Everything</title>
                </all>
                <admin>
                    <children>
                        <system>
                            <children>
                                <config>
                                    <children>
                                        <atwix_sitemap>
                                            <title>Atwix Sitemap</title>
                                        </atwix_sitemap>
                                    </children>
                                </config>
                            </children>
                        </system>
                    </children>
                </admin>
            </resources>
        </acl>
    </adminhtml>
</config>

So, to begin with, we need to provide a possibility of enabling and disabling our extension and set the limit of the items per sub-file. This goal can be reached through the Magento admin panel settings. Here we’ll just add a section to the System Configuration. Furthermore, the corresponding code in the config.xml specifies the proper ACL resource:

        <acl>
            <resources>
                <all>
                    <title>Allow Everything</title>
                </all>
                <admin>
                    <children>
                        <system>
                            <children>
                                <config>
                                    <children>
                                        <atwix_sitemap>
                                            <title>Atwix Sitemap</title>
                                        </atwix_sitemap>
                                    </children>
                                </config>
                            </children>
                        </system>
                    </children>
                </admin>
            </resources>
        </acl>

The system.xml will have only two settings so far:

<?xml version="1.0"?>
<!--

Location:
magento_root/app/code/local/Atwix/Sitemap/etc/system.xml

-->
<config>
    <tabs>
        <atwix translate="label" module="atwix_sitemap">
            <label>Atwix Extensions</label>
            <sort_order>100</sort_order>
        </atwix>
    </tabs>
    <sections>
            <atwix_sitemap translate="label" module="atwix_sitemap">
                <class>separator-top</class>
                <label>Atwix Sitemap</label>
                <tab>atwix</tab>
                <sort_order>10</sort_order>
                <show_in_default>1</show_in_default>
                <show_in_website>1</show_in_website>
                <show_in_store>1</show_in_store>
                <groups>
                    <general translate="label">
                        <label>General Options</label>
                        <frontend_type>text</frontend_type>
                        <sort_order>10</sort_order>
                        <show_in_default>1</show_in_default>
                        <show_in_website>0</show_in_website>
                        <show_in_store>0</show_in_store>
                        <fields>
                            <enabled translate="label">
                                <label>Enable extension</label>
                                <frontend_type>select</frontend_type>
                                <source_model>adminhtml/system_config_source_yesno</source_model>
                                <sort_order>100</sort_order>
                                <show_in_default>1</show_in_default>
                                <show_in_website>1</show_in_website>
                                <show_in_store>1</show_in_store>
                            </enabled>
                            <limit translate="label">
                                <label>Item limit per page</label>
                                <frontend_type>text</frontend_type>
                                <sort_order>200</sort_order>
                                <show_in_default>1</show_in_default>
                                <show_in_website>1</show_in_website>
                                <show_in_store>1</show_in_store>
                                <comment>
                                    <![CDATA[<span class="notice">50000 if empty</span>]]>
                                </comment>
                            </limit>
                        </fields>
                    </general>
                </groups>
            </atwix_sitemap>
    </sections>
</config>

Then, the final thing that we need to make the backend settings work is an empty helper. It is declared in the config.xml like here:

        <helpers>
            <atwix_sitemap>
                <class>Atwix_Sitemap_Helper</class>
            </atwix_sitemap>
        </helpers>

So, here is our “dummy helper” code:

<?php
/**
 * Location:
 * magento_root/app/code/local/Atwix/Sitemap/Helper/Data.php
 */

class Atwix_Sitemap_Helper_Data extends Mage_Core_Helper_Data
{

}

Okay, now we are ready to rewrite the core model responsible for the sitemap generation. Declaring our extension model and rewriting the Mage_Sitemap_Model_Sitemap are made with this config.xml code:

        <models>
            <atwix_sitemap>
                <class>Atwix_Sitemap_Model</class>
            </atwix_sitemap>
            <sitemap>
                <rewrite>
                    <sitemap>Atwix_Sitemap_Model_Sitemap</sitemap>
                </rewrite>
            </sitemap>
        </models>

The model code as follows:

<?php
/**
 * Location:
 * magento_root/app/code/local/Atwix/Sitemap/Model/Sitemap.php
 */

class Atwix_Sitemap_Model_Sitemap extends Mage_Sitemap_Model_Sitemap
{
    const     ITEM_LIMIT = 50000;
    protected $_io;
    protected $_subfiles = array();

    public function generateXml()
    {
        $enabled = (bool) Mage::getStoreConfig('atwix_sitemap/general/enabled');
        if(!$enabled) {
            return parent::generateXml();
        }
        $helper = Mage::helper('atwix_sitemap');
        
        $limit = (int) Mage::getStoreConfig('atwix_sitemap/general/limit');
        if ($limit == 0) {
            $limit = self::ITEM_LIMIT;
        }
        $this->fileCreate();

        $storeId = $this->getStoreId();
        $date = Mage::getSingleton('core/date')->gmtDate('Y-m-d');
        $baseUrl = Mage::app()->getStore($storeId)->getBaseUrl(Mage_Core_Model_Store::URL_TYPE_LINK);

        /**
         * Generate categories sitemap
         */
        $changefreq = (string) Mage::getStoreConfig('sitemap/category/changefreq');
        $priority = (string) Mage::getStoreConfig('sitemap/category/priority');
        $collection = Mage::getResourceModel('sitemap/catalog_category')->getCollection($storeId);

        /**
         * Delete old category files
         */
        try {
            foreach(glob($this->getPath() . substr($this->getSitemapFilename(), 0, strpos($this->getSitemapFilename(), '.xml')) . '_cat_*.xml') as $f) {
                unlink($f);
            }
        } catch(Exception $e) {
            Mage::getSingleton('adminhtml/session')->addError(
                $helper->__('Unable to delete old categories sitemaps') . $e->getMessage()
            );
        }

        /**
         * Brake to pages
         */
        $pages = ceil( count($collection) / $limit );
        $i = 0;
        while( $i < $pages ) {
            $name = '_cat_' . $i . '.xml';
            $this->subFileCreate($name);
            $subCollection = array_slice($collection, $i * $limit, $limit);
            foreach ($subCollection as $item) {
                $xml = sprintf(
                    '<url><loc>%s</loc><lastmod>%s</lastmod><changefreq>%s</changefreq><priority>%.1f</priority></url>',
                    htmlspecialchars($baseUrl . $item->getUrl()),
                    $date,
                    $changefreq,
                    $priority
                );
                $this->sitemapSubFileAddLine($xml, $name);
            }
            $this->subFileClose($name);
            /**
             * Add link of the subfile to the main file
             */
            $xml = sprintf('<sitemap><loc>%s</loc><lastmod>%s</lastmod></sitemap>', htmlspecialchars( $this->getSubFileUrl($name)), $date);
            $this->sitemapFileAddLine($xml);
            $i++;
        }

        unset($collection);

        /**
         * Generate products sitemap
         */
        $changefreq = (string) Mage::getStoreConfig('sitemap/product/changefreq');
        $priority = (string) Mage::getStoreConfig('sitemap/product/priority');
        $collection = Mage::getResourceModel('sitemap/catalog_product')->getCollection($storeId);

        /**
         * Delete old products files
         */
        try {
            foreach(glob($this->getPath() . substr($this->getSitemapFilename(), 0, strpos($this->getSitemapFilename(), '.xml')) . '_prod_*.xml') as $f) {
                unlink($f);
            }
        } catch(Exception $e) {
            Mage::getSingleton('adminhtml/session')->addError(
                $helper->__('Unable to delete old products sitemaps') . $e->getMessage()
            );
        }

        /**
         * Brake to pages
         */
        $pages = ceil( count($collection) / $limit );
        $i = 0;
        while( $i < $pages ) {
            $name = '_prod_' . $i . '.xml';
            $this->subFileCreate($name);
            $subCollection = array_slice($collection, $i * $limit, $limit);
            foreach ($subCollection as $item) {
                $xml = sprintf(
                    '<url><loc>%s</loc><lastmod>%s</lastmod><changefreq>%s</changefreq><priority>%.1f</priority></url>',
                    htmlspecialchars($baseUrl . $item->getUrl()),
                    $date,
                    $changefreq,
                    $priority
                );
                $this->sitemapSubFileAddLine($xml, $name);
            }
            $this->subFileClose($name);
            /**
             * Add link of the subfile to the main file
             */
            $xml = sprintf('<sitemap><loc>%s</loc><lastmod>%s</lastmod></sitemap>', htmlspecialchars($this->getSubFileUrl($name)), $date);
            $this->sitemapFileAddLine($xml);
            $i++;
        }

        unset($collection);

        /**
         * Generate cms pages sitemap
         */
        $changefreq = (string) Mage::getStoreConfig('sitemap/page/changefreq');
        $priority = (string) Mage::getStoreConfig('sitemap/page/priority');
        $collection = Mage::getResourceModel('sitemap/cms_page')->getCollection($storeId);

        /**
         * Delete old cms pages files
         */
        try {
            foreach(glob($this->getPath() . substr($this->getSitemapFilename(), 0, strpos($this->getSitemapFilename(), '.xml')) . '_pages_*.xml') as $f) {
                unlink($f);
            }
        } catch(Exception $e) {
            Mage::getSingleton('adminhtml/session')->addError(
                $helper->__('Unable to delete old products sitemaps') . $e->getMessage()
            );
        }

        /**
         * Brake to pages
         */
        $pages = ceil( count($collection) / $limit );
        $i = 0;
        while( $i < $pages ) {
            $name = '_pages_' . $i . '.xml';
            $this->subFileCreate($name);
            $subCollection = array_slice($collection, $i * $limit, $limit);
            foreach ($subCollection as $item) {
                $xml = sprintf(
                    '<url><loc>%s</loc><lastmod>%s</lastmod><changefreq>%s</changefreq><priority>%.1f</priority></url>',
                    htmlspecialchars($baseUrl . $item->getUrl()),
                    $date,
                    $item->getUrl() == 'home' ? 'always' : $changefreq,
                    $item->getUrl() == 'home' ? '1' : $priority
                );
                $this->sitemapSubFileAddLine($xml, $name);
            }
            $this->subFileClose($name);
            /**
             * Adding link of the subfile to the main file
             */
            $xml = sprintf('<sitemap><loc>%s</loc><lastmod>%s</lastmod></sitemap>', htmlspecialchars($this->getSubFileUrl($name)), $date);
            $this->sitemapFileAddLine($xml);
            $i++;
        }
        unset($collection);

        $this->fileClose();

        $this->setSitemapTime(Mage::getSingleton('core/date')->gmtDate('Y-m-d H:i:s'));
        $this->save();

        return $this;
    }

    /**
     * Create sitemap subfile by name in sitemap directory
     *
     * @param $name
     */
    protected function subFileCreate($name)
    {
        $io = new Varien_Io_File();
        $io->setAllowCreateFolders(true);
        $io->open(array('path' => $this->getPath()));
        $io->streamOpen( substr($this->getSitemapFilename(), 0, strpos($this->getSitemapFilename(), '.xml')) . $name);

        $io->streamWrite('<?xml version="1.0" encoding="UTF-8"?>' . "\n");
        $io->streamWrite('<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">');
        $this->_subfiles[$name] = $io;
    }

    /**
     * Add line to sitemap subfile
     *
     * @param $xml
     * @param $name
     */
    public function sitemapSubFileAddLine($xml, $name) {
        $this->_subfiles[$name]->streamWrite($xml);
    }

    /**
     * Create main sitemap file
     */
    protected function fileCreate() {
        $io = new Varien_Io_File();
        $io->setAllowCreateFolders(true);
        $io->open(array('path' => $this->getPath()));
        $io->streamOpen($this->getSitemapFilename());

        $io->streamWrite('<?xml version="1.0" encoding="UTF-8"?>' . "\n");
        $io->streamWrite('<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">');
        $this->_io = $io;
    }

    /**
     * Add closing tag and close sitemap file
     */
    protected function fileClose() {
        $this->_io->streamWrite('</sitemapindex>');
        $this->_io->streamClose();
    }

    /**
     * Add closing tag and close sitemap subfile by the name
     *
     * @param $name
     */
    protected function subFileClose($name) {
        $this->_subfiles[$name]->streamWrite('</urlset>');
        $this->_subfiles[$name]->streamClose();
    }

    /**
     * Get URL of sitemap subfile by the name
     *
     * @param $name
     * @return string
     */
    public function getSubFileUrl($name)
    {
        $fileName = substr($this->getSitemapFilename(), 0, strpos($this->getSitemapFilename(), '.xml')) . $name;
        $filePath = Mage::app()->getStore($this->getStoreId())->getBaseUrl(Mage_Core_Model_Store::URL_TYPE_LINK) . $this->getSitemapPath();
        $filePath = str_replace('//','/',$filePath);
        $filePath = str_replace(':/','://',$filePath);
        return $filePath . $fileName;
    }

    /**
     * Add line to the main file
     *
     * @param $xml
     */
    public function sitemapFileAddLine($xml)
    {
        $this->_io->streamWrite($xml);
    }
}

The main method here is generateXml() . Basically, it is creating the main file, running generation of items of each type (categories, products and cms pages) and closing the file (with model saving) at the end. It is true for both our and original method class. The only thing is really important here is that we’re splitting items collection to pages and write those pages to the separate files. Moreover, we’re also taking care of deleting the old sitemap files after each sitemap generation.

If you have some additional components in your store such as a blog, which has its own pages, you should add the corresponding logic to the generateXml() method.

The extension repo is available through this link. You can check our Magento ERP extention guide too. Thanks for reading us and do not hesitate to leave your feedback about our post in the comments.
Read more: