Working with large Magento Collections

This time, topic of our article is “Collections”, a great key to data manipulation in Magento. All of the developers are able to use and operate this powerful instrument. Isn’t it cool to be able to grab all needed data from the database tables with no SQL? We can find many articles and tutorials in the WEB explaining collection features, but the main goal of our article is a discussion of the situations when the collections become a serious trouble for a developer.

You can face with most of the problems while dealing with the large collections. As you know, Magento uses EAV structure for entities, so, grabbing a product with all the attributes generates a lot of SQL. Imagine, that you are writing a custom indexer or some kind of the CSV export for a store with 200k products. Let’s suppose that we have created/updated the function of some indexer:

public function updateIndex()
{
    $productsCollection = Mage::getModel('catalog/product')
        ->getCollection()
        ->addAttributeToSelect(array('name', 'image', 'url_key', 'price', 'visibility'));
    foreach ($productsCollection as $_product) {
        $data = array(
            'entity_id' => $_product->getId(),
            'title' => $_product->getName(),
            'image' => $_product->getImage(),
            'url' => $_product->getUrlKey(),
            'price' => $_product->getFinalPrice(),
        );

        //inserting data to custom table
        $this->_getWriteAdapter()->insertOnDuplicate(
            $this->getTable('atwix_tutorial/sometable'),
            $data,
            array('title', 'image', 'url', 'price')
        );
    }
}

Everything is simple here: get collection, add attributes to collection, then, iterating through the collection items in the foreach loop. But, the problem is that running this code you will get the memory leak very quickly. Collection load() method is being called before looping, and attributes that have been specified in addAttributeToSelect are being loaded for every item in the collection. Of course, it is not a problem for the small collections, but such operation will eat all your memory in case you have a lot of items. Moreover, you will find the following message at the error log:

PHP Fatal error: Allowed memory size of xxxx bytes exhausted

For this purpose, we would like to describe at least two possibly solutions of this problem.

 

Using core/resource_iterator model.

 

In simple words, the core/resource_iterator model allows you to get the collection item data one by one executing corresponding SQL query. The Walk function of Mage_Core_Model_Resource_Iterator is used here:

public function walk($query, array $callbacks, array $args=array(), $adapter = null)

It takes the following parameters:

  • $query – is query string or select object to execute
  • $callbacks – callback function(s) to process an item data
  • $args – extra arguments that can be used in the callback function (optional)
  • $adapter – write/read the adapter (optional)

See the usage example below:

public function updateIndex()
{
    $productsCollection = Mage::getModel('catalog/product')->getCollection()
        ->addAttributeToSelect(array('name', 'image', 'url_key', 'price', 'visibility'), 'inner');

    Mage::getSingleton('core/resource_iterator')->walk(
        $productsCollection->getSelect(),
        array(array($this, 'productCallback')),
        array('store_id' => $storeId)
    );
}

public function productCallback($args)
{
    $_product = Mage::getModel('catalog/product');
    $_product->setData($args['row']);

        $insertData = array(
            'entity_id' => $_product->getId(),
            'title'     => $_product->getName(),
            'image'     => $_product->getImage(),
            'url'       => $_product->getUrlKey(),
            'price'     => $_product->getFinalPrice(),
            'store_id'  => $args['store_id']
        );

        $this->_getWriteAdapter()->insertOnDuplicate(
            $this->getTable('atwix_tutorial/sometable'),
            $insertData,
            array('title', 'image', 'url', 'price', 'visible')
        );
}

productCallback – is our callback function. All grabbed from the database data is available in $args[‘row’] here.

 

Using Collection Page Size.

 

Then, the goal of the second method is to divide the collection into pages and load data by page. setPageSize sets size of the collection and setCurPage sets iterator to the page you want to load.
Check the example below:

public function updateIndex()
{
    $productsCollection = Mage::getModel('catalog/product')->getCollection()
        ->addAttributeToSelect(array('name', 'image', 'url_key', 'price', 'visibility'));

    $productsCollection->setPageSize(100);

    $pages = $productsCollection->getLastPageNumber();
    $currentPage = 1;

    do {
        $productsCollection->setCurPage($currentPage);
        $productsCollection->load();

        foreach ($productsCollection as $_product) {

            $insertData = array(
                'entity_id' => $_product->getId(),
                'title' => $_product->getName(),
                'image' => $_product->getImage(),
                'url' => $_product->getUrlKey(),
                'price' => $_product->getFinalPrice(),
            );

            $this->_getWriteAdapter()->insertOnDuplicate(
                $this->getTable('atwix_sonar/suggestions'),
                $insertData,
                array('title', 'image', 'url', 'price')
            );
        }

        $currentPage++;
        //clear collection and free memory
        $productsCollection->clear();
    } while ($currentPage <= $pages);
}

We hope those techniques will help you to use Magento collection more effective. You are welcome to share your thoughts and methods in the comments.