Detecting possible issues with serialized data after Magento 2 upgrade

The new Magento 2.2 release replaces the usage of default PHP serialized format to JSON format. The release also upgrades scripts that convert Magento data that is stored in serialized format e.g. Magento\Sales\Setup\SerializedDataConverter, Magento\Sales\Setup\SalesOrderPaymentDataConverter, Magento\Framework\DB\DataConverter\SerializedToJson classes. These major changes may affect the correct functionality of already existing custom modules and related data that is stored in the database.

If your external module stores its own serialized data, you would need the get more information from Magento offical tutorial on how to upgrade PHP serialized data to JSON format.

The converters from the latest release may cause some issues during the database upgrade in case of invalid data, which is processed by the converters.

For example, the data might be broken in the following cases:

  • Some values have been incorrectly migrated from another eCommerce platform;
  • Empty values could have been saved during an import when data was missing;
  • Data or its format has been changed by an existing third-party solution;
  • Some values could be manually removed or adjusted;
  • And so on…

Let’s imagine we have few invalid serialized records in product_options field of the sales_order_item table. In this case, during the database upgrade processing you will receive an exception similar to the following:

Error converting field product_options in table sales_order_item where item_id= 1, 2, 6, 8, 9, 11
using Magento\Sales\Setup\SerializedDataConverter. Fix data or replace with a valid value.

First of all, we need to fix the existing invalid records in order to make it possible to convert them.

From the error message it’s clear, which field exactly contains invalid data (product_options), in what table (sales_order_item), and how many and which records exactly could not be converted (item_id in 1, 2, 6, 8, 9, 11).

Unfortunately, there is no common solution to fix invalid data. Moreover, most of the issues should be fixed manually. However, we’ve implemented a small extension intended to simplify the investigation flow and to fix issues. The module provides two Magento command line interface commands:

atwix:serialized-to-json:empty-values-fix [table_name] [identifier_field_name] [field_name]
atwix:serialized-to-json:validate [table_name] [identifier_field_name] [field_name]

Here is a quick guide on how to create your own Magento CLI command.

The process involves two stages – getting the list of invalid serialized data, and converting empty string values.

Get list of invalid serialized data

The field value should be unserialized before converting it to JSON format. We need to check whether the value is not empty and whether it can be unserialized with unserialize() function. This will ensure that the field contains a valid record. Also, if the field value is already in JSON format, it will be skipped. This way we can detect invalid records and make a list with them.

The source code of this CLI command is the following:

<?php
/**
 * @author Atwix Team
 * @copyright Copyright (c) 2017 Atwix (https://www.atwix.com/)
 * @package Atwix_SerializedToJson
 */

namespace Atwix\SerializedToJson\Console\Command;

use Exception;
use Magento\Framework\App\Filesystem\DirectoryList;
use Magento\Framework\App\ResourceConnection;
use Magento\Framework\Db\Select;
use Magento\Framework\Filesystem\Io\File as FileIO;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Magento\Framework\Serialize\Serializer\Json as JsonSerializer;

/**
 * Class ValidateFieldCommand
 */
class ValidateFieldCommand extends Command
{
    /**
     * Argument key for table name
     */
    const ARGUMENT_TABLE = 'table';

    /**
     * Argument key for identifier field name
     */
    const ARGUMENT_ID_FIELD = 'id-field';

    /**
     * Argument key for field name to be processed
     */
    const ARGUMENT_FIELD = 'field';

    /**
     * Page size
     */
    const ROWS_PER_PAGE = 1000;

    /**
     * Output file name
     */
    const OUTPUT_FILE_NAME = 'serialized_to_json_validation.log';

    /**
     * Error message pattern
     */
    const ERROR_MESSAGE_PATTERN = '%s: %s - %s';

    /**
     * Resource Connection
     *
     * @var ResourceConnection
     */
    protected $resource;

    /**
     * Filesystem IO
     *
     * @var FileIO
     */
    protected $ioFile;

    /**
     * Directory list
     *
     * @var DirectoryList
     */
    protected $directoryList;

    /**
     * Json Serializer
     *
     * @var JsonSerializer
     */
    protected $jsonSerializer;

    /**
     * ValidateFieldCommand constructor
     *
     * @param ResourceConnection $resource
     * @param FileIO $ioFile
     * @param DirectoryList $directoryList
     * @param JsonSerializer $jsonSerializer
     * @param null $name
     */
    public function __construct(
        ResourceConnection $resource,
        FileIO $ioFile,
        DirectoryList $directoryList,
        JsonSerializer $jsonSerializer,
        $name = null
    ) {
        parent::__construct($name);
        $this->resource = $resource;
        $this->ioFile = $ioFile;
        $this->directoryList = $directoryList;
        $this->jsonSerializer = $jsonSerializer;
    }

    /**
     * {@inheritdoc}
     */
    protected function configure()
    {
        $this->setName('atwix:serialized-to-json:validate');
        $this->setDescription('Get list of invalid serialized data');
        $this->addArgument(self::ARGUMENT_TABLE, InputArgument::REQUIRED, 'Table name');
        $this->addArgument(self::ARGUMENT_ID_FIELD, InputArgument::REQUIRED, 'Identifier field name');
        $this->addArgument(self::ARGUMENT_FIELD, InputArgument::REQUIRED, 'Field to be validated');
    }

    /**
     * {@inheritdoc}
     */
    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $tableName = $input->getArgument(self::ARGUMENT_TABLE);
        $idFieldName = $input->getArgument(self::ARGUMENT_ID_FIELD);
        $fieldName = $input->getArgument(self::ARGUMENT_FIELD);
        $invalidRecordsCount = 0;
        $resultOutput = [];
        $resultOutput[] = sprintf('Validation result for field "%s" in table "%s":', $fieldName, $tableName);

        $connection = $this->resource->getConnection();

        /** @var Select $countQuery */
        $countQuery = $connection->select();
        $countQuery->from($tableName, ['rows_count' => 'COUNT(*)']);
        $count = (int)$connection->fetchOne($countQuery);
        $pagesCount = ceil($count / self::ROWS_PER_PAGE);

        /** @var Select $select */
        $select = $connection->select();
        $select->from($tableName, [$idFieldName, $fieldName]);

        for ($currPage = 1; $currPage <= $pagesCount; $currPage++) {
            $select->limitPage($currPage, self::ROWS_PER_PAGE);
            $rows = $connection->fetchAll($select);

            foreach ($rows as $row) {
                $rowId = $row[$idFieldName];
                $value = $row[$fieldName];

                if ($value === false || $value === null || $value === '') {
                    // The field should not contain the empty values
                    $row[$fieldName];
                    $invalidRecordsCount++;
                    $resultOutput[] = sprintf(self::ERROR_MESSAGE_PATTERN, $idFieldName, $rowId, 'contains empty value');

                    continue;
                }

                if ($this->isValidJson($value)) {
                    // If the field contains valid JSON, it should not be considered as an invalid record
                    continue;
                }

                try {
                    // Try to unserialize field value
                    unserialize($value);
                } catch (Exception $e) {
                    $invalidRecordsCount++;
                    $resultOutput[] = sprintf(self::ERROR_MESSAGE_PATTERN, $idFieldName, $rowId, $e->getMessage());
                }
            }
        }

        $summaryMessage = sprintf('Invalid records count: %s', $invalidRecordsCount);
        $resultOutput[] = $summaryMessage;

        $filePath = $this->directoryList->getPath(DirectoryList::LOG) . DIRECTORY_SEPARATOR . self::OUTPUT_FILE_NAME;
        $this->ioFile->checkAndCreateFolder(dirname($filePath));
        $this->ioFile->open();
        $this->ioFile->write($filePath, implode($resultOutput, PHP_EOL));
        $this->ioFile->close();

        $output->writeln($summaryMessage);
        $output->writeln(sprintf('Result output: %s', $filePath));

        return 0;
    }

    /**
     * Check whether the value has been already converted to json
     *
     * @param string $value
     *
     * @return bool
     */
    protected function isValidJson($value)
    {
        try {
            $this->jsonSerializer->unserialize($value);
        } catch (Exception $e) {
            return false;
        }

        return true;
    }
}

Here is an example of atwix:serialized-to-json:validate command’s call. Where the sales_order_item is a table name, the item_id is an identifier field used as a reference for records, and the product_options is the field with values to be validated. As a result we will get a log report file which contains a list of invalid records and related errors.

bin/magento atwix:serialized-to-json:validate sales_order_item item_id product_options
Invalid records count: 7
Result output: /home/dev/sites/bravo/2.2/var/log/serialized_to_json_validation.log

Convert empty string values

It’s simple to fix empty values if they exist. First of all, we are retrieving all the empty records and replacing them with a valid empty serialized value.

Keep in mind, that it allows us to fix the issue in order to get rid of the errors during conversion to JSON format. But in future, we might need to replace these values with the correct data, as some data might be required due to an existing logic. In this case, the additional investigation is required: why the values for some records have been empty, which object is proper, and what structure was expected?

<?php
/**
 * @author Atwix Team
 * @copyright Copyright (c) 2017 Atwix (https://www.atwix.com/)
 * @package Atwix_SerializedToJson
 */

namespace Atwix\SerializedToJson\Console\Command;

use Exception;
use Magento\Framework\App\ResourceConnection;
use Magento\Framework\Db\Select;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Magento\Framework\Console\Cli;

/**
 * Class ConvertEmptyValuesCommand
 */
class ConvertEmptyValuesCommand extends Command
{
    /**
     * Argument key for table name
     */
    const ARGUMENT_TABLE = 'table';

    /**
     * Argument key for identifier field name
     */
    const ARGUMENT_ID_FIELD = 'id-field';

    /**
     * Argument key for field name to be processed
     */
    const ARGUMENT_FIELD = 'field';

    /**
     * Resource Connection
     *
     * @var ResourceConnection
     */
    protected $resource;

    /**
     * ConvertEmptyValuesCommand constructor
     *
     * @param ResourceConnection $resource
     * @param null $name
     */
    public function __construct(ResourceConnection $resource, $name = null)
    {
        parent::__construct($name);
        $this->resource = $resource;
    }

    /**
     * {@inheritdoc}
     */
    protected function configure()
    {
        $this->setName('atwix:serialized-to-json:empty-values-fix');
        $this->setDescription('Convert empty string values to valid serialized object');
        $this->addArgument(self::ARGUMENT_TABLE, InputArgument::REQUIRED, 'Table name');
        $this->addArgument(self::ARGUMENT_ID_FIELD, InputArgument::REQUIRED, 'Identifier field name');
        $this->addArgument(self::ARGUMENT_FIELD, InputArgument::REQUIRED, 'Field to be validated');
    }

    /**
     * {@inheritdoc}
     */
    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $tableName = $input->getArgument(self::ARGUMENT_TABLE);
        $idFieldName = $input->getArgument(self::ARGUMENT_ID_FIELD);
        $fieldName = $input->getArgument(self::ARGUMENT_FIELD);
        // Generate a valid serialised empty object as string
        $emptySerializedValue = serialize('');
        $connection = $this->resource->getConnection();

        /** @var Select $select */
        $select = $connection->select();
        $select->from($tableName);
        $select->where(sprintf('%s = \'\'', $fieldName));
        $rows = $connection->fetchAll($select);
        $count = count($rows);

        if (!$count) {
            $output->writeln(
                sprintf('The "%s" field does not contain empty values in "%s" table', $fieldName, $tableName)
            );

            return 0;
        }

        $output->writeln(sprintf('Empty values count: %s', $count));
        $connection->beginTransaction();

        try {
            foreach ($rows as $row) {
                // Replace empty value
                $row[$fieldName] = $emptySerializedValue;
                $where = [$idFieldName . ' = ?' => $row[$idFieldName]];
                $connection->update($tableName, $row, $where);
            }
            $connection->commit();
        } catch (Exception $e) {
            $connection->rollBack();
            $output->writeln("<error>{$e->getMessage()}</error>");

            return Cli::RETURN_FAILURE;
        }

        $output->writeln(sprintf('Successfully replaced empty values'));

        return 0;
    }
}

Here is an example of atwix:serialized-to-json:empty-values-fix command’s call.

bin/magento atwix:serialized-to-json:empty-values-fix sales_order_item item_id product_options
Empty values count: 4
Successfully replaced empty values

If there are no empty values to be fixed for a specified field, we will receive a corresponding message.

bin/magento atwix:serialized-to-json:empty-values-fix sales_order_item item_id product_options
The "product_options" field does not contain empty values in "sales_order_item" table

The full module’s source code can be fond in Atwix_SerializedToJson GIT repository.

Thanks for reading!

Read more: