Mysql4all Blog

Ivan Zoratti's blog on MySQL and around

Archive for the ‘Uncategorized’ Category

Yet Again On Subqueries

with 10 comments

…with a bit of strategic thinking

They come back, every now and then. Subqueries are far from being perfect at MySQL and they can give you some serious headaches.

Skilled MySQL developers know it better. They avoid subqueries as much as they can. It is not that subqueries do not work, it is just that the optimizer sometimes is, well, “not that optimised”.

So you may stay away from subqueries with some good SQL review. But what happens when the subquery is automatically generated by a script or a tool? If you can change the statement, I’m afraid you need to find some serious workarounds that vary case by case.

Here is an example that I found few weeks ago when I visited one of our customers.

Our customer used Magento for its site. Magento used a couple of queries that I will report here as sales and sales_items, although they are not their real name:

CREATE TABLE `sales` (
   `sales_id`     int(11) NOT NULL,
   `sales_entity` int(11) NOT NULL,
   `sales_status` int(11) NOT NULL,
   PRIMARY KEY (`sales_id`),
   KEY `ix2` (`sales_entity`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
 
CREATE TABLE `sales_items` (
   `sales_item_id` int(11) NOT NULL,
   `product_id`    int(11) NOT NULL,
   `quantity`      int(11) NOT NULL,
   PRIMARY KEY (`sales_item_id`),
   KEY `ix2` (`product_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
 

Both queries had tens of thousands of rows, but in our example I will use only few rows:

mysql> select * from sales;
+----------+--------------+--------------+
| sales_id | sales_entity | sales_status |
+----------+--------------+--------------+
|        1 |            1 |            1 |
|        2 |            1 |            2 |
|        3 |            2 |            1 |
|        4 |            2 |            2 |
|        5 |            3 |            1 |
|        6 |            3 |            2 |
|        7 |            4 |            1 |
|        8 |            4 |            2 |
|        9 |            5 |            1 |
|       10 |            5 |            2 |
+----------+--------------+--------------+
10 rows in set (0.00 sec)
 
mysql> select * from sales_items;
+---------------+------------+----------+
| sales_item_id | product_id | quantity |
+---------------+------------+----------+
|             1 |          1 |        1 |
|             2 |          1 |        1 |
|             3 |          2 |        1 |
|             4 |          2 |        1 |
|             5 |          3 |        1 |
|             6 |          3 |        1 |
+---------------+------------+----------+
6 rows in set (0.00 sec)
 

The application was running very slow. When we analysed the query log using SkySQL Enterprise Monitor, we found an UPDATE like this:

UPDATE sales SET sales_status = ?
 WHERE sales_entity IN (
       SELECT DISTINCT sales_item_id
         FROM sales_items
        WHERE product_id IN (?)
 )

The update was executed several times in a minute and it took several seconds to be executed. We already knew that the issue was in the subquery, but just to explain the situation to the customer we wanted to show an execution plan.

As you may know, you cannot use EXPLAIN with and UPDATE statement, therefore we use a SELECT that acts like the UPDATE. This SELECT did the trick:

SELECT * FROM sales
 WHERE sales_entity IN (
       SELECT DISTINCT sales_item_id
         FROM sales_items
        WHERE product_id IN (3)
);
 
+----------+--------------+--------------+
| sales_id | sales_entity | sales_status |
+----------+--------------+--------------+
|        9 |            5 |            1 |
|       10 |            5 |            2 |
+----------+--------------+--------------+
2 rows in set (0.00 sec)
 

The execution plan of the query is:

EXPLAIN SELECT * FROM sales
         WHERE sales_entity IN (
               SELECT DISTINCT sales_item_id
                 FROM sales_items
                WHERE product_id IN (3)
        );
 
+----+--------------------+-------------+-----------------+---------------+---------+---------+------+------+-------------+
| id | select_type        | table       | type            | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+--------------------+-------------+-----------------+---------------+---------+---------+------+------+-------------+
|  1 | PRIMARY            | sales       | ALL             | NULL          | NULL    | NULL    | NULL |   10 | Using where |
|  2 | DEPENDENT SUBQUERY | sales_items | unique_subquery | PRIMARY,ix2   | PRIMARY | 4       | func |    1 | Using where |
+----+--------------------+-------------+-----------------+---------------+---------+---------+------+------+-------------+
2 rows in set (0.00 sec)
 

As you can see, the query uses the index in the subquery to retrieve the rows from sales_items, but then it does not use the index on sales_entity in sales. The result is that the UPDATE takes several seconds to be executed, because mysqld performs a full scan of the sales table.

If you want to check the details of this issue, you can have a look at the infamous bug 18826 –  http://bugs.mysql.com/bug.php?id=18826

So what should you do to fix the problem? If you cannot change the statement, I am afraid there is not much you can do. We were luck and the customer could change the statement. By changing the select with a JOIN that replaces the subquery, something like:

SELECT sales.* FROM sales JOIN sales_items
    ON  (sales.sales_entity = sales_items.sales_item_id )
 WHERE sales_items.product_id in (3);
 
+----------+--------------+--------------+
| sales_id | sales_entity | sales_status |
+----------+--------------+--------------+
|        9 |            5 |            1 |
|       10 |            5 |            2 |
+----------+--------------+--------------+
2 rows in set (0.00 sec) 

 

The SELECT gives the same result of the previous SELECT and the EXPLAIN command shows:

EXPLAIN SELECT sales.* FROM sales JOIN sales_items
            ON  (sales.sales_entity = sales_items.sales_item_id )
         WHERE sales_items.product_id in (3);
 
+----+-------------+-------------+------+---------------+------+---------+--------------------------------+------+-------------+
| id | select_type | table       | type | possible_keys | key  | key_len | ref                            | rows | Extra       |
+----+-------------+-------------+------+---------------+------+---------+--------------------------------+------+-------------+
|  1 | SIMPLE      | sales_items | ref  | PRIMARY,ix2   | ix2  | 4       | const                          |    2 | Using index |
|  1 | SIMPLE      | sales       | ref  | ix2           | ix2  | 4       | test.sales_items.sales_item_id |    1 |             |
+----+-------------+-------------+------+---------------+------+---------+--------------------------------+------+-------------+
2 rows in set (0.00 sec)
 

So ix2 can be used to filter the rows in the SELECT, and ultimately in the UPDATE.

In fact, the new UPDATE is:

UPDATE sales JOIN sales_items
    ON ( sales.sales_entity = sales_items.sales_item_id )
   SET sales_status = ?
 WHERE sales_items.product_id IN ( ? )
 

The server tales few milliseconds to execute the query, fixing also the headache of our developers better than a pain killer.

Some people may argue that the use of subqueries is not efficient and developers should learn some good SQL. Unfortunately, this is not the point. Products like Magento and others – I have primarily in mind BI and reporting tools – are supposed to be database agnostic and generate queries for many RDBMSs with minor customisation. Bugs like this give the end users the impression that MySQL is terribly slow and is simply not good enough for the job, so they should refer to expensive commercial RDBMSs – the ones that have subqueries working properly.

I really, really hope that the optimiser team at Monty Program one day will fix this bug. We will see even more adoption of MySQL in the Enterprise world, something that has been always underestimated. More projects in the “Enterprise world” mean more resources injected into the MySQL ecosystem, resources (funds, skilled developers and DBAs, expertise) that may be used to improve the product and the solutions around it.

Written by Ivan Zoratti

25 January 2011 at 9:39 pm

Heading to Oracle Open World

leave a comment »

Today is all airports and flights. The flight between London and San Francisco is not too bad, but I don’t like long haul flights and for 12 hours I can sympathise with sardines and anchovies…

Tomorrow it will be different. The weather looks good, the sessions very interesting. Just few hours to settle, review the presentation and relax a bit before it begins. Then, Oracle Open World is going to open on Sunday evening at the Moscone Center.

What about MySQL?

Sunday is all MySQL! We will have opening keynotes from Edward Screven and the MySQL team. We will have technical sessions all afternoon and a closing keynote from Marten Mickos in the evening. The program for MySQL Sunday is here.

If you cannot participate in person, I am sure you will find presentations and videos soon after the event.

My presentation

I am going to present in the mainstream set of Oracle Open World, On Monday at 5pm Pacific. The topic is “Getting the best of both worlds with Oracle Database 11g and MySQL Enterprise” – details are here.

In the light of the adoption of MySQL in the Enteprise, it is important to find the best ways to integrate Oracle and MySQL. This integration is possible now, but there are many other options to come in the near future.

Anything else re MySQL at the Oracle Open World?

Of course there are hundreds of other sessions and a MySQL booth. If you can stop there you will find me and other colleagues of the MySQL team.

What’s next?

Well, will see. September and October are hot months for the business. Lots of presentations in Europe and customers meetings will keep me busy. I hope there will be time to meet many people at the MySQL User Groups and in customers events.

Written by Ivan Zoratti

18 September 2010 at 2:49 am

Posted in MySQL, Uncategorized