How can I improve the insertion times in a large table?

advertisements

We have the following data structure in a MySQL table which basically logs user actions on pages

id int
page_id int
user_id int
action_type enum(6)
date_created datetime`

We have the following indexes:

id Primary key
user_id-page_id-date_created unique
page_id-user_id-date_created
user_id
page_id-date_created

Our issue is that this table currently has 125 million rows and it's growing at a rate of 0.8 million a day which makes the inserts take about 2 hours to complete. The inserts are made through 3 queries that select data from 3 other tables. What could we do to improve this time? Should we drop mysql and try other database solutions?

L.E: Based on your feedback I am trying to give more info. First of all the tables are MyISAM and these inserts happen once every night in a cron job and we do not delete any data from them. Here is how i handle the inserts. I will refer to the big table as big_table and each of the 3 tables will be content_table because they are similar in structure. The explains will be for the largest of the 3 tables which has around 108.5 million. First i get the id from which i should start inserting using php. (I'm ok with the 3 minutes for the non-indexed query to get it)

SELECT id FROM content_table WHERE date_created > "2012-04-18" ORDER BY id ASC LIMIT 1;
+-----------+
| id        |
+-----------+
| 107278872 |
+-----------+
1 row in set (3 min 15.52 sec)

EXPLAIN SELECT id FROM content_table WHERE date_created > "2012-04-18" ORDER BY id ASC LIMIT 1;
+----+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table            | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | content_table    | index | NULL          | PRIMARY | 4       | NULL |    1 | Using where |
+----+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.06 sec)

And then using this id i do the following

INSERT IGNORE INTO big_table (user_id, page_id, type, date_created)
SELECT user_id, page_id, IF (is_admin,"admin_action","action") as type, created_time FROM content_table WHERE id >= "107278872";

Here's how the explain for the select looks like

EXPLAIN SELECT user_id, page_id, IF (is_admin,"admin_action","action") as type, created_time FROM content_table WHERE id >= "107278872";
+----+-------------+------------------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table            | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
+----+-------------+------------------+-------+---------------+---------+---------+------+--------+-------------+
|  1 | SIMPLE      | content_table    | range | PRIMARY       | PRIMARY | 4       | NULL | 777864 | Using where |
+----+-------------+------------------+-------+---------------+---------+---------+------+--------+-------------+
1 row in set (0.00 sec)

I've also tried it in phpmyadmin and got times of around 0.004s so i think it's the inserts that take time and not the data fetching. All i know about the server is that it is a quad core xeon @ 2.4 ghz and 16 GB of ram but I do not know anything about storage (will come back as soon as i have that info). And the data is not used for logging only we need to have statitistics like which users were most active on pages, various groupings etc and the user can specify any interval for these.


You could:

  1. EXPLAIN PLAN on the queries you run on the 3 other tables to see if they're properly indexed. TABLE SCAN should be eliminated.
  2. Add indexes to those three other tables for each WHERE clause in the queries.
  3. Partition the data by day, week, month, or some other suitable mechanism so you can move the oldest data out to a reporting/warehousing solution.
  4. You could see if a trigger solution could help you.
  5. Profile the database and monitor network traffic to see where the time is being spent.

All relational databases will suffer from having to deal with too much data. Your first thought should not be to drop MySQL; it should be figuring out what your archiving strategy needs to be. You have to decide exactly how much of that data is needed in your transactional store at a given time.