Aurora MySQL Storage Monitoring & Reduction

Using Aurora MySQL is a great way to host a MySQL database at low cost while providing a scalable and fault tolerant environment. Once you’ve setup your multi-zone cluster AWS takes care of much of the scaling for the database server. One of the components it manages automatically is the volume, or disk storage, up to a maximum of 128TB. However, this does not mean volume storage is fully managed.

All AWS RDS implementations have several components that can impact the monthly billing. While things like CPU and memory are based on the workload and often require application restructuring to reduce consumption, disk space — or volume storage — can often be reduced with routine maintenance.

Implementing routine maintenance to reduce volume consumption should be part of a monthly health-check for your AWS services.

Checking The Aurora MySQL Volume Size

The easiest way to check the amount of storage being used by the nodes in the data cluster is to use CloudWatch. You can follow this AWS article on How To View Storage to view the volume sizes. For monthly reviews, add a Number widget for this metric to your main Cloudwatch dashboard.

The Store Locator Plus® CloudWatch dashboard.

Checking MySQL Database and Table Sizes

You can check table sizes with this MySQL command:

           ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) AS `Size MB`
        FROM   information_schema.tables
        GROUP  BY table_name) AS tmp_table

Checking database sizes in MySQL can be done with this command:

           ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) AS `Size MB`
        FROM   information_schema.tables
        GROUP  BY TABLE_SCHEMA) AS tmp_table
Our database sizes

Additional File & Table Size Commands

Temp files in Aurora MySQL

SELECT file_name, ROUND(SUM(total_extents * extent_size)/1024/1024/1024,2) AS "TableSizeinGB" from information_schema.files;

Table and fragmented space (GB)

SELECT table_schema AS "DB_NAME", SUM(size) "DB_SIZE", SUM(fragmented_space) APPROXIMATED_FRAGMENTED_SPACE_GB FROM (SELECT table_schema, table_name, ROUND((data_length+index_length+data_free)/1024/1024/1024,2) AS size, ROUND((data_length - (AVG_ROW_LENGTH*TABLE_ROWS))/1024/1024/1024,2)
AS fragmented_space FROM information_schema.tables WHERE table_type='BASE TABLE' AND table_schema NOT IN ('performance_schema', 'mysql', 'information_schema') ) AS TEMP GROUP BY DB_NAME ORDER BY APPROXIMATED_FRAGMENTED_SPACE_GB DESC;

Reducing Volume Size

Yes, Amazon did add automatic volume reduction as part of the Aurora MySQL and PostgreSQL implementations a couple of years ago, but there are some caveats. First, deleting records from a table does not release the disk space; MySQL keeps that space for future inserts and updates. Second, Optimize Table should help but it temporarily increases volume usage as it replicates the table during the process; Also InnoDB tables — the default unless overridden should be auto-optimized so you won’t gain much if anything from this step.

This Stack Overflow post has some great details about MySQL and volume storage.

Dropping unused tables will reduce volume size IF innodb_file_per_table is on. It should be for default Aurora MySQL instances. You can check that setting with this MySQL command.

show variables like "innodb_file_per_table";

Cloning a database with dump/restore and dropping the original will reduce volume size and may be the only option to “reclaim” space if a lot of records have been dropped. You’ll be doubling volume use during the operation, but once the original source database is removed you should see a reduction in space.

You may be interested in following this AWS article on performance and scaling for Aurora DB clusters as well.

Or this article from AWS on troubleshooting storage space.

WordPress 4.8.2 Data Query Woes

Millions of websites across the Internet were automatically updated to WordPress 4.8.2 yesterday.    Thousands of those sites have key components that are no longer working.   Many of those sites will be looking for new plugins and themes unless new patches are made to make those themes and plugins 4.8.2 compatible.   WordPress 4.8.2 Data Query WoesWordPress 4.8.2 Data Query WoesWordPress 4.8.2 Data Query Woesn the meantime there are going to be a lot more partially broken websites online for the next month.

Unfortunately for my Store Locator Plus add ons, we build and test against the latest “nightly” builds which has been on WordPress 4.9 beta for a while.  Somehow the changes to the prepare method did not make it to that build on our test systems until 36 hours ago; that is about 3 hours AFTER we finished testing the latest 4.8.4 updates.  The very same locator updates that we published just over 30 hours ago.

What Changed?

WordPress changed a key component of the system architecture that is used by thousands of plugins, the database management interface.    The prepare() function of the WordPress Database class (WPDB) is used to manage data queries and replace placeholders with variables when performing searches.

As part of the WordPress 4.8.2 security update the ability to use a variety of different standardized printf formats, where a %d represents a number or a %s represents a string as a placeholder, was severely restricted.    As of 4.8.2 any plugin or theme that uses “undocumented’ or “unsupported” formats will no longer work.    To be more specific, any plugins or themes using the placeholders WordPress deems undocumented are not supported.

The officially supported placeholders are limited to:

*   %d (integer)
*   %f (float)
*   %s (string)
*   %% (literal percentage sign - no argument needed)

Many plugins and themes use positional arguments like %1s or , as is the case in my Store Locator Plus Power add on, use the more accurate %u placeholder instead of %d.   These plugins and themes no longer work as of 4.8.2.

The Fix

Any plugin and theme that uses the wpdb->prepare() call, and they all should if they are processing data queries with variables being passed in, will need to change the string formats to use the 4 approved placeholders noted above.

Why %d and %u Are Important

In my case I elected to use the standard printf %u format which is an unsigned integer format.   The reason I chose this option is because the WordPress standard for data storage of keys in WordPress tables is the MySQL unsigned integer.

What’s he difference?   Both PHP and MySQL store 2 billion values in a signed integer and 4 billion in an unsigned integer.    That means you can have TWICE as many records in your database with unsigned integers but more importantly you avoid possible confusion.   Can you imagine telling someone “No, no, no, I meant record number NEGATIVE 1357 not POSITITVE 1357”?

WordPress decided that they would limit the standard prepare to the %d , signed integer format which is limited to 2 billion unique entries.  Granted that will cover nearly any real world WordPress install, but for the one site that hits record 2-billion-and-one there is likely to be trouble.

Today, I brought this up on the WP Core ticket system.   Maybe they’ll change it.  Maybe not.  In the meantime I’m replacing all %u references with %d because my client’s don’t care about technical merit but rather just want their sites to stay running.


Update:  on my PHP7 based system the “internal workings” of PHP don’t change how %d and %u are working.  That’s good news.

sprintf( 'Unsigned %u<br/>. Signed %d <br/>' , PHP_INT_MAX , PHP_INT_MAX )

Unsigned 9223372036854775807
  Signed 9223372036854775807

You can run this same sprintf() on your box to verify %d and %u are interchangeable, but for WPDB ALWAYS use %d.   Now to change a lot of code.

Performance Benefits Of COUNT(*) In SQL

This is something I often come across in code.  Just came across it in our MySLP project today , in fact.

SELECT count(id) FROM my_table

While the performance impact of that statement in itself is negligible, it does add up.  For most database engines that statement incurs additional overhead as it processes logic related to a specific column containing a specific data type.   Some database engines check if the column is null for EVERY RECORD, for example.   Not a big deal if you have an index on the column but even with an index there is overhead.

The better option is to use count(*)

SELECT count(*) FROM my_table

COUNT(*) is a semantic placeholder, NOT an operation directive in-and-of-itself.   Most database engines, MySQL included, recognize this construct and employ “shortcuts”.

Performing this operation on a moderate-sized table, for example will show a slight difference in the speed of getting the count of records in the table with count(*) almost-always pulling the desired count of records in ZERO seconds and other variants taking a second or more.

Not much, granted, but it IS a big deal with you are working in large volume instances where you are processing billions of records not thousands.  Same thing if you have thousands of tables not dozens.   A site with hundreds-of-thousands of visitors not hundreds.

The performance differential is even more pronounced when using joins in your count.

Don’t be fooled into thinking count(id) only reads one column from your database while count(*) reads ALL of them and therefore count(id) MUST be faster.  That is not how it works.  Not for most database engines.


Here is an article by Percona on the same subject you may be interested in:

%d bloggers like this: