PHP Session Management On Container Clusters

The new Store Locator Plus® SaaS platform that is launching later this month has been built from the ground-up on a modern cloud infrastructure. The new application leverages AWS Elastic Container Service which manages underlying Docker containers. This move was made in order to move toward a better continuous integration and deployment process for the application. This new configuration allows for automated building and testing of the new software as it is being updated. It also provides a managed service that automatically handles horizontal and vertical scaling across multiple application services that support the SaaS platform.

While the application is still a very monolithic design with one big code base and a database cluster, it allows for future revisions to move toward a services oriented architecture. In short, we will be able to continue our move away from WordPress As An Application Framework toward a more modern and responsive set of micro services. Each service can employ the technology best suited to the task without being locked into PHP and MySQL.

Updating the Store Locator Plus® SaaS platform to use AWS ECS is the first step toward modernizing the platform to provide an updated user experience with faster response times.

One of the challenges in running in this environment is handling the PHP session management across cluster containers. Relying solely on cookie and file based sessions will no longer work.

PHP Session Management On EC2 Clusters

The Store Locator Plus® SaaS platform has been running on a scalable cluster since it was launched nearly a decade ago. However, the legacy architecture relied on a series of individual EC2 servers each running an identical copy of the software. Since the application was running on the host operating system that basically has direct access to the rest of the cloud fabric, it allowed for shortcuts. With the legacy configuration you can get around PHP session store issues by utilizing what are essentially network-mounted disk volumes.

On the legacy cluster configuration each EC2 server that would spin up and be added to the cluster would mount a shared drive from the cloud where the session data was kept. This ensured that the right session tokens would exist when your browser connected to any server in the cluster. This is mostly stable, but not very fast. It also meant that the cloud drive was a single point of failure (though that was setup with redundancy as well) which makes the application a little more brittle than we like (though we’ve never had a session or drive loss in 10 years). Not only that, mounted network drives are SLOW. Granted a long 3 or 4 second response time isn’t much when logging in, but it can still be noticeable.

Session Management Across Cluster Containers

The new platform, however, is designed to be broken into services. Our approach starts with using ECS to spin up our host servers which also happens to run on EC2 instances. The difference with the new configuration is that each EC2 server is not running PHP and WordPress directly on the server. Instead it runs a Docker host which runs one or more containers.

For now we run a single Docker container on each EC2 instance but in the future each EC2 host will run multiple containers. One container may run the WordPress PHP part of the app, another a NodeJS service that supports specific features. This will allow us to pick from the best-implementation options and do things like run a NodeJS service in a Docker container or migrate it to a Lambda function or possibly a separate Amplify mini-app. Taking on the pain of implementing hosted ECS container clusters now means far more flexibility with future application design.

Unlike the EC2 clusters, however, it is not easy to mount shared drives with session tokens between the clusters. Not only is there a lot more work to punch holes from Docker guests to hosts and out to the AWS cloud, it would re-introduce the performance issues and fragility of the legacy design.

Instead, we need to employ off-server PHP session data stores that can be easily read from each container as needed. For this project we are employing Valkey (open source Redis) caching via the Amazon ElastiCache service. We think of it as our first micro service for our SaaS platform, separating session management the PHP host server.

Setting Up ElastiCache For Session Tokens

We opted to use an AWS ElastiCache Valkey server. This is an open source implementation of Redis.

One of the things we learned along the way is you are probably better off creating your own ElastiCache server via “Design your own cache”. If you use the default Serverless quick setup on AWS it is going to be 4x as costly in the long run and you have less flexibility on the configuration. For PHP session tokens this would be overkill.

Here are the settings that we used for setting up our Redis PHP session management stores:

Deployment option: Design your own
Creation method: Cluster cache – you have more control over each configuration option here
Cluster mode: Enabled – for our production and staging services
Location: AWS Cloud
Multi-AZ: Enable – makes it more fault tolerant
Replicas: 2 – the default, good enough for now
Subnet Group: we chose our pre-existing subnet where our ECS clusters live, this includes a private local cloud network and a bridge for our port 443 web services to a network gateway for public Internet traffic. The important part is being on the same subnet as the ECS and by inference the underlying EC2 hosts.

All encryption options are enabled both at rest and in transit.
We added our security group used in the ECS cluster which ensures the Redis ports are protected.

We do have backups for production, not for development or staging.

Setup of the Redis server is fairly simple. The key items are ensuring the network connectivity is correct which means choosing the right VPC and subnets. Check to make sure the security group allows communication between the ECS servers and Redis servers on the AWS cloud.

It can take up to 15 minutes to spin up the server. When it is finished you will get an endpoint that will be used to configure PHP connectivity.

Configuring The Docker Image For PHP Session Management With Redis

Next on the list is configuring the Docker image that will be used to build the environment for the containers that run the SaaS platform. In our case we are using PHP with WordPress so we need to update our default WordPress PHP docker image to add the necessary Redis extension and ensure it loads properly.

One of the key elements to the implementation is ensuring our Redis server is not hard-coded into the configuration. Instead we are going to use environment variables that we can set in our ECS service task definitions. This allows our development, staging, and production services to all have their own session management data.

Create ./Docker/Images/Files/php/docker-php-ext-redis.ini

extension=redis.so
session.save_handler = ${PHP_SESSION_SAVE_HANDLER}
session.save_path = ${PHP_SESSION_SAVE_PATH}

This PHP ini file is copied over to the container image and allows the runtime PHP settings for these elements to be read from an environment variable. Those variables are set with either Docker Composer files (local development) or via our ECS task definitions that work very much like Docker composer files but for the AWS ECS setup.

Create ./Docker/Images/Dockerfile

This is our complete Dockerfile to build WordPress on PHP 8.3 with multisite, our SSL certs, and some other OS level items we need to get our SaaS platform running.

# -- base image

FROM public.ecr.aws/docker/library/wordpress:6.4.2-php8.3-apache
LABEL authors="lancecleveland" \
      image="WordPress Multisite on Apache"

# -- ports

EXPOSE 443

# -- os utilities

RUN set -eux; \
	apt-get update; \
	apt-get install -y --no-install-recommends \
		dnsutils \
        inetutils-traceroute \
        iputils-ping \
        libz-dev \
        libssl-dev \
        libmagickwand-dev \
	; \
	rm -rf \
        /var/lib/apt/lists/* \
	    /usr/src/wordpress/wp-content/themes/* \
	    /usr/src/wordpress/wp-content/plugins/* \
	    /usr/src/wordpress/wp-config-example.php \
    ;

# -- install Redis PHP extension
RUN pecl channel-update pecl.php.net \
    && pecl install redis \
    && docker-php-ext-enable redis

# -- PHP redis
COPY ./Files/php/docker-php-ext-redis.ini /usr/local/etc/php/conf.d/docker-php-ext-redis.ini

# -- apache rewrite

RUN a2enmod ssl && a2enmod rewrite; \
    mkdir -p /etc/apache2/ssl

# -- apache SSL

COPY ./Files/ssl/*.pem /etc/apache2/ssl/
COPY ./Files/apache/sites-available/*.conf /etc/apache2/sites-available/

# -- WordPress , gets copies to apache root /var/www/html
COPY ./Files/wordpress/ /usr/src/wordpress/

# -- php xdebug

RUN pecl channel-update pecl.php.net
RUN pecl install xdebug \
    && docker-php-ext-enable xdebug

# -- Standard WordPress Env Vars

ENV WORDPRESS_DB_USER="blah_blah_user"
ENV WORDPRESS_DB_NAME="blah_blah_database"
ENV WORDPRESS_TABLE_PREFIX="wp_"
ENV WORDPRESS_DB_CHARSET="utf8"
ENV WORDPRESS_DB_COLLATE=""

Update Docker Composer and ECS Task Definitions

We set our environment variables via Docker composer files locally or in the ECS Task Definitions for the AWS cluster service. This allows each deployment environment to connect to various servers or services depending on the need.

Here is our example third-layer “composer secrets” setup for the WordPress server on our local laptop. This bypasses the Redis server as we develop locally in a non-cluster/single container environment. No need for the extra complexity or server costs in that mode.

services:
  wp:
    environment:
      PHP_SESSION_SAVE_HANDLER: 'files'
      PHP_SESSION_SAVE_PATH: ''

For a PHP connection to a cluster, like we have on our AWS fault-tolerant container clusters you and fault-tolerant ElastiCache clusters you need to set something similar in the Task Definition environment variables using the same names as above.

      PHP_SESSION_SAVE_HANDLER: 'redis'
PHP_SESSION_SAVE_PATH: 'tcp://blah-saas-staging.blah.blah.blah.amazonaws.com:6379?persistent=1&failover=1&timeout=2&read_timeout=2&serialize=php&cluster=redis'

PHP Code Updates For Redis Clusters

When we spun up our servers for the first time the above configuration did not work as expected. Turns out PHP (and WordPress) does not like to work with clustered Redis (or Valkey) servers. The problem is that a token may not always reside on the Redis server that the PHP app first connects to. Remember, this is a cluster which means there are going to be 2..n Redis servers running. One of the first things that happened when session_start() was called from with our PHP app was we got a “PHP fatal” error:

PHP Warning:  session_start(): Error communicating with Redis server 

PHP Fatal error:  Uncaught RedisException: MOVED

Turns out that MOVED item is a problem. Our session token was NOT on the initial Redis server our app connected to. PHP, did not handle the moved key by default – even with the Redis extension installed.

We needed to write some code to intercept the session management and override the default PHP session handler with our own setup. We are still investigating WHY this is not working the same as the php.ini settings – but for now we ended up having to implement our own PHP session handler and hook it into our WordPress muplugins_loaded hook to ensure it takes over PHP sessions early in the process.

<?php
defined( 'MYSLP_VERSION' ) || exit;


/**
 *
 */
class RedisClusterSessionHandler implements SessionHandlerInterface {
	private $redis;

	public function __construct() {
		$redisClusterEndpoint = get_cfg_var( 'session.save_path' );
		if ( empty( $redisClusterEndpoint ) ) {
			throw new RuntimeException( 'No Redis Cluster endpoint configured' );
		}


		// Parse and extract host/port (handle both single node and cluster)
		$parsedUrl = parse_url( $redisClusterEndpoint );
		$redisHost = $parsedUrl['host'] ?? 'localhost';
		$redisPort = $parsedUrl['port'] ?? 6379;

		// Use an array format required by RedisCluster
		$redisClusterNodes = [ "$redisHost:$redisPort" ];

		try {
			// Initialize RedisCluster
			$this->redis = new RedisCluster( null, $redisClusterNodes, 2.5, 2.5, true );
		} catch ( RedisClusterException $e ) {
			throw new RuntimeException( 'Failed to connect to Redis Cluster: ' . $e->getMessage() );
		}

	}

	/**
	 * Initialize session
	 * @link https://php.net/manual/en/sessionhandlerinterface.open.php
	 *
	 * @param $savePath
	 * @param $sessionName
	 *
	 * @return bool <p>
	 * The return value (usually TRUE on success, FALSE on failure).
	 * Note this value is returned internally to PHP for processing.
	 * </p>
	 * @since 5.4
	 */
	public function open( $savePath, $sessionName ): bool {
		return true; // No need to do anything here
	}

	/**
	 * Close the session
	 * @link https://php.net/manual/en/sessionhandlerinterface.close.php
	 * @return bool <p>
	 * The return value (usually TRUE on success, FALSE on failure).
	 * Note this value is returned internally to PHP for processing.
	 * </p>
	 * @since 5.4
	 */
	public function close(): bool {
		return true; // No need to close anything explicitly
	}

	/**
	 * Read session data
	 * @link https://php.net/manual/en/sessionhandlerinterface.read.php
	 *
	 * @param $sessionId
	 *
	 * @return string <p>
	 * Returns an encoded string of the read data.
	 * If nothing was read, it must return false.
	 * Note this value is returned internally to PHP for processing.
	 * </p>
	 * @since 5.4
	 */
	public function read( $sessionId ): string {
		$sessionData = $this->redis->get( "PHPREDIS_SESSION:$sessionId" );

		return $sessionData ?: ''; // Return session data or empty string if not found
	}

	/**
	 * Write session data
	 * @link https://php.net/manual/en/sessionhandlerinterface.write.php
	 *
	 * @param $sessionId
	 * @param string $data <p>
	 * The encoded session data. This data is the
	 * result of the PHP internally encoding
	 * the $_SESSION superglobal to a serialized
	 * string and passing it as this parameter.
	 * Please note sessions use an alternative serialization method.
	 * </p>
	 *
	 * @return bool <p>
	 * The return value (usually TRUE on success, FALSE on failure).
	 * Note this value is returned internally to PHP for processing.
	 * </p>
	 * @since 5.4
	 */
	public function write( $sessionId, $data ): bool {
		return $this->redis->setex( "PHPREDIS_SESSION:$sessionId", 3600, $data ); // 1-hour TTL
	}

	/**
	 * Destroy a session
	 * @link https://php.net/manual/en/sessionhandlerinterface.destroy.php
	 *
	 * @param $sessionId
	 *
	 * @return bool <p>
	 * The return value (usually TRUE on success, FALSE on failure).
	 * Note this value is returned internally to PHP for processing.
	 * </p>
	 * @since 5.4
	 */
	public function destroy( $sessionId ): bool {
		return $this->redis->del( [ "PHPREDIS_SESSION:$sessionId" ] ) > 0;
	}

	/**
	 * Cleanup old sessions
	 * @link https://php.net/manual/en/sessionhandlerinterface.gc.php
	 *
	 * @param $maxLifetime
	 *
	 * @return int|false <p>
	 * Returns the number of deleted sessions on success, or false on failure. Prior to PHP version 7.1, the function returned true on success.
	 * Note this value is returned internally to PHP for processing.
	 * </p>
	 * @since 5.4
	 */
	public function gc( $maxLifetime ): int|false {
		return true; // Redis handles expiration via TTL, so no need to do anything
	}
}

/**
 *
 */
class MySLP_RedisCluster extends MySLP_Base {
	private $redis;

	/**
	 * Catch cluster redirects (MOVED) using the built-in PHP RedisCluster lib
	 * @return void
	 * @throws RedisClusterException
	 */
	final function initialize() {
		if ( class_exists( 'RedisCluster' ) ) {
			try {
				$handler = new RedisClusterSessionHandler();
				session_set_save_handler( $handler, true );

				if ( session_status() === PHP_SESSION_NONE ) {
					session_start();
				}
			} catch ( RuntimeException $e ) {
				error_log( 'Error initializing RedisClusterSessionHandler: ' . $e->getMessage() );
			}
		}
	}
}

Set Your Salt And Keys

Once we got our Redis (Valkey) server setup and configured for our ECS containers, we were still having a problem with sessions remaining logged in across containers. Turns out the default WordPress Docker image is setup to generate random keys and salts for the WordPress configuration. These salts affect the validation token that is stored within the “logged in cookie” for WordPress. As such, once a user is logged in the cookie on server A in the cluster is not a valid authentication cookie on server B.

To solve this discrepancy we used the built in mechanism to force keys and salts to be read from environment variables. Each level of deployment, develop, staging, and production uses their own salts and keys however they are the SAME for every node in the cluster for each level. All of the staging servers that are spun up use the same task definition which sets the WordPress salts and keys.

Looking at the wp-config-docker override which is built into the WordPress Docker images you will find the following:

define( 'AUTH_KEY',         getenv_docker('WORDPRESS_AUTH_KEY',         'put your unique phrase here') );
define( 'SECURE_AUTH_KEY',  getenv_docker('WORDPRESS_SECURE_AUTH_KEY',  'put your unique phrase here') );
define( 'LOGGED_IN_KEY',    getenv_docker('WORDPRESS_LOGGED_IN_KEY',    'put your unique phrase here') );
define( 'NONCE_KEY',        getenv_docker('WORDPRESS_NONCE_KEY',        'put your unique phrase here') );
define( 'AUTH_SALT',        getenv_docker('WORDPRESS_AUTH_SALT',        'put your unique phrase here') );
define( 'SECURE_AUTH_SALT', getenv_docker('WORDPRESS_SECURE_AUTH_SALT', 'put your unique phrase here') );
define( 'LOGGED_IN_SALT',   getenv_docker('WORDPRESS_LOGGED_IN_SALT',   'put your unique phrase here') );
define( 'NONCE_SALT',       getenv_docker('WORDPRESS_NONCE_SALT',       'put your unique phrase here') );

This allows you to set environment variables like “WORDPRESS_SECURE_AUTH_SALT” in the ECS Task Definition and ensure that a login token on server A will hash out to the same value when the user’s session is switched over to server B in the cluster.

From here our session state was secure even though our connections would wander through various servers in the ECS cluster.

Hope this helps you get your WordPress cluster working. If you have more hints or tips please leave a comment.

Image by Corey Dupree from Pixabay

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.