The AWS Cloud services has been challenging to get to behave as expected while building the new DevOps setup for 2024. Part of the problem is that a lot of the AWS documentation is outdated. Same goes for most of the solutions and suggestions provided by various forums such as Stack Overflow. Forays into ChatGPT and other AI services to try to dig through disparate online resources yielded incomplete and inaccurate results. Thankfully I am stubborn AF and kept pushing ahead until there was a solution. This is what I learned along the way.
While the initial rudimentary tests went well, allowing us to update the ECS baseline server image any time I updated a subordinate code module, it fell apart quickly when I put all the pieces together. I did have a simple test working where I could update the DevOps repository that ultimately created the container image hosted at AWS ECR. I also got CodePipeline to monitor this repo, pull in the latest code, and build the ECR, and even fire up the entire IT services stack in the cloud complete with load balancers and all. It worked great. Until it didn’t.
CodeBuild Git Submodules Is A Fickle Bitch
Turns out CodeBuild’s Use git submodules feature is very particular. Rather than describe all the nuances and wrong turns, here is what I learned about what actually worked.
YES, Use git Submodules works to pull in code from other git projects and place them in directories CodeBuild can access when running what is essentially the Docker build command.
However, it needs to be configured exactly right.
Obviously you’ll want to check of “Use Git Submodules” in the advanced section of the source attribute of the build project.
Source: Git clone depth
While I have not yet played with putting this setting back to the default value of “1”, I did find that you get access to the entire repo history with this set to “Full”. Our builds are now working with this setting, though “Full” may not be as important as I thought. I will check that out in a future build.
Environment : Image
This is VERY important. It seems to have a major impact on what is built and how. Since I want the same images available on our local MacOS for testing and running in the cloud, I are using an aarch64 / ARM based OS for builds and for the final image production. I’ve tried other mixes like the “builder container” running a different architecture, but trust us — it is faster and with less side effects to have the image builder running aarch64/ARM if the image you are building is aarch64 / ARM.
Now, here is the BIGGER part of that setting. The image that actually works: aws/codebuild/amazonlinux2-aarch64-standard:2.0
And this seems to be a bigger part of the story. I’ll talk about that later.
Adding Your Submodules
On your local development box you’ll want to make sure you add your submodules using HTTPS URLs only.
Since I are using CodeCommit to store all of our modules, that means setting up an IAM user and granting that. user read only access to CodeCommit repos. That user also needs HTTPS login credentials for CodeCommit to be setup under Security Credentials.
This also means that for the local builds you’ll want to fetch the username/password in a secure fashion. I used AWS Secrets Manager for that so I could store the username and password for this HTTPS CodeCommit user there and retrieve it from the AWS CLI as needed. This was baked into our shell scripts that added the submodules so I don’t have to type in a username and password 20x without storing the actual credentials in that script. The shell script tools are stored alongside the image builder definitions like Dockerfile, buildspec.yml, etc. so I don’t want to be storing passwords there.
I dropped these submodules into a directory in our Docker image builder directory this allows us to bring in the files to the image with a COPY command in Dockerfile versus a mount. I want our SaaS server image to be stable after all. Yes, I mount some EFS stuff but only for persistent user uploads like images. I don’t want code there I want it on the “baked in” server image.
SSH Breaks CodeBuild Git Submodules
Why HTTPS and not SSH?
Well it turns out using SSH with the proper keys makes building the images on the local box easy. I can even push the images to ECR when I are done. However, this skips over the CodeBuild script and makes it extremely difficult to setup and efficient and effective CodePipline for CI/CD container image deployments.
Also – SSH does NOT work on CodeBuild. Although the documentation on CodeBuild seems to indicate that both HTTPS and SSH access vectors should work for pulling down submodules into the source, it absolutely DOES NOT work with SSH with version 2.0 of the Amazon Linux 2 server image.
If you keep SSH URLS in place you will get the following error:
Submodule error error creating SSH agent: "SSH agent requested but SSH_AUTH_SOCK not-specified" for primary source and source version refs/heads/develop
Now, despite a whole lot of suggestions on AWS forums and via the “AI” helpers like ChatGPT and others, adding any number of ssh-agent or pushing your SSH keys into a .ssh dir on the install or pre-build phases of the buildspec.yml is useless.
This error is happening way BEFORE the image build process which is Docker running inside a Docker container to build a docker image. The problem is on the “outermost shell” of that stack of Russian dolls. Nothing you add to buildspec is going to help resolve this situation. It needs to be be resolve in the Amazon Linux 2 image or in the CodeBuild execution stack. (Side note, you could build your own custom CodeBuild docker image to run this DinD process, but I elected to not go that deep down the rabbit hole).
So, bottom line, Amazon Linux 2 does NOT have an SSH agent installed and cannot pull in SSH-based URLS for the git repos. Time to change to HTTPS as noted above.
Amazon Linux 2 Version 3.0 And Submodules
As far as I can tell, the Amazon Linux 2 3.0 image does not pull down submodules AT ALL. Ever. While it does not throw the SSH_AUTH_SOCK error and is able to move onto the install, pre-build, and build stages, it completely ignores the submodules.
I tried all kinds of trick to get this to work. Nothing did the trick. No IAM users, not roles or policies, nothing.
Yes, I could completely rewrite the buildspec and/or Dockerfile to manually setup the SSH keys, init the submodules, and build the submodule stack but that seems to defeat the entire purpose of AWS CodeBuild having a Use git submodules settings in the first place. After all, you can do that with any Dockerfile and composer setup without having to tell the “host launcher” for a DinD setup that you are going to do so.
From what I can tell, Amazon Linux 2 version 3.0 simply does not work with the CodeBuild Submodules feature.