Solving AWX Error “unexpected EOF make: *** [Makefile:508: docker-compose-build] Error 1”

My Problem

Attempting to build the containers for a Docker Compose installation of AWX fails with the error:

unexpected EOF
make: *** [Makefile:508: docker-compose-build] Error 1

My Solution

Give up on trying to clone a specific tag / release and simply clone master. As of the writing of this blog post, there’s both an error in the documentation concerning using tagged releases as well as a disconnect between the tagged release version numbering vs the official Docker image naming scheme.

If you really want to use a specific image, check out the published images here and then find a specific release that you want. For example, release 4.2. Edit the AWX Makefile and change the line that assigns a value to the variable COMPOSE_TAG towards the top of the file. Assign it the value of the tag that you desire. For example COMPOSE_TAG ?= release_4.2 which will then pull image ghcr.io/ansible/awx_devel:release_4.2

The Long Story

I’m attempting to create a test installation of Ansible’s AWX project following the Docker Compose instructions. After installing the required prerequisite packages, I follow the official documentation and clone the latest stable branch (at the time of this blog post):

git clone https://github.com/ansible/awx.git --branch 21.4.0

From there I modify the inventory file using settings appropriate to my environment:

localhost ansible_connection=local ansible_python_interpreter="/usr/bin/env python"

[all:vars]

# pg_password="mega_Secret"
# broadcast_websocket_secret="giga_Secret"
# secret_key="tera_Secret"

Then I run make docker-compose-build and wait for a few minutes before hitting the error:

unexpected EOF
make: *** [Makefile:508: docker-compose-build] Error 1

Let’s examine the relevant lines around 508:

    506 docker-compose-build:
    507         ansible-playbook tools/ansible/dockerfile.yml -e build_dev=True -e receptor_image=$(RECEPTOR_IMAGE)
    508         DOCKER_BUILDKIT=1 docker build -t $(DEVEL_IMAGE_NAME) \
    509             --build-arg BUILDKIT_INLINE_CACHE=1 \
    510             --cache-from=$(DEV_DOCKER_TAG_BASE)/awx_devel:$(COMPOSE_TAG) .

As we can see, the docker-compose-build target starts on 506, and the whole target only lasts until line 510. It’s really just two commands. The first being a call to run the dockerfile.yml playbook that builds the Dockerfile, and then docker build to actually create the Docker image. Simple enough, and yet something is causing it to bomb out.

I decided to run these commands manually in my shell, but to do so meant I also had to manually populate the variables. Some variables were themselves created from a combination of other variables. Ultimately, this is what I had to create:

  • RECEPTOR_IMAGE is assigned on line 30: RECEPTOR_IMAGE ?= quay.io/ansible/receptor:devel
  • DEVEL_IMAGE_NAME is assigned on line 28: DEVEL_IMAGE_NAME ?= $(DEV_DOCKER_TAG_BASE)/awx_devel:$(COMPOSE_TAG)
  • DEV_DOCKER_TAG_BASE is assigned on line 27: DEV_DOCKER_TAG_BASE ?= ghcr.io/ansible
  • COMPOSE_TAG is assigned on line 12: COMPOSE_TAG ?= $(GIT_BRANCH)
  • GIT_BRANCH is assigned on line 5: GIT_BRANCH ?= $(shell git rev-parse --abbrev-ref HEAD)

Ultimately I assigned the variables these values:

RECEPTOR_IMAGE="quay.io/ansible/receptor:devel"
DEV_DOCKER_TAG_BASE="ghcr.io/ansible"
GIT_BRANCH=$(git rev-parse --abbrev-ref HEAD)
COMPOSE_TAG="$GIT_BRANCH"
DEVEL_IMAGE_NAME="$DEV_DOCKER_TAG_BASE/awx_devel:$COMPOSE_TAG"

Let’s double check to make sure they look right:

echo -e "RECEPTOR_IMAGE = $RECEPTOR_IMAGE \nDEV_DOCKER_TAG_BASE = $DEV_DOCKER_TAG_BASE \nGIT_BRANCH = $GIT_BRANCH \nCOMPOSE_TAG = $COMPOSE_TAG \nDEVEL_IMAGE_NAME = $DEVEL_IMAGE_NAME"
RECEPTOR_IMAGE = quay.io/ansible/receptor:devel
DEV_DOCKER_TAG_BASE = ghcr.io/ansible
GIT_BRANCH = HEAD
COMPOSE_TAG = HEAD
DEVEL_IMAGE_NAME = ghcr.io/ansible)/awx_devel:HEAD

Now I can continue on by manually executing the next command that the docker-compose-build target performs:

ansible-playbook tools/ansible/dockerfile.yml -e build_dev=True -e receptor_image=$RECEPTOR_IMAGE

The four tasks complete successfully and a Dockerfile and some config files are created. Next up, it’s time to build the Docker image:

DOCKER_BUILDKIT=1 docker build -t $DEVEL_IMAGE_NAME --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from=$DEV_DOCKER_TAG_BASE/awx_devel:$COMPOSE_TAG .

And the build process kicks off. After a few minutes, we bomb out with:

------
 > importing cache manifest from ghcr.io/ansible/awx_devel:HEAD:
------
importing cache manifest from ghcr.io/ansible/awx_devel:HEAD:
unexpected EOF

Maybe for the first time in this process I really start to think. More importantly, I start to look up at the tidal wave of Docker build logs in my terminal. Towards the top, I see this:

=> ERROR importing cache manifest from ghcr.io/ansible/awx_devel:HEAD

Trying to manually pull that image receives a slightly more helpful error:

docker pull ghcr.io/ansible/awx_devel:HEAD
Error response from daemon: manifest unknown

Maybe if we don’t use a tag and just let docker use the default tag of latest?

docker pull ghcr.io/ansible/awx_devel
Using default tag: latest
Error response from daemon: manifest unknown

Looking at the latest published container images for the AWX project and I see what’s wrong:

There’s no HEAD tag, there’s no latest. Instead there’s devel among other release and version tags.

Apparently we’re not getting what the Makefile authors expected when we populate the GIT_BRANCH variable with git rev-parse --abbrev-ref HEAD.

Let’s refresh our memories from a few minutes ago when I explained how I cloned the AWX repository. The official documentation very clearly states:

We generally recommend that you view the releases page and clone the latest stable tag, e.g., git clone -b x.y.z https://github.com/ansible/awx.git

Please note that deploying from HEAD (or the latest commit) is not stable, and that if you want to do this, you should proceed at your own risk.

https://github.com/ansible/awx/blob/devel/tools/docker-compose/README.md#clone-the-repo

The documentation has simultaneously told us to check out a a specific branch, and also scared us away from using HEAD. This I dutifully did:

git clone https://github.com/ansible/awx.git /awx --branch 21.4.0

However, cloning a tag puts you in a detached HEAD state. Thus when you git rev-parse --abbrev-ref HEAD you get back… well… HEAD. The logic of the Makefile then searches for a base Docker image named ghcr.io/ansible/awx_devel:HEAD which does not exist.

If we were bad users and just cloned MASTER like we were told not to, what would we get as the value of `COMPOSE_TAG“?

git rev-parse --abbrev-ref HEAD
devel

And that would ‘solve’ the issue. Except, would it? Now we’re using the HEAD docker image, which we were told not to. If we look at all of the tagged images we see something curious. None of the image tags appear to be named in any way similar to the branch tags. What’s release_4.2? What’s release_3.8.7? Looking at tagged releases shows a completely different numbering system. Even if I did get the tagged release version with something like git show -s --pretty=%d HEAD, there would be no docker image to correlate it to. I can’t even find correlations between the SHAs for tags vs images.

The AWX project uses GitHub Actions to build official images. One action builds and pushes the Docker images, and we can see that it’s only triggered on pushes to branches named devel or release_*. Furthermore, it tags the image with GITHUB_REF##, which now explains why images are named release_4.2 for example and not anything to do with the tagged release names like the documentation would indicate.

It was then that I began to realize something humorous. Every blog and forum post I’ve encountered in the last month which explained how an individual installed the Docker Compose version of AWX always cloned from master. No one seems to have read or cared about the documentation enough to follow directions to deploy only tagged releases. When I cloned master and ran the docker-compose-build target, everything completed perfectly.

Further proof that reading the documentation is overrated and one should just wing it whenever possible.

Solving “zsh: no matches found” When Using ‘*’ Characters

My Problem

When attempting to scp a directory full of files from a remote machine to a local machine I encountered the error “zsh: no matches found”.

scp user@remote-host:/remote/filesystem/* /local/filesystem
zsh: no matches found: user@remote-host:/remote/filesystem/*

My Solution

Escape the glob, either with a backslash, or quotes around the shell word that has the glob in it. In my example, a backslash would look like:

 scp user@remote-host:/remote/filesystem/\* /local/filesystem 

And quoting would look like:

 scp 'user@remote-host:/remote/filesystem/*' /local/filesystem 

You could also choose to use the noglob precommand modifier before the use of scp but that might not be useful if you’re using multiple globs and want one to actually expand locally. Nevertheless, this will work for simple uses:

noglob scp user@remote-host:/remote/filesystem/* /local/filesystem 

Solved: “error: cannot open Packages database in /var/lib/rpm” and “Error: rpmdb open failed”

My Problem

After a failed yum update I was no longer able to run yum. I received errors that culminated in:

Error: rpmdb open failed

My Solution

Back up the rpm directory with something like tar -czvf /var/lib/rpm.bak /var/lib/rpm/

Then use the Berkeley DB tools db_dump and db_load to reload the Packages database:

$ mv /var/lib/rpm/Packages /var/lib/rpm/Packages.borked
$ db_dump -f /var/lib/rpm/Packages.borked /var/lib/rpm/Packages.dump
$ db_load -f /var/lib/rpm/Packages.dump /var/lib/rpm/Packages

The Long Story (and alternative fixes)

When attempting to do pretty much anything with yum, such as yum update or yum-complete-transaction, I would receive a rather nasty looking error:

error: rpmdb: BDB0113 Thread/process 8630/140300564338496 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 - (-30973)
error: cannot open Packages database in /var/lib/rpm
CRITICAL:yum.main:

Error: rpmdb open failed

These errors were preceded by an attempt to simply yum update install -y a CentOS 7 machine, however the update failed because the kernel oomkilled the yum process. As a result, the rpm database was a bit unhappy.

The Packages file is a Berkeley DB database, so the first thing I did was checked the file with db_verify /var/lib/rpm/Packages. Oddly, it appeared to be fine.

$ db_verify /var/lib/rpm/Packages
BDB5105 Verification of /var/lib/rpm/Packages succeeded.

Attempting to recover the database was of no use since there were apprently no log file to replay:

$ db_recover -fv
BDB2525 No log files found

Another trick is to dump and reload the database. db_dump has a few options to consider. Most notably -r and -R — both used for recovery, one being more aggressive than the other. Check man pages for more info.

$ mv /var/lib/rpm/Packages /var/lib/rpm/Packages.borked
$ db_dump -f /var/lib/rpm/Packages.borked /var/lib/rpm/Packages.dump
$ db_load -f /var/lib/rpm/Packages.dump /var/lib/rpm/Packages

In my specific case, that did the trick, however there were some possible options that remained for me:

Most notably, db_dump itself has -r and -R options for recovery from a corrupted database file. I’d strongly suggest 1) Having a backup of the Packages file (and a backup of the backup), and 2) Reading the man page for db_dump a few times to have a basic idea of what each flag is going to do.

Another option would be to completely rebuild the RPM database with rpm --rebuilddb.

Adding Simple base64 Decoding to Your Shell

I had a need to repeatedly decode some base64 strings quickly and easily. Easier than typing out openssl base64 -d -in -out, or even base64 --decode file.

The simplest solution that I found and prefer is a shell function with a here string. Crack open your preferred shell’s profile file. In my case, .zshrc. Make a shell function thusly:

decode() {
  base64 --decode <<<$1
}

Depending on your shell and any addons, you may need to echo an extra newline to make the decoded text appear on its own line and not have the next shell prompt append to the decoded text.

Workaround: “Unable to Change Virtual Machine Power State: Cannot Find a Valid Peer Process to Connect to”

My Problem

Attempting to start a virtual machine in VMware Workstation 15 Pro (15.0.3) on a RedHat based Linux workstation caused the following error: “Unable to Change Virtual Machine Power State: Cannot Find a Valid Peer Process to Connect to”

I was able to start other virtual machines in the VM library, however.

My Workaround

Note that this is simply a workaround. I don’t yet know the ultimate cause, but I’m documenting how I workaround it until I or someone else can figure out the ultimate cause of this problem.

First, check to see if the virtual machine is actually running, in spite of there being no visual indicators within VMware Workstation: vmrun list

You’ll probably see that the virtual machine is running. If you don’t, then this workaround isn’t likely to help you. Attempt to shut the running virtual machine down softly: vmrun stop /path/to/virtual_machine.vmx soft

After that, you should be able to start the machine again, until the next time it crashes for unknown reasons. More news as I discover it.

Dumping Grounds (Turn Back Now):

I’ll dump some of my notes here and they’ll be updated periodically as I find out more info about this issue. You’re completely safe to ignore everything past this point. Abandon all hope, ye who proceed.

I had recently upgraded from Fedora 29 to Fedora 30, and was experiencing some minor instability with my main workstation. I’m not sure if that was the ultimate cause of this issue, but I’m suspicious since I never had this issue until after the upgrade.

My first act was to go to the Help menu, select the “Support” menu and then “Collect Support Data…” I chose to collect data for the specific VM that was having this issue. This took quite a while, by my standards. About 20 minutes. It basically creates a giant zipped dump of pertinent files across your physical machine that pertain to VMware and that specific virtual machine. It’s not super easy to parse and know what to look for.

I searched through /var/log/vmware/ for any clues in any of the log files found therein. Grepping for all files that had the pertinent virtual machine’s name, and looking for surrounding context didn’t turn anything up.

I attempted to start the vmware-workstation-server service but that failed. I don’t think that’s the issue since the virtual machine isn’t a shared VM.

I tried vmrun list and saw that the Windows VM was actually listed as running. I stopped it soft: vmrun stop /path/to/my/virtual_machine.vmx soft and was then able to start the virtual machine. I’m not sure what’s causing the crash, and what’s causing the crash of VMware Workstation Pro, and why when I start it back up it doesn’t appear to know that the VM it was previously working with is actually running.

Solved: “bad input file size” When Attempting to `setfont` to a New Console Font

My Problem

In a Linux distribution of one kind or another, when attempting to set a new console font in a TTY, you may received the following error:

# setfont -32 ter-u32n.bdf
bad input file size

My Solution

First, if you’re coming to this blog post because you’re attempting to install larger Terminus fonts for your TTY, you probably just want to search your distribution’s package manager for Terminus, specifically the console fonts package:

$ yum search terminus
== Name Matched: terminus ==
terminus-fonts.noarch : Clean fixed width font
terminus-fonts-grub2.noarch : Clean fixed width font (grub2 version)
terminus-fonts-console.noarch : Clean fixed width font (console version)
$ yum install terminus-fonts-console

However if you’re coming to this blog post for other reasons, then you’re probably attempting to setfont with a .bdf file or just something that isn’t a .psf file. You most likely need to follow the instructions for your font, in my case Terminus, to make the files into the proper .psf format.The Linux From Scratch project has a good quick primer on the topic that you can use to mine for search terms and further information.

With my specific font, what worked for me was:

$ sudo ./configure --psfdir=/usr/lib/kbd/consolefonts
$ sudo make -j8 psf
# Stuff happens here
$ sudo make install-psf

After that, I had the fonts installed into my /usr/lib/kbd/consolefonts directory and was able to setfont and further change my TTY font to my preferences.

Solved: Attempting to Install and Configure Wireguard Fails with “Unknown device type” and “FATAL: Module wireguard not found in directory”

My Problem

Attempting to install and use Wireguard (version 0.0.20190406-1) on Fedora release 29 is unsuccessful with a variety of symptoms. The first being:

ip link add dev wg0 type wireguard
Error: Unknown device type.

Attempting to get some info about the module with modprobe shows:

$ modprobe wireguard
modprobe: FATAL: Module wireguard not found in directory /lib/modules/5.0.4-2004

The dkms tool shows that the wireguard module is added:

$ dkms status
wireguard, 0.0.20190406: added

However, attempting to build it shows:

$ dkms build wireguard/0.0.20190406
Error! echo
Your kernel headers for kernel 5.0.4-200.fc29.x86_64 cannot be found at /lib/modules/5.0.4-200.fc29.x86_64/build or /lib/modules/5.0.4-200.fc29.x86_64/.

My Solution

Make sure that your running kernel and your kernel headers are the same version, or at least that the running version of the kernel is newer than your kernel headers.

For example, I’m running on a RedHat based system, and checked the following:

$ uname --kernel-release
5.0.4-200.fc29.x86_64

But then the kernel headers were newer:

$ rpm -q kernel-headers
kernel-headers-5.0.9-200.fc29.x86_64

My solution was to yum update the kernel and reboot. I didn’t have to re-install the headers or the wireguard packages. Another possible solution would have been to manually install 5.0.4 kernel headers, but that would require uninstalling packages that marked 5.0.9 kernel headers as a dependency. I believe the cleaner solution is to simply update the kernel.

The Long Story

First, I checked that I even had kernel headers installed in the first place:

$ rpm -q kernel-headers
kernel-headers-5.0.9-200.fc29.x86_64

Well that’s interesting, because:

$ uname --kernel-release
5.0.4-200.fc29.x86_64

So I’m running kernel 5.0.4, but the kernel-headers package that I’m offered is for 5.0.9. I attempted to install the specific kernel header package by version:

yum install kernel-headers-5.0.4-200.fc29.x86_64
[...]
No match for argument: kernel-headers-5.0.4-200.fc29.x86_64

At this point, I had two viable options.

  1. I could update the running kernel, since 5.0.10-200.fc29 was released and waiting for me.
  2. I could go into Fedora’s build system, Koji, and pull out the specific kernel headers package that I needed to then install manually.

Choosing #2, however, would require me to uninstall the current 5.0.9 kernel headers, and anything that had it as a dependency. This includes things like binutils and gcc, among many others. I decided to update the system. A quick yum update and reboot later, and:

$ uname -or
5.0.10-200.fc29.x86_64 GNU/Linux

My only concern was that the headers that are in the official yum repo are 5.0.9; a minor version behind the new kernel:

rpm -q kernel-headers
kernel-headers-5.0.9-200.fc29.x86_64

Nevertheless, my fears were allayed with dkms:

$ dkms status
wireguard, 0.0.20190406, 5.0.10-200.fc29.x86_64, x86_64: installed

Previously, wireguard had only been added, but not successfully installed. I quickly tried to add a wireguard interface:

$ ip link add dev wg0 type wireguard
$ ip link show wg0
3: wg0: <POINTOPOINT,NOARP> mtu 1420 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/none

Success!

Solved: WordPress – “An unexpected error occurred.” when installing plugins, themes, and more.

My Problem

Attempting to add things to WordPress like plugins or themes causes the following error:

An unexpected error occurred. Something may be wrong with WordPress.org or this server’s configuration. If you continue to have problems, please try the support forums.

My Solution

Check your SELinux audit logs for signs of denials. Your web server software (probably Apache / httpd) or a module being used by the software is most likely having outbound connection attempts denied.

Continue reading Solved: WordPress – “An unexpected error occurred.” when installing plugins, themes, and more.