Solving AWX Error “unexpected EOF make: *** [Makefile:508: docker-compose-build] Error 1”

My Problem

Attempting to build the containers for a Docker Compose installation of AWX fails with the error:

unexpected EOF
make: *** [Makefile:508: docker-compose-build] Error 1

My Solution

Give up on trying to clone a specific tag / release and simply clone master. As of the writing of this blog post, there’s both an error in the documentation concerning using tagged releases as well as a disconnect between the tagged release version numbering vs the official Docker image naming scheme.

If you really want to use a specific image, check out the published images here and then find a specific release that you want. For example, release 4.2. Edit the AWX Makefile and change the line that assigns a value to the variable COMPOSE_TAG towards the top of the file. Assign it the value of the tag that you desire. For example COMPOSE_TAG ?= release_4.2 which will then pull image

The Long Story

I’m attempting to create a test installation of Ansible’s AWX project following the Docker Compose instructions. After installing the required prerequisite packages, I follow the official documentation and clone the latest stable branch (at the time of this blog post):

git clone --branch 21.4.0

From there I modify the inventory file using settings appropriate to my environment:

localhost ansible_connection=local ansible_python_interpreter="/usr/bin/env python"


# pg_password="mega_Secret"
# broadcast_websocket_secret="giga_Secret"
# secret_key="tera_Secret"

Then I run make docker-compose-build and wait for a few minutes before hitting the error:

unexpected EOF
make: *** [Makefile:508: docker-compose-build] Error 1

Let’s examine the relevant lines around 508:

    506 docker-compose-build:
    507         ansible-playbook tools/ansible/dockerfile.yml -e build_dev=True -e receptor_image=$(RECEPTOR_IMAGE)
    508         DOCKER_BUILDKIT=1 docker build -t $(DEVEL_IMAGE_NAME) \
    509             --build-arg BUILDKIT_INLINE_CACHE=1 \
    510             --cache-from=$(DEV_DOCKER_TAG_BASE)/awx_devel:$(COMPOSE_TAG) .

As we can see, the docker-compose-build target starts on 506, and the whole target only lasts until line 510. It’s really just two commands. The first being a call to run the dockerfile.yml playbook that builds the Dockerfile, and then docker build to actually create the Docker image. Simple enough, and yet something is causing it to bomb out.

I decided to run these commands manually in my shell, but to do so meant I also had to manually populate the variables. Some variables were themselves created from a combination of other variables. Ultimately, this is what I had to create:

  • RECEPTOR_IMAGE is assigned on line 30: RECEPTOR_IMAGE ?=
  • DEVEL_IMAGE_NAME is assigned on line 28: DEVEL_IMAGE_NAME ?= $(DEV_DOCKER_TAG_BASE)/awx_devel:$(COMPOSE_TAG)
  • DEV_DOCKER_TAG_BASE is assigned on line 27: DEV_DOCKER_TAG_BASE ?=
  • COMPOSE_TAG is assigned on line 12: COMPOSE_TAG ?= $(GIT_BRANCH)
  • GIT_BRANCH is assigned on line 5: GIT_BRANCH ?= $(shell git rev-parse --abbrev-ref HEAD)

Ultimately I assigned the variables these values:

GIT_BRANCH=$(git rev-parse --abbrev-ref HEAD)

Let’s double check to make sure they look right:


Now I can continue on by manually executing the next command that the docker-compose-build target performs:

ansible-playbook tools/ansible/dockerfile.yml -e build_dev=True -e receptor_image=$RECEPTOR_IMAGE

The four tasks complete successfully and a Dockerfile and some config files are created. Next up, it’s time to build the Docker image:

DOCKER_BUILDKIT=1 docker build -t $DEVEL_IMAGE_NAME --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from=$DEV_DOCKER_TAG_BASE/awx_devel:$COMPOSE_TAG .

And the build process kicks off. After a few minutes, we bomb out with:

 > importing cache manifest from
importing cache manifest from
unexpected EOF

Maybe for the first time in this process I really start to think. More importantly, I start to look up at the tidal wave of Docker build logs in my terminal. Towards the top, I see this:

=> ERROR importing cache manifest from

Trying to manually pull that image receives a slightly more helpful error:

docker pull
Error response from daemon: manifest unknown

Maybe if we don’t use a tag and just let docker use the default tag of latest?

docker pull
Using default tag: latest
Error response from daemon: manifest unknown

Looking at the latest published container images for the AWX project and I see what’s wrong:

There’s no HEAD tag, there’s no latest. Instead there’s devel among other release and version tags.

Apparently we’re not getting what the Makefile authors expected when we populate the GIT_BRANCH variable with git rev-parse --abbrev-ref HEAD.

Let’s refresh our memories from a few minutes ago when I explained how I cloned the AWX repository. The official documentation very clearly states:

We generally recommend that you view the releases page and clone the latest stable tag, e.g., git clone -b x.y.z

Please note that deploying from HEAD (or the latest commit) is not stable, and that if you want to do this, you should proceed at your own risk.

The documentation has simultaneously told us to check out a a specific branch, and also scared us away from using HEAD. This I dutifully did:

git clone /awx --branch 21.4.0

However, cloning a tag puts you in a detached HEAD state. Thus when you git rev-parse --abbrev-ref HEAD you get back… well… HEAD. The logic of the Makefile then searches for a base Docker image named which does not exist.

If we were bad users and just cloned MASTER like we were told not to, what would we get as the value of `COMPOSE_TAG“?

git rev-parse --abbrev-ref HEAD

And that would ‘solve’ the issue. Except, would it? Now we’re using the HEAD docker image, which we were told not to. If we look at all of the tagged images we see something curious. None of the image tags appear to be named in any way similar to the branch tags. What’s release_4.2? What’s release_3.8.7? Looking at tagged releases shows a completely different numbering system. Even if I did get the tagged release version with something like git show -s --pretty=%d HEAD, there would be no docker image to correlate it to. I can’t even find correlations between the SHAs for tags vs images.

The AWX project uses GitHub Actions to build official images. One action builds and pushes the Docker images, and we can see that it’s only triggered on pushes to branches named devel or release_*. Furthermore, it tags the image with GITHUB_REF##, which now explains why images are named release_4.2 for example and not anything to do with the tagged release names like the documentation would indicate.

It was then that I began to realize something humorous. Every blog and forum post I’ve encountered in the last month which explained how an individual installed the Docker Compose version of AWX always cloned from master. No one seems to have read or cared about the documentation enough to follow directions to deploy only tagged releases. When I cloned master and ran the docker-compose-build target, everything completed perfectly.

Further proof that reading the documentation is overrated and one should just wing it whenever possible.

Solving “zsh: no matches found” When Using ‘*’ Characters

My Problem

When attempting to scp a directory full of files from a remote machine to a local machine I encountered the error “zsh: no matches found”.

scp user@remote-host:/remote/filesystem/* /local/filesystem
zsh: no matches found: user@remote-host:/remote/filesystem/*

My Solution

Escape the glob, either with a backslash, or quotes around the shell word that has the glob in it. In my example, a backslash would look like:

 scp user@remote-host:/remote/filesystem/\* /local/filesystem 

And quoting would look like:

 scp 'user@remote-host:/remote/filesystem/*' /local/filesystem 

You could also choose to use the noglob precommand modifier before the use of scp but that might not be useful if you’re using multiple globs and want one to actually expand locally. Nevertheless, this will work for simple uses:

noglob scp user@remote-host:/remote/filesystem/* /local/filesystem 

Making a Double Clickable PowerShell One Liner

It’s not uncommon for me to need a quick, single focus task to be performed on a Windows machine. The task can often be performed with a PowerShell or cmd one-liner. I do not, however, like to alter my PowerShell execution policy without serious deliberation. Thus, running a script isn’t usually the way to go.

My solution? I use a Windows shortcut that points to the powershell.exe executable, and pass it the -Command option with my simple commands. For example, I frequently have to kill off the parsecd.exe process, so I run this

%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe -Command "& taskkill /T /F /fi 'IMAGENAME eq parsecd.exe'"

Solved: ansible-lint on macOS “FATAL: Ansible CLI and python module versions do not match.”

My Problem

While attempting to run ansible-lint on macOS, I received the error:

FATAL: Ansible CLI (2.10.8) and python module (2.11.5) versions do not match. This indicates a broken execution environment.

My Solution

I ran the following destructive command, but before you even think about running it yourself, please familiarize yourself with what it will do:

brew link --overwrite ansible

The Long Story

While attempting to lint some Ansible .yml files, I realized that the specific macOS machine I was on didn’t have ansible-lint. I begrudgingly use Homebrew on it to manage packages formulae. I ran brew install ansible-lint in a hurry, and attempted to immediately use it. I then received this error:

FATAL: Ansible CLI (2.10.8) and python module (2.11.5) versions do not match. This indicates a broken execution environment.

My first thought was to check if the ansible command corroborated that it was indeed 2.10.8 and it did:

ansible --version
ansible 2.10.8
config file = None
configured module search path = ['/Users/username/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
executable location = /usr/local/bin/ansible
python version = 3.9.7 (default, Sep 3 2021, 12:37:55) [Clang 12.0.5 (clang-1205.0.22.9)]

Then, I checked pip list and saw:

ansible-base 2.10.8

I even checked the file:

grep __version__ /usr/local/lib/python3.9/site-packages/ansible/ __version__ = '2.10.8'

Clearly there’s a different version of Ansible somewhere, so I brute-forced the situation by searching for every file named that was in a path with the word ansible in it:

mdfind "kMDItemDisplayName == ''c"


Hold up. There appears to be two Ansibles installed. One is in the standard filesystem paths, and another is in Homebrew’s Cellar path. Let’s check the Homebrew version:

grep __version__ /usr/local/Cellar/ansible/4.6.0/libexec/lib/python3.9/site-packages/ansible/
__version__ = '2.11.5'

Because I used Homebrew to install ansible-lint, it also installed ansible in Homebrew’s path. In the past, I had installed Ansible using pip which installed binaries in standard OS paths.

If I had paid attention when I was installing ansible-lint I would have noticed this very informative error:

==> Pouring ansible--4.6.0.big_sur.bottle.tar.gz
Error: The brew link step did not complete successfully
The formula built, but is not symlinked into /usr/local
Could not symlink bin/ansible
Target /usr/local/bin/ansible
already exists. You may want to remove it:
rm '/usr/local/bin/ansible'

To force the link and overwrite all conflicting files:
brew link --overwrite ansible

To list all files that would be deleted:
brew link --overwrite --dry-run ansible

Possible conflicting files are:

When considering the situation, I realized that I’d rather be managing all of Ansible’s binaries in Homebrew, so I was comfortable with running brew link --overwrite ansible. The problem was solved, and I could continue on until the next problem.

Solved: “error: cannot open Packages database in /var/lib/rpm” and “Error: rpmdb open failed”

My Problem

After a failed yum update I was no longer able to run yum. I received errors that culminated in:

Error: rpmdb open failed

My Solution

Back up the rpm directory with something like tar -czvf /var/lib/rpm.bak /var/lib/rpm/

Then use the Berkeley DB tools db_dump and db_load to reload the Packages database:

$ mv /var/lib/rpm/Packages /var/lib/rpm/Packages.borked
$ db_dump -f /var/lib/rpm/Packages.borked /var/lib/rpm/Packages.dump
$ db_load -f /var/lib/rpm/Packages.dump /var/lib/rpm/Packages

The Long Story (and alternative fixes)

When attempting to do pretty much anything with yum, such as yum update or yum-complete-transaction, I would receive a rather nasty looking error:

error: rpmdb: BDB0113 Thread/process 8630/140300564338496 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 - (-30973)
error: cannot open Packages database in /var/lib/rpm

Error: rpmdb open failed

These errors were preceded by an attempt to simply yum update install -y a CentOS 7 machine, however the update failed because the kernel oomkilled the yum process. As a result, the rpm database was a bit unhappy.

The Packages file is a Berkeley DB database, so the first thing I did was checked the file with db_verify /var/lib/rpm/Packages. Oddly, it appeared to be fine.

$ db_verify /var/lib/rpm/Packages
BDB5105 Verification of /var/lib/rpm/Packages succeeded.

Attempting to recover the database was of no use since there were apprently no log file to replay:

$ db_recover -fv
BDB2525 No log files found

Another trick is to dump and reload the database. db_dump has a few options to consider. Most notably -r and -R — both used for recovery, one being more aggressive than the other. Check man pages for more info.

$ mv /var/lib/rpm/Packages /var/lib/rpm/Packages.borked
$ db_dump -f /var/lib/rpm/Packages.borked /var/lib/rpm/Packages.dump
$ db_load -f /var/lib/rpm/Packages.dump /var/lib/rpm/Packages

In my specific case, that did the trick, however there were some possible options that remained for me:

Most notably, db_dump itself has -r and -R options for recovery from a corrupted database file. I’d strongly suggest 1) Having a backup of the Packages file (and a backup of the backup), and 2) Reading the man page for db_dump a few times to have a basic idea of what each flag is going to do.

Another option would be to completely rebuild the RPM database with rpm --rebuilddb.

Bash Parameter Expansion: Variable Unset or Null ${YOU_BE_THE:-“Judge”}

I came across a lesser-used Bash idiom while attempting to implement ZeroSSL TLS certificates. Specifically, in ZeroSSL’s wrapper script to install their implementation of certbot. The idiom is seen here:


My interest was piqued by that one little dash in between CERTBOT_SCRIPT_LOCATION and "". To understand it, I’ll pull back and think about the whole line, component by component.


Let’s look at this line as if it was two sides of a seesaw with the middle being the = sign. The left half CERTBOT_SCRIPT_LOCATION= is simply a variable assignment. Whatever the right side of the = expands to is going to be put inside the variable CERTBOT_SCRIPT_LOCATION.

So far, so simple.

${ }

On the right side of the =, we have a dollar sign and a bunch of stuff within a pair of braces. Let’s ignore the content within the braces for now and examine the use of ${} as our next element.

The dollar sign character is interpreted by Bash to introduce a number of possible things, including command substitution, arithmetic evaluation, accessing the value of a named variable, or a parameter expansion.

Command substitution is triggered in Bash with $( and ends with a closing ). You could fill a variable with the return value of any command like this:

MY_VARIABLE=$( command )

Arithmetic substitution is triggered in bash with $(( and ends with a matching )). Whatever is between the double parentheses is expanded and treated as an arithmetic expression.

Variable values are accessed when a $ is followed by a named variable. You’ve already seen one named variable in this article: CERTBOT_SCRIPT_LOCATION. However, it currently has no value. In fact, as you read this, we’re currently in the midst of figuring out what value is going to be assigned to that variable.

Parameter expansion is introduced into bash with ${ and ending with a corresponding }. Any shell parameter found within the braces is expanded. There are a lot of arcane and esoteric shell parameters, but you’ve already been introduced to one type of shell parameter in this article: a variable. That’s right, shell variables are parameters. This brings us to the final piece of this puzzle.


We know that CERTBOT_SCRIPT_LOCATION is a variable and thus a shell parameter, so Bash will attempt to expand it within the ${} construct. However, we’re pretty sure that it’s empty at this point. And what’s with the double-quoted string that contains a URL? And why is a dash separating them?! That lowly dash is the linchpin that holds all of this together.

Within a parameter expansion expression, the dash will test if the variable on the left is set or not. If it is set, the variable is expanded and what’s on the right is discarded.

However, if the parameter on the left of the dash is not set, then the thing on the right side of the dash is expanded (if it needs to be) and then assigned as the value of the variable on the left of the dash. Let’s take a look at our specimen:


The above says, in plain language: “Does the variable CERTBOT_SCRIPT_LOCATION exist? If it does, return the variable’s value. If the variable doesn’t exist, then insert the string "" into it, and finally return that value.

Putting it all Together

Whew! We’ve been through a lot, but there’s still a bit more to go. Let’s take a look at the whole line again, explain what’s happening, and then put it in context:


We’re creating a variable named CERTBOT_SCRIPT_LOCATION and assigning it the final value of the parameter expansion on the right side of the = sign.

Within that parameter expansion expression, we’re checking if CERTBOT_SCRIPT_LOCATION already exists. If it does, return the value of that variable which is immediately assigned to that exact same variable. This looks a little weird, but it’s a Bash idiom that means “If CERTBOT_SCRIPT_LOCATION already exists, leave it alone.”

However, if the variable CERTBOT_SCRIPT_LOCATION does not exist, then create it and put the string "" inside.

To put things into greater context, that variable is later used within a call to curl:


The question you may now be asking is: “Why?!” Why not avoid the use of a seldom used, single character test that took so long to explain? Why not use curl and supply the URL directly? Without asking the script author, here are three reasons that I think the script was written this way:

Abstraction. We use variables for any information that has a reasonable chance of being changed. A URL can easily change, and if we assign it to a variable once, we can more easily change that value at a later date. We never need to worry about changing the URL in every spot that we used it.

Documentation. When you assign a value to a variable, you name the variable. In this case, our value is a URL. What exactly does that URL do? What is its purpose? When we assign the URL to a variable named CERTBOT_SCRIPT_LOCATION, now we have an explanation. Every time we use that variable it reminds us of what it’s doing.

Safety. The two reasons above explain the use of variables, but not that lone dash. I believe the dash idiom was chosen for safety. Maybe we ran the script multiple times before, or perhaps something else set it previously. We don’t need to keep repeating the process of setting the variable, and if it was set previously, let’s not overwrite it.

Final Thoughts

I noticed that the script does not check CERTBOT_SCRIPT_LOCATION for a value that makes sense. What if it’s set, but has a number in it? Or a string that isn’t an HTTP URL? Those are more complex problems. How would you solve them?

In the title of this article, I used a slightly different bash idiom: the use of :- rather than the lonesome -. If we look to Bash’s documentation, we find:

When not performing substring expansion, using the form described below (e.g., ‘:-’), Bash tests for a parameter that is unset or null. Omitting the colon results in a test only for a parameter that is unset.

The dash merely checks for the existence of the variable on its left. The colon-dash will additionally check if the variable exists but is null. If the value is null, then Bash assigns the value on the right to the variable. Ask yourself which logic makes the most sense for your own scripts.

Do you have any scripts you’d like me to tear down? Any shell idioms that you’re not sure about? Comment below!

Solved: The connection to the server localhost:8080 was refused – did you specify the right host or port?

My Problem

When attempting to perform any kubectl command, you receive the error:

The connection to the server localhost:8080 was refused - did you specify the right host or port?

I was not on the Kubernetes cluster nodes or master, and I did not need to initialize the cluster or move /etc/kubernetes/admin.conf.

My Solution

Your kubeconfig file is jacked up. No really, it is. It’s most likely because you attempted to add or remove clusters to a monolithic config file rather than using multiple config files and having them merged together into one running config.

Go back to basics and create the simplest possible kubeconfig file that works to access your cluster. If you’re having trouble with that, leave a comment below and perhaps we can step through the issue to find the specific bit of yaml that tripped you up.

The Long Story

When hacking around on a kubectl config file, I ended up getting it into a state where any kubectl command responded with the error:

The connection to the server localhost:8080 was refused - did you specify the right host or port?

When searching around on the internet, most of the solutions focus on creating a nonexistent config file, or initializing the Kubernetes cluster. However in my case, I was not on a cluster member itself, and I already had a config file. The problem was somewhere in the config file itself.

Interestingly, when attempting to use the same config file on a Windows machine, the error was slightly different:

error: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

Well now that’s interesting. It’s complaining that there’s no master, which seems like it would be a root cause of kubectl attempting to connect to localhost for the control plane server.

What happened next was a painstaking comparison of known-good config files, which found misconfiguration errors. It would be nice if perhaps there were some kind of default config linting that took place and offered a bit better errors.

After starting from basics and using the simplest possible kubeconfig file, and adding in more contexts and users, the monolithic file eventually worked correctly, and peace reigned in the land.

Solving ModuleNotFoundError: No module named ‘ansible’

My Problem

When running any ansible command, I see a stack trace similar to:

Traceback (most recent call last):
File "/usr/local/bin/ansible", line 34, in
from ansible import context
ModuleNotFoundError: No module named 'ansible'

My Solution

pip install ansible or brew install ansible or yum install ansible or…

Somehow your Ansible Python modules were removed, but the Ansible scripts in your $PATH remained. Install Ansible’s python package however makes the most sense for your platform and preferences. E.g. via pip directly or Homebrew or your package manager of choice.

The Long Story

Let’s break the error down line by line:

File "/usr/local/bin/ansible", line 34, in

Ansible is just a Python script, so let’s check out line 34:

31 import sys
32 import traceback
34 from ansible import context
35 from ansible.errors import AnsibleError, AnsibleOptionsError, AnsibleParserError
36 from ansible.module_utils._text import to_text

The second line in the stack trace shows that from ansible import context is just another module import in the larger context of the Python application. With that larger context clarified, this error may snap a bit more into focus:

ModuleNotFoundError: No module named 'ansible'

It’s just a Python application that can’t find a module. If there’s no module, let’s check with Python to see what packages it knows about:

$ pip list

Package    Version
---------- -------
gpg        1.14.0
pip        20.1.1
protobuf   3.13.0
setuptools 49.2.0
six        1.15.0
wheel.     0.34.2

There’s no Ansible package listed. Wait, which version of Python did I just check?

$ which pip
pip: aliased to pip3

Let’s check pip2 just to make sure there’s no version weirdness going on:

$ pip2 list

Package      Version
------------ -------
altgraph     0.10.2
asn1crypto   0.24.0
bdist-mpkg   0.5.0
bonjour-py   0.3
boto         2.49.0
cffi         1.12.2
cryptography 2.6.1
enum34       1.1.6
future       0.17.1

Nope, no Ansible. Since I’m on a Mac, let’s check Brew just to see what comes back:

brew list ansible
Error: No such keg: /usr/local/Cellar/ansible

I’m not really sure what happened. I’ve got the Ansible scripts in my path, but I don’t have the python modules. I prefer to install Ansible via pip so I simply pip install ansible and everything was right with the world.

Docker: Error response from daemon: manifest not found: manifest unknown

I was seeing the rather character dense and yet information sparse error from Docker:

Error response from daemon: manifest for graylog/graylog:latest not found: manifest unknown: manifest unknown

Yes, I was hacking around with Graylog in this specific instance.

As it turns out, Graylog doesn’t have a latest tag on Dockerhub, and Docker will add :latest to any image that you attempt to pull without explicitly adding a tag.

What happens if there’s no :latest tag on the registry? You get the above error. Search your container registry and repo for what tags they use and find the one that makes most sense for you.

Solving Kubectl “Error from server (InternalError): an error on the server (“”) has prevented the request from succeeding”

My Problem

When switching to a Linode Kubernetes Engine (LKE) cluster context, any command such as kubectl get pods or kubectl cluster-info hangs for about a minute before ultimately showing the following error:

Error from server (InternalError): an error on the server ("") has prevented the request from succeeding

My Solution

It’s super simple. Check your kubectl config view and make sure that your authentication information is accurate. In my case the user token was wrong since I had been bringing up and tearing down LKE clusters and forgot to change my token. The error could probably be a bit more verbose or otherwise narrow the context down a bit, but alas.

The Long Story

Incidentally, I was running Windows 10 and running kubectl from PowerShell, but that doesn’t seem to be germane to the situation.

Running kubectl system-info --v=10 provided a ton of information. Note that --v is perhaps underdocumented (or was at one point).

What I found was that I was getting numerous: Got a Retry-After 1s response for attempt 8 to https://my-cluster:443/api?timeout=32s until the whole request timed out. I checked my Linode control panel and the cluster was indeed up and running.

The whole thing smelled like some kind of auth issue to me, so I double checked the kubectl config file that Linode offers in the UI (and via API), and noticed that the tokens weren’t matching with what I had in my .kube/config file. It was then that I remembered I had been tearing down and re-creating k8s clusters via Terraform and had forgotten to update my config file with the proper user token. Oh the joys of late-night hacking.

Once I updated my config file, I was able to access kubernetes.