Solving a problem about AWS SAM and Docker

Background

After moving our product towards serverless architecture, our daily development now heavily relies on AWS. For local debugging, we use aws-sam-cli to run Lambda in Docker and then use remote debugging feature of our IDE, which allows us to set breakpoints.

However on some of my colleagues’ machine, when a Docker container is started, it fails immediately and prints the following message:

1
error while loading shared libraries: libjli.so: cannot open shared object file: No such file or directory

Since it’ll be very inconvenient if we can’t locally debug Lambda, I decided to investigate the issue.

A first attempt

After checking the SAM version and Docker version are same as mine, I realize there’s not a trivial solution. Since I don’t know much about Docker, I started by googling the error message.

First I found this post about an issue in java:8-jre-alpine docker image. It says when you run ldd /usr/bin/java in such docker image, the libjli.so error will shows. The reason is that musl can’t read library path properly. I confirmed that lambci/lambda:java8(the docker image used by AWS SAM) is based on openjdk:8-alpine, and that running ldd /usr/bin/java shows the same error. So I guessed this is the cause.

Then I tried several methods, including setting LD_LIBRARY_PATH from Dockerfile, adding a /etc/ld-musl-x86_64.path file, as described in the post. However none of them worked.

A second attempt

Then I found this post stating another possibility. It says Java will try to inject LD_LIBRARY_PATH back to the parent, which may get blocked by the isolation protection on some machines. The solution is replacing java ... command with bash -c java ....

Checking the Dockerfile of lambci/lambda:java8 again, its entry point is like

1
2
3
ENTRYPOINT ["/usr/bin/java", "-XX:MaxHeapSize=2834432k", "-XX:MaxMetaspaceSize=163840k", "-XX:ReservedCodeCacheSize=81920k", \
"-XX:+UseSerialGC", "-Xshare:on", "-XX:-TieredCompilation", "-Djava.net.preferIPv4Stack=true", \
"-jar", "/var/runtime/lib/LambdaJavaRTEntry-1.0.jar"]

It indeed call java directly rather than using bash. So I replace the /usr/bin/java with a bash -c one. And it worked!

The next thing is allowing SAM to use the new docker image. Fortunately, the sam local start-api command allows a --skip-pull-image flag to make it use the local lambda image which I modified. Now SAM can successfully run Lambda on my colleagues’ machine!

However when I use the -d option(debug mode) of SAM, the libjli.so error still occurs. I then dug into the source code and realized that SAM will actually overwrite the entry point specified in Dockerfile:

1
2
3
4
5
6
7
8
if runtime == Runtime.java8.value:

entrypoint = ["/usr/bin/java"] \
+ debug_args_list \
+ [
"-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,quiet=y,address=" + str(debug_port),
...
]

So I did a small hack to the source code as I did to Dockerfile. Finally, everything works fine.

Another solution

When running docker command on my colleagues’ machine, I discovered another problem. docker stop a container will get permission denied, which won’t occur on my machine. This, together with the libjli.so problem, indicate that maybe the protection level on their machine is higher than mine. So following this post I remove apparmor from their machine. Now SAM works fine even with the original lambda docker image.

Some useful command

Since I know little about docker before, I also learned many useful docker command by solving this problem, as shown below.

1
docker run -it --entrypoint /bin/sh image

This will run the image and open a shell. This is useful when you need to inspect an image which specifies its own entry point.

1
docker commit [options] [container ID] [repository:tag]

This will commit a docker container as a image. We can run an image, do some modification and commit it back to the image.

But if you just want to modify the Dockerfile and overwrite an existing image, just run

1
docker build -t lambci/lambda:java .