Background
After moving our product towards serverless architecture, our daily development now heavily relies on AWS. For local debugging, we use aws-sam-cli to run Lambda in Docker and then use remote debugging feature of our IDE, which allows us to set breakpoints.
However on some of my colleagues’ machine, when a Docker container is started, it fails immediately and prints the following message:
1 | error while loading shared libraries: libjli.so: cannot open shared object file: No such file or directory |
Since it’ll be very inconvenient if we can’t locally debug Lambda, I decided to investigate the issue.
A first attempt
After checking the SAM version and Docker version are same as mine, I realize there’s not a trivial solution. Since I don’t know much about Docker, I started by googling the error message.
First I found this post about an issue in java:8-jre-alpine
docker image. It says when you run ldd /usr/bin/java
in such docker image, the libjli.so
error will shows. The reason is that musl
can’t read library path properly. I confirmed that lambci/lambda:java8
(the docker image used by AWS SAM) is based on openjdk:8-alpine
, and that running ldd /usr/bin/java
shows the same error. So I guessed this is the cause.
Then I tried several methods, including setting LD_LIBRARY_PATH
from Dockerfile, adding a /etc/ld-musl-x86_64.path
file, as described in the post. However none of them worked.
A second attempt
Then I found this post stating another possibility. It says Java will try to inject LD_LIBRARY_PATH
back to the parent, which may get blocked by the isolation protection on some machines. The solution is replacing java ...
command with bash -c java ...
.
Checking the Dockerfile of lambci/lambda:java8
again, its entry point is like
1 | ENTRYPOINT ["/usr/bin/java", "-XX:MaxHeapSize=2834432k", "-XX:MaxMetaspaceSize=163840k", "-XX:ReservedCodeCacheSize=81920k", \ |
It indeed call java
directly rather than using bash. So I replace the /usr/bin/java
with a bash -c
one. And it worked!
The next thing is allowing SAM to use the new docker image. Fortunately, the sam local start-api
command allows a --skip-pull-image
flag to make it use the local lambda image which I modified. Now SAM can successfully run Lambda on my colleagues’ machine!
However when I use the -d
option(debug mode) of SAM, the libjli.so
error still occurs. I then dug into the source code and realized that SAM will actually overwrite the entry point specified in Dockerfile:
1 | if runtime == Runtime.java8.value: |
So I did a small hack to the source code as I did to Dockerfile. Finally, everything works fine.
Another solution
When running docker command on my colleagues’ machine, I discovered another problem. docker stop
a container will get permission denied
, which won’t occur on my machine. This, together with the libjli.so
problem, indicate that maybe the protection level on their machine is higher than mine. So following this post I remove apparmor
from their machine. Now SAM works fine even with the original lambda docker image.
Some useful command
Since I know little about docker before, I also learned many useful docker command by solving this problem, as shown below.
1 | docker run -it --entrypoint /bin/sh image |
This will run the image and open a shell. This is useful when you need to inspect an image which specifies its own entry point.
1 | docker commit [options] [container ID] [repository:tag] |
This will commit a docker container as a image. We can run an image, do some modification and commit it back to the image.
But if you just want to modify the Dockerfile and overwrite an existing image, just run
1 | docker build -t lambci/lambda:java . |