Background
After moving our product towards serverless architecture, our daily development now heavily relies on AWS. For local debugging, we use aws-sam-cli to run Lambda in Docker and then use remote debugging feature of our IDE, which allows us to set breakpoints.
However on some of my colleagues’ machine, when a Docker container is started, it fails immediately and prints the following message:1
error while loading shared libraries: libjli.so: cannot open shared object file: No such file or directory
Since it’ll be very inconvenient if we can’t locally debug Lambda, I decided to investigate the issue.
A first attempt
After checking the SAM version and Docker version are same as mine, I realize there’s not a trivial solution. Since I don’t know much about Docker, I started by googling the error message.
First I found this post about an issue in java:8-jre-alpine
docker image. It says when you run ldd /usr/bin/java
in such docker image, the libjli.so
error will shows. The reason is that musl
can’t read library path properly. I confirmed that lambci/lambda:java8
(the docker image used by AWS SAM) is based on openjdk:8-alpine
, and that running ldd /usr/bin/java
shows the same error. So I guessed this is the cause.
Then I tried several methods, including setting LD_LIBRARY_PATH
from Dockerfile, adding a /etc/ld-musl-x86_64.path
file, as described in the post. However none of them worked.
A second attempt
Then I found this post stating another possibility. It says Java will try to inject LD_LIBRARY_PATH
back to the parent, which may get blocked by the isolation protection on some machines. The solution is replacing java ...
command with bash -c java ...
.
Checking the Dockerfile of lambci/lambda:java8
again, its entry point is like1
2
3ENTRYPOINT ["/usr/bin/java", "-XX:MaxHeapSize=2834432k", "-XX:MaxMetaspaceSize=163840k", "-XX:ReservedCodeCacheSize=81920k", \
"-XX:+UseSerialGC", "-Xshare:on", "-XX:-TieredCompilation", "-Djava.net.preferIPv4Stack=true", \
"-jar", "/var/runtime/lib/LambdaJavaRTEntry-1.0.jar"]
It indeed call java
directly rather than using bash. So I replace the /usr/bin/java
with a bash -c
one. And it worked!
The next thing is allowing SAM to use the new docker image. Fortunately, the sam local start-api
command allows a --skip-pull-image
flag to make it use the local lambda image which I modified. Now SAM can successfully run Lambda on my colleagues’ machine!
However when I use the -d
option(debug mode) of SAM, the libjli.so
error still occurs. I then dug into the source code and realized that SAM will actually overwrite the entry point specified in Dockerfile:1
2
3
4
5
6
7
8if runtime == Runtime.java8.value:
entrypoint = ["/usr/bin/java"] \
+ debug_args_list \
+ [
"-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,quiet=y,address=" + str(debug_port),
...
]
So I did a small hack to the source code as I did to Dockerfile. Finally, everything works fine.
Another solution
When running docker command on my colleagues’ machine, I discovered another problem. docker stop
a container will get permission denied
, which won’t occur on my machine. This, together with the libjli.so
problem, indicate that maybe the protection level on their machine is higher than mine. So following this post I remove apparmor
from their machine. Now SAM works fine even with the original lambda docker image.
Some useful command
Since I know little about docker before, I also learned many useful docker command by solving this problem, as shown below.
1 | docker run -it --entrypoint /bin/sh image |
This will run the image and open a shell. This is useful when you need to inspect an image which specifies its own entry point.
1 | docker commit [options] [container ID] [repository:tag] |
This will commit a docker container as a image. We can run an image, do some modification and commit it back to the image.
But if you just want to modify the Dockerfile and overwrite an existing image, just run1
docker build -t lambci/lambda:java .