1、起因
线上系统通过 git 命令执行的方式获取远程仓库分支,一直运行正常的接口,突然出现超时,接口无法响应,分析验证发现只有个别仓库获取分支会出现这种情况,其他都还是可以正常获取到分支结果信息。
2、分析异常原因
分析接口代码并没有发现明显异常,为什么有的仓库会出现阻塞卡死的问题呢? 登陆到服务器上查看进程,发现关于这个仓库的 git 命令进行全部都卡死了。
a、执行 jstack pid 看下线程运行情况
http-nio-7629-exec-9" #250 daemon prio=5 os_prio=0 tid=0x00007f047f37c000 nid=0x5f0d0 runnable [0x00007f03cb411000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:255)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0x00000006d499c890> (a java.lang.UNIXProcess$ProcessPipeInputStream)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
- locked <0x00000006d499c918> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
- locked <0x00000006d499c918> (a java.io.InputStreamReader)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at com.utils.GitUtils.actualExcute(GitUtils.java:106)
at com.utils.GitUtils.getRemoteBranchesByCmd(GitUtils.java:50)
关于 git 命令执行的线程全部都是 runnable, 状态没有问题,但是流读取却都出现了 locked ,java.io.BufferedInputStream.read locked <0x00000006d499c890>
b、分析 UNIXProcess#ProcessPipeInputStream 为什么出现 locked
分析源码实现,发现锁 closeLock 在 processExited、close 方法执行时会加锁
java"> /*** A buffered input stream for a subprocess pipe file descriptor* that allows the underlying file descriptor to be reclaimed when* the process exits, via the processExited hook.** This is tricky because we do not want the user-level InputStream to be* closed until the user invokes close(), and we need to continue to be* able to read any buffered data lingering in the OS pipe buffer.*/private static class ProcessPipeInputStream extends BufferedInputStream {private final Object closeLock = new Object();ProcessPipeInputStream(int fd) {super(new FileInputStream(newFileDescriptor(fd)));}private static byte[] drainInputStream(InputStream in)throws IOException {int n = 0;int j;byte[] a = null;while ((j = in.available()) > 0) {a = (a == null) ? new byte[j] : Arrays.copyOf(a, n + j);n += in.read(a, n, j);}return (a == null || n == a.length) ? a : Arrays.copyOf(a, n);}/** Called by the process reaper thread when the process exits. */synchronized void processExited() {synchronized (closeLock) {try {InputStream in = this.in;// this stream is closed if and only if: in == nullif (in != null) {byte[] stragglers = drainInputStream(in);in.close();this.in = (stragglers == null) ?ProcessBuilder.NullInputStream.INSTANCE :new ByteArrayInputStream(stragglers);}} catch (IOException ignored) {}}}@Overridepublic void close() throws IOException {// BufferedInputStream#close() is not synchronized unlike most other// methods. Synchronizing helps avoid race with processExited().synchronized (closeLock) {super.close();}}}
c、本地复现
java">public static void main(String[] args) throws InterruptedException {Runtime runtime = Runtime.getRuntime();try {Process process = runtime.exec("yes \"This is a normal log message\" && yes \"This is a normal log message\" > &2");BufferedReader stdoutReader = new BufferedReader(new InputStreamReader(process.getInputStream()));BufferedReader stderrReader = new BufferedReader(new InputStreamReader(process.getErrorStream()));String line;System.out.println("ERROR");while ((line = stderrReader.readLine()) != null) {System.out.println(line);}System.out.println("OUTPUT");while ((line = stdoutReader.readLine()) != null) {System.out.println(line);}int exitVal = process.waitFor();System.out.println("process exit value is " + exitVal);} catch (Exception e) {e.printStackTrace();}}
日志输出
Connected to the target VM, address: '127.0.0.1:58960', transport: 'socket'
ERROR
猜测是流读取导致的阻塞,回到开始看下 Process 的源码,发现文档说明中表示 Runtime.exec()创建的子进程公用父进程的流,父进程的stream buffer可能被打满导致子进程阻塞,从而永远无法返回。
By default, the created process does not have its own terminal or console. All its standard I/O (i.e. stdin, stdout, stderr) operations will be redirected to the parent process, where they can be accessed via the streams obtained using the methods getOutputStream(), getInputStream(), and getErrorStream(). The I/O streams of characters and lines can be written and read using the methods outputWriter(), outputWriter(Charset)}, inputReader(), inputReader(Charset), errorReader(), and errorReader(Charset). The parent process uses these streams to feed input to and get output from the process. Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the process may cause the process to block, or even deadlock.
当使用
ProcessBuilder
或Runtime.exec()
在 Java 中启动一个子进程时,子进程的标准输入(stdin)、标准输出(stdout)和标准错误(stderr)流会被重定向到父进程。父进程可以通过getOutputStream()
、getInputStream()
和getErrorStream()
方法访问这些流。由于某些本地平台(操作系统)对标准输入和输出流的缓冲区大小有限,如果父进程没有及时处理这些流(即没有及时读取子进程的输出或写入子进程的输入),可能会导致子进程阻塞,甚至导致死锁。这是因为子进程会等待缓冲区中的数据被读取或写入,从而继续执行。
d、产生阻塞或死锁的原因
输出缓冲区满:如果子进程产生了大量的输出(stdout 或 stderr),而父进程没有及时读取这些输出,缓冲区会被填满。当缓冲区满时,子进程将无法继续写入数据并因此阻塞,等待缓冲区有空闲空间。
输入缓冲区空:如果子进程需要从标准输入(stdin)读取数据,而父进程没有及时提供输入,子进程将阻塞,等待输入数据。
分析代码实现发现我们的代码是同步读取error 和正常 input 流,而且是先读取的 error ,按住上述说明,此时如果没有error日志,子进程将阻塞,等待输入数据。
调整代码实现,先读取正常流再读取error流,程序可以正常运行了。
3、总结
因为错误的流读取方式导致线程阻塞,虽然调整成先读取正常 input 流可以正常运行了,但是也存在全部输入为异常流,正常流无法读取的情况,所以最好的方式还是异步同时读取俩种流
解决方法
- 为每个 I/O 流创建单独的线程,以确保及时读取子进程的输出和错误流,并及时写入子进程的输入流。
- 如果不需要处理子进程的输出或错误流,可以使用
Redirect.INHERIT
让子进程继承父进程的 I/O 流。 - 确保在读取和写入流时使用合适的缓冲区大小,以提高 I/O 操作的效率。这种不太靠谱因为大小本身就不可控