源码基于:Android R
0. 前言
App crash(全称为 Application crash),又分 java crash 和 native crash,又称 java layer exception(JE) 和 native layer exception(NE)。对于 crash 在开发过程中或多或少都会遇到,本文将整理总结 crash 原理,剖析系统是如何捕捉、处理这些 crash。因为篇幅较长,所以会分JE 和 NE 两部分各自剖析。
1. RuntimeInit.commonInit()
在Andriod 系统中,上层应用都是由 Zygote fork 孵化而来,分为system_server 系统进程和普通应用进程。在这些进程创建之初会设置未捕获异常的处理器,当系统抛出未捕获的异常时,最终都会交给异常处理器。而这个异常捕获的处理设置就是在 RuntimeInit.commonInit() 函数中完成:
frameworks/base/core/java/com/android/internal/os/RuntimeInit.javaprotected static final void commonInit() {...LoggingHandler loggingHandler = new LoggingHandler();RuntimeHooks.setUncaughtExceptionPreHandler(loggingHandler);Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler));...}
1.1 Thread.setDefaultUncaughtExceptionHandler()
libcore/ojluni/src/main/java/java/lang/Thread.javaprivate static volatile UncaughtExceptionHandler defaultUncaughtExceptionHandler;public static void setDefaultUncaughtExceptionHandler(UncaughtExceptionHandler eh) {defaultUncaughtExceptionHandler = eh;}
参数的类型为 UncaughtExceptionHandler,这就是未捕获的异常类型。该类型是 interface,后面调用其 uncaughtException() 函数进行处理:
libcore/ojluni/src/main/java/java/lang/Thread.javapublic interface UncaughtExceptionHandler {void uncaughtException(Thread t, Throwable e);}
当线程因为未捕获的异常停止时,Java 虚拟机会调用 uncaughtException() 函数。
1.2 KillApplicationHandler 类
从上面的代码看到参数 UncaughtExceptionHandler 为 KillApplicationHandler 对象,这里来看下该类:
frameworks/base/core/java/com/android/internal/os/RuntimeInit.javaprivate static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {private final LoggingHandler mLoggingHandler;public KillApplicationHandler(LoggingHandler loggingHandler) {this.mLoggingHandler = Objects.requireNonNull(loggingHandler);}@Overridepublic void uncaughtException(Thread t, Throwable e) {...}...}
该类父类为 Thread.UncaughtExceptionHandler,并实现了 uncaughtException() 函数。
另外,构造中需要传入一个 LoggingHandler 的对象,存放在私有变量 mLoggingHandler 中。
对于核心的处理函数 uncaughtException() 详细见下面第 2 节。
1.3 LoggingHandler 类
frameworks/base/core/java/com/android/internal/os/RuntimeInit.javaprivate static class LoggingHandler implements Thread.UncaughtExceptionHandler {public volatile boolean mTriggered = false;@Overridepublic void uncaughtException(Thread t, Throwable e) {mTriggered = true;//已经在crash 流程中,则已经在处理KillApplicationHandler则不再重复进入if (mCrashing) return;if (mApplicationObject == null && (Process.SYSTEM_UID == Process.myUid())) {Clog_e(TAG, "*** FATAL EXCEPTION IN SYSTEM PROCESS: " + t.getName(), e);} else {logUncaught(t.getName(), ActivityThread.currentProcessName(), Process.myPid(), e);}}}
该类中的 uncaughtException() 是在crash 最开始调用的,用以输出crash 开头信息:
- 当 system 进程crash提示 *** FATAL EXCEPTION IN SYSTEM PROCESS: [线程名]
- 当 app 进程crash 提示三个内容:
- FATAL EXCEPTION: [线程名]
- Process: [进程名], PID: [pid]
对于processName 为null,只会提示PID。
2. KillApplicationHandler.uncaughtException()
当线程因为未捕获的异常停止时,Java 虚拟机会调用 uncaughtException() 函数,即调用 KillApplicationHandler 中的 uncaughtException() 函数。本节将在上面第 1.2 节的基础上详细地剖析uncaughtException() 函数:
frameworks/base/core/java/com/android/internal/os/RuntimeInit.javapublic void uncaughtException(Thread t, Throwable e) {try {//调用LoggingHandler.uncaughtException(),不会反复调用ensureLogging(t, e);//全局变量,用以控制重复进入crash流程,第一次进入后会将该变量置trueif (mCrashing) return;mCrashing = true;//尝试去停止profiling,因为后面需要kill 进程,内存buffer会丢失,//所以尝试停止,来 flush 内存bufferif (ActivityThread.currentActivityThread() != null) {ActivityThread.currentActivityThread().stopProfiling();}//弹出crash对话框,等待处理完成ActivityManager.getService().handleApplicationCrash(mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));} catch (Throwable t2) {...} finally {//确保当前进程彻底杀掉Process.killProcess(Process.myPid());System.exit(10);}}
主要是通过 AMS 调用 handleApplicationCrash() 函数进行 crash report,共两个参数:
- 第一个参数为进程对象
- 第二个参数为ParcelableCrashInfo(父类为 CrashInfo 和 Parcelable)
ParcelableCrashInfo 用以封装 CrashInfo,引入Parcelable。
CrashInfo 类主要是保存 crash 信息:文件名、类名、方法名、对应行号、异常信息等。
详细的 handleApplicationCrash() 函数剖析,请查看下面第 3 节。
3. AMS.handleApplicationCrash()
frameworks/base/services/core/java/com/android/server/am/AMS.javapublic void handleApplicationCrash(IBinder app,ApplicationErrorReport.ParcelableCrashInfo crashInfo) {ProcessRecord r = findAppProcess(app, "Crash");final String processName = app == null ? "system_server": (r == null ? "unknown" : r.processName);handleApplicationCrashInner("crash", r, processName, crashInfo);}
该函数主要两个操作:
- 确定进程名;
- handleApplicationCrashInner() 函数调用;
对于进程名,
- 当参数 app 为null,表示 system_server 进程;
- 当参数 app不为null,通过findAppProcess() 确认ProcessRecord,进而确认进程名;
3.1 AMS.handleApplicationCrashInner()
frameworks/base/services/core/java/com/android/server/am/AMS.javavoid handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,ApplicationErrorReport.CrashInfo crashInfo) {EventLogTags.writeAmCrash(Binder.getCallingPid(),UserHandle.getUserId(Binder.getCallingUid()), processName,r == null ? -1 : r.info.flags,crashInfo.exceptionClassName,crashInfo.exceptionMessage,crashInfo.throwFileName,crashInfo.throwLineNumber);FrameworkStatsLog.write(FrameworkStatsLog.APP_CRASH_OCCURRED,Binder.getCallingUid(),eventType,processName,Binder.getCallingPid(),(r != null && r.info != null) ? r.info.packageName : "",(r != null && r.info != null) ? (r.info.isInstantApp()? FrameworkStatsLog.APP_CRASH_OCCURRED__IS_INSTANT_APP__TRUE: FrameworkStatsLog.APP_CRASH_OCCURRED__IS_INSTANT_APP__FALSE): FrameworkStatsLog.APP_CRASH_OCCURRED__IS_INSTANT_APP__UNAVAILABLE,r != null ? (r.isInterestingToUserLocked()? FrameworkStatsLog.APP_CRASH_OCCURRED__FOREGROUND_STATE__FOREGROUND: FrameworkStatsLog.APP_CRASH_OCCURRED__FOREGROUND_STATE__BACKGROUND): FrameworkStatsLog.APP_CRASH_OCCURRED__FOREGROUND_STATE__UNKNOWN,processName.equals("system_server") ? ServerProtoEnums.SYSTEM_SERVER: (r != null) ? r.getProcessClassEnum(): ServerProtoEnums.ERROR_SOURCE_UNKNOWN);final int relaunchReason = r == null ? RELAUNCH_REASON_NONE: r.getWindowProcessController().computeRelaunchReason();final String relaunchReasonString = relaunchReasonToString(relaunchReason);if (crashInfo.crashTag == null) {crashInfo.crashTag = relaunchReasonString;} else {crashInfo.crashTag = crashInfo.crashTag + " " + relaunchReasonString;}addErrorToDropBox(eventType, r, processName, null, null, null, null, null, null, crashInfo);mAppErrors.crashApplication(r, crashInfo);}
函数比较长,主要做了下面几件事情:
- 写event log;
类似:
12-01 16:45:29.663198 1260 3220 I am_crash: [21597,0,com.qualcomm.qti.PresenceApp,550026821,java.lang.NoSuchMethodException,com.qualcomm.qti.PresenceApp.SubsriptionTab.<init> [],Class.java,2363]
- addErrorToDropBox() 将crash 的信息输出到 /data/system/dropbox/ 下,例如system_server 的dropbox 文件名为 system_server_crash@xxx.txt (xxx 代表时间戳);
- crashApplication() 继续处理 crash 流程,发出 SHOW_ERROR_UI_MSG,弹出 crash 对话框;
4. AppErrors.crashApplication()
frameworks/base/services/core/java/com/android/server/am/AppErrors.javavoid crashApplication(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {final int callingPid = Binder.getCallingPid();final int callingUid = Binder.getCallingUid();final long origId = Binder.clearCallingIdentity();try {crashApplicationInner(r, crashInfo, callingPid, callingUid);} finally {Binder.restoreCallingIdentity(origId);}}
主要是调用私有函数 crashApplicationInner(),因为代码逻辑太多,这里不过多剖析,主要关注如下几点:
- handleAppCrashInActivityController()
如果是 IActivityController 类型该处理的 crash,是不会弹出对话框,通过该函数进入 makeAppCrashingLocked() 流程。
- makeAppCrashingLocked()
如果无法识别进程,或者进程已经超过crash 额度,将不再弹出对话框,而是直接return到上一级。
- 发出 SHOW_ERROR_UI_MSG 消息,弹出crash 对话框
- 等待用户选择,根据不同选择做进一步处理
5. makeAppCrashingLocked()
frameworks/base/services/core/java/com/android/server/am/AppErrors.javaprivate boolean makeAppCrashingLocked(ProcessRecord app,String shortMsg, String longMsg, String stackTrace, AppErrorDialog.Data data) {app.setCrashing(true);app.crashingReport = generateProcessError(app,ActivityManager.ProcessErrorStateInfo.CRASHED, null, shortMsg, longMsg, stackTrace);app.startAppProblemLocked();app.getWindowProcessController().stopFreezingActivities();return handleAppCrashLocked(app, "force-crash" /*reason*/, shortMsg, longMsg, stackTrace,data);}