结论:App进程Crash,不是真正意义上的进程崩溃(对比native代码崩溃),是java代码运行抛出没人处理的异常后,App自己把自己Kill掉了。
工作中遇到后台Service挂掉后(弹出停止运行),很久没有重启,分析log发现进程抛出FATAL EXCEPTION后并没有被杀,很久后才被杀掉重启,迷惑,遂看看具体的App挂掉流程是什么样的。
表象
当一个Android App进程因为各种原因抛出异常而没有被catch处理的时候,在用户看来,就会看到一个“某某已停止运行”的对话框,之前我一般认为该app进程已经挂掉。
实际上
以前在看到“某某已停止运行”时,一直认为对应进程也同时结束,没有仔细分析过整个App停止运行的机制,其实,停止运行对话框弹出的时候,进程还没有完全退出,真正的退出是进程将自己kill掉的时候。下面就记录下从App抛出没有catch的异常到该进程真正灰飞烟灭的整个过程。
App进程的创建
要分析一个app进程是怎么没的,先看看app进程是怎么来的。
关键代码
App进程创建流程:
frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
startResult = Process.start(entryPoint,
app.processName, uid, uid, gids, debugFlags, mountExternal,
app.info.targetSdkVersion, seInfo, requiredAbi, instructionSet,
app.info.dataDir, invokeWith, entryPointArgs);
frameworks/base/core/java/android/os/ZygoteProcess.java
//ZygoteState维护了与Zygote进程通过Socket的连接
private ZygoteState openZygoteSocketIfNeeded(String abi) throws ZygoteStartFailedEx {
Preconditions.checkState(Thread.holdsLock(mLock), "ZygoteProcess lock not held");
if (primaryZygoteState == null || primaryZygoteState.isClosed()) {
try {
primaryZygoteState = ZygoteState.connect(mSocket);
} catch (IOException ioe) {
throw new ZygoteStartFailedEx("Error connecting to primary zygote", ioe);
}
}
if (primaryZygoteState.matches(abi)) {
return primaryZygoteState;
}
// The primary zygote didn't match. Try the secondary.
if (secondaryZygoteState == null || secondaryZygoteState.isClosed()) {
try {
secondaryZygoteState = ZygoteState.connect(mSecondarySocket);
} catch (IOException ioe) {
throw new ZygoteStartFailedEx("Error connecting to secondary zygote", ioe);
}
}
if (secondaryZygoteState.matches(abi)) {
return secondaryZygoteState;
}
throw new ZygoteStartFailedEx("Unsupported zygote ABI: " + abi);
}
private static Process.ProcessStartResult zygoteSendArgsAndGetResult(
ZygoteState zygoteState, ArrayList<String> args)
throws ZygoteStartFailedEx {
try {
// Throw early if any of the arguments are malformed. This means we can
// avoid writing a partial response to the zygote.
int sz = args.size();
for (int i = 0; i < sz; i++) {
if (args.get(i).indexOf('\n') >= 0) {
throw new ZygoteStartFailedEx("embedded newlines not allowed");
}
}
/**
* See com.android.internal.os.SystemZygoteInit.readArgumentList()
* Presently the wire format to the zygote process is:
* a) a count of arguments (argc, in essence)
* b) a number of newline-separated argument strings equal to count
*
* After the zygote process reads these it will write the pid of
* the child or -1 on failure, followed by boolean to
* indicate whether a wrapper process was used.
*/
final BufferedWriter writer = zygoteState.writer;
final DataInputStream inputStream = zygoteState.inputStream;
writer.write(Integer.toString(args.size()));
writer.newLine();
for (int i = 0; i < sz; i++) {
String arg = args.get(i);
writer.write(arg);
writer.newLine();
}
writer.flush();
// Should there be a timeout on this?
Process.ProcessStartResult result = new Process.ProcessStartResult();
// Always read the entire result from the input stream to avoid leaving
// bytes in the stream for future process starts to accidentally stumble
// upon.
result.pid = inputStream.readInt();
result.usingWrapper = inputStream.readBoolean();
if (result.pid < 0) {
throw new ZygoteStartFailedEx("fork() failed");
}
return result;
} catch (IOException ex) {
zygoteState.close();
throw new ZygoteStartFailedEx(ex);
}
}
zygoteSendArgsAndGetResult方法通过LocalSocket发送的命令被Zygote接收到:
frameworks/base/core/java/com/android/internal/os/ZygoteConnection.java
pid = Zygote.forkAndSpecialize(parsedArgs.uid, parsedArgs.gid, parsedArgs.gids,
parsedArgs.debugFlags, rlimits, parsedArgs.mountExternal, parsedArgs.seInfo,
parsedArgs.niceName, fdsToClose, fdsToIgnore, parsedArgs.instructionSet,
parsedArgs.appDataDir);
此处fork出真正的app进程,然后在fork出的子进程中执行命令:
ZygoteInit.zygoteInit(parsedArgs.targetSdkVersion, parsedArgs.remainingArgs,
null /* classLoader */);
执行的命令:
最终会从ActivityThread.java 的main函数进入,开始App的生命周期
RuntimeInit.commonInit()
上面流程中,App进程fork出来后,执行此函数:
RuntimeInit.commonInit()
其中:
Thread.setUncaughtExceptionPreHandler(new LoggingHandler());
Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler());
/**
* Dispatch an uncaught exception to the handler. This method is
* intended to be called only by the runtime and by tests.
*
* @hide
*/
// @VisibleForTesting (would be private if not for tests)
public final void dispatchUncaughtException(Throwable e) {
Thread.UncaughtExceptionHandler initialUeh =
Thread.getUncaughtExceptionPreHandler();
if (initialUeh != null) {
try {
initialUeh.uncaughtException(this, e);
} catch (RuntimeException | Error ignored) {
// Throwables thrown by the initial handler are ignored
}
}
getUncaughtExceptionHandler().uncaughtException(this, e);
}
setUncaughtExceptionPreHandler设置“未捕获异常预处理程序”为loggingHandler,setDefaultUncaughtExceptionHandler设置真正的“未捕获异常默认处理程序”为KillApplicationHandler,按字面意思以及函数dispatchUncaughtException理解,发生异常时,先调用loggingHandler处理异常,再调用KillApplicationHandler处理。loggingHandler就是用来打印FATAL EXCEPTION以及trace的:
E AndroidRuntime: FATAL EXCEPTION: main
KillApplicationHandler:
/**
* Handle application death from an uncaught exception. The framework
* catches these for the main threads, so this should only matter for
* threads created by applications. Before this method runs,
* {@link LoggingHandler} will already have logged details.
*/
private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
public void uncaughtException(Thread t, Throwable e) {
try {
// Don't re-enter -- avoid infinite loops if crash-reporting crashes.
if (mCrashing) return;
mCrashing = true;
// Try to end profiling. If a profiler is running at this point, and we kill the
// process (below), the in-memory buffer will be lost. So try to stop, which will
// flush the buffer. (This makes method trace profiling useful to debug crashes.)
if (ActivityThread.currentActivityThread() != null) {
ActivityThread.currentActivityThread().stopProfiling();
}
final String processName = ActivityThread.currentProcessName();
if (processName != null) {
if (Build.IS_USERDEBUG && processName.equals(SystemProperties.get("persist.debug.process"))) {
Log.w(TAG, "process: " + processName + " crash message is skip");
return;
}
}
// Bring up crash dialog, wait for it to be dismissed
ActivityManager.getService().handleApplicationCrash(
mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
} catch (Throwable t2) {
if (t2 instanceof DeadObjectException) {
// System process is dead; ignore
} else {
try {
Clog_e(TAG, "Error reporting crash", t2);
} catch (Throwable t3) {
// Even Clog_e() fails! Oh well.
}
}
} finally {
// Try everything to make sure this process goes away.
Process.killProcess(Process.myPid());
System.exit(10);
}
}
}
这里通过如下代码和ActivityManagerService交互弹出“停止运行”对话框,注意注释,对话框消失后才会继续往下执行。
// Bring up crash dialog, wait for it to be dismissed
ActivityManager.getService().handleApplicationCrash(
mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
在ActivityManagerService,最终会停在如下代码处:
AppErrors.java crashApplicationInner():
synchronized (mService) {
/**
* If crash is handled by instance of {@link android.app.IActivityController},
* finish now and don't show the app error dialog.
*/
if (handleAppCrashInActivityController(r, crashInfo, shortMsg, longMsg, stackTrace,
timeMillis, callingPid, callingUid)) {
return;
}
/**
* If this process was running instrumentation, finish now - it will be handled in
* {@link ActivityManagerService#handleAppDiedLocked}.
*/
if (r != null && r.instr != null) {
return;
}
// Log crash in battery stats.
if (r != null) {
mService.mBatteryStatsService.noteProcessCrash(r.processName, r.uid);
}
AppErrorDialog.Data data = new AppErrorDialog.Data();
data.result = result;
data.proc = r;
// If we can't identify the process or it's already exceeded its crash quota,
// quit right away without showing a crash dialog.
if (r == null || !makeAppCrashingLocked(r, shortMsg, longMsg, stackTrace, data)) {
return;
}
final Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;
task = data.task;
msg.obj = data;
mService.mUiHandler.sendMessage(msg);
}
int res = result.get();
result为AppErrorResult类型,result.get()会wait(),block当前Binder调用,等待对应的notify;前面的代码就是弹出“停止运行”的对话框:AppErrorDialog,result会随data传入AppErrorDialog,dismiss时调用result.set(),唤醒刚才Binder线程的wait:
AppErrorResult
final class AppErrorResult {
public void set(int res) {
synchronized (this) {
mHasResult = true;
mResult = res;
notifyAll();
}
}
public int get() {
synchronized (this) {
while (!mHasResult) {
try {
wait();
} catch (InterruptedException e) {
}
}
}
return mResult;
}
boolean mHasResult = false;
int mResult;
}
然后进行后面的处理Binder调用返回后,App进程中才最终会杀死自己:
finally {
// Try everything to make sure this process goes away.
Process.killProcess(Process.myPid());
System.exit(10);
}
注意到,在AppErrorDialog构造函数中:
// After the timeout, pretend the user clicked the quit button
mHandler.sendMessageDelayed(
mHandler.obtainMessage(TIMEOUT),
DISMISS_TIMEOUT)
如果用户一直没有理睬,会在5分钟后返回,可以注意如下log:
Slog.w(TAG, "handleApplicationStrictModeViolation; res=" + res);
在超时后才返回,就会导致app进程在crash状态下存在5分钟之久,除了异常的线程,其他线程还会努力工作,有可能会有些奇怪的事情发生。应该挂掉重启的,由于进程没有被杀死,ActivityManagerService收不到binderDied消息,也会在超时之前一直得不到重启。