Android Stability - gdb和coredump

在分析Android Native Error这一类问题的时候,如果能抓到异常进程的coredump文件,那么对分析该问题是事半功倍的,但是由于在抓取coredump文件的时候,需要消耗很多的内存和CPU资源,并且保存的文件也都很大,所以用户最终使用的版本都是默认关闭的,即使在内部研发阶段,也只是某些特定的测试项里面才会打开,例如针对系统稳定性的monkey测试,所以有的时候稳定性问题其实不是很难分析,难的是获取有效的Log,抓取到了coredump文件,同时有这个固件对应的symbole的话就可以使用GDB这一个调试利器来分析问题了.

Coredump文件

coredump文件可以理解为是进程某个时刻的内存和寄存器快照,最终用ELF文件把这些内容包装一下,就可以使用GDB等工具来分析了,Kernel默认是支持Coredump的,但是在Android上面还有几个重要的因素影响到是否会抓取coredump.

Linux当中每个进程可以使用的资源是有限制的,可以通过查看/proc/$PID/limits这个文件来查看,例如


进程rlimit

从这个节点的信息可以看到,这个进程允许打开的文件个数是1024,而它的core file size是0,所以当前这个进程是即使收到了相关的信号,它也是无法抓取coredump的,所以一般要修改进程的rlimit.

/proc/sys/kernel/core_pattern设置coredump文件的保存路径,例如 echo " /data/corefile/core-%e-%p" > /proc/sys/kernel/core_pattern
另外还可能要执行 echo 1 > /proc/sys/fs/suid_dumpable.

进程只有在接收到某些特定的信号时,才会去抓coredump,比如SIGSEGV、SIGABRT、SIGBUS等等,同时要注意在抓取某个进程的coredump文件的时候,不能发送SIGKILL信号给该进程,SIGKILL会终止抓取动作,导致抓出来的coredump文件不完整,无法分析.

GDB
  • GDB在线调试环境

GDB,GNU Project Debugger,大名鼎鼎的调试利器,对于我们程序员来说,即使没用过但应该也不陌生吧,GDB它可以在线调试,也可以离线调试coredump等内存转储文件,在稳定性日常工作中,我们主要用它来离线分析coredump文件.

  • adb shell gdbserver remote:1234 --attach 4321
    1234是手机端的端口,4321是你要debug的进程PID.
  • adb forward tcp:1234 tcp:1234
    设置adb tcp端口转发,前一个tcp:1234是指PC端的端口,后一个是Target,也就是手机端的.
  • aarch64-linux-android-gdb
    aarch64-linux-android-gdb是针对ARM64的gdb客户端,相应的对于以AARCH32来执行的进程,需要选择相应的gdb客户端.
  • 在GDB命令行里面执行以下命令:
    (gdb) set solib-absolute-prefix out/target/product/general/symbols/
    (gdb) set solib-search-path out/target/product/general/symbols/
    (gdb) target remote :1234

更多的信息请见搭建Android GDB在线调试环境

  • GDB + Eclipse 离线调试

工欲善其事必先利其器,分析NE问题可以使用命令行形式的GDB工具,如果你熟悉GDB的各种命令,那么命令行的方式可以让你得心应手,另外也还可以使用GDB + Eclipse打造一个可视化的调试环境,虽然功能没有命令行强大,但是对我们分析简单的问题足矣,下面介绍如何搭建环境:
1、打开ADT之后,依次点击Run → Debug Configration,然后选择C/C++ Postmortem Debugger

2、点击左上角的 "+"符号,新建一个配置,并随机取一个名字,例如“android_gdb”, C/C++ Appliacation选择你的Coredump文件对应的可执行文件,例如SurfaceFlinger,可以选择/symbols/system/bin/surfaceflinger,但是由zygote派生出来的进程要选择/symbols/system/bin/app_process64, 同时 Post Mortem file type选择 Core file,点击Browse定位到Coredump文件.

3、切换到Debugger选项卡,GDB debugger选择对应平台的gdb可执行文件,GDB command file对应的文件是你想在打开coredump文件之后想要执行的gdb命令,我的gdbinit文件内容是: set solib-search-path /media/xxxx/SSD/tmp/Log/0622/symbols/system/lib64 设置GDB的lib库查找路径,这样GDB就可以把带符号信息的so库加载进去了

4、点击Debug按钮之后会出现完整的debug视图

  • GDB脚本

GDB脚本 gdb支持两种脚本:python脚本和命令脚本,在命令脚本中我们可以自定义命令,其形式类似于:

  define commandName 
   statement 
   ...... 
  end

其中 statement可以是任何有效的GDB命令,此外自定义命令还支持最多10个输入参数:$arg0,$arg1 …… $arg9,并且还用$argc来标明一共传入了多少参数,另外脚本也提供了if else等条件判断语句和while循环语句,可以直接在命令行里面编辑gdb脚本,也可以写到一个单独的文件里面,然后使用source命令加载进来.

  • GDB调试coredump示例
    在monkey测试过程中,发现有一台机器卡屏了,通过log分析到可能是system_server进程的ART虚拟机在抓取trace或者gc时候,调用SuspendAll的时候超时了,这种情况以前也遇到过,也是抓coredump文件来分析的,所以这一次我们也是直接发送了kill -11信号给system_server进程,然后抓到coredump文件.拿到了coredump文件之后,还需要这个固件对应的symbole文件分析.
[Linux@Linux w]$ls
core-system_server-3060  symbols  symbols.zip
[Linux@Linux w]$aarch64-linux-android-gdb ./symbols/system/bin/app_process64 ./core-system_server-3060 
GNU gdb (GDB) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-linux-gnu --target=aarch64-elf-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://source.android.com/source/report-bugs.html>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./symbols/system/bin/app_process64...done.
[New LWP 3060]
[New LWP 3065]
[New LWP 3173]
[New LWP 3067]
[New LWP 3066]
......
......
[New LWP 3954]
[New LWP 3086]
[New LWP 3128]

warning: Could not load shared library symbols for 194 libraries, e.g. /system/bin/linker64.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000007962221cac in ?? ()
(gdb) set solib-search-path ./symbols/system/lib64/
Reading symbols from /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libcutils.so...done.
Loaded symbols for /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libcutils.so
Reading symbols from /media/xxxx/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libutils.so...done.
Loaded symbols for /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libutils.so
Reading symbols from /media/xxxx/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/liblog.so...done.
Loaded symbols for /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/liblog.so
......
(gdb) bt
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1  0x000000795f0c63dc in futex (uaddr=0x795f6fa910, op=0, val=17669, val3=0, timeout=<optimized out>, uaddr2=<optimized out>) at art/runtime/base/mutex-inl.h:45
#2  art::ConditionVariable::WaitHoldingLocks (this=<optimized out>, self=<optimized out>) at art/runtime/base/mutex.cc:848
#3  0x000000795f3272e8 in TransitionFromSuspendedToRunnable (this=<optimized out>) at art/runtime/thread-inl.h:209
#4  ScopedThreadStateChange (self=<optimized out>, new_thread_state=art::kRunnable, this=<optimized out>) at art/runtime/scoped_thread_state_change.h:51
#5  ScopedObjectAccessUnchecked (this=<optimized out>, env=<optimized out>) at art/runtime/scoped_thread_state_change.h:224
#6  ScopedObjectAccess (this=<optimized out>, env=<optimized out>) at art/runtime/scoped_thread_state_change.h:255
#7  art::JNI::NewStringUTF (env=<optimized out>, utf=<optimized out>) at art/runtime/jni_internal.cc:1646
#8  0x0000007961acce64 in NewStringUTF (bytes=<optimized out>, this=0x795f63e180) at libnativehelper/include/nativehelper/jni.h:842
#9  android::android_content_AssetManager_getArrayStringResource (env=0x795f63e180, clazz=<optimized out>, arrayResId=<optimized out>) at frameworks/base/core/jni/android_util_AssetManager.cpp:1977
#10 0x00000000748f498c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

因为这个机器是虚拟机在suspend all的时候卡住的,分析代码,这里卡住的话一般是因为某些线程没有及时响应suspend flag,而不响应的话一般是这个线程的状态是mRunnable状态,注意这里指的是ART的线程状态不是Linux的R状态,这两个之间还是有区别的,那我们的思路就是要从coredump文件找出是哪个线程还在mRunnable状态,因为所有的Java线程对应的art::Thread对象都在ThreadList的list_域变量里面,所以我们只要把这个list_对象内容打印出来,就可以找到是哪个Java线程是mRunnable状态.

  // The actual list of all threads.
  std::list<Thread*> list_ GUARDED_BY(Locks::thread_list_lock_);

而要打印这个list_的内容的话,需要从上下文里面找到ThreadList对象,这个可以通过Runtime的全局变量推导出来,也可以找到哪个线程的调用堆栈上下文里面有这个ThreadList对象的,然后找出来,我们这里选用第二种方式,因为ThreadList::SuspendAllInternal的方法恰好就有this参数,通过this就很容易找到ThreadList对象,在虚拟机中调用这个的地方只有SignalCatcher 或者HeapTaskDaemon线程,他们一个负责打印trace,一个负责执行gc task,所以先从现场或者log里面找到这两个线程的pid,然后通过gdb来查看他们当前的堆栈. 我们找到这两个线程的pid分别为3065和3070.

(gdb) info threads
  Id   Target Id         Frame 
  180  LWP 3128          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  179  LWP 3086          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  178  LWP 3954          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  177  LWP 3218          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  ......
  127  LWP 3167          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  ---Type <return> to continue, or q <return> to quit---

因为GDB对线程重新编了号,所以我们要找到3065和3070对应的编号,而且我们看到在GDB里面有输出
“---Type <return> to continue, or q <return> to quit---”这样的内容,这个是因为GDB默认对于输出内容很长的做了截断,可以通过set pagination off来改变这种行为.

(gdb) set pagination off
(gdb) info threads
......
9    LWP 3070          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
......
2    LWP 3065          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41

(gdb) t 2
[Switching to thread 2 (LWP 3065)]
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41  bionic/libc/arch-arm64/bionic/syscall.S: 没有那个文件或目录.
(gdb) bt
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1  0x000000795f0c63dc in futex (uaddr=0x795f6fa910, op=0, val=17669, val3=0, timeout=<optimized out>, uaddr2=<optimized out>) at art/runtime/base/mutex-inl.h:45
#2  art::ConditionVariable::WaitHoldingLocks (this=<optimized out>, self=<optimized out>) at art/runtime/base/mutex.cc:848
#3  0x000000795f120234 in TransitionFromSuspendedToRunnable (this=<optimized out>) at art/runtime/thread-inl.h:209
#4  ScopedThreadStateChange (new_thread_state=art::kRunnable, this=<optimized out>, self=<optimized out>) at art/runtime/scoped_thread_state_change.h:51
#5  ScopedObjectAccessUnchecked (this=<optimized out>, self=<optimized out>) at art/runtime/scoped_thread_state_change.h:231
#6  ScopedObjectAccess (self=<optimized out>, this=<optimized out>) at art/runtime/scoped_thread_state_change.h:261
#7  art::ClassLinker::DumpForSigQuit (this=<optimized out>, os=...) at art/runtime/class_linker.cc:7752
#8  0x000000795f415950 in art::Runtime::DumpForSigQuit (this=0x795f6ec000, os=...) at art/runtime/runtime.cc:1401
#9  0x000000795f41c27c in art::SignalCatcher::HandleSigQuit (this=<optimized out>) at art/runtime/signal_catcher.cc:145
#10 0x000000795f41ad3c in art::SignalCatcher::Run (arg=<optimized out>) at art/runtime/signal_catcher.cc:214
#11 0x000000796226e0f0 in __pthread_start (arg=<optimized out>) at bionic/libc/bionic/pthread_create.cpp:198
#12 0x0000007962223944 in __start_thread (fn=0x62, arg=0x795f6fa910) at bionic/libc/bionic/clone.cpp:41
#13 0x0000000000000000 in ?? ()

(gdb) t 9
[Switching to thread 9 (LWP 3070)]
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41  in bionic/libc/arch-arm64/bionic/syscall.S
(gdb) bt
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1  0x000000795f43bb2c in futex (val3=0, uaddr=<optimized out>, op=<optimized out>, val=<optimized out>, timeout=<optimized out>, uaddr2=<optimized out>) at art/runtime/base/mutex-inl.h:45
#2  art::ThreadList::SuspendAllInternal (this=<optimized out>, self=<optimized out>, ignore1=<optimized out>, ignore2=<optimized out>, debug_suspend=<optimized out>) at art/runtime/thread_list.cc:586
#3  0x000000795f43c198 in art::ThreadList::SuspendAll (this=0x795f6fb000, cause=0x795f55e996 "ScopedPause", long_suspend=<optimized out>) at art/runtime/thread_list.cc:476
#4  0x000000795f1c6d4c in art::gc::collector::MarkSweep::RunPhases (this=<optimized out>) at art/runtime/gc/collector/mark_sweep.cc:153
#5  0x000000795f1bf490 in art::gc::collector::GarbageCollector::Run (this=0x795f687500, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=false) at art/runtime/gc/collector/garbage_collector.cc:87
#6  0x000000795f1ef0a4 in art::gc::Heap::CollectGarbageInternal (this=<optimized out>, gc_type=<optimized out>, gc_cause=<optimized out>, clear_soft_references=<optimized out>) at art/runtime/gc/heap.cc:2719
#7  0x000000795f1f65dc in art::gc::Heap::ConcurrentGC (this=0x795f64b700, self=<optimized out>, force_full=true) at art/runtime/gc/heap.cc:3722
#8  0x000000795f1fd668 in art::gc::Heap::ConcurrentGCTask::Run (this=<optimized out>, self=0x0) at art/runtime/gc/heap.cc:3685
#9  0x000000795f21f2c4 in art::gc::TaskProcessor::RunAllTasks (this=<optimized out>, self=<optimized out>) at art/runtime/gc/task_processor.cc:124
#10 0x0000000072739114 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) 

从上面gdb命令的执行结果来看,ThreadList对象的地址是0x795f6fb000,那么可以通过它找到保存了所有Thread对象的list_地址.

#3 0x000000795f43c198 in art::ThreadList::SuspendAll (this=0x795f6fb000, cause=0x795f55e996 "ScopedPause", long_suspend=<optimized out>) at art/runtime/thread_list.cc:476.
(gdb) set print pretty on
(gdb) f 3
#3  0x000000795f43c198 in art::ThreadList::SuspendAll (this=0x795f6fb000, cause=0x795f55e996 "ScopedPause", long_suspend=<optimized out>) at art/runtime/thread_list.cc:476
476 in art/runtime/thread_list.cc
(gdb) p *this
$2 = {
  static kMaxThreadId = 65535, 
  static kInvalidThreadId = 0, 
  static kMainThreadId = 1, 
  allocated_ids_ = {
    <std::__1::__bitset<1024, 65535>> = {
      static __bits_per_word = 64, 
      __first_ = {18446744073709551615, 18446744073709551615, 38654705663, 0 <repeats 1021 times>}
    }, 
    members of std::__1::bitset<65535>: 
    static __n_words = 1024
  }, 
  list_ = {
    <std::__1::__list_imp<art::Thread*, std::__1::allocator<art::Thread*> >> = {
      __end_ = {
        __prev_ = 0x792df94ee0, 
        __next_ = 0x795f6fa9a0
      }, 
      __size_alloc_ = {
        <std::__1::__libcpp_compressed_pair_imp<unsigned long, std::__1::allocator<std::__1::__list_node<art::Thread*, void*> >, 2>> = {
          <std::__1::allocator<std::__1::__list_node<art::Thread*, void*> >> = {<No data fields>}, 
          members of std::__1::__libcpp_compressed_pair_imp<unsigned long, std::__1::allocator<std::__1::__list_node<art::Thread*, void*> >, 2>: 
          __first_ = 161
        }, <No data fields>}
    }, <No data fields>}, 
  suspend_all_count_ = 1, 
  debug_suspend_all_count_ = 0, 
 ......
}

因为list_是一个很长的列表,所以这里先自定义一个GDB命令,用来自动打印每个Thread对象内容

(gdb) def dump_all_threads_state
Type commands for definition of "dump_all_threads_state".
End with a line saying just "end".
>    set $current = list_.__end_.__next_
>    while $current != 0
 >        p * $current.__value_
 >        set $current = $current.__next_
 >    end
>end 

(gdb) dump_all_threads_state
$3 = {
  static kStackOverflowImplicitCheckSize = 8192, 
  static kMaxCheckpoints = 3, 
  static kMaxSuspendBarriers = 3, 
  static is_started_ = true, 
  static pthread_key_self_ = -2147483634, 
  static resume_cond_ = 0x795f6fa900, 
  static is_sensitive_thread_hook_ = 0x7961a76f20 <android::runtime_isSensitiveThread()>, 
  static jit_sensitive_thread_ = 0x0, 
  tls32_ = {
    state_and_flags = {
      as_struct = {
        flags = 1, 
        state = 89
      }, 
      as_atomic_int = {
        <std::__1::atomic<int>> = {
          <std::__1::__atomic_base<int, true>> = {
            <std::__1::__atomic_base<int, false>> = {
              __a_ = 5832705
            }, <No data fields>}, <No data fields>}, <No data fields>}, 
      as_int = 5832705
    }, 
    suspend_count = 1, 
    debug_suspend_count = 0, 
    thin_lock_thread_id = 1, 
    tid = 3060, 
    daemon = 0, 
    throwing_OutOfMemoryError = 0, 
    no_thread_suspension = 0, 
    thread_exit_check_count = 0, 
    handling_signal_ = 0, 
    suspended_at_suspend_check = 0, 
    ready_for_debug_invoke = 0, 
    debug_method_entry_ = 0, 
    is_gc_marking = 0, 
    weak_ref_access_enabled = 1, 
    disable_thread_flip_count = 0
  }, 
  tls64_ = {
    trace_clock_base = 0, 
    stats = {
      allocated_objects = 0, 
      allocated_bytes = 0, 
      freed_objects = 0, 
      freed_bytes = 0, 
      gc_for_alloc_count = 0, 
      class_init_count = 2682, 
      class_init_time_ns = 1043034049
    }
  }, 
  tlsPtr_ = {
    card_table = 0x795ad01070 "", 
    exception = 0x0, 
    stack_end = 0x7ffbc34000 "", 
    managed_stack = {
      top_quick_frame_ = 0x7ffc42d940, 
      link_ = 0x7ffc42e050, 
      top_shadow_frame_ = 0x0
    }, 
    suspend_trigger = 0x0, 
    jni_env = 0x795f63e180, 
    tmp_jni_env = 0x0, 
    self = 0x0, 
    opeer = 0x762523e8, 
    jpeer = 0x0, 
    stack_begin = 0x7ffbc32000 "", 
    stack_size = 8388608, 
    stack_trace_sample = 0x0, 
    wait_next = 0x0, 
    monitor_enter_object = 0x0, 
    top_handle_scope = 0x7ffc42d948, 
    class_loader_override = 0x10070a, 
    long_jump_context = 0x795f687c80, 
    instrumentation_stack = 0x795f716e90, 
    debug_invoke_req = 0x0, 
    single_step_control = 0x0, 
    stacked_shadow_frame_record = 0x0, 
    deoptimization_context_stack = 0x0, 
    frame_id_to_shadow_frame = 0x0, 
    name = 0x795f6fa980, 
    pthread_self = 521358133912, 
    last_no_thread_suspension_cause = 0x0, 
    checkpoint_functions = {0x0, 0x0, 0x0}, 
    active_suspend_barriers = {0x0, 0x0, 0x0}, 
    jni_entrypoints = {
      pDlsymLookup = 0x795f0b04d0 <art_jni_dlsym_lookup_stub>
    }, 
    quick_entrypoints = {
      pAllocArray = 0x795f0b4420 <art_quick_alloc_array_rosalloc>, 
      pAllocArrayResolved = 0x795f0b44e0 <art_quick_alloc_array_resolved_rosalloc>, 
      pAllocArrayWithAccessCheck = 0x795f0b45a0 <art_quick_alloc_array_with_access_check_rosalloc>, 
      pAllocObject = 0x795f0b9cc0 <art_quick_alloc_object_rosalloc>, 
      pAllocObjectResolved = 0x795f0b41e0 <art_quick_alloc_object_resolved_rosalloc>, 
      pAllocObjectInitialized = 0x795f0b42a0 <art_quick_alloc_object_initialized_rosalloc>, 
      pAllocObjectWithAccessCheck = 0x795f0b4360 <art_quick_alloc_object_with_access_check_rosalloc>, 
      pCheckAndAllocArray = 0x795f0b4660 <art_quick_check_and_alloc_array_rosalloc>, 
      pCheckAndAllocArrayWithAccessCheck = 0x795f0b4720 <art_quick_check_and_alloc_array_with_access_check_rosalloc>, 
      pAllocStringFromBytes = 0x795f0b47e0 <art_quick_alloc_string_from_bytes_rosalloc>, 
      pAllocStringFromChars = 0x795f0b48f0 <art_quick_alloc_string_from_chars_rosalloc>, 
      pAllocStringFromString = 0x795f0b49b0 <art_quick_alloc_string_from_string_rosalloc>, 
      pInstanceofNonTrivial = 0x795f516374 <artIsAssignableFromCode(art::mirror::Class*, art::mirror::Class*)>, 
      pCheckCast = 0x795f0b17f0 <art_quick_check_cast>, 
      pInitializeStaticStorage = 0x795f0b1a40 <art_quick_initialize_static_storage>, 
      pInitializeTypeAndVerifyAccess = 0x795f0b1bc0 <art_quick_initialize_type_and_verify_access>, 
      pInitializeType = 0x795f0b1b00 <art_quick_initialize_type>, 
      pResolveString = 0x795f0b2e80 <art_quick_resolve_string>, 
      pSet8Instance = 0x795f0b2a00 <art_quick_set8_instance>, 
      pSet8Static = 0x795f0b2700 <art_quick_set8_static>, 
      pSet16Instance = 0x795f0b2ac0 <art_quick_set16_instance>, 
      pSet16Static = 0x795f0b27c0 <art_quick_set16_static>, 
      pSet32Instance = 0x795f0b2b80 <art_quick_set32_instance>, 
      pSet32Static = 0x795f0b2880 <art_quick_set32_static>, 
      pSet64Instance = 0x795f0b2c40 <art_quick_set64_instance>, 
      pSet64Static = 0x795f0b2dc0 <art_quick_set64_static>, 
      pSetObjInstance = 0x795f0b2d00 <art_quick_set_obj_instance>, 
      pSetObjStatic = 0x795f0b2940 <art_quick_set_obj_static>, 
      pGetByteInstance = 0x795f0b2280 <art_quick_get_byte_instance>, 
      pGetBooleanInstance = 0x795f0b21c0 <art_quick_get_boolean_instance>, 
      pGetByteStatic = 0x795f0b1d40 <art_quick_get_byte_static>, 
      pGetBooleanStatic = 0x795f0b1c80 <art_quick_get_boolean_static>, 
      pGetShortInstance = 0x795f0b2400 <art_quick_get_short_instance>, 
      pGetCharInstance = 0x795f0b2340 <art_quick_get_char_instance>, 
      pGetShortStatic = 0x795f0b1ec0 <art_quick_get_short_static>, 
      pGetCharStatic = 0x795f0b1e00 <art_quick_get_char_static>, 
      pGet32Instance = 0x795f0b24c0 <art_quick_get32_instance>, 
      pGet32Static = 0x795f0b1f80 <art_quick_get32_static>, 
      pGet64Instance = 0x795f0b2580 <art_quick_get64_instance>, 
      pGet64Static = 0x795f0b2040 <art_quick_get64_static>, 
      pGetObjInstance = 0x795f0b2640 <art_quick_get_obj_instance>, 
      pGetObjStatic = 0x795f0b2100 <art_quick_get_obj_static>, 
      pAputObjectWithNullAndBoundCheck = 0x795f0b1870 <art_quick_aput_obj_with_null_and_bound_check>, 
      pAputObjectWithBoundCheck = 0x795f0b1880 <art_quick_aput_obj_with_bound_check>, 
      pAputObject = 0x795f0b18a0 <art_quick_aput_obj>, 
      pHandleFillArrayData = 0x795f0b1980 <art_quick_handle_fill_data>, 
      pJniMethodStart = 0x795f523c2c <art::JniMethodStart(art::Thread*)>, 
      pJniMethodStartSynchronized = 0x795f523db0 <art::JniMethodStartSynchronized(_jobject*, art::Thread*)>, 
      pJniMethodEnd = 0x795f523dec <art::JniMethodEnd(unsigned int, art::Thread*)>, 
      pJniMethodEndSynchronized = 0x795f5240d4 <art::JniMethodEndSynchronized(unsigned int, _jobject*, art::Thread*)>, 
      pJniMethodEndWithReference = 0x795f5242f4 <art::JniMethodEndWithReference(_jobject*, unsigned int, art::Thread*)>, 
      pJniMethodEndWithReferenceSynchronized = 0x795f5243a4 <art::JniMethodEndWithReferenceSynchronized(_jobject*, unsigned int, _jobject*, art::Thread*)>, 
      pQuickGenericJniTrampoline = 0x795f0ba500 <art_quick_generic_jni_trampoline>, 
      pLockObject = 0x795f0b1430 <art_quick_lock_object>, 
      pUnlockObject = 0x795f0b1610 <art_quick_unlock_object>, 
      pCmpgDouble = 0x0, 
      pCmpgFloat = 0x0, 
      pCmplDouble = 0x0, 
      pCmplFloat = 0x0, 
      pCos = 0x7961075168 <cos>, 
      pSin = 0x7961079e78 <sin>, 
      pAcos = 0x796106b978 <acos>, 
      pAsin = 0x796106c128 <asin>, 
      pAtan = 0x7961074400 <atan>, 
      pAtan2 = 0x796106c55c <atan2>, 
      pCbrt = 0x7961074844 <cbrt>, 
      pCosh = 0x796106cbb4 <cosh>, 
      pExp = 0x796106cd98 <exp>, 
      pExpm1 = 0x7961077cdc <expm1>, 
      pHypot = 0x796106d688 <hypot>, 
      pLog = 0x7961071960 <log>, 
      pLog10 = 0x79610712c0 <log10>, 
      pNextAfter = 0x79610792c8 <nextafter>, 
      pSinh = 0x79610730f0 <sinh>, 
      pTan = 0x796107a72c <tan>, 
      pTanh = 0x796107aefc <tanh>, 
      pFmod = 0x796106d204 <fmod>, 
      pL2d = 0x0, 
      pFmodf = 0x796106d4f4 <fmodf>, 
      pL2f = 0x0, 
      pD2iz = 0x0, 
      pF2iz = 0x0, 
      pIdivmod = 0x0, 
      pD2l = 0x0, 
      pF2l = 0x0, 
      pLdiv = 0x0, 
      pLmod = 0x0, 
      pLmul = 0x0, 
      pShlLong = 0x0, 
      pShrLong = 0x0, 
      pUshrLong = 0x0, 
      pIndexOf = 0x795f0ba930 <art_quick_indexof>, 
      pStringCompareTo = 0x795f0baa00 <art_quick_string_compareto>, 
      pMemcpy = 0x79622208c8 <memcpy>, 
      pQuickImtConflictTrampoline = 0x795f0ba290 <art_quick_imt_conflict_trampoline>, 
      pQuickResolutionTrampoline = 0x795f0ba3c0 <art_quick_resolution_trampoline>, 
      pQuickToInterpreterBridge = 0x795f0ba650 <art_quick_to_interpreter_bridge>, 
      pInvokeDirectTrampolineWithAccessCheck = 0x795f0b0a70 <art_quick_invoke_direct_trampoline_with_access_check>, 
      pInvokeInterfaceTrampolineWithAccessCheck = 0x795f0b0870 <art_quick_invoke_interface_trampoline_with_access_check>, 
      pInvokeStaticTrampolineWithAccessCheck = 0x795f0b0970 <art_quick_invoke_static_trampoline_with_access_check>, 
      pInvokeSuperTrampolineWithAccessCheck = 0x795f0b0b70 <art_quick_invoke_super_trampoline_with_access_check>, 
      pInvokeVirtualTrampolineWithAccessCheck = 0x795f0b0c70 <art_quick_invoke_virtual_trampoline_with_access_check>, 
      pTestSuspend = 0x795f0ba090 <art_quick_test_suspend>, 
      pDeliverException = 0x795f0b0660 <art_quick_deliver_exception>, 
      pThrowArrayBounds = 0x795f0b0760 <art_quick_throw_array_bounds>, 
      pThrowDivZero = 0x795f0b0710 <art_quick_throw_div_zero>, 
      pThrowNoSuchMethod = 0x795f0b0810 <art_quick_throw_no_such_method>, 
      pThrowNullPointer = 0x795f0b06c0 <art_quick_throw_null_pointer_exception>, 
      pThrowStackOverflow = 0x795f0b07c0 <art_quick_throw_stack_overflow>, 
      pDeoptimize = 0x795f0ba8d0 <art_quick_deoptimize_from_compiled_code>, 
      pA64Load = 0x795f4253b8 <art::UnimplementedEntryPoint()>, 
      pA64Store = 0x795f4253b8 <art::UnimplementedEntryPoint()>, 
      pNewEmptyString = 0x70e8c810, 
      pNewStringFromBytes_B = 0x70e8c848, 
      pNewStringFromBytes_BI = 0x70e8c880, 
      pNewStringFromBytes_BII = 0x70e8c8b8, 
      pNewStringFromBytes_BIII = 0x70e8c8f0, 
      pNewStringFromBytes_BIIString = 0x70e8c928, 
      pNewStringFromBytes_BString = 0x70e8c998, 
      pNewStringFromBytes_BIICharset = 0x70e8c960, 
      pNewStringFromBytes_BCharset = 0x70e8c9d0, 
      pNewStringFromChars_C = 0x70e8ca40, 
      pNewStringFromChars_CII = 0x70e8ca78, 
      pNewStringFromChars_IIC = 0x70e8ca08, 
      pNewStringFromCodePoints = 0x70e8cab0, 
      pNewStringFromString = 0x70e8cae8, 
      pNewStringFromStringBuffer = 0x70e8cb20, 
      pNewStringFromStringBuilder = 0x70e8cb58, 
      pReadBarrierJni = 0x795f523c28 <art::ReadBarrierJni(art::mirror::CompressedReference<art::mirror::Object>*, art::Thread*)>, 
      pReadBarrierMark = 0x795f5235c4 <artReadBarrierMark(art::mirror::Object*)>, 
      pReadBarrierSlow = 0x795f5236e8 <artReadBarrierSlow(art::mirror::Object*, art::mirror::Object*, uint32_t)>, 
      pReadBarrierForRootSlow = 0x795f5236f0 <artReadBarrierForRootSlow(art::GcRoot<art::mirror::Object>*)>
    }, 
    thread_local_objects = 0, 
    thread_local_start = 0x0, 
    thread_local_pos = 0x0, 
    thread_local_end = 0x0, 
    mterp_current_ibase = 0x795f0a0280 <artMterpAsmInstructionStart>, 
    mterp_default_ibase = 0x795f0a0280 <artMterpAsmInstructionStart>, 
    mterp_alt_ibase = 0x795f0a8280 <artMterpAsmSisterStart>, 
    rosalloc_runs = {0x795f5fcf08 <art::gc::allocator::RosAlloc::dedicated_full_run_storage_>, 0x13754000, 0x149d7000, 0x14e1f000, 0x14595000, 0x1457c000, 0x13e65000, 0x14186000, 0x14828000, 0x12f71000, 0x13f18000, 0x795f5fcf08 <art::gc::allocator::RosAlloc::dedicated_full_run_storage_>, 0x795f5fcf08 <art::gc::allocator::RosAlloc::dedicated_full_run_storage_>, 0x13c29000, 0x795f5fcf08 <art::gc::allocator::RosAlloc::dedicated_full_run_storage_>, 0x795f5fcf08 <art::gc::allocator::RosAlloc::dedicated_full_run_storage_>}, 
    thread_local_alloc_stack_top = 0x795a52b8b8, 
    thread_local_alloc_stack_end = 0x795a52ba00, 
    held_mutexes = {0x0 <repeats 63 times>}, 
    nested_signal_state = 0x795f67f300, 
    flip_function = 0x0, 
    method_verifier = 0x0, 
    thread_local_mark_stack = 0x0
  }, 
  wait_mutex_ = 0x795f719080, 
  wait_cond_ = 0x795f6fa960, 
  wait_monitor_ = 0x0, 
  interrupted_ = false, 
  debug_disallow_read_barrier_ = 0 '\000'
}
...... //此处省略N个Thread对象的打印
Cannot access memory at address 0xa1
(gdb) 

从上面打印出来的N个Thread对象的内容来看,我们很容易找到处于kRunnable状态的线程,它的pid为3093,因为它的state = 67,也就是kRunnable.

enum ThreadState {
  //                                   Thread.State   JDWP state
  kTerminated = 66,                 // TERMINATED     TS_ZOMBIE    Thread.run has returned, but Thread* still around
  kRunnable,                        // RUNNABLE       TS_RUNNING   runnable
  kTimedWaiting,                    // TIMED_WAITING  TS_WAIT      in Object.wait() with a timeout
  kSleeping,                        // TIMED_WAITING  TS_SLEEPING  in Thread.sleep()
  kBlocked,                         // BLOCKED        TS_MONITOR   blocked on a monitor
  kWaiting,                         // WAITING        TS_WAIT      in Object.wait()
  kWaitingForGcToComplete,          // WAITING        TS_WAIT      blocked waiting for GC
  kWaitingForCheckPointsToRun,      // WAITING        TS_WAIT      GC waiting for checkpoints to run
  kWaitingPerformingGc,             // WAITING        TS_WAIT      performing GC
  kWaitingForDebuggerSend,          // WAITING        TS_WAIT      blocked waiting for events to be sent
  kWaitingForDebuggerToAttach,      // WAITING        TS_WAIT      blocked waiting for debugger to attach
  kWaitingInMainDebuggerLoop,       // WAITING        TS_WAIT      blocking/reading/processing debugger events
  kWaitingForDebuggerSuspension,    // WAITING        TS_WAIT      waiting for debugger suspend all
  kWaitingForJniOnLoad,             // WAITING        TS_WAIT      waiting for execution of dlopen and JNI on load code
  kWaitingForSignalCatcherOutput,   // WAITING        TS_WAIT      waiting for signal catcher IO to complete
  kWaitingInMainSignalCatcherLoop,  // WAITING        TS_WAIT      blocking/reading/processing signals
  kWaitingForDeoptimization,        // WAITING        TS_WAIT      waiting for deoptimization suspend all
  kWaitingForMethodTracingStart,    // WAITING        TS_WAIT      waiting for method tracing to start
  kWaitingForVisitObjects,          // WAITING        TS_WAIT      waiting for visiting objects
  kWaitingForGetObjectsAllocated,   // WAITING        TS_WAIT      waiting for getting the number of allocated objects
  kWaitingWeakGcRootRead,           // WAITING        TS_WAIT      waiting on the GC to read a weak root
  kWaitingForGcThreadFlip,          // WAITING        TS_WAIT      waiting on the GC thread flip (CC collector) to finish
  kStarting,                        // NEW            TS_WAIT      native thread started, not yet ready to run managed code
  kNative,                          // RUNNABLE       TS_RUNNING   running in a JNI native method
  kSuspended,                       // RUNNABLE       TS_RUNNING   suspended by GC or debugger
};

tls32_ = {
    state_and_flags = {
      as_struct = {
        flags = 5, 
        state = 67
      }, 
      as_atomic_int = {
        <std::__1::atomic<int>> = {
          <std::__1::__atomic_base<int, true>> = {
            <std::__1::__atomic_base<int, false>> = {
              __a_ = 4390917
            }, <No data fields>}, <No data fields>}, <No data fields>}, 
      as_int = 4390917
    }, 
    suspend_count = 1, 
    debug_suspend_count = 0, 
    thin_lock_thread_id = 20, 
    tid = 3093, 
    daemon = 0, 
    throwing_OutOfMemoryError = 0, 
    no_thread_suspension = 0, 
    thread_exit_check_count = 0, 
    handling_signal_ = 0, 
    suspended_at_suspend_check = 0, 
    ready_for_debug_invoke = 0, 
    debug_method_entry_ = 0, 
    is_gc_marking = 0, 
    weak_ref_access_enabled = 1, 
    disable_thread_flip_count = 0
  }
(gdb) t 176
[Switching to thread 176 (LWP 3093)]
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41  bionic/libc/arch-arm64/bionic/syscall.S: 没有那个文件或目录.
(gdb) bt
#0  syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#1  0x000000796226eb84 in __futex (op=<optimized out>, value=<optimized out>, timeout=0x0, bitset=-1, ftx=<optimized out>) at bionic/libc/private/bionic_futex.h:48
#2  __futex_wait_ex (value=<optimized out>, ftx=<optimized out>, shared=<optimized out>, use_realtime_clock=<optimized out>, abs_timeout=<optimized out>) at bionic/libc/private/bionic_futex.h:70
#3  __pthread_normal_mutex_lock (abs_timeout_or_null=<optimized out>, mutex=<optimized out>, shared=<optimized out>, use_realtime_clock=<optimized out>) at bionic/libc/bionic/pthread_mutex.cpp:327
#4  __pthread_mutex_lock_with_timeout (mutex=<optimized out>, use_realtime_clock=<optimized out>, abs_timeout_or_null=<optimized out>) at bionic/libc/bionic/pthread_mutex.cpp:430
#5  0x0000007961ad0354 in android::android_content_AssetManager_applyStyle (env=0x795127c740, themeToken=520810520368, defStyleAttr=<optimized out>, defStyleRes=16974731, xmlParserToken=1982366608, attrs=0x795f0bdcb4 <art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+256>, outValues=0x795f197c50 <art::gc::allocator::RosAlloc::AllocFromRun(art::Thread*, unsigned long, unsigned long*, unsigned long*, unsigned long*)+348>, outIndices=0x7942b9dee0, clazz=<optimized out>) at frameworks/base/core/jni/android_util_AssetManager.cpp:1430
#6  0x00000000748f3ecc in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

从这个堆栈来看它已经进入了JNI函数,按理来说它应该是kNative状态才对,但是这里却为kRunnable状态,有点奇怪,查看进入Jni函数的代码:

extern uint32_t JniMethodStart(Thread* self) {
  JNIEnvExt* env = self->GetJniEnv();
  DCHECK(env != nullptr);
  uint32_t saved_local_ref_cookie = env->local_ref_cookie;
  env->local_ref_cookie = env->locals.GetSegmentState();
  ArtMethod* native_method = *self->GetManagedStack()->GetTopQuickFrame();
  if (!native_method->IsFastNative()) { //如果这个Jni方法不是fast native方法,就改为suspend状态
    // When not fast JNI we transition out of runnable.
    self->TransitionFromRunnableToSuspended(kNative);
  }
  return saved_local_ref_cookie;
}

所以如果这个Native方法是fast native方法的话,那么它的状态就还是kRunnable,我们看android_content_AssetManager_applyStyle这个Jni函数注册的地方:

{ "applyStyle","!(JIIJ[I[I[I)Z",(void*) android_content_AssetManager_applyStyle }

注册的时候有加!号,所以这个函数的确是一个fast native方法,所以它的状态就是kRunnable,fast native方法应该是指能够很快返回的jni方法,所以可以不用转换状态,本来是一种优化措施,但是从上面的堆栈来看,这个fast native方法却在等锁,一旦等锁的话,就可能不是那么快执行完了,所以觉得这里把它置为fast native不是那么合适,而应该去掉前面的 !号,这样就可以在进入JNI之后变为kNative状态,ART也不会卡死.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 206,723评论 6 481
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 88,485评论 2 382
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 152,998评论 0 344
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 55,323评论 1 279
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 64,355评论 5 374
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,079评论 1 285
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,389评论 3 400
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,019评论 0 259
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 43,519评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,971评论 2 325
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,100评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,738评论 4 324
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,293评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,289评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,517评论 1 262
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,547评论 2 354
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,834评论 2 345

推荐阅读更多精彩内容

  • layout: wikititle: Android逆向分析笔记categories: Reverse_Engin...
    超哥__阅读 10,681评论 1 17
  • Android 自定义View的各种姿势1 Activity的显示之ViewRootImpl详解 Activity...
    passiontim阅读 171,515评论 25 707
  • 作为曾经的学生党,看到这样接地气的问题坐不住了,必须要来答一答。 对于题主的疑问,首先要说明的是,排座位是个必须的...
    岚风的叶子阅读 1,452评论 0 1
  • 亲爱的自己: 今天是否舍不得睡? 生命翻开了新的篇章,生活总是给予太多的惊喜。 一直觉得自己很幸运,得到了生命中很...
    周洋_图乐园阅读 214评论 4 7
  • 1、这个时代充斥着太多的假象,媒体建造了一个个舞台,聚光灯,音响,音乐.....营造着浮夸的氛围,台下的观众沸腾着...
    忧小刺阅读 121评论 0 0