[toc]
1. java1.8 中的Reference结构
在jdk1.8中,Reference位于java.lan.ref包中。
主要的类有:Reference、SoftReference、WeakReference、PhantomReference及FinalReference、和Finalizer。其中最核心的是抽象类Reference,其他的Reference都继承了这个抽象类。分别对应java的软、弱、虚引用。而强引用是系统缺省的引用关系,用等号即可表示。因此没有专门的类。另外还有一个FinalReference,这个类主要是配合Finalizer机制使用。Finalizer本身存在诸多问题,在jdk1.9中已经被替换为另外一种Cleaner机制来配合PhantomReference机制,本文暂不涉及jdk1.9中的内容仅限于jdk1.8。
还有一个关键的类是ReferenceQueue,
java.lan.ref包中各类的关系如下图:
也可以通过idea提供的Diagram查看:
上述Reference总结见下表:
类名 | 引用类型 | 说明 |
---|---|---|
SoftReference | 软引用 | 堆内存不足时,垃圾回收器会回收对应引用 |
WeakReference | 弱引用 | 每次垃圾回收都会回收其引用 |
PhantomReference | 虚引用 | 对引用无影响,只用于获取对象被回收的通知 |
FinalReference | - | Java用于实现finalization的一个内部类 |
2. 引用与可达性
要搞懂Reference,必须要对GC的过程进行进一步的了解。
我们在前文中已经体会了jvm中定义的这些引用的具体用法。
我们知道,GC决定是否对一个对象进行回收,主要根据的是从GC ROOT 节点往下搜索,进行可达性计算。GC根据可达性结果决定是否对这些对象进行回收。可达性主要有五种,分别与这4种引用类型进行对应。
可达性类型 | 引用类型 | 说明 |
---|---|---|
强可达(Strongly Reachable) | 强引用(Strong Reference) | 如果线程能通过强引用访问到对象,那么这个对象就是强可达的。 |
软可达(Soft Reachable) | 软引用(Soft Reference) | 如果一个对象不是强可达的,但是可以通过软引用访问到,那么这个对象就是软可达的 |
弱可达(Weak Reachable) | 弱引用(Weak Reference) | 如果一个对象不是强可达或者软可达的,但是可以通过弱引用访问到,那么这个对象就是弱可达的。 |
虚可达(Phantom Reachable) | 虚引用(Phantom Reference) | 如果一个对象不是强可达,软可达或者弱可达,并且这个对象已经finalize过了,并且有虚引用指向该对象,那么这个对象就是虚可达的。 |
不可达(Unreachable) | - | 如果一个对象不是强可达,软可达或者弱可达,并且这个对象已经finalize过了,并且有虚引用指向该对象,那么这个对象就是虚可达的。 |
这是可达性的概念,我们可以通过如下示例进一步分析:
在上面这个例子中,A~D,每个对象只存在一个引用,分别是:A-强引用,B-软引用,C-弱引用,D-虚引用,所以他们的可达性为:A-强可达,B-软可达,C-弱可达,D-虚可达。因为E没有存在和GC Root的引用链,所以它是不可达。
再看如下这个更加复杂的例子:
- A依然只有一个强引用,所以A是强可达
- B存在两个引用,强引用和软引用,但是B可以通过强引用访问到,所以B是强可达
- C只能通过弱引用访问到,所以是弱可达
- D存在弱引用和虚引用,所以是弱可达
- E虽然存在F的强引用,但是GC Root无法访问到它,所以它依然是不可达。
这是jvm种的5种可达性。不难看出,jvm主要是根据这些Reference的4种子类,来实现GC面对这些对象不可达的时候的不同处理办法。
3. Reference源码
3.1 核心源码
首先来看Reference源码
/**
* Abstract base class for reference objects. This class defines the
* operations common to all reference objects. Because reference objects are
* implemented in close cooperation with the garbage collector, this class may
* not be subclassed directly.
*
* @author Mark Reinhold
* @since 1.2
*/
注释说,这个抽象类是所有Reference类的基类,定义了所有Reference相关的操作,与GC紧密关联。也就是说GC会根据这些类来做一些特定的处理,直接实现其子类没有意义。什么意思,也就是说,jvm会对这个类及其子类做特殊的处理,jvmGC程序会硬编码识别SoftReference,WeakReference,PhantomReference等这些具体的类,对其reference变量进行特殊对象,才有了不同的引用类型的效果。否则,Reference与普通的类没啥区别。
Reference 主要实现两大核心功能:
- 实现特定的引用类型
- 用户可以对象被回收后得到通知
那么第一个功能在此已经可以很明白了。对于第二个功能,GC如何实现垃圾回收之后发送消息通知呢?很显然,对于GC这种性能要求很高的场景,不能采用传统的消息回调模式。万一再FullGC重消息回调阻塞或者出现性能问题,那么会导致整个JVM挂起。所以,Reference采用了另外一种方式,把被回收的Reference添加到了一个队列中。后续用户根据需要自行从queue中获取。这也解释了为啥软、弱引用提供了两调用方式,可以选择ReferenceQueue一起使用,也可以不用。但是虚引用由于只有通知消息,必须和ReferenceQuene一起使用。
现在查看Reference的源码:
public abstract class Reference<T> {
//会被GC特殊对待
private T referent; /* Treated specially by GC */
//Reference被回收之后会被添加到这个queue
volatile ReferenceQueue<? super T> queue;
/* -- Constructors -- */
//用户只需要特殊的Reference,并不关心GC状态,因此可以不需要ReferenceQueue
Reference(T referent) {
this(referent, null);
}
//构造函数中传入了queue,如果reference被GC回收,则会添加到queue中去
Reference(T referent, ReferenceQueue<? super T> queue) {
this.referent = referent;
this.queue = (queue == null) ? ReferenceQueue.NULL : queue;
}
}
3.2 reference的状态
再Reference中,定义了Reference的状态:
/* A Reference instance is in one of four possible internal states:
*
* Active: Subject to special treatment by the garbage collector. Some
* time after the collector detects that the reachability of the
* referent has changed to the appropriate state, it changes the
* instance's state to either Pending or Inactive, depending upon
* whether or not the instance was registered with a queue when it was
* created. In the former case it also adds the instance to the
* pending-Reference list. Newly-created instances are Active.
*
* Pending: An element of the pending-Reference list, waiting to be
* enqueued by the Reference-handler thread. Unregistered instances
* are never in this state.
*
* Enqueued: An element of the queue with which the instance was
* registered when it was created. When an instance is removed from
* its ReferenceQueue, it is made Inactive. Unregistered instances are
* never in this state.
*
* Inactive: Nothing more to do. Once an instance becomes Inactive its
* state will never change again.
*
* The state is encoded in the queue and next fields as follows:
*
* Active: queue = ReferenceQueue with which instance is registered, or
* ReferenceQueue.NULL if it was not registered with a queue; next =
* null.
*
* Pending: queue = ReferenceQueue with which instance is registered;
* next = this
*
* Enqueued: queue = ReferenceQueue.ENQUEUED; next = Following instance
* in queue, or this if at end of list.
*
* Inactive: queue = ReferenceQueue.NULL; next = this.
*
* With this scheme the collector need only examine the next field in order
* to determine whether a Reference instance requires special treatment: If
* the next field is null then the instance is active; if it is non-null,
* then the collector should treat the instance normally.
*
* To ensure that a concurrent collector can discover active Reference
* objects without interfering with application threads that may apply
* the enqueue() method to those objects, collectors should link
* discovered objects through the discovered field. The discovered
* field is also used for linking Reference objects in the pending list.
*/
大段的英文注释,实际上在学习java源代码的过程中,看懂这些注释往往比源码更加重要,有时候源码只能反应实现的具体过程,但是究竟为什么要真没实现,则在很多源码的注释中有说明。
注释中,将Reference的状态分为4种:
状态 | 说明 |
---|---|
Active | 刚初始化的实例是Active状态,在可达性发生变化之后,由于GC的各种特殊处理,可能会切换为Pendig或者Inactive状态,如果实例创建时注册了referenceQueue,则会切换到Pending状态,并将Reference加入到Pending-Reference队列,如果没有注册ReferenceQueue,则会切换到Inactive状态 |
Pending | 当被加入到Penging-reference链表中的时候的状态,这些Reference等待被加入到ReferenceQueue。如果没有注册ReferenceQueue则永远不会出现这个状态 |
Enqueued | 在ReferenceQueue队列中的Reference的状态,如果从ReferenceQueue中移除,则会进入Inactive状态 |
Inactive | Reference的最终状态,一旦到达Inactive状态则状态不会再发生改变 |
对于这四种状态,Reference的next指针和queue如下:
状态 | queue | next |
---|---|---|
Active | ReferenceQueue or ReferenceQueue.NULL | null |
Pending | ReferenceQueue | this |
Enqueued | ReferenceQueue.ENQUEUED | 队列中的下一个 |
Inactive | ReferenceQueue.NULL | this |
状态图如下:
在上文注释中我们发现有一个Penging-reference链表,还有一个ReferenceQueue。这个链表又是来做什么的呢?常规来说,jvm应该直接将gc后的Referencce加入到ReferenceQueue中即可。但是实际上并不是如此。GC为了保证执行效率,而ReferenceQueue中的数据本身也不需要那么高的时效性,因此,在具体的代码中,jvm的GC操作只把Reference加入到了pending-Reference链表中。这是一个轻量级的操作,效率会非常高。Reference中有一个pending的成员变量,他就是这个pending-Reference链表的头节点。而discoverd 则是指向下一个节点的指针。
我们再看看Reference源码:
/* List of References waiting to be enqueued. The collector adds
* References to this list, while the Reference-handler thread removes
* them. This list is protected by the above lock object. The
* list uses the discovered field to link its elements.
*/
private static Reference<Object> pending = null;
/* When active: next element in a discovered reference list maintained by GC (or this if last)
* pending: next element in the pending list (or null if last)
* otherwise: NULL
*/
transient private Reference<T> discovered; /* used by VM */
GC操作将Active的reference添加到了pending链表中。
3.3 ReferenceHandler
上文中说到GC只将reference添加到了Pending-Reference链表中。何时会被加入到ReferenceQueue中呢?这个过程就需要通过一个独立的线程来运行,这个线程就是ReferenceHandler。它是Reference的一个内部类,同时,为了线程安全,还有一个全局的锁:
/* Object used to synchronize with the garbage collector. The collector
* must acquire this lock at the beginning of each collection cycle. It is
* therefore critical that any code holding this lock complete as quickly
* as possible, allocate no new objects, and avoid calling user code.
*/
//GC在操作过程中需要获取reference的这个锁,与ReferenceHandler线程同步。避免造成线程不安全。
//由于GC也要用到这个锁,因此referenceHandler中的操作必须尽快完成,不生成新的对象,也不调用用户代码。避免对GC过程造成影响。
static private class Lock { }
private static Lock lock = new Lock();
/* High-priority thread to enqueue pending References
*/
private static class ReferenceHandler extends Thread {
private static void ensureClassInitialized(Class<?> clazz) {
try {
Class.forName(clazz.getName(), true, clazz.getClassLoader());
} catch (ClassNotFoundException e) {
throw (Error) new NoClassDefFoundError(e.getMessage()).initCause(e);
}
}
static {
// pre-load and initialize InterruptedException and Cleaner classes
// so that we don't get into trouble later in the run loop if there's
// memory shortage while loading/initializing them lazily.
ensureClassInitialized(InterruptedException.class);
ensureClassInitialized(Cleaner.class);
}
ReferenceHandler(ThreadGroup g, String name) {
super(g, name);
}
public void run() {
while (true) {
tryHandlePending(true);
}
}
}
线程的核心逻辑都在tryHandlePending中:
/**
* Try handle pending {@link Reference} if there is one.<p>
* Return {@code true} as a hint that there might be another
* {@link Reference} pending or {@code false} when there are no more pending
* {@link Reference}s at the moment and the program can do some other
* useful work instead of looping.
*
* @param waitForNotify if {@code true} and there was no pending
* {@link Reference}, wait until notified from VM
* or interrupted; if {@code false}, return immediately
* when there is no pending {@link Reference}.
* @return {@code true} if there was a {@link Reference} pending and it
* was processed, or we waited for notification and either got it
* or thread was interrupted before being notified;
* {@code false} otherwise.
*/
static boolean tryHandlePending(boolean waitForNotify) {
Reference<Object> r;
Cleaner c;
try {
// 获取锁,避免与垃圾回收器同时操作
synchronized (lock) {
//判断pending-Reference链表是否有数据
if (pending != null) {
// 如果有Pending Reference,从列表中取出
r = pending;
// 'instanceof' might throw OutOfMemoryError sometimes
// so do this before un-linking 'r' from the 'pending' chain...
c = r instanceof Cleaner ? (Cleaner) r : null;
// unlink 'r' from 'pending' chain
pending = r.discovered;
r.discovered = null;
} else {
// 如果没有Pending Reference,调用wait等待
//
// wait等待锁,是可能抛出OOME的,
// 因为可能发生InterruptedException异常,然后就需要实例化这个异常对象,
// 如果此时内存不足,就可能抛出OOME,所以这里需要捕获OutOfMemoryError,
// 避免因为OOME而导致ReferenceHandler进程静默退出
// The waiting on the lock may cause an OutOfMemoryError
// because it may try to allocate exception objects.
if (waitForNotify) {
lock.wait();
}
// retry if waited
return waitForNotify;
}
}
} catch (OutOfMemoryError x) {
// Give other threads CPU time so they hopefully drop some live references
// and GC reclaims some space.
// Also prevent CPU intensive spinning in case 'r instanceof Cleaner' above
// persistently throws OOME for some time...
Thread.yield();
// retry
return true;
} catch (InterruptedException x) {
// retry
return true;
}
//调用clean方法
// Fast path for cleaners
if (c != null) {
c.clean();
return true;
}
ReferenceQueue<? super Object> q = r.queue;
//如果ReferenceQueue不为null 则入队
if (q != ReferenceQueue.NULL) q.enqueue(r);
return true;
}
ReferenceHandler则是在线程中的静态代码块中启动的:
static {
ThreadGroup tg = Thread.currentThread().getThreadGroup();
for (ThreadGroup tgn = tg;
tgn != null;
tg = tgn, tgn = tg.getParent());
Thread handler = new ReferenceHandler(tg, "Reference Handler");
/* If there were a special system-only priority greater than
* MAX_PRIORITY, it would be used here
*/
handler.setPriority(Thread.MAX_PRIORITY);
handler.setDaemon(true);
handler.start();
// provide access in SharedSecrets
SharedSecrets.setJavaLangRefAccess(new JavaLangRefAccess() {
@Override
public boolean tryHandlePendingReference() {
return tryHandlePending(false);
}
});
}
可以看出,ReferenceHandler设置了Thread.MAX_PRIORITY 最高优先级。主要逻辑是将Pending-reference链表中的Reference添加到ReferenceUqeue。需要注意的是,为了不与GC冲突,ReferenceHandler不生成新的对象,也不调用用户代码。避免对GC过程造成影响。
4. ReferenceQueue
我们再来看看ReferenceQueue的源码。
/**
* Reference queues, to which registered reference objects are appended by the
* garbage collector after the appropriate reachability changes are detected.
*
* @author Mark Reinhold
* @since 1.2
*/
Reference queues 在注册queue之后,将GC之后的Reference放到这个队列中。其本身也是一个链表。
// 引用链表的头节点
private volatile Reference<? extends T> head = null;
// 引用队列长度,入队则增加1,出队则减少1
private long queueLength = 0;
为了在多线程下运行,同样也实现了锁:
// 静态内部类,作为锁对象
static private class Lock { };
/* 互斥锁,用于同步ReferenceHandler的enqueue和用户线程操作的remove和poll出队操作 */
private Lock lock = new Lock();
// 用于标识没有注册Queue
static ReferenceQueue<Object> NULL = new Null<>();
// 用于标识已经处于对应的Queue中
static ReferenceQueue<Object> ENQUEUED = new Null<>();
重点是入队的方法enqueue:
boolean enqueue(Reference<? extends T> r) { /* Called only by Reference class */
//获得锁
synchronized (lock) {
//判断是否需要入队
// Check that since getting the lock this reference hasn't already been
// enqueued (and even then removed)
ReferenceQueue<?> queue = r.queue;
// 如果引用实例持有的队列为ReferenceQueue.NULL或者ReferenceQueue.ENQUEUED则入队失败返回false
if ((queue == NULL) || (queue == ENQUEUED)) {
return false;
}
assert queue == this;
//入队之后 设置为ENQUEUED 将Reference绑定只queue改为new一个新的Enqueue队列,避免循环引用
r.queue = ENQUEUED;
// 如果链表没有元素,则此引用实例直接作为头节点,否则把前一个引用实例作为下一个节点
r.next = (head == null) ? r : head;
// 当前实例更新为头节点,也就是每一个新入队的引用实例都是作为头节点,已有的引用实例会作为后继节点
head = r;
// 队列长度增加1
queueLength++;
// 特殊处理FinalReference,VM进行计数
if (r instanceof FinalReference) {
sun.misc.VM.addFinalRefCount(1);
}
lock.notifyAll();
return true;
}
}
poll 方法和reallypoll方法:
// 引用队列的poll操作,此方法必须在加锁情况下调用
private Reference<? extends T> reallyPoll() { /* Must hold lock */
Reference<? extends T> r = head;
if (r != null) {
@SuppressWarnings("unchecked")
Reference<? extends T> rn = r.next;
// 更新next节点为头节点,如果next节点为自身,说明已经走过一次出队,则返回null
head = (rn == r) ? null : rn;
r.queue = NULL;
// 当前头节点变更为环状队列,考虑到FinalReference尚为inactive和避免重复出队的问题
r.next = r;
// 队列长度减少1
queueLength--;
if (r instanceof FinalReference) {
sun.misc.VM.addFinalRefCount(-1);
}
return r;
}
return null;
}
// 队列的公有poll操作,主要是加锁后调用reallyPoll
public Reference<? extends T> poll() {
if (head == null)
return null;
synchronized (lock) {
return reallyPoll();
}
}
移除引用队列中的下一个引用元素的remove方法:
// 移除引用队列中的下一个引用元素,实际上也是依赖于reallyPoll的Object提供的阻塞机制
public Reference<? extends T> remove(long timeout)
throws IllegalArgumentException, InterruptedException
{
if (timeout < 0) {
throw new IllegalArgumentException("Negative timeout value");
}
synchronized (lock) {
Reference<? extends T> r = reallyPoll();
if (r != null) return r;
long start = (timeout == 0) ? 0 : System.nanoTime();
for (;;) {
lock.wait(timeout);
r = reallyPoll();
if (r != null) return r;
if (timeout != 0) {
long end = System.nanoTime();
timeout -= (end - start) / 1000_000;
if (timeout <= 0) return null;
start = end;
}
}
}
}
不难看出,实际上ReferenceQueue只存储了Reference链表的头节点,真正的Reference链表的所有节点是存储在Reference实例本身,通过属性 next 拼接的,ReferenceQueue提供了对Reference链表的入队、poll、remove等操作。
Reference与ReferenceQueue的完整关系如下图:
5.其他Reference源码
5.1 SoftReference
SoftReference的实现很简单,继承Reference之后,只是增加了一个时间戳。
/**
* Timestamp clock, updated by the garbage collector
*/
static private long clock;
/**
* Timestamp updated by each invocation of the get method. The VM may use
* this field when selecting soft references to be cleared, but it is not
* required to do so.
*/
private long timestamp;
在SoftReference中,有一个全局的变量clock(实际上就是java.lang.ref.SoftReference的类变量clock,其保持了最后一次GC的时间点(以毫秒为单位),即每一次GC发生时,该值均会被重新设置。 同时,java.lang.ref.SoftReference对象实例均有一个timestamp的属性,其被设置为最后一次成功通过SoftReference对象获取其引用对象时的clock的值(最后一次GC)。所以,java.lang.ref.SoftReference对象实例的timestamp属性,保持的是这个对象被访问时的最后一次GC的时间戳。
get 方法如下:
/**
* Returns this reference object's referent. If this reference object has
* been cleared, either by the program or by the garbage collector, then
* this method returns <code>null</code>.
*
* @return The object to which this reference refers, or
* <code>null</code> if this reference object has been cleared
*/
public T get() {
T o = super.get();
if (o != null && this.timestamp != clock)
this.timestamp = clock;
return o;
}
在每次调用get的过程中,实际上只是修改了这个时间戳的值。GC每次调用会同时修改clock和timestamp。这样就可以计算出这个softReference有多久没访问。之后决定要不要将其删除。
当GC发生时,以下两个因素影响SoftReference引用的对象是否被回收:
1、SoftReference 对象实例的timestamp有多旧;
2、内存空闲空间的大小。
具体回收过程本文不做详细展开。
5.2 WeakReference
weakReference中只有构造方法,其他方法全部继承Reference构造方法。
/**
* Creates a new weak reference that refers to the given object. The new
* reference is not registered with any queue.
*
* @param referent object the new weak reference will refer to
*/
public WeakReference(T referent) {
super(referent);
}
/**
* Creates a new weak reference that refers to the given object and is
* registered with the given queue.
*
* @param referent object the new weak reference will refer to
* @param q the queue with which the reference is to be registered,
* or <tt>null</tt> if registration is not required
*/
public WeakReference(T referent, ReferenceQueue<? super T> q) {
super(referent, q);
}
5.3 PhantomReference
PhantomReference 只有一个带ReferenceQueue的构造方法。在使用的时候必须和ReferenceQueue配合一起使用。
/**
* Creates a new phantom reference that refers to the given object and
* is registered with the given queue.
*
* <p> It is possible to create a phantom reference with a <tt>null</tt>
* queue, but such a reference is completely useless: Its <tt>get</tt>
* method will always return null and, since it does not have a queue, it
* will never be enqueued.
*
* @param referent the object the new phantom reference will refer to
* @param q the queue with which the reference is to be registered,
* or <tt>null</tt> if registration is not required
*/
public PhantomReference(T referent, ReferenceQueue<? super T> q) {
super(referent, q);
}
由此不难发现PhantomReference和weakReference在代码层面只有一个构造方法的差异。
关于Finalizer和FinaReference将在后面专门介绍。
本文参考:
JDK源码阅读-Reference
阿里面试: 说说强引用、软引用、弱引用、虚引用吧