显示标签为“技术”的博文。显示所有博文
显示标签为“技术”的博文。显示所有博文

2015年1月2日星期五

Understanding about CMSInitiatingOccupancyFraction and UseCMSInitiatingOccupancyOnly

While reading the Useful JVM Flags – Part 7 (CMS Collector), I was impressed that CMSInitiatingOccupancyFraction was useless when UseCMSInitiatingOccupancyOnly is false (default) except the first CMS collection:
We can use the flag -XX+UseCMSInitiatingOccupancyOnly to instruct the JVM not to base its decision when to start a CMS cycle on run time statistics. Instead, when this flag is enabled, the JVM uses the value of CMSInitiatingOccupancyFraction for every CMS cycle, not just for the first one.
After checking the source code, I found this statement is inaccurate, a more accurate statement would be:
When UseCMSInitiatingOccupancyOnly is false (default), a CMS collection may be triggered even the actual occupancy is smaller than the specified CMSInitiatingOccupancyFraction value. In other words, when actual occupancy is greater than the specified CMSInitiatingOccupancyFraction value, a CMS collection will be triggered.

Detail Explanation

Code snippet from OpenJDK (openjdk/hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp):

  // If the estimated time to complete a cms collection (cms_duration())
  // is less than the estimated time remaining until the cms generation
  // is full, start a collection.
  if (!UseCMSInitiatingOccupancyOnly) {
    if (stats().valid()) {
      if (stats().time_until_cms_start() == 0.0) {
        return true;
      }
    } else {
      // We want to conservatively collect somewhat early in order
      // to try and "bootstrap" our CMS/promotion statistics;
      // this branch will not fire after the first successful CMS
      // collection because the stats should then be valid.
      if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
        if (Verbose && PrintGCDetails) {
          gclog_or_tty->print_cr(
            " CMSCollector: collect for bootstrapping statistics:"
            " occupancy = %f, boot occupancy = %f", _cmsGen->occupancy(),
            _bootstrap_occupancy);
        }
        return true;
      }
    }
  }

  // Otherwise, we start a collection cycle if either the perm gen or
  // old gen want a collection cycle started. Each may use
  // an appropriate criterion for making this decision.
  // XXX We need to make sure that the gen expansion
  // criterion dovetails well with this. XXX NEED TO FIX THIS
  if (_cmsGen->should_concurrent_collect()) {
    if (Verbose && PrintGCDetails) {
      gclog_or_tty->print_cr("CMS old gen initiated");
    }
    return true;
  }
In above code, the _cmsGen->should_concurrent_collect() is always been called, unless it's already determined that a collection is needed. In the implementation of _cmsGen->should_concurrent_collect(), the CMSInitiatingOccupancyFraction value is checked at beginning.

bool ConcurrentMarkSweepGeneration::should_concurrent_collect() const {

  assert_lock_strong(freelistLock());
  if (occupancy() > initiating_occupancy()) {
    if (PrintGCDetails && Verbose) {
      gclog_or_tty->print(" %s: collect because of occupancy %f / %f  ",
        short_name(), occupancy(), initiating_occupancy());
    }
    return true;
  }
  if (UseCMSInitiatingOccupancyOnly) {
    return false;
  }
  if (expansion_cause() == CMSExpansionCause::_satisfy_allocation) {
    if (PrintGCDetails && Verbose) {
      gclog_or_tty->print(" %s: collect because expanded for allocation ",
        short_name());
    }
    return true;
  }
  if (_cmsSpace->should_concurrent_collect()) {
    if (PrintGCDetails && Verbose) {
      gclog_or_tty->print(" %s: collect because cmsSpace says so ",
        short_name());
    }
    return true;
  }
  return false;
}
From the above code, it's easy to find out that CMSBootstrapOccupancy is been used for first collection if UseCMSInitiatingOccupancyOnly is false.

Summary

The UseCMSInitiatingOccupancyOnly need to be set to true only if you want to avoid the early collection before occupancy reaches the specified value. Looks it's not the case when CMSInitiatingOccupancyFraction is set to a small value. For example you application allocated direct buffers frequently and you may want to collect garbage even the old generation utilization is quite low.

2014年12月31日星期三

Java RSS increased by memory fragmentation

Recently, I found a strange memory related problem with our product system, that the RSS (resident set size) increased over time. The Java heap utilization is less than 50%, looks like there could be a native memory leak, while it turns out something else.

Leaking Direct Buffer?

Direct Buffer is one of the potential native memory leak causes, so first  checked the Direct Buffer with the tool from Alan Bateman's blog. It shows the direct buffers as following:
          direct                        mapped
 Count   Capacity     Memory   Count   Capacity     Memory
   419  123242031  123242031       0          0          0
   419  123242031  123242031       0          0          0
   421  123299674  123299674       0          0          0
There is no strong evidence about that it's caused by direct buffer.

Per-thread malloc?

While checking the memory usage of the java process with pmap, I found some strange 64MB memory blocks, similar as described in Lex Chou's blog (Chinese). So that I tried to set the MALLOC_ARENA_MAX environment variable. Unfortunately, the problem is still not resolved.

Native Heap Fragmentation?

With further investigation, I found this problem could be caused by memory fragmentation, as described in this bug report.The malloc() implementation works fine for general applications, while it's not able/necessary to support all kinds of applications.
By using gdb, I found the real evidence:

gdb --pid <pid>
(gdb) call malloc_stats()
And got following output:

Arena 0:
system bytes     = 2338504704
in use bytes     =   69503376
Arena 1:
system bytes     =   48705536
in use bytes     =   19162544
Arena 2:
system bytes     =     806912
in use bytes     =     341776
Arena 3:
system bytes     =   17965056
in use bytes     =   17505488
Total (incl. mmap):
system bytes     = 2444173312
in use bytes     =  144704288
max mmap regions =         59
max mmap bytes   =  154546176
So there are about 2.4GB memory been allocated from system, but only used about 144MB. This is a strong indicator of problem, so that I set MALLOC_MMAP_THRESHOLD_ to 131072, and monitor the result. Seems the RSS could draw down after long running, but it still raised too high (9G).

Conclusion

After monitoring the application for long time, the actual problem is  complicated and caused by multiple problems. First, the heap fragmentation is the major contributor of this problem, second, this application creates lots of transient objects, and some direct byte buffers are kept for little longer time. Which means those byte buffers are moved to old generation because of frequent young GC. After that there is very few GC in old generation since it's not full. So that those byte buffers are not garbage collected.
To resolve this problem, a small CMSInitiatingOccupancyFraction is used together with UseCMSInitiatingOccupancyOnly option. then the total RSS looks quite stable now.

2008年12月11日星期四

关于字节流的一个争论(一)

最近一段时间,和一个英国人争论了很久,其实来回也没几封邮件,但是对方每次回都要隔一两天,所以就拖了很久,而且最终也没有解决,看来我的影响力(Influence)还是很需要提高的。

问题其实很简单,我们的代码里需要序列化一个数据对象,序列化前它是很多个字节数组(byte array),这个是因为这个对象类似ByteArrayOutputStream,它是慢慢变大的,一开始的时候我们并不知道它有多大,如果用单个数组,在变大的过程中需要不断申请新的大块内存,可能会导致内存不够的错误。因为我们关心的是数据,反序列化之后则只需要恢复成一个数组就好了。

所以我们的主要逻辑就类似于下面这样的伪代码:
class ValueObject {
private transient List<byte[]> values;
private transient byte[] data;
private void writeObject(ObjectOutputStream output) {
output.writeInt(totalLength); // write out the total length of bytes
for (byte[] value : values) {
output.write(value, 0, value.length);
}
}

private void readObject(ObjectInputStream input) {
int length = input.readInt();
data = new byte[length];
input.read(data, 0, length);
}
}

换句话说,我们的主要想法就是写的时候分开写,读的时候一次读,这个逻辑本来是没有问题的,因为Java里面的流(stream)指的就是字节流,比如OutputStream里面明确申明:
This abstract class is the superclass of all classes representing an output stream of bytes.
既然是流,我怎么往里面写数据,另外一头怎么读应该任意,只要我读写的总字节数对上了就OK,而且大部分I/O类也都是这么实现的,所以这段代码一直都能很好地工作,似乎从来没有问题。

然而,我们还是碰到了一个问题,当我们的对象通过RMI-IIOP传输的时候,竟然导致了一个错误。经过测试,发现了一个奇怪的现象,那就是IIOP的I/O流和普通的流的行为是不一致的,具体就是当数组比较大的时候,当你写出一个字节数组,那么对应地必须读一个字节数组,写两次就得读两次,而且这个问题只在字节数组比较大的时候才会出现问题,小数组是没有问题的。

为什么会有这么奇怪的问题呢?仔细读过这方面的实现代码CDRInputStream后发现,IIOP是把数据分区块(block)传输的,比如这一头做了很多次写,如果数据不多的话(小于等于区块大小),但是传输是一个区块过去的,而对于字节数组,如果大于区块大小,就会是一个单独的区块。问题在于,区块是有边界标志的,读的时候,应该检查边界标志,事实上CDRInputStream也是这么做了,然而不幸的是,当用户读一个字节数组的时候,它假定用户指定的长度刚好就是区块的大小,并不判断是否超出当前区块,问题就出现了。

IT民工看到这儿,基本上立刻意识到,这不就是一个简单的错误么?检查一下边界就好了。于是我立刻给我们公司负责JDK ORB的组开了一个bug,要求他们解决这个问题,我以为他们应该很容易改掉,可是没有想到,这个问题竟然也成了一个扯皮的问题,最后竟然没法解决了!

具体过程有点复杂,下次再写。