一本流水帐

2021年11月19日星期五

联通HN8346X6光猫简单折腾

联通送的HN8346X6还不错，自带WIFI6据网上评测还可以，但是开桥接，端口映射等稍微复杂一点的功能联通版界面就一概欠奉了，为了找回这些光猫原本就有的功能，尝试折腾了一下。

首先想的是能不能找到管理员密码，根据网上的方法，首先得补全shell，一通搜索之后，终于找到一个人提供了一个改shell的方法：

https://www.right.com.cn/forum/thread-4060870-1-1.html

断开光猫光纤，有线连上任意端口，通过最新版的ONT工具，选择升级选项，加载补全shell的文件（应该只是一个配置文件，所以非常小），等待成功即可。

之后就可以telnet，账号root，密码adminHW，登录成功之后su，通过华为密码计算器算出challenge密码，成功之后输入shell即可进入busybox的shell。

虽然拿到了root权限，也找到了hw_ctree.xml并且解密成功，但是发现这个文件里根本就没有管理员配置项，尝试自己添加了一个，但是在光猫登录页就是不让登陆，看了一下其实是JavaScript代码做的检查，但是本人不是前端工程师，懒得折腾这个前端代码了，还是直接刷华为模式吧。所以就执行光猫自带的restorehwmode.sh，恢复了华为界面。

特别提醒恢复华为模式貌似无法再回到联通界面，所以刷之前一定需要记住INTERNET，IPTV，组播VLAN之类的配置信息，要不然就只能网上找同一地区的信息了。

恢复华为界面之后，用华为默认账号telecomadmin密码admintelecom即可登陆网页界面，剩下的就没啥好说的了。

相关工具分享：https://115.com/s/swnenir3zb5 访问码：b2d1

2015年4月1日星期三

Hidden command options of ExpressCache (OEM version)

My Lenovo ThinkPad comes with a 16GB SSD installed as cache, then I install another 128G M.2 SSD as primary OS drive, after re-installing Windows, the cache software is gone, so that the 16GB SSD becomes useless.

Fortunately, the cache software is available for download from Lenovo's website. So that I downloaded it and installed it, while I realize this cache should be used for the hard drive only, not for the SSD. While looks like the eccmd provided only very limited options:


ExpressCache Command Version 1.3.110.0
Copyright?2010-2013 Condusiv Technologies.
Date Time: 1/21/2015 20:27:16:729 (THINKPAD #2)

USAGE:

 ECCmd [-NOLOGO]
       [-INFO | -PARTITION | -FORMAT]

 -NOLOGO              - No copyright message will be displayed.
 -INFO                - Display ExpressCache activity information.
 -PARTITION           - Create an ExpressCache partition.
         [DriveId]    - Optional drive ID
         [PartSize]   - Optional partition size in MB
 -FORMAT              - Format the cache volume.

After searching Google, I found there are some options in previous version:


 -EXCLUDE             - Exclude a drive from being cached.
         DriveLetter  - Drive letter
 -CLEAREXCLUSIONS     - Clear all cache exclusions.
 -PRELOAD             - Preload a folder or a file into the cache
         FileName     - File or folder name to be cached
         [UsageCount] - Optional file usage count.

This is just what I need, keeping here since there are very few pages mentioned those hidden options.

2015年2月9日星期一

偶然读到LMAX Disruptor

因为一个偶然的机会读到LMAX Disruptor，这个项目里最核心的那个Ring Buffer真是一个巧妙的解决方案。当年在英特尔的时候也需要解决类似的问题，可惜还没有到需要解决这么极致的性能问题的时候项目就夭折了。

2015年1月6日星期二

sun.nio.ch.Util managed direct buffer and thread reusing

Background

As I mentioned in previous blog, there is a memory fragmentation problem within our application, and tunning the JVM parameters can resolve the problem, but it doesn't answer the question who is creating so many direct buffers and use them for short time? Because it's against the recommendation of ByteBuffer.allocateDirect():

It is therefore recommended that direct buffers be allocated primarily for large, long-lived buffers that are subject to the underlying system's native I/O operations.

Analysis

To answer this question, I created a simple program to monitor the caller of ByteBuffer.allocateDirect(int) through JDI, and got following output (some stack are omitted):


java.nio.ByteBuffer.allocateDirect(int): count=124, size=13463718
  sun.nio.ch.Util.getTemporaryDirectBuffer(int): count=124, size=13463718
    sun.nio.ch.IOUtil.write(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher): count=1, size=447850
      sun.nio.ch.SocketChannelImpl.write(java.nio.ByteBuffer): count=1, size=447850
        org.eclipse.jetty.io.ChannelEndPoint.flush(java.nio.ByteBuffer[]): count=1, size=447850
    sun.nio.ch.IOUtil.write(java.io.FileDescriptor, java.nio.ByteBuffer[], int, int, sun.nio.ch.NativeDispatcher): count=102, size=12819260
      sun.nio.ch.SocketChannelImpl.write(java.nio.ByteBuffer[], int, int): count=102, size=12819260
        org.eclipse.jetty.io.ChannelEndPoint.flush(java.nio.ByteBuffer[]): count=102, size=12819260
    sun.nio.ch.IOUtil.read(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher): count=21, size=196608
      sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer): count=21, size=196608
        org.eclipse.jetty.io.ChannelEndPoint.fill(java.nio.ByteBuffer): count=21, size=196608

The count is the number of direct buffers, and the size is the total size of buffers. According the monitoring result, those direct buffers are all allocated by JDK default implementation of SocketChannel, when the user pass a non-direct buffer to perform I/O. After read the source code of sun.nio.ch.Util, I found it uses a thread local cache to keep the direct buffer. While the sun.nio.ch.Util is designed carefully to limit the total number of cached buffers (8 per thread), and clean direct buffer while it's removed from the cache. The only possible reason is there are too many new threads, and the buffers in cache are not used at all.

Root Cause

Our application uses Jetty to handle HTTP requests, and there is a custom class wraps Jetty Server, in this wrapper class, a ThreadPoolExecutor is used like following:


ExecutorService executor = new ThreadPoolExecutor(
 minThreads,
 maxThreads,
 maxIdleTimeMs,
 TimeUnit.MILLISECONDS,
 new SynchronousQueue());
Server server = new Server(new ExecutorThreadPool(executor));

Unfortunately, the minThreads is 2 and maxIdleTimeMs is 5000ms, and the Jetty Server will use 12 threads from the pool for accepting and selector. Which means the total number of threads is always bigger than the minThreads. When the service is not very busy, a worker thread will be discarded, and a new worker thread will be created when a new request comes. In this situation, the cached buffers in sun.nio.ch.Util will no longer be used and will only been collected when GC triggered.

2015年1月2日星期五

Understanding about CMSInitiatingOccupancyFraction and UseCMSInitiatingOccupancyOnly

While reading the Useful JVM Flags – Part 7 (CMS Collector), I was impressed that CMSInitiatingOccupancyFraction was useless when UseCMSInitiatingOccupancyOnly is false (default) except the first CMS collection:

We can use the flag -XX+UseCMSInitiatingOccupancyOnly to instruct the JVM not to base its decision when to start a CMS cycle on run time statistics. Instead, when this flag is enabled, the JVM uses the value of CMSInitiatingOccupancyFraction for every CMS cycle, not just for the first one.

After checking the source code, I found this statement is inaccurate, a more accurate statement would be:
When UseCMSInitiatingOccupancyOnly is false (default), a CMS collection may be triggered even the actual occupancy is smaller than the specified CMSInitiatingOccupancyFraction value. In other words, when actual occupancy is greater than the specified CMSInitiatingOccupancyFraction value, a CMS collection will be triggered.

Detail Explanation

Code snippet from OpenJDK (openjdk/hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp):


  // If the estimated time to complete a cms collection (cms_duration())
  // is less than the estimated time remaining until the cms generation
  // is full, start a collection.
  if (!UseCMSInitiatingOccupancyOnly) {
    if (stats().valid()) {
      if (stats().time_until_cms_start() == 0.0) {
        return true;
      }
    } else {
      // We want to conservatively collect somewhat early in order
      // to try and "bootstrap" our CMS/promotion statistics;
      // this branch will not fire after the first successful CMS
      // collection because the stats should then be valid.
      if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
        if (Verbose && PrintGCDetails) {
          gclog_or_tty->print_cr(
            " CMSCollector: collect for bootstrapping statistics:"
            " occupancy = %f, boot occupancy = %f", _cmsGen->occupancy(),
            _bootstrap_occupancy);
        }
        return true;
      }
    }
  }

  // Otherwise, we start a collection cycle if either the perm gen or
  // old gen want a collection cycle started. Each may use
  // an appropriate criterion for making this decision.
  // XXX We need to make sure that the gen expansion
  // criterion dovetails well with this. XXX NEED TO FIX THIS
  if (_cmsGen->should_concurrent_collect()) {
    if (Verbose && PrintGCDetails) {
      gclog_or_tty->print_cr("CMS old gen initiated");
    }
    return true;
  }

In above code, the _cmsGen->should_concurrent_collect() is always been called, unless it's already determined that a collection is needed. In the implementation of _cmsGen->should_concurrent_collect(), the CMSInitiatingOccupancyFraction value is checked at beginning.


bool ConcurrentMarkSweepGeneration::should_concurrent_collect() const {

  assert_lock_strong(freelistLock());
  if (occupancy() > initiating_occupancy()) {
    if (PrintGCDetails && Verbose) {
      gclog_or_tty->print(" %s: collect because of occupancy %f / %f  ",
        short_name(), occupancy(), initiating_occupancy());
    }
    return true;
  }
  if (UseCMSInitiatingOccupancyOnly) {
    return false;
  }
  if (expansion_cause() == CMSExpansionCause::_satisfy_allocation) {
    if (PrintGCDetails && Verbose) {
      gclog_or_tty->print(" %s: collect because expanded for allocation ",
        short_name());
    }
    return true;
  }
  if (_cmsSpace->should_concurrent_collect()) {
    if (PrintGCDetails && Verbose) {
      gclog_or_tty->print(" %s: collect because cmsSpace says so ",
        short_name());
    }
    return true;
  }
  return false;
}

From the above code, it's easy to find out that CMSBootstrapOccupancy is been used for first collection if UseCMSInitiatingOccupancyOnly is false.

Summary

The UseCMSInitiatingOccupancyOnly need to be set to true only if you want to avoid the early collection before occupancy reaches the specified value. Looks it's not the case when CMSInitiatingOccupancyFraction is set to a small value. For example you application allocated direct buffers frequently and you may want to collect garbage even the old generation utilization is quite low.

2014年12月31日星期三

Java RSS increased by memory fragmentation

Recently, I found a strange memory related problem with our product system, that the RSS (resident set size) increased over time. The Java heap utilization is less than 50%, looks like there could be a native memory leak, while it turns out something else.

Leaking Direct Buffer?

Direct Buffer is one of the potential native memory leak causes, so first checked the Direct Buffer with the tool from Alan Bateman's blog. It shows the direct buffers as following:

          direct                        mapped
 Count   Capacity     Memory   Count   Capacity     Memory
   419  123242031  123242031       0          0          0
   419  123242031  123242031       0          0          0
   421  123299674  123299674       0          0          0

There is no strong evidence about that it's caused by direct buffer.

Per-thread malloc?

While checking the memory usage of the java process with pmap, I found some strange 64MB memory blocks, similar as described in Lex Chou's blog (Chinese). So that I tried to set the MALLOC_ARENA_MAX environment variable. Unfortunately, the problem is still not resolved.

Native Heap Fragmentation?

With further investigation, I found this problem could be caused by memory fragmentation, as described in this bug report.The malloc() implementation works fine for general applications, while it's not able/necessary to support all kinds of applications.
By using gdb, I found the real evidence:

gdb --pid <pid>
(gdb) call malloc_stats()

And got following output:

Arena 0:
system bytes     = 2338504704
in use bytes     =   69503376
Arena 1:
system bytes     =   48705536
in use bytes     =   19162544
Arena 2:
system bytes     =     806912
in use bytes     =     341776
Arena 3:
system bytes     =   17965056
in use bytes     =   17505488
Total (incl. mmap):
system bytes     = 2444173312
in use bytes     =  144704288
max mmap regions =         59
max mmap bytes   =  154546176

So there are about 2.4GB memory been allocated from system, but only used about 144MB. This is a strong indicator of problem, so that I set MALLOC_MMAP_THRESHOLD_ to 131072, and monitor the result. Seems the RSS could draw down after long running, but it still raised too high (9G).

Conclusion

After monitoring the application for long time, the actual problem is complicated and caused by multiple problems. First, the heap fragmentation is the major contributor of this problem, second, this application creates lots of transient objects, and some direct byte buffers are kept for little longer time. Which means those byte buffers are moved to old generation because of frequent young GC. After that there is very few GC in old generation since it's not full. So that those byte buffers are not garbage collected.
To resolve this problem, a small CMSInitiatingOccupancyFraction is used together with UseCMSInitiatingOccupancyOnly option. then the total RSS looks quite stable now.

2008年12月24日星期三

万年历

年末休假中，总想找点什么事情做做，lp说要买个挂历，我说我给你做吧，因为很早以前就见过别人写过很不错的万年历，觉着挺有意思的，曾经试图仿着写一个Java版本的农历，不过一直都没有做起来。

既然有几天假期，于是又开始蠢蠢欲动了，于是上网去找相关的资料，看看有没有现成的咚咚，我可不想重新发明轮子。

网上有很多的万年历，但是真正作者却很少，因为很多人都是抄来抄去。之前我看过一个叫做知来者的万年历，写得很好，精确度很高，还是难得的开放源代码的程序，但是作者却神龙不见首尾，连个联系方式都失效了。这次又找到了他的一个blog，虽然也有一年多没更新了，不过还是知道了两点，一是他还是移动万年历MobCal的真正作者，二是他的email地址，虽然我没有和他打过交道，但是就目前国内的IT圈子里，一个认真做事的开放源代码程序员是值得敬佩的。

另外就是很顺利地在一个农历论坛找到了一个很不错JavaScript版本的叫做寿星万年历，作者许剑伟，似乎是福建莆田十中的一个中学教师。这个程序算法是基于天文算法做出来的历法，准确性很高，还能计算日月食等等天文相关的数据。
仔细读了一下这位许老师的一些帖子发现，他是一个非常能钻研的人，做这个程序之前，他并没有很多天文历法方面的知识，然而通过一段时间的努力，他阅读了大量的天文资料，做出了实实在在的成果，并且翻译了《天文算法》这本书，这一切，都是在一年之内完成的。

回想一下自己在这一年，没有什么长进，一直都以很忙做为借口，想做的事却一直没有动手去做，实在是惭愧。“做到”这两个字，是当前的我最需要注意的。

订阅：评论 (Atom)