JGroups setting and tuning

网友投稿 594 2022-10-30


JGroups setting and tuning

UDP configuration:

UDP protocol stack will bundle message into large ones and send them together to reduce network overhead. Relevant configurations are:

bundler_type:   old  -> DefaultBundler   use a TimerScheduler to send the message.  new ->TransferQueueBundler   use an internal LinkedBlockingQueue to hold the message  Add message to queue unless total message size reaches max_bundle_size or queue is 90% full (bundler_capacity)max_bundle_timeout  useful only for the DefaultBundler, this is the scheduling delay mcast_send_buf_size, mcast_recv_buf_size, ucast_send_buf_size, ucast_recv_buf_size   unicast or multicast socket sender/receiver buffer size.timer_type, timer_min_threads, timer_max_threads, timer.keep_alive_time,  timer.queue_max_size  Timer thread pool is used to perform scheduled task, like bundlingthread_pool.min_threads, thread_pool.max_threads, thread_pool.keep_alive_time, thread_pool.queue_enabled, thread_pool.queue_max_size, thread_pool.rejection_policy    Thread pool is used to handle received batch of message and execute BatchHandlerenable_batching:  directly pass a batch of messages up, instead of processing them one by one, default to true

Other protocol stacks:

FD: failure detectionpbcast.NAKACK2, UNICAST:  message reliability, use an internal sliding window to ensure that messages are delivered in orderSTABLE: message stability, ensures that message are seen to every member in the cluster, periodically broadcast latest stable message

Monitoring:

java  -cp  jgroups-xxx.jar   -Djava.net.preferIPv4Stack=true  org.jgroups.tests.Probe   jmx=[Protocol name]

You can view  num_msgs_sent, num_msgs_received, num_bytes_sent, num_bytes_received, num_rejected_msgs,  num_oob_msgs_received etc

JGroups optimization for invalidation usage:

the JGroups config shipped by default is simply taken directly from the JGroups documentation. It does not take into account our requirements, or improvements in JGroups and provides functionality we do not need.Flow Control: We use MFC and UFC and by default allow them 4M credits. This seems to adversely affect the performance of import and catalog sync, as the system soon blocks waiting for more credits. Disabling Flow Control floods the network and quickly causes problems. Testing with 200M found that CPU utilisation was able to climb to 100%, but then timeouts on cluster sync and heartbeat messages caused the importing node to be evicted from the cluster, also causing problems. Testing with *40M* produced much increased performance over 4M and no errors reported. Recommendation:BARRIER and pbcast.STATE_TRANSFER: Not useful for invalidation usage (order is not important) so they can be removed.MERGE: We use MERGE2, more recent MERGE3 should be used as it uses a more efficient algorithm.FRAG2: We set 60K by default, but this can be optimised by using the network's max frame size.UDP: Set your send and receive buffer sizes to match your operating system settings.Set your timer type to "new3"Ensure thread_pool.enabled="true" and thread_pool.queue_enabled="true" are set. Make your queue massive, 1000000 is fine, or more. Increase number of threads to 2xCores on box.Bundling. This is very important and max_bundle_timeout="5" or less should be set, as 30 seconds is a very long time between invalidation messages. max_bundle_size should be slightly bigger than your FRAG2 frag_size setting.

To simulate 2 network interfaces on my centos virtual server I have defined an alias for my eth0 interface. In a real customer environment skip this step assuming you already have the dedicated network interface for cluster messages configured.sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0:0#I have pasted the lines below. Make sure you change these settings according to your local network settings:DEVICE=eth0:0NETWORK=10.5.0.0NETMASK=255.255.255.0IPADDR=10.5.0.246#end of ifcfg-etg0:0 settingssudo ifup eth0:0ifconfig #verify you have an eth0:0 interfaceJGROUPS UDP Setting:mcast_addr="230.0.0.1"bind_addr="10.5.0.246" Sometimes multicast traffic choses to use IPV6 network interfaces even if we specifically bind by an IPV4 address. To make sure the JVM uses ipv4 add the following settings -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Addresses=trueVerification:netstat -ngIPv6/IPv4 Group MembershipsInterface       RefCnt Group--------------- ------ ---------------------lo              1      224.0.75.75lo              1      224.0.0.1eth0            1      224.0.75.75eth0            1      230.0.0.1eth0            1      224.0.0.1lo              1      ff02::1eth0            1      ff02::202eth0            1      ff02::1:ffa4:7c61eth0            1      ff02::1eth1            1      ff02::1netstat -anp | grep 230.0.0.1(Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.)udp        0      0 230.0.0.1:9997              0.0.0.0:*                               3761/java     Server console log:GMS: address=node-1, cluster=broadcast, physical address=10.5.0.246:38815Or netstat -an | grep udp| grep  10.5.0.246udp        0      0 10.5.0.246:38815            0.0.0.0:*   sudo tcpdump -ni eth0:0 udp port 9997 tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on eth0:0, link-type EN10MB (Ethernet), capture size 65535 bytes16:05:11.457605 IP 10.5.0.246.palace-6 > 230.0.0.1.palace-6: UDP, length 7816:05:11.963756 IP 10.5.0.246.palace-6 > 230.0.0.1.palace-6: UDP, length 7816:05:13.184506 IP 10.5.0.246.palace-6 > 230.0.0.1.palace-6: UDP, length 91In case of any issues, you would also want to make sure that the jgroups settings: receive_on_all_interfaces and send_on_all_intefaces are set to false.  By default they should be, at least the receive_on_all_interfaces based on jgroups documentation.Other jgroups settings that could be explored are  receive_interfaces and send_interfaces. Not sure if for a large environment, for performance reasons one would want to further separate the receive and send traffic on their own interfaces. For more strict control monitoring or troubleshooting: bind_port and port_range could also be explored. Besides the multicast traffic jgroups also uses unicast sockets for node to node communication. The physical address of the node is the unicast socket. If a network filter blocks the communication on these sockets, the cluster cannot form and you'd get error in the log similar to these. [OOB-434,broadcast,node-17] [UDP] node-17: no physical address for 8a0e9532-54df-d91e-79af-185b2cadaf1f, dropping message. By default the unicast socket uses a random port number, this is fine in most environments but for environments where iptables or other local firewalls are enabled, you will need to set the bind_port and port_range, otherwise the cluster would not be able to form. Define your port range using these 2 values and add rules to your iptables to allow communication on these ports.In most of the servers out of the box sysctl settings don't seem to be be optimal for Jgroups . Check for warns in the console log right after Jgroups started that look similar to this. Ask your admin to adjust these settings accordingly. WARN  [main] [UDP] [JGRP00014] the send buffer of socket DatagramSocket was set to 640KB, but the OS only allocated 229.38KB. This might lead to performance problems. Please set your max send buffer in the OS correctly (e.g. net.core.wmem_max on Linux)    WARN  [main] [UDP] [JGRP00014] the receive buffer of socket DatagramSocket was set to 20MB, but the OS only allocated 229.38KB. This might lead to performance problems. Please set your max receive buffer in the OS correctly (e.g. net.core.rmem_max on Linux)     [java] WARN  [main] [UDP] [JGRP00014] the send buffer of socket MulticastSocket was set to 640KB, but the OS only allocated 229.38KB. This might lead to performance problems. Please set your max send buffer in the OS correctly (e.g. net.core.wmem_max on Linux)     [java] WARN  [main] [UDP] [JGRP00014] the receive buffer of socket MulticastSocket was set to 25MB, but the OS only allocated 229.38KB. This might lead to performance problems. Please set your max receive buffer in the OS correctly (e.g. net.core.rmem_max on Linux)JGroups with large cluster:Newer version of JGroups 2.2.9 and higher could leverage TCP NIO for TCP clusters: . While in small cluster there does not seem to be a difference, actually the opposite, NIO seems to be slower, it is expected to yield better performance in larger clusters.Analysis:

WARN  [INT-2,hybris-broadcast,hybrisnode-1] [UDP] JGRP000012: discarded message from different cluster EH_CACHE (our cluster is xxx). Sender was 7d979f10-cadd-6813-d5c4-21e4cca405c5

check  Jgroup configuration file's mcast_port setting: probably it is in conflict with some other cluster

WARN  [TransferQueueBundler,hybris-broadcast,hybrisnode-2] [UDP] JGRP000032: %s: no physical address for %s, dropping message

Add system property -Djava.net.preferIPv4Stack=true-Djava.net.preferIPv4Addresses=true

check Jgroup configuration file bind_addr="match-interface:eth0"


版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:tomcat启动一闪而过的几种问题
下一篇:springboot整合websocket最基础入门使用教程详解
相关文章

 发表评论

暂时没有评论,来抢沙发吧~