drivers/staging/zcache/ramster/ramster-howto.txt - nest-hello/linux-3.10 - Git at Google

 			RAMSTER HOW-TO

 Author: Dan Magenheimer
 Ramster maintainer: Konrad Wilk <konrad.wilk@oracle.com>

 This is a HOWTO document for ramster which, as of this writing, is in
 the kernel as a subdirectory of zcache in drivers/staging, called ramster.
 (Zcache can be built with or without ramster functionality.)  If enabled
 and properly configured, ramster allows memory capacity load balancing
 across multiple machines in a cluster.  Further, the ramster code serves
 as an example of asynchronous access for zcache (as well as cleancache and
 frontswap) that may prove useful for future transcendent memory
 implementations, such as KVM and NVRAM.  While ramster works today on
 any network connection that supports kernel sockets, its features may
 become more interesting on future high-speed fabrics/interconnects.

 Ramster requires both kernel and userland support.  The userland support,
 called ramster-tools, is known to work with EL6-based distros, but is a
 set of poorly-hacked slightly-modified cluster tools based on ocfs2, which
 includes an init file, a config file, and a userland binary that interfaces
 to the kernel.  This state of userland support reflects the abysmal userland
 skills of this suitably-embarrassed author; any help/patches to turn
 ramster-tools into more distributable rpms/debs useful for a wider range
 of distros would be appreciated.  The source RPM that can be used as a
 starting point is available at:
     http://oss.oracle.com/projects/tmem/files/RAMster/

 As a result of this author's ignorance, userland setup described in this
 HOWTO assumes an EL6 distro and is described in EL6 syntax.  Apologies
 if this offends anyone!

 Kernel support has only been tested on x86_64.  Systems with an active
 ocfs2 filesystem should work, but since ramster leverages a lot of
 code from ocfs2, there may be latent issues.  A kernel configuration that
 includes CONFIG_OCFS2_FS should build OK, and should certainly run OK
 if no ocfs2 filesystem is mounted.

 This HOWTO demonstrates memory capacity load balancing for a two-node
 cluster, where one node called the "local" node becomes overcommitted
 and the other node called the "remote" node provides additional RAM
 capacity for use by the local node.  Ramster is capable of more complex
 topologies; see the last section titled "ADVANCED RAMSTER TOPOLOGIES".

 If you find any terms in this HOWTO unfamiliar or don't understand the
 motivation for ramster, the following LWN reading is recommended:
 -- Transcendent Memory in a Nutshell (lwn.net/Articles/454795)
 -- The future calculus of memory management (lwn.net/Articles/475681)
 And since ramster is built on top of zcache, this article may be helpful:
 -- In-kernel memory compression (lwn.net/Articles/545244)

 Now that you've memorized the contents of those articles, let's get started!

 A. PRELIMINARY

 1) Install two x86_64 Linux systems that are known to work when
    upgraded to a recent upstream Linux kernel version.

 On each system:

 2) Configure, build and install, then boot Linux, just to ensure it
    can be done with an unmodified upstream kernel.  Confirm you booted
    the upstream kernel with "uname -a".

 3) If you plan to do any performance testing or unless you plan to
    test only swapping, the "WasActive" patch is also highly recommended.
    (Search lkml.org for WasActive, apply the patch, rebuild your kernel.)
    For a demo or simple testing, the patch can be ignored.

 4) Install ramster-tools as root.  An x86_64 rpm for EL6-based systems
    can be found at:
     http://oss.oracle.com/projects/tmem/files/RAMster/
    (Sorry but for now, non-EL6 users must recreate ramster-tools on
    their own from source.  See above.)

 5) Ensure that debugfs is mounted at each boot.  Examples below assume it
    is mounted at /sys/kernel/debug.

 B. BUILDING RAMSTER INTO THE KERNEL

 Do the following on each system:

 1) Using the kernel configuration mechanism of your choice, change
    your config to include:

 	CONFIG_CLEANCACHE=y
 	CONFIG_FRONTSWAP=y
 	CONFIG_STAGING=y
 	CONFIG_CONFIGFS_FS=y # NOTE: MUST BE y, not m
 	CONFIG_ZCACHE=y
 	CONFIG_RAMSTER=y

    For a linux-3.10 or later kernel, you should also set:

 	CONFIG_ZCACHE_DEBUG=y
 	CONFIG_RAMSTER_DEBUG=y

    Before building the kernel please doublecheck your kernel config
    file to ensure all of the settings are correct.

 2) Build this kernel and change your boot file (e.g. /etc/grub.conf)
    so that the new kernel will boot.

 3) Add "zcache" and "ramster" as kernel boot parameters for the new kernel.

 4) Reboot each system approximately simultaneously.

 5) Check dmesg to ensure there are some messages from ramster, prefixed
    by "ramster:"

 	# dmesg | grep ramster

    You should also see a lot of files in:

 	# ls /sys/kernel/debug/zcache
 	# ls /sys/kernel/debug/ramster

    These are mostly counters for various zcache and ramster activities.
    You should also see files in:

 	# ls /sys/kernel/mm/ramster

    These are sysfs files that control ramster as we shall see.

    Ramster now will act as a single-system zcache on each system
    but doesn't yet know anything about the cluster so can't yet do
    anything remotely.

 C. CONFIGURING THE RAMSTER CLUSTER

 This part can be error prone unless you are familiar with clustering
 filesystems.  We need to describe the cluster in a /etc/ramster.conf
 file and the init scripts that parse it are extremely picky about
 the syntax.

 1) Create a /etc/ramster.conf file and ensure it is identical on both
    systems.  This file mimics the ocfs2 format and there is a good amount
    of documentation that can be searched for ocfs2.conf, but you can use:

 	cluster:
 		name = ramster
 		node_count = 2
 	node:
 		name = system1
 		cluster = ramster
 		number = 0
 		ip_address = my.ip.ad.r1
 		ip_port = 7777
 	node:
 		name = system2
 		cluster = ramster
 		number = 1
 		ip_address = my.ip.ad.r2
 		ip_port = 7777

    You must ensure that the "name" field in the file exactly matches
    the output of "hostname" on each system; if "hostname" shows a
    fully-qualified hostname, ensure the name is fully qualified in
    /etc/ramster.conf.  Obviously, substitute my.ip.ad.rx with proper
    ip addresses.

 2) Enable the ramster service and configure it.  If you used the
    EL6 ramster-tools, this would be:

 	# chkconfig --add ramster
 	# service ramster configure

    Set "load on boot" to "y", cluster to start is "ramster" (or whatever
    name you chose in ramster.conf), heartbeat dead threshold as "500",
    network idle timeout as "1000000".  Leave the others as default.

 3) Reboot both systems.  After reboot, try (assuming EL6 ramster-tools):

 	# service ramster status

    You should see "Checking RAMSTER cluster "ramster": Online".  If you do
    not, something is wrong and ramster will not work.  Note that you
    should also see that the driver for "configfs" is loaded and mounted,
    the driver for ocfs2_dlmfs is not loaded, and some numbers for network
    parameters.  You will also see "Checking RAMSTER heartbeat: Not active".
    That's all OK.

 4) Now you need to start the cluster heartbeat; the cluster is not "up"
    until all nodes detect a heartbeat.  In a real cluster, heartbeat detection
    is done via a cluster filesystem, but ramster doesn't require one.  Some
    hack-y kernel code in ramster can start the heartbeat for you though if
    you tell it what nodes are "up".  To enable the heartbeat, do:

 	# echo 0 > /sys/kernel/mm/ramster/manual_node_up
 	# echo 1 > /sys/kernel/mm/ramster/manual_node_up

    This must be done on BOTH nodes and, to avoid timeouts, must be done
    approximately concurrently on both nodes.  On an EL6 system, it is
    convenient to put these lines in /etc/rc.local.  To confirm that the
    cluster is now up, on both systems do:

 	# dmesg | grep ramster

    You should see ramster "Accepted connection" messages in dmesg on both
    nodes after this.  Note that if you check userland status again with

 	# service ramster status

    you will still see "Checking RAMSTER heartbeat: Not active".  That's
    still OK... the ramster kernel heartbeat hack doesn't communicate to
    userland.

 5) You now must tell each node the node to which it should "remotify" pages.
    On this two node cluster, we will assume the "local" node, node 0, has
    memory overcommitted and will use ramster to utilize RAM capacity on
    the "remote node", node 1.  To configure this, on node 0, you do:

 	# echo 1 > /sys/kernel/mm/ramster/remote_target_nodenum

    You should see "ramster: node 1 set as remotification target" in dmesg
    on node 0.  Again, on EL6, /etc/rc.local is a good place to put this
    on node 0 so you don't forget to do it at each boot.

 6) One more step:  By default, the ramster code does not "remotify" any
    pages; this is primarily for testing purposes, but sometimes it is
    useful.  This may change in the future, but for now, on node 0, you do:

 	# echo 1 > /sys/kernel/mm/ramster/pers_remotify_enable
 	# echo 1 > /sys/kernel/mm/ramster/eph_remotify_enable

    The first enables remotifying swap (persistent, aka frontswap) pages,
    the second enables remotifying of page cache (ephemeral, cleancache)
    pages.

    On EL6, these lines can also be put in /etc/rc.local (AFTER the
    node_up lines), or at the beginning of a script that runs a workload.

 7) Note that most testing has been done with both/all machines booted
    roughly simultaneously to avoid cluster timeouts.  Ideally, you should
    do this too unless you are trying to break ramster rather than just
    use it. ;-)

 D. TESTING RAMSTER

 1) Note that ramster has no value unless pages get "remotified".  For
    swap/frontswap/persistent pages, this doesn't happen unless/until
    the workload would cause swapping to occur, at which point pages
    are put into frontswap/zcache, and the remotification thread starts
    working.  To get to the point where the system swaps, you either
    need a workload for which the working set exceeds the RAM in the
    system; or you need to somehow reduce the amount of RAM one of
    the system sees.  This latter is easy when testing in a VM, but
    harder on physical systems.  In some cases, "mem=xxxM" on the
    kernel command line restricts memory, but for some values of xxx
    the kernel may fail to boot.  One may also try creating a fixed
    RAMdisk, doing nothing with it, but ensuring that it eats up a fixed
    amount of RAM.

 2) To see if ramster is working, on the "remote node", node 1, try:

 	# grep . /sys/kernel/debug/ramster/foreign_*
         # # note, that is space-dot-space between grep and the pathname

    to monitor the number (and max) ephemeral and persistent pages
    that ramster has sent.  If these stay at zero, ramster is not working
    either because the workload on the local node (node 0) isn't creating
    enough memory pressure or because "remotifying" isn't working.  On the
    local system, node 0, you can watch lots of useful information also.
    Try:

 	grep . /sys/kernel/debug/zcache/*pageframes* \
 		/sys/kernel/debug/zcache/*zbytes* \
 		/sys/kernel/debug/zcache/*zpages* \
 		/sys/kernel/debug/ramster/*remote*

    Of particular note are the remote_*_pages_succ_get counters.  These
    show how many disk reads and/or disk writes have been avoided on the
    overcommitted local system by storing pages remotely using ramster.

    At the risk of information overload, you can also grep:

         /sys/kernel/debug/cleancache/* and /sys/kernel/debug/frontswap/*

    These show, for example, how many disk reads and/or disk writes have
    been avoided by using zcache to optimize RAM on the local system.


 AUTOMATIC SWAP REPATRIATION

 You may notice that while the systems are idle, the foreign persistent
 page count on the remote machine slowly decreases.  This is because
 ramster implements "frontswap selfshrinking":  When possible, swap
 pages that have been remotified are slowly repatriated to the local
 machine.  This is so that local RAM can be used when possible and
 so that, in case of remote machine crash, the probability of loss
 of data is reduced.

 REBOOTING / POWEROFF

 If a system is shut down while some of its swap pages still reside
 on a remote system, the system may lock up during the shutdown
 sequence.  This will occur if the network is shut down before the
 swap mechansim is shut down, which is the default ordering on many
 distros.  To avoid this annoying problem, simply shut off the swap
 subsystem before starting the shutdown sequence, e.g.:

 	# swapoff -a
 	# reboot

 Ideally, this swapoff-before-ifdown ordering should be enforced permanently
 using shutdown scripts.

 KNOWN PROBLEMS

 1) You may periodically see messages such as:

     ramster_r2net, message length problem

    This is harmless but indicates that a node is sending messages
    containing compressed pages that exceed the maximum for zcache
    (PAGE_SIZE*15/16).  The sender side needs to be fixed.

 2) If you see a "No longer connected to node..." message or a "No connection
    established with node X after N seconds", it is possible you may
    be in an unrecoverable state.  If you are certain all of the
    appropriate cluster configuration steps described above have been
    performed, try rebooting the two servers concurrently to see if
    the cluster starts.

    Note that "Connection to node... shutdown, state 7" is an intermediate
    connection state.  As long as you later see "Accepted connection", the
    intermediate states are harmless.

 3) There are known issues in counting certain values.  As a result
    you may see periodic warnings from the kernel.  Almost always you
    will see "ramster: bad accounting for XXX".  There are also "WARN_ONCE"
    messages.  If you see kernel warnings with a tombstone, please report
    them.  They are harmless but reflect bugs that need to be eventually fixed.

 ADVANCED RAMSTER TOPOLOGIES

 The kernel code for ramster can support up to eight nodes in a cluster,
 but no testing has been done with more than three nodes.

 In the example described above, the "remote" node serves as a RAM
 overflow for the "local" node.  This can be made symmetric by appropriate
 settings of the sysfs remote_target_nodenum file.  For example, by setting:

 	# echo 1 > /sys/kernel/mm/ramster/remote_target_nodenum

 on node 0, and

 	# echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum

 on node 1, each node can serve as a RAM overflow for the other.

 For more than two nodes, a "RAM server" can be configured.  For a
 three node system, set:

 	# echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum

 on node 1, and

 	# echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum

 on node 2.  Then node 0 is a RAM server for node 1 and node 2.

 In this implementation of ramster, any remote node is potentially a single
 point of failure (SPOF).  Though the probability of failure is reduced
 by automatic swap repatriation (see above), a proposed future enhancement
 to ramster improves high-availability for the cluster by sending a copy
 of each page of date to two other nodes.  Patches welcome!
	RAMSTER HOW-TO

	Author: Dan Magenheimer
	Ramster maintainer: Konrad Wilk <konrad.wilk@oracle.com>

	This is a HOWTO document for ramster which, as of this writing, is in
	the kernel as a subdirectory of zcache in drivers/staging, called ramster.
	(Zcache can be built with or without ramster functionality.) If enabled
	and properly configured, ramster allows memory capacity load balancing
	across multiple machines in a cluster. Further, the ramster code serves
	as an example of asynchronous access for zcache (as well as cleancache and
	frontswap) that may prove useful for future transcendent memory
	implementations, such as KVM and NVRAM. While ramster works today on
	any network connection that supports kernel sockets, its features may
	become more interesting on future high-speed fabrics/interconnects.

	Ramster requires both kernel and userland support. The userland support,
	called ramster-tools, is known to work with EL6-based distros, but is a
	set of poorly-hacked slightly-modified cluster tools based on ocfs2, which
	includes an init file, a config file, and a userland binary that interfaces
	to the kernel. This state of userland support reflects the abysmal userland
	skills of this suitably-embarrassed author; any help/patches to turn
	ramster-tools into more distributable rpms/debs useful for a wider range
	of distros would be appreciated. The source RPM that can be used as a
	starting point is available at:
	http://oss.oracle.com/projects/tmem/files/RAMster/

	As a result of this author's ignorance, userland setup described in this
	HOWTO assumes an EL6 distro and is described in EL6 syntax. Apologies
	if this offends anyone!

	Kernel support has only been tested on x86_64. Systems with an active
	ocfs2 filesystem should work, but since ramster leverages a lot of
	code from ocfs2, there may be latent issues. A kernel configuration that
	includes CONFIG_OCFS2_FS should build OK, and should certainly run OK
	if no ocfs2 filesystem is mounted.

	This HOWTO demonstrates memory capacity load balancing for a two-node
	cluster, where one node called the "local" node becomes overcommitted
	and the other node called the "remote" node provides additional RAM
	capacity for use by the local node. Ramster is capable of more complex
	topologies; see the last section titled "ADVANCED RAMSTER TOPOLOGIES".

	If you find any terms in this HOWTO unfamiliar or don't understand the
	motivation for ramster, the following LWN reading is recommended:
	-- Transcendent Memory in a Nutshell (lwn.net/Articles/454795)
	-- The future calculus of memory management (lwn.net/Articles/475681)
	And since ramster is built on top of zcache, this article may be helpful:
	-- In-kernel memory compression (lwn.net/Articles/545244)

	Now that you've memorized the contents of those articles, let's get started!

	A. PRELIMINARY

	1) Install two x86_64 Linux systems that are known to work when
	upgraded to a recent upstream Linux kernel version.

	On each system:

	2) Configure, build and install, then boot Linux, just to ensure it
	can be done with an unmodified upstream kernel. Confirm you booted
	the upstream kernel with "uname -a".

	3) If you plan to do any performance testing or unless you plan to
	test only swapping, the "WasActive" patch is also highly recommended.
	(Search lkml.org for WasActive, apply the patch, rebuild your kernel.)
	For a demo or simple testing, the patch can be ignored.

	4) Install ramster-tools as root. An x86_64 rpm for EL6-based systems
	can be found at:
	http://oss.oracle.com/projects/tmem/files/RAMster/
	(Sorry but for now, non-EL6 users must recreate ramster-tools on
	their own from source. See above.)

	5) Ensure that debugfs is mounted at each boot. Examples below assume it
	is mounted at /sys/kernel/debug.

	B. BUILDING RAMSTER INTO THE KERNEL

	Do the following on each system:

	1) Using the kernel configuration mechanism of your choice, change
	your config to include:

	CONFIG_CLEANCACHE=y
	CONFIG_FRONTSWAP=y
	CONFIG_STAGING=y
	CONFIG_CONFIGFS_FS=y # NOTE: MUST BE y, not m
	CONFIG_ZCACHE=y
	CONFIG_RAMSTER=y

	For a linux-3.10 or later kernel, you should also set:

	CONFIG_ZCACHE_DEBUG=y
	CONFIG_RAMSTER_DEBUG=y

	Before building the kernel please doublecheck your kernel config
	file to ensure all of the settings are correct.

	2) Build this kernel and change your boot file (e.g. /etc/grub.conf)
	so that the new kernel will boot.

	3) Add "zcache" and "ramster" as kernel boot parameters for the new kernel.

	4) Reboot each system approximately simultaneously.

	5) Check dmesg to ensure there are some messages from ramster, prefixed
	by "ramster:"

	# dmesg \| grep ramster

	You should also see a lot of files in:

	# ls /sys/kernel/debug/zcache
	# ls /sys/kernel/debug/ramster

	These are mostly counters for various zcache and ramster activities.
	You should also see files in:

	# ls /sys/kernel/mm/ramster

	These are sysfs files that control ramster as we shall see.

	Ramster now will act as a single-system zcache on each system
	but doesn't yet know anything about the cluster so can't yet do
	anything remotely.

	C. CONFIGURING THE RAMSTER CLUSTER

	This part can be error prone unless you are familiar with clustering
	filesystems. We need to describe the cluster in a /etc/ramster.conf
	file and the init scripts that parse it are extremely picky about
	the syntax.

	1) Create a /etc/ramster.conf file and ensure it is identical on both
	systems. This file mimics the ocfs2 format and there is a good amount
	of documentation that can be searched for ocfs2.conf, but you can use:

	cluster:
	name = ramster
	node_count = 2
	node:
	name = system1
	cluster = ramster
	number = 0
	ip_address = my.ip.ad.r1
	ip_port = 7777
	node:
	name = system2
	cluster = ramster
	number = 1
	ip_address = my.ip.ad.r2
	ip_port = 7777

	You must ensure that the "name" field in the file exactly matches
	the output of "hostname" on each system; if "hostname" shows a
	fully-qualified hostname, ensure the name is fully qualified in
	/etc/ramster.conf. Obviously, substitute my.ip.ad.rx with proper
	ip addresses.

	2) Enable the ramster service and configure it. If you used the
	EL6 ramster-tools, this would be:

	# chkconfig --add ramster
	# service ramster configure

	Set "load on boot" to "y", cluster to start is "ramster" (or whatever
	name you chose in ramster.conf), heartbeat dead threshold as "500",
	network idle timeout as "1000000". Leave the others as default.

	3) Reboot both systems. After reboot, try (assuming EL6 ramster-tools):

	# service ramster status

	You should see "Checking RAMSTER cluster "ramster": Online". If you do
	not, something is wrong and ramster will not work. Note that you
	should also see that the driver for "configfs" is loaded and mounted,
	the driver for ocfs2_dlmfs is not loaded, and some numbers for network
	parameters. You will also see "Checking RAMSTER heartbeat: Not active".
	That's all OK.

	4) Now you need to start the cluster heartbeat; the cluster is not "up"
	until all nodes detect a heartbeat. In a real cluster, heartbeat detection
	is done via a cluster filesystem, but ramster doesn't require one. Some
	hack-y kernel code in ramster can start the heartbeat for you though if
	you tell it what nodes are "up". To enable the heartbeat, do:

	# echo 0 > /sys/kernel/mm/ramster/manual_node_up
	# echo 1 > /sys/kernel/mm/ramster/manual_node_up

	This must be done on BOTH nodes and, to avoid timeouts, must be done
	approximately concurrently on both nodes. On an EL6 system, it is
	convenient to put these lines in /etc/rc.local. To confirm that the
	cluster is now up, on both systems do:

	# dmesg \| grep ramster

	You should see ramster "Accepted connection" messages in dmesg on both
	nodes after this. Note that if you check userland status again with

	# service ramster status

	you will still see "Checking RAMSTER heartbeat: Not active". That's
	still OK... the ramster kernel heartbeat hack doesn't communicate to
	userland.

	5) You now must tell each node the node to which it should "remotify" pages.
	On this two node cluster, we will assume the "local" node, node 0, has
	memory overcommitted and will use ramster to utilize RAM capacity on
	the "remote node", node 1. To configure this, on node 0, you do:

	# echo 1 > /sys/kernel/mm/ramster/remote_target_nodenum

	You should see "ramster: node 1 set as remotification target" in dmesg
	on node 0. Again, on EL6, /etc/rc.local is a good place to put this
	on node 0 so you don't forget to do it at each boot.

	6) One more step: By default, the ramster code does not "remotify" any
	pages; this is primarily for testing purposes, but sometimes it is
	useful. This may change in the future, but for now, on node 0, you do:

	# echo 1 > /sys/kernel/mm/ramster/pers_remotify_enable
	# echo 1 > /sys/kernel/mm/ramster/eph_remotify_enable

	The first enables remotifying swap (persistent, aka frontswap) pages,
	the second enables remotifying of page cache (ephemeral, cleancache)
	pages.

	On EL6, these lines can also be put in /etc/rc.local (AFTER the
	node_up lines), or at the beginning of a script that runs a workload.

	7) Note that most testing has been done with both/all machines booted
	roughly simultaneously to avoid cluster timeouts. Ideally, you should
	do this too unless you are trying to break ramster rather than just
	use it. ;-)

	D. TESTING RAMSTER

	1) Note that ramster has no value unless pages get "remotified". For
	swap/frontswap/persistent pages, this doesn't happen unless/until
	the workload would cause swapping to occur, at which point pages
	are put into frontswap/zcache, and the remotification thread starts
	working. To get to the point where the system swaps, you either
	need a workload for which the working set exceeds the RAM in the
	system; or you need to somehow reduce the amount of RAM one of
	the system sees. This latter is easy when testing in a VM, but
	harder on physical systems. In some cases, "mem=xxxM" on the
	kernel command line restricts memory, but for some values of xxx
	the kernel may fail to boot. One may also try creating a fixed
	RAMdisk, doing nothing with it, but ensuring that it eats up a fixed
	amount of RAM.

	2) To see if ramster is working, on the "remote node", node 1, try:

	# grep . /sys/kernel/debug/ramster/foreign_*
	# # note, that is space-dot-space between grep and the pathname

	to monitor the number (and max) ephemeral and persistent pages
	that ramster has sent. If these stay at zero, ramster is not working
	either because the workload on the local node (node 0) isn't creating
	enough memory pressure or because "remotifying" isn't working. On the
	local system, node 0, you can watch lots of useful information also.
	Try:

	grep . /sys/kernel/debug/zcache/pageframes \
	/sys/kernel/debug/zcache/zbytes \
	/sys/kernel/debug/zcache/zpages \
	/sys/kernel/debug/ramster/remote

	Of particular note are the remote_*_pages_succ_get counters. These
	show how many disk reads and/or disk writes have been avoided on the
	overcommitted local system by storing pages remotely using ramster.

	At the risk of information overload, you can also grep:

	/sys/kernel/debug/cleancache/* and /sys/kernel/debug/frontswap/*

	These show, for example, how many disk reads and/or disk writes have
	been avoided by using zcache to optimize RAM on the local system.


	AUTOMATIC SWAP REPATRIATION

	You may notice that while the systems are idle, the foreign persistent
	page count on the remote machine slowly decreases. This is because
	ramster implements "frontswap selfshrinking": When possible, swap
	pages that have been remotified are slowly repatriated to the local
	machine. This is so that local RAM can be used when possible and
	so that, in case of remote machine crash, the probability of loss
	of data is reduced.

	REBOOTING / POWEROFF

	If a system is shut down while some of its swap pages still reside
	on a remote system, the system may lock up during the shutdown
	sequence. This will occur if the network is shut down before the
	swap mechansim is shut down, which is the default ordering on many
	distros. To avoid this annoying problem, simply shut off the swap
	subsystem before starting the shutdown sequence, e.g.:

	# swapoff -a
	# reboot

	Ideally, this swapoff-before-ifdown ordering should be enforced permanently
	using shutdown scripts.

	KNOWN PROBLEMS

	1) You may periodically see messages such as:

	ramster_r2net, message length problem

	This is harmless but indicates that a node is sending messages
	containing compressed pages that exceed the maximum for zcache
	(PAGE_SIZE*15/16). The sender side needs to be fixed.

	2) If you see a "No longer connected to node..." message or a "No connection
	established with node X after N seconds", it is possible you may
	be in an unrecoverable state. If you are certain all of the
	appropriate cluster configuration steps described above have been
	performed, try rebooting the two servers concurrently to see if
	the cluster starts.

	Note that "Connection to node... shutdown, state 7" is an intermediate
	connection state. As long as you later see "Accepted connection", the
	intermediate states are harmless.

	3) There are known issues in counting certain values. As a result
	you may see periodic warnings from the kernel. Almost always you
	will see "ramster: bad accounting for XXX". There are also "WARN_ONCE"
	messages. If you see kernel warnings with a tombstone, please report
	them. They are harmless but reflect bugs that need to be eventually fixed.

	ADVANCED RAMSTER TOPOLOGIES

	The kernel code for ramster can support up to eight nodes in a cluster,
	but no testing has been done with more than three nodes.

	In the example described above, the "remote" node serves as a RAM
	overflow for the "local" node. This can be made symmetric by appropriate
	settings of the sysfs remote_target_nodenum file. For example, by setting:

	# echo 1 > /sys/kernel/mm/ramster/remote_target_nodenum

	on node 0, and

	# echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum

	on node 1, each node can serve as a RAM overflow for the other.

	For more than two nodes, a "RAM server" can be configured. For a
	three node system, set:

	# echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum

	on node 1, and

	# echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum

	on node 2. Then node 0 is a RAM server for node 1 and node 2.

	In this implementation of ramster, any remote node is potentially a single
	point of failure (SPOF). Though the probability of failure is reduced
	by automatic swap repatriation (see above), a proposed future enhancement
	to ramster improves high-availability for the cluster by sending a copy
	of each page of date to two other nodes. Patches welcome!