mtd-utils-1.3.1/tests/checkfs/README - nest-learning-thermostat/5.1.1/mtd-utils - Git at Google

 $Id: README,v 1.2 2001/06/21 23:07:06 dwmw2 Exp $
 $Log: README,v $
 Revision 1.2  2001/06/21 23:07:06  dwmw2
 Initial import to MTD CVS

 Revision 1.1  2001/06/11 19:34:40  vipin
 Added README file to dir.


 This is the README file for the "checkfs" power fail test program.
 By: Vipin Malik

 NOTE: This program requires an external "power cycling box"
 connected to one of the com ports of the system under test.
 This power cycling box should wait for a random amount of time
 after it receives a "ok to power me down" message over the
 serial port, and then yank power to the system under test.
 (The box that I rigged up tested with waits anywhere from
 0 to ~40 seconds).


 It should then restore power after a few seconds and wait for the
 message again.


 ABOUT:

 This program's primary purpose it to test the reliiability
 of various file systems under Linux.

 SETUP:

 You need to setup the file system you want to test and run the
 "makefiles" program ONCE. This creates a set of files that are
 required by the "checkfs" program.

 Also copy the "checkfs" executable program to the same dir.

 Then you need to make sure that the program "checkfs" is called
 automatically on startup. You can customise the operation of
 the "checkfs" program by passing it various cmd line arguments.
 run "checkfs -?" for more details.

 ****NOTE*******
 Make sure that you call the checkfs program only after you have
 mounted the file system you want to test (this is obvious), but
 also after you have run any "scan" utilities to check for and
 fix any file systems errors. The e2fsck is one utility for the
 ext2 file system. For an automated setup you of course need to
 provide these scan programs to run in standalone mode (-f -y
 flags for e2fsck for example).

 File systems like JFFS and JFFS2 do not have any such external
 utilities and you may call "checkfs" right after you have mounted
 the respective file system under test.

 There are two ways you can mount the file system under test:

 1. Mount your root fs on a "standard" fs like ext2 and then
 mount the file system under test (which may be ext2 on another
 partition or device) and then run "checkfs" on this mounted
 partition OR

 2. Make your fs AND device that you have put this fs as your
 root fs and run "checkfs" on the root device (i.e. "/").
 You can of course still run checkfs under a separate dir
 under your "/" root dir.

 I have found the second method to be a particularly stringent
 arrangement (and thus preferred when you are trying to break
 something).

 Using this arrangement I was able to find that JFFS clobbered
 some "sister" files on the root fs even though "checkfs" would
 run fine through all its own check files.

 (I found this out when one of the clobbered sister file happened
 to be /bin/bash. The system refused to run rc.local thus
 preventing my "checkfs" program from being launched :)

 "checkfs":

 The "formatting" reliability of the fs as well as the file data integrity
 of files on the fs can be checked using this program.

 "formatiing" reliability can only be checked via an indirect method.
 If there is severe formatting reliability issues with the file system,
 it will most likely cause other system failures that will prevent this
 program from running successfully on a power up. This will prevent
 a "ok to power me down" message from going out to the power cycling
 black box and prevent power being turned off again.

 File data reliability is checked more directly. A fixed number of
 files are created in the current dir (using the program "makefiles").

 Each file has a random number of bytes in it (set by using the
 -s cmd line flag). The number of "ints" in the file is stored as the
 first "int" in it (note: 0 length files are not allowed). Each file
 is then filled with random data and a 16 bit CRC appended at the end.

 When "checkfs" is run, it runs through all files (with predetermined
 file names)- one at a time- and checks for the number of "int's"
 in it as well as the ending CRC.

 The program exits if the numbers of files that are corrupt are greater
 that a user specified parameter (set by using the -e cmd line flag).

 If the number of corrupt files is less than this parameter, the corrupt
 files are repaired and operation resumes as explained below.

 The idea behind allowing a user specified amount of corrupt files is as
 follows:

 If you are testing for "formatting" reliability of a fs, and for
 the data reliability of "other" files present of the fs, use -e 1.
 "other" files are defined as sister files on the fs, not being written to
 by the "checkfs" test program.

 As mentioned, in this case you would set -e 1, or allow at most 1 file
 to be corrupt each time after a power fail. This would be the file
 that was probably being written to when power failed (and CRC was not
 updated to reflect the  new data being written). You would check file
 systems like ext2 etc. with such a configuration.
 (As you have no hope that these file systems provide for either your
 new data or old data to be present in the file if power failed during
 the write. This is called "roll back and recover".)

 With JFFS2 I tested for such "roll back and recover" file data reliability
 by setting -e 0 and making sure that all writes to the file being
 updated are done in a *single* write().

 This is how I found that JFFS2 (yet) does NOT support this functionality.
 (There was a great debate if this was a bug or a feature that was lacking
 or even an issue at all. See the mtd archives for more details).

 In other words, JFFS2 will partially update a file on FLASH even before
 the write() command has completed, thus leaving part old data part new
 data in your file if power failed in the middle of a write().

 This is bad functionality if you are updating a binary structure or a
 CRC protected file (as in our case).


 If All Files Check Out OK:

 On the startup scan, if there are less errors than specified by the "-e flag"
 a "ok to power me down message" is sent via the specified com port.

 The actual format of this message will depend on the format expected
 by the power cycling box that will receive this message. One may customise
 the actual message that goes out in the "do_pwr_dn)" routine in "comm.c".

 This file is called with an open file descriptor to the comm port that
 this message needs to go out over and the count of the current power
 cycle (in case your power cycling box can display/log this count).

 After this message has been sent out, the checkfs program goes into
 a while(1) loop of writing new data (with CRC), one at a time, into
 all the "check files" in the dir.

 Its life comes to a sudden end when power is asynchronously pulled from
 under its feet (by your external power cycling box).

 It comes back to life when power is restored and the system boots and
 checkfs is called from the rc.local script file.

 The cycle then repeats till a problem is detected, at which point
 the "ok to power me down" message is not sent and the cycle stops
 waiting for the user to examine the system.
	$Id: README,v 1.2 2001/06/21 23:07:06 dwmw2 Exp $
	$Log: README,v $
	Revision 1.2 2001/06/21 23:07:06 dwmw2
	Initial import to MTD CVS

	Revision 1.1 2001/06/11 19:34:40 vipin
	Added README file to dir.


	This is the README file for the "checkfs" power fail test program.
	By: Vipin Malik

	NOTE: This program requires an external "power cycling box"
	connected to one of the com ports of the system under test.
	This power cycling box should wait for a random amount of time
	after it receives a "ok to power me down" message over the
	serial port, and then yank power to the system under test.
	(The box that I rigged up tested with waits anywhere from
	0 to ~40 seconds).


	It should then restore power after a few seconds and wait for the
	message again.


	ABOUT:

	This program's primary purpose it to test the reliiability
	of various file systems under Linux.

	SETUP:

	You need to setup the file system you want to test and run the
	"makefiles" program ONCE. This creates a set of files that are
	required by the "checkfs" program.

	Also copy the "checkfs" executable program to the same dir.

	Then you need to make sure that the program "checkfs" is called
	automatically on startup. You can customise the operation of
	the "checkfs" program by passing it various cmd line arguments.
	run "checkfs -?" for more details.

	**NOTE*****
	Make sure that you call the checkfs program only after you have
	mounted the file system you want to test (this is obvious), but
	also after you have run any "scan" utilities to check for and
	fix any file systems errors. The e2fsck is one utility for the
	ext2 file system. For an automated setup you of course need to
	provide these scan programs to run in standalone mode (-f -y
	flags for e2fsck for example).

	File systems like JFFS and JFFS2 do not have any such external
	utilities and you may call "checkfs" right after you have mounted
	the respective file system under test.

	There are two ways you can mount the file system under test:

	1. Mount your root fs on a "standard" fs like ext2 and then
	mount the file system under test (which may be ext2 on another
	partition or device) and then run "checkfs" on this mounted
	partition OR

	2. Make your fs AND device that you have put this fs as your
	root fs and run "checkfs" on the root device (i.e. "/").
	You can of course still run checkfs under a separate dir
	under your "/" root dir.

	I have found the second method to be a particularly stringent
	arrangement (and thus preferred when you are trying to break
	something).

	Using this arrangement I was able to find that JFFS clobbered
	some "sister" files on the root fs even though "checkfs" would
	run fine through all its own check files.

	(I found this out when one of the clobbered sister file happened
	to be /bin/bash. The system refused to run rc.local thus
	preventing my "checkfs" program from being launched :)

	"checkfs":

	The "formatting" reliability of the fs as well as the file data integrity
	of files on the fs can be checked using this program.

	"formatiing" reliability can only be checked via an indirect method.
	If there is severe formatting reliability issues with the file system,
	it will most likely cause other system failures that will prevent this
	program from running successfully on a power up. This will prevent
	a "ok to power me down" message from going out to the power cycling
	black box and prevent power being turned off again.

	File data reliability is checked more directly. A fixed number of
	files are created in the current dir (using the program "makefiles").

	Each file has a random number of bytes in it (set by using the
	-s cmd line flag). The number of "ints" in the file is stored as the
	first "int" in it (note: 0 length files are not allowed). Each file
	is then filled with random data and a 16 bit CRC appended at the end.

	When "checkfs" is run, it runs through all files (with predetermined
	file names)- one at a time- and checks for the number of "int's"
	in it as well as the ending CRC.

	The program exits if the numbers of files that are corrupt are greater
	that a user specified parameter (set by using the -e cmd line flag).

	If the number of corrupt files is less than this parameter, the corrupt
	files are repaired and operation resumes as explained below.

	The idea behind allowing a user specified amount of corrupt files is as
	follows:

	If you are testing for "formatting" reliability of a fs, and for
	the data reliability of "other" files present of the fs, use -e 1.
	"other" files are defined as sister files on the fs, not being written to
	by the "checkfs" test program.

	As mentioned, in this case you would set -e 1, or allow at most 1 file
	to be corrupt each time after a power fail. This would be the file
	that was probably being written to when power failed (and CRC was not
	updated to reflect the new data being written). You would check file
	systems like ext2 etc. with such a configuration.
	(As you have no hope that these file systems provide for either your
	new data or old data to be present in the file if power failed during
	the write. This is called "roll back and recover".)

	With JFFS2 I tested for such "roll back and recover" file data reliability
	by setting -e 0 and making sure that all writes to the file being
	updated are done in a single write().

	This is how I found that JFFS2 (yet) does NOT support this functionality.
	(There was a great debate if this was a bug or a feature that was lacking
	or even an issue at all. See the mtd archives for more details).

	In other words, JFFS2 will partially update a file on FLASH even before
	the write() command has completed, thus leaving part old data part new
	data in your file if power failed in the middle of a write().

	This is bad functionality if you are updating a binary structure or a
	CRC protected file (as in our case).


	If All Files Check Out OK:

	On the startup scan, if there are less errors than specified by the "-e flag"
	a "ok to power me down message" is sent via the specified com port.

	The actual format of this message will depend on the format expected
	by the power cycling box that will receive this message. One may customise
	the actual message that goes out in the "do_pwr_dn)" routine in "comm.c".

	This file is called with an open file descriptor to the comm port that
	this message needs to go out over and the count of the current power
	cycle (in case your power cycling box can display/log this count).

	After this message has been sent out, the checkfs program goes into
	a while(1) loop of writing new data (with CRC), one at a time, into
	all the "check files" in the dir.

	Its life comes to a sudden end when power is asynchronously pulled from
	under its feet (by your external power cycling box).

	It comes back to life when power is restored and the system boots and
	checkfs is called from the rc.local script file.

	The cycle then repeats till a problem is detected, at which point
	the "ok to power me down" message is not sent and the cycle stops
	waiting for the user to examine the system.