Windows Parallel Filesystems

I recently was involved in some development work for a quasi-parallel filesystem for Microsoft Windows.  As a result of that involvement my interest was piqued and I decided to do so research on what the state of research and development is in the field of parallel filesystems designed specifically for Microsoft Windows.

First a quick review of what I mean by a parallel file system.  There are any number of different types of parallel file systems available.  Some allow multiple systems and applications to share common pools of storage as in a clusered filesystem.  Some split the data across two or more nodes to improve access time and redundancy.  Other variants split files into lots of small chunks, stores these chunks on different disks in a round-robin fashion, and re-combine them upon reading to get back the original file.

The earliest instance of Microsoft Windows-specific parallel fileystem that I have found to date is the parallel filesystem developed by the ARGOS group at Universidad Carlos 111 de Madrid, Madrid, ES.  This research group developed a prototype of a parallel file system for a network of Microsft Windows nodes which they called WinPFS.  They presented their work at COSET 2004 and a number of other workshops.  WinPFS was implemented as a new fileyystem type fully integrated within the Microsoft Windows kernel.  This has the advantage that no modification or recompilation of user applications is needed to take advantage of the parallel filesystem.

The goal of this research group was to build a parallel file system for networks of Microsoft Windows computers using Microsoft Windows shared folders to access remote data in parallel. The implementation is based on file system redirectors which redirect requests to remote nodes using UNC (Universal Naming Convention) and the SMB and/or CIFS protocols. WinPFS is registered as a virtual remote file system and access to remote data is through a new shared folder \\PFS.  The basic file operation primatives are: create, read, write, and create directory.

The prototype was developed on the Windows XP platform, and has been tested with a cluster of seven Windows XP nodes and a Windows 2003 Server node in various configurations.  Maximum throughput for write operations were 250 Mbit/s and 1200 Mbit/s for read operations.  The research team reported that the bottleneck for writes was the disks and for reads was the network.  As far as I can tell this project is no longer under active development.

Another interesting experimental parallel file system for Microsoft Windows was developed by Lungpin Yeh, Juei-Ting Sun, Sheng-Kai Hung & Yarsun Hsu of the National Tsing-Hua University, ROC.  They presented a paper on their initial implementation at the 2007 High Performance Computation Conference.

Their implementation consists of three main components: A metadata server, I/O node daemons (IOD) and an API library (libwpvfs) to enable users to develop their own applications on top of the parallel file system.  libwpvfs, which uses the .NET framework, handles all communications with the metadata server and the IODs and supports six basic file operation primatives: open, create, read, write, seek, and close.

The metadata server maps each filename to a unique 64-bit file ID and maintains other information about the file such as striping size, node location and count.  While one metadata server is obviously a possible single point of failure, mirroring and redundancy can be used to improve relaibility.

When an application wants to access a file, the library first connects to the metadata server to acquire the metadata for the file.  The library then connects to the appropriate I/O node daemons listed in the metadata, and these node I/O daemons then access the correct file and send the appropriate stripes back to the library for handoff to the calling application.

Under test conditions with 5 nodes, a maximum of 109 MB/s for writes and 85 MB/s for reads was measured.

If one of the nodes fails the parallel file system can still work, i.e. the library can make use of the remaining healthy nodes, but the data within the failed node is not available anymore.  To overcome this limitation, the research team plan to add node fault tolerance in a future version so that the parallel filesystem will fully work even if some of the I/O nodes fail.

I am certain that there are other examples of parallel filesystems specifically targeted at networks of Microsoft Windows computers which have been developed by the academic research community and probably by Microsoft itself.  As I come across such work, I plan to add details to this post.

In conclusion, a parallel filesystem can not only provide a larger and/or global storage space for applications by combining storage resources on different nodes but also increase the performance of an application because the application can access files in parallel.

Traditionally this kind of solution was only available for Unix and GNU/Linux systems.  Examples include PVFS, GPFS, ParFiSys and Vesta.  However, because of the advantages that parallel fileystems can offer, and because of the ubiquity of Microsoft Windows computers I expect to see a number of commercial parallel filesystems targeted at networks of Microsoft Windows computers emerge over the next 5 to 10 years.

P.S. The graphics in this post were copied from published papers of the respective research teams.


Post a Comment