Difference between revisions of "Namespaced IO Layer"

From HaskellWiki
Jump to navigation Jump to search
(6 intermediate revisions by the same user not shown)
Line 54: Line 54:
   
 
The layer itself is implemented as Monad transformer similar to <hask>IdentityT</hask> but without the <hask>lift</hask> method available. Underneath this transformer, a standard <hask>ReaderT</hask> transformer is used, with environment being used as per-process data (see below). A monad underlying the transformer must be capable of lifting IO functions (that is, must be a member the <hask>MonadIO</hask> class). Typically this is the <hask>IO</hask> monad, but not always necessarily.
 
The layer itself is implemented as Monad transformer similar to <hask>IdentityT</hask> but without the <hask>lift</hask> method available. Underneath this transformer, a standard <hask>ReaderT</hask> transformer is used, with environment being used as per-process data (see below). A monad underlying the transformer must be capable of lifting IO functions (that is, must be a member the <hask>MonadIO</hask> class). Typically this is the <hask>IO</hask> monad, but not always necessarily.
  +
===Namespace===
  +
Each process has a namespace which provides a per-process view of the underlying host resources as a single tree of directories and files. Reference to the namespace is stored in the per-process data structure (see below). A process may issue a request request to update its namespace (to bind a file path somewhere in the namespace). The original Plan 9 implementation is described in http://doc.cat-v.org/plan_9/4th_edition/papers/lexnames, and the Namespaced IO Layer closely follows this description except that the concept of the "current" directory does not exist: all file paths are absolute from the filesystem root ("/") or the device table root ("#").
   
  +
A namespace is implemented as a map with file path keys and union directory values. A union directory is a concatenation of directories.
   
  +
Filename evaluation is a process of obtaining a physical (per device) path to the underlying host resource given a logical (visible to a process) path.
===Processes===
 
   
  +
If for example there is a "console" device whose root is known as <tt>#c</tt>, and containing a file <tt>cons</tt>, then <tt>#c/cons</tt> is a physical path. If a command <pre> bind -b '#c' /dev</pre> was once issued, then the logical path <tt>/dev/cons</tt> evaluates to the physical path <tt>#c/cons</tt>.
There is a small data structure that persists through the process lifetime. This structure is not directly accessible by the application code.
 
  +
  +
Each process must have at least one binding in its namespace, that is the root binding. In many cases this will be some fragment of the underlying host filesystem. Once such root binding is introduced, filename evaluation becomes possible.
  +
  +
Given a logical path, it is split into components (slash separated). The root component is replaced with whatever is bound to the namespace root. A prefix is formed from the rest components, starting from the topmost, and is tried to match a namespace entry. If no match found, next component is appended to the prefix, and the process repeats. If the whole path is tried, and no match in the namespace was found, the resulting physical path is the namespace root binding plus the logical path itself. That is, if the root was bound with <pre>bind -c '#Z' /</pre> then <tt>/foo/bar</tt> evaluates to <tt>#Z/foo/bar</tt>.
  +
  +
If however a match is found at some step (that is, the logical path is <tt>/dev/cons</tt>, and <tt>/dev</tt> matches the namespace entry containing a concatenation of several directories, including one exposed by the console device), then the matched prefix is replaced with one of the concatenated directories: the one containing an entry with the path component following the matched prefix (if no such directory is found, evaluation fails). This is <tt>cons</tt> in our case, and <tt>/dev</tt> will be replaced with <tt>#c/</tt> rather than with <tt>#Z/</tt>. Thus, <tt>/dev/cons</tt> evaluates to <tt>#c/cons</tt>.
  +
  +
===Per-process Data===
  +
There is a small data structure that persists through the process lifetime. This structure is directly accessible only at the "system" level, via the <hask>ReaderT</hask> functionality. The device drivers code does not have access to per-process data. Due to the use of <hask>ReaderT</hask>, this structure is immutable, except for the namespace map. See http://hs-ogl-misc.googlecode.com/hg/io-layer/System/IO9/NameSpace/Types.hs for the Haskell definition.
  +
  +
The most important fields of this per-process data structure are:
  +
  +
* Process running privileges: Init, Admin, Host owner, and None. The former three are used with locally initiated processes (most of the processes run as Host owner). The None privilege is given to server processes running on behalf of external/remote users. When a process attaches a device, its running privileges are copied into the attachment descriptor. The logic of granting access is entirely on file servers, however the common rule is that processes with Init, Admin, and Host owner privileges have almost full access to the underlying host resources while None has no access at all. Thus, processes running as None need to perform certain authentication procedures to obtain proper attachment descriptors (''this is future work to implement a Plan 9 or Inferno authentication scheme'').
  +
  +
* Device map. This is a one-level map with character keys and device table values. It is used to find proper device table for a file path starting with the '#' character.
  +
  +
* Reference to the process namespace. This is a <hask>MVar</hask> whose contents is overwritten when the namespace is updated.
  +
  +
Other fields include host owner name string, path handles for process standard input and output, parent process (thread) identifier, but they are not essential for the Namespace layer itself.
  +
  +
===New Process Creation===
  +
The Namespace Layer provides functionality to start new processes with possible control over child process namespace and running privileges. The basic rules are:
  +
  +
* Running privileges of the child process can only be lower than or same of the parent process, considering that Init > Admin > Host owner > None.
  +
  +
* Namespace may be shared between parent and child, or cloned, or built from scratch (the child process starts with an empty namespace). Namespace sharing is only allowed for processes with same privileges.
   
 
==Application Layer==
 
==Application Layer==

Revision as of 21:25, 19 December 2010

Introduction

The Haskell I/O library is based on the underlying Unix/Posix concepts, repeating its well-known design specifics and inconsistencies. The namespaced IO library provides an IO abstraction based on the ideas found in Plan 9 and Inferno, that is, to represent each IO capable resource as a virtual file server exposing a tree of files and directories, organizing those trees using per-process configurable namespaces.

Availability

Project summary (licensing, etc.): http://code.google.com/p/hs-ogl-misc/

Source code: http://code.google.com/p/hs-ogl-misc/source/browse/ under the io-layer directory.

Checkout: see http://code.google.com/p/hs-ogl-misc/source/checkout (Mercurial repo)

Note on the Haskell Runtime

This library heavily depends on the most recent improvements in the Glasgow Haskell Compiler runtime, and thus cannot be used with other Haskell implementations. Throughout this text, the term "Haskell Runtime" means "GHC Runtime".

Structure

Base Layer

This layer is represented by the Haskell (GHC) own IO library (the IO Monad). Handles provided by the standard library are used to perform actual IO operations.

Device Layer

This layer contains servers providing file operations to access the resources represented by the Base Layer. These operations are not directly available to applications.

File servers defined within this layer expose an interface that resembles the interface of a Plan9 kernel-level driver. The set of operations implemented by such servers is close to 9P2000 set of operations, but the implementation has been adjusted for more convenient implementation.

Attaching to a Device

This operation corresponds to the ATTACH operation of 9P2000. Its purpose is to establish a relationship between an application process and a portion of the filesystem that the device exposes. The result of this operation is so called Attachment Descriptor for the root of the filesystem exposed by the device.

Attachment Descriptor roughly corresponds to the Chan structure that Plan9 kernel maintains for each process-device attachment. The most important parts of this structure are:

  • Qid (same as 9P2000 Qid) which holds the internal reference to a file or a directory served by the attached device.
  • Attachment privileges which tell the device which files may be accessed through this attachment descriptor, and in which fashion. Attachment privileges may in some cases differ from the running privileges of an application process: thus a non-privileged process may have higher access levels with certain devices.
  • Path to the file or directory from the fevice's filesystem root.

Walking the File System Tree

This operation corresponds to the WALK operation of 9P2000. Its purpose is to obtain an attachment descriptor for a file or directory (tatget of the walk) other than the one an application process has an attachment descriptor for (start of the walk). While a successful device attachment operation results in an attachment descriptor for a device filesystem root, in order to reach an arbitrary file or a directory down the filesystem tree, one or more walk operations need to be performed. Basically one step of filesystem walk includes search of an entry with the given name (one component of the target path) in the directory related to the given attachment descriptor (walk on a regular file is not allowed). If the search was successful, the next component of the target path is searched for in the directory found at the previous step, etc. until all the target path components are processed, or any kind of error (entry not found, access violation, IO error etc.). The result of the walk operation is an Attachment Descriptior for the target path. The target Attachment Descriptor will likely contain the same attachment privileges that the start Attachment Descriptor contains.

Opening a Handle

This operation corresponds to the OPEN operation of 9P2000. Given an Attachment Descriptor for a file or a directory, and IO mode (read, write, append, etc.) requested, a Handle will be opened by the means of the underlying Haskell (GHC) runtime. To read/write/seek/close a Handle, use the standard Haskell library interface.

Getting/Setting File Status Attributes

These operations correspond to the STAT and WSTAT operations of 9P2000. Their purpose is to obtain and modify file status information. The data structure to describe file status is directly derived from the 9P2000 specification.

Creation and Removal of Files

These operations correspond to the CREATE and REMOVE operations of 9P2000. Their purpose is to create and delete directory entries in the filesystem exposed by a device driver. To create a new entry, an Attachment Descriptor for the directory where new entry is to be created is required, with attachment privileges sufficient to create a new file. To remove an entry, a properly privileged Attachment Descriptor for the file or directory to be removed is required (not for the parent directory).

Namespace Layer

This layer provides facilities to organize file systems presented by the Device Layer into per-process (thread) namespaces. Operations such as binding a file system to a namespace, and path evaluation are directly available to applications. The Namespace layer also introduces type-based separation between "application" and "system" code execution levels. At the application level, standard Haskell IO facilities based on the IO monad are not available: an application can only invoke system calls provided by the layer. At the system level, the IO monad is available.

The layer itself is implemented as Monad transformer similar to IdentityT but without the lift method available. Underneath this transformer, a standard ReaderT transformer is used, with environment being used as per-process data (see below). A monad underlying the transformer must be capable of lifting IO functions (that is, must be a member the MonadIO class). Typically this is the IO monad, but not always necessarily.

Namespace

Each process has a namespace which provides a per-process view of the underlying host resources as a single tree of directories and files. Reference to the namespace is stored in the per-process data structure (see below). A process may issue a request request to update its namespace (to bind a file path somewhere in the namespace). The original Plan 9 implementation is described in http://doc.cat-v.org/plan_9/4th_edition/papers/lexnames, and the Namespaced IO Layer closely follows this description except that the concept of the "current" directory does not exist: all file paths are absolute from the filesystem root ("/") or the device table root ("#").

A namespace is implemented as a map with file path keys and union directory values. A union directory is a concatenation of directories.

Filename evaluation is a process of obtaining a physical (per device) path to the underlying host resource given a logical (visible to a process) path.

If for example there is a "console" device whose root is known as #c, and containing a file cons, then #c/cons is a physical path. If a command

 bind -b '#c' /dev

was once issued, then the logical path /dev/cons evaluates to the physical path #c/cons.

Each process must have at least one binding in its namespace, that is the root binding. In many cases this will be some fragment of the underlying host filesystem. Once such root binding is introduced, filename evaluation becomes possible.

Given a logical path, it is split into components (slash separated). The root component is replaced with whatever is bound to the namespace root. A prefix is formed from the rest components, starting from the topmost, and is tried to match a namespace entry. If no match found, next component is appended to the prefix, and the process repeats. If the whole path is tried, and no match in the namespace was found, the resulting physical path is the namespace root binding plus the logical path itself. That is, if the root was bound with

bind -c '#Z' /

then /foo/bar evaluates to #Z/foo/bar.

If however a match is found at some step (that is, the logical path is /dev/cons, and /dev matches the namespace entry containing a concatenation of several directories, including one exposed by the console device), then the matched prefix is replaced with one of the concatenated directories: the one containing an entry with the path component following the matched prefix (if no such directory is found, evaluation fails). This is cons in our case, and /dev will be replaced with #c/ rather than with #Z/. Thus, /dev/cons evaluates to #c/cons.

Per-process Data

There is a small data structure that persists through the process lifetime. This structure is directly accessible only at the "system" level, via the ReaderT functionality. The device drivers code does not have access to per-process data. Due to the use of ReaderT, this structure is immutable, except for the namespace map. See http://hs-ogl-misc.googlecode.com/hg/io-layer/System/IO9/NameSpace/Types.hs for the Haskell definition.

The most important fields of this per-process data structure are:

  • Process running privileges: Init, Admin, Host owner, and None. The former three are used with locally initiated processes (most of the processes run as Host owner). The None privilege is given to server processes running on behalf of external/remote users. When a process attaches a device, its running privileges are copied into the attachment descriptor. The logic of granting access is entirely on file servers, however the common rule is that processes with Init, Admin, and Host owner privileges have almost full access to the underlying host resources while None has no access at all. Thus, processes running as None need to perform certain authentication procedures to obtain proper attachment descriptors (this is future work to implement a Plan 9 or Inferno authentication scheme).
  • Device map. This is a one-level map with character keys and device table values. It is used to find proper device table for a file path starting with the '#' character.
  • Reference to the process namespace. This is a MVar whose contents is overwritten when the namespace is updated.

Other fields include host owner name string, path handles for process standard input and output, parent process (thread) identifier, but they are not essential for the Namespace layer itself.

New Process Creation

The Namespace Layer provides functionality to start new processes with possible control over child process namespace and running privileges. The basic rules are:

  • Running privileges of the child process can only be lower than or same of the parent process, considering that Init > Admin > Host owner > None.
  • Namespace may be shared between parent and child, or cloned, or built from scratch (the child process starts with an empty namespace). Namespace sharing is only allowed for processes with same privileges.

Application Layer

This layer implements streaming IO operations using the Iteratee concept.


DRAFT! DRAFT! DRAFT!

This document as well as the library it describes are both work in progress and subject to changes of any unpredictable kind ;) The text on this page may look at times bizarre, and nonsense; this will be eventually corrected ;)