FreeArc/Universal Archive Format

From HaskellWiki
< FreeArc
Revision as of 14:21, 8 July 2008 by Bulatz (talk | contribs)
Jump to: navigation, search

It's description of FreeArc archive format and ideas how it can be further improved.

Archive block structure

Archive consists of blocks which divide into DATA BLOCKS (one datablock contains compressed data of one solid block) and CONTROL BLOCKS which stores archive meta-info (directories, comments, compression methods, recovery records...). Every block may be described by following info:

  • block type (0 - data block, 1.. - various control blocks)
  • its position in archive (number of first byte)
  • original size
  • compressed size
  • compression algorithm used to compress this block (usually all blocks are compressed, data blocks compression controlled by -m option, control blocks compression by -dm option)
  • CRC32 of original data - used to check block consistency

Usually archive consists of blocks in the following order:

  • HEADER BLOCK: contains archive signature ("ArC\1") and archiver version. The only block whose contents are never compressed. So first 4 bytes of FreeArc archives are always "ArC\1"
  • One or more DATA BLOCKS, containing compressed data for one or more solid blocks
  • DIRECTORY BLOCK containing info about files compressed in previous data blocks (filename, size, datetime, attributes plus its CRC32 for consistency checking) plus descriptors of data blocks plus info about which files are stored in which data blocks
  • Then one or more DATA BLOCKS followed by corresponding DIRECTORY BLOCK may go again, and again. Unlike other archivers, FreeArc arcghive directory may be split into many parts each containing info only about part of archive - this simplifies archive recovery. Directory splitting is controlled by -s option
  • FOOTER BLOCK: contains info about all other control blocks plus archive-wide information - archive comment, amount of recovery info and so on
  • optionally, TWO RECOVERY RECORD BLOCKS, followed by second copy of FOOTER BLOCK - these are added when -rr option is enabled

So, archive decompression goes in the following way: 1) Read last 4096 bytes of archive and find last occurence of archive signature in these data 2) Read block descriptor after signature found and ensure that it's a footer block 3) Decompress and parse footer block and get info about directory blocks 4) Decompress and parse directory blocks and get info about files contained in archive and data blocks containing compressed data 5) Decompressed data blocks writing decompressed data into files