Difference between revisions of "FreeArc/Standard API for compression libraries"

From HaskellWiki
Jump to navigation Jump to search
(INIT/DONE in host, NOT_IMPL in codec)
(Removed header file and added idea descrription)
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
Idea: [http://encode.ru/forum/showthread.php?t=182 at encode.ru forum]
+
Discussion: [http://encode.ru/forum/showthread.php?t=182 at encode.ru forum]
 
Download sources: [http://www.haskell.org/bz/cls.zip]
 
Download sources: [http://www.haskell.org/bz/cls.zip]
   
Header:
 
<pre-cpp>
 
// Operations
 
const int CLS_INIT = 1; // Called once on DLL load
 
const int CLS_DONE = 2; // Called once before DLL unload
 
const int CLS_COMPRESS = 3; // Requests to compress data using i/o via callbacks
 
const int CLS_DECOMPRESS = 4; // Requests to decompress data using i/o via callbacks
 
   
  +
the main reason of freearc success is its use of leading compression algorithms. but not every great algorithm is open-source that forces advanced users to rely on "external compressors" feature, that isn't super-handy
// Callbacks
 
const int CLS_READ = 1000; // Read data into buffer ptr:n. Use CLS_READ+i to read i'th stream. Retcode: <0 - error, 0 - EOF, >0 - amount of data read
 
const int CLS_WRITE = 2000; // Write data from buffer ptr:n. CLS_WRITE+i also works. Retcode: the same
 
const int CLS_MALLOC = 1; // Alloc n bytes and make *ptr point to this area
 
const int CLS_FREE = 2; // Free ptr previously allocated by CLS_MALLOC
 
const int CLS_PARAMETERS = 3; // Put ASCIIZ parameters string into buffer ptr:n
 
// INIT-stage callbacks
 
const int CLS_ID = 4; // Identify itself with unique ASCIIZ string at ptr (for example, "lzma.7zip.org")
 
   
  +
external programs has advantage of being absolutely independent of me. everyone can develop compressor that will be usable standalone and at the same time easily integrated with FA while adding new algorithms to FA needs co-operation with me. now i think that by providing the same level of independence for compressors developed as dll we can make things better
// Error codes
 
const int CLS_OK = 0; // ALL RIGHT
 
const int CLS_ERROR_GENERAL = -1; // Unclassified error
 
const int CLS_ERROR_NOT_IMPLEMENTED = -2; // Requested feature isn't supported
 
const int CLS_ERROR_NOT_ENOUGH_MEMORY = -3; // Memory allocation failed
 
const int CLS_ERROR_READ = -4;
 
const int CLS_ERROR_WRITE = -5;
 
const int CLS_ERROR_ONLY_DECOMPRESS = -6; // This DLL supports only decompression
 
const int CLS_ERROR_INVALID_COMPRESSOR = -7; // Invalid compression method parameters
 
const int CLS_ERROR_BAD_COMPRESSED_DATA = -8; // Data can't be decompressed
 
const int CLS_ERROR_NO_MORE_DATA_REQUIRED = -9; // Required part of data was already decompressed
 
const int CLS_ERROR_OUTBLOCK_TOO_SMALL = -10; // Output block size in (de)compressMem is not enough for all output data
 
   
  +
so that i propose: standard API for compression dlls. once you have dll developed according to this API, you can just drop it to the FreeArc folder (or any other program supporting this standard) and immediately use it for compression and decompression. moreover, it will be possible to download-on-demand dlls required to decompress your archive just like now it's done in meda players
// Type of callback passed to ClsMain
 
  +
typedef int CLS_CALLBACK(void* instance, int op, void *ptr, int n);
 
  +
  +
my proposal is based on experience of approving various algorithms for FA. it's highly flexible to allow further extensions w/o losing backward compatibility, at the same time i tried to simplify basic operations
  +
  +
1) library should be provided in dll with name cls-*.dll: it makes smpler to find all compatible libs in the large directory
  +
  +
2) the only function that should be exported is
  +
  +
int ClsMain(CALLBACK* cb, void* instance)
  +
  +
where
  +
 
typedef int CALLBACK(char *what, void* instance, void *ptr, int n)
  +
  +
3) whole interaction with caller implemented via callbacks. string `what` describes operation what we ask to perform, instane allows to pass instantiation-specific parameters (important for multithreadung environments), while ptr and n are used to pass operation parameters. Operations requiring more params can use ptr as pointer to structure
  +
  +
4) the minimum set of operations, that should be supported, consists of:
  +
  +
cb("action", instance, buf, len) - puts "compress" or "decompress" in buf. required to determine what operation ClsMain should perform
  +
  +
cb("read", instance, buf, len) - allows to read input data into buf. returns
  +
>0 - amount of data read
  +
=0 - EOF
  +
<0 - errorcode
  +
  +
cb("write", instance, buf, len) - the same for writing data
  +
  +
compression methods supporting multiple output streams (such as bcj2) may add stream number to read or write:
  +
cb("write0", instance, buf, len)
  +
cb("write1", instance, buf, len)
  +
...
  +
  +
the following action may be used to determine compression parameters:
  +
cb("parameters", instance, buf, len) - puts string representing compression parameters into buf
  +
  +
  +
that's all for beginning. one interesting idea may be implemenatation of code that turns such ClsMain into standalone compressor. i.e. some standard shell with all those file/error/crc/cmdline mangling so that developer can focus on writing just compression code itself. this code may interact either with dlls or statically link with ClsMain-style library
   
/* to do:
 
- check versions and backward/forward compatibility
 
- allow multiple codecs in same dll. this may be solved by ClsMain2, ClsMain3... exported but this may be not enough for some more complex scenarios
 
- codecs need to know how much memory for compression / decompression are they supposed to use.
 
- some codecs might handle multithreading in a smarter way than splitting streams.
 
- Also there're threading issues (like, application allowing to use up to N
 
threads) and whether dll is thread-safe or not (if not, it can be secured
 
by loading multiple instances of dll - might be a useful feature as many
 
experimental compressors are not really incapsulated).
 
- Some interface methods are required for initialization and model flush
 
(which are not the same as there might be some precalculation required
 
only once).
 
- what about detectors and filters
 
-
 
*/
 
</pre-cpp>
 
   
   
Line 75: Line 70:
 
}
 
}
   
  +
default:
</pre-cpp>
 
  +
return CLS_ERROR_NOT_IMPLEMENTED;
  +
}
  +
}
 
</pre-cpp>
   
   
Line 111: Line 110:
 
ClsMain(CLS_INIT, cb, NULL);
 
ClsMain(CLS_INIT, cb, NULL);
 
int ret = ClsMain(CLS_COMPRESS, cb, NULL);
 
int ret = ClsMain(CLS_COMPRESS, cb, NULL);
  +
ClsMain(CLS_DONE, cb, NULL);
</pre-cpp>
 
  +
  +
return ret;
  +
}
 
</pre-cpp>

Latest revision as of 00:32, 28 October 2008

Discussion: at encode.ru forum
Download sources: [1]


the main reason of freearc success is its use of leading compression algorithms. but not every great algorithm is open-source that forces advanced users to rely on "external compressors" feature, that isn't super-handy

external programs has advantage of being absolutely independent of me. everyone can develop compressor that will be usable standalone and at the same time easily integrated with FA while adding new algorithms to FA needs co-operation with me. now i think that by providing the same level of independence for compressors developed as dll we can make things better

so that i propose: standard API for compression dlls. once you have dll developed according to this API, you can just drop it to the FreeArc folder (or any other program supporting this standard) and immediately use it for compression and decompression. moreover, it will be possible to download-on-demand dlls required to decompress your archive just like now it's done in meda players


my proposal is based on experience of approving various algorithms for FA. it's highly flexible to allow further extensions w/o losing backward compatibility, at the same time i tried to simplify basic operations

1) library should be provided in dll with name cls-*.dll: it makes smpler to find all compatible libs in the large directory

2) the only function that should be exported is

int ClsMain(CALLBACK* cb, void* instance)

where

typedef int CALLBACK(char *what, void* instance, void *ptr, int n)

3) whole interaction with caller implemented via callbacks. string `what` describes operation what we ask to perform, instane allows to pass instantiation-specific parameters (important for multithreadung environments), while ptr and n are used to pass operation parameters. Operations requiring more params can use ptr as pointer to structure

4) the minimum set of operations, that should be supported, consists of:

cb("action", instance, buf, len) - puts "compress" or "decompress" in buf. required to determine what operation ClsMain should perform

cb("read", instance, buf, len) - allows to read input data into buf. returns >0 - amount of data read =0 - EOF <0 - errorcode

cb("write", instance, buf, len) - the same for writing data

compression methods supporting multiple output streams (such as bcj2) may add stream number to read or write: cb("write0", instance, buf, len) cb("write1", instance, buf, len) ...

the following action may be used to determine compression parameters: cb("parameters", instance, buf, len) - puts string representing compression parameters into buf


that's all for beginning. one interesting idea may be implemenatation of code that turns such ClsMain into standalone compressor. i.e. some standard shell with all those file/error/crc/cmdline mangling so that developer can focus on writing just compression code itself. this code may interact either with dlls or statically link with ClsMain-style library


Simplest codec: <pre-cpp>

  1. include "cls.h"

int ClsMain (int op, CLS_CALLBACK cb, void* instance) {

   switch(op)
   {
   case CLS_COMPRESS:
   case CLS_DECOMPRESS:
       {   
           const int BUFSIZE = 4096;
           char buf[BUFSIZE];
           for (int len; (len=cb(instance, CLS_READ, buf, BUFSIZE)) != 0; )
           {
               if (len<0)  return len;  // Return errcode on error
               int ret = cb(instance, CLS_WRITE, buf, len);
               if (ret!=len)  return ret<0? ret : CLS_ERROR_WRITE;
           }
           return CLS_OK;
       }
   default:
       return CLS_ERROR_NOT_IMPLEMENTED;
   }

} </pre-cpp>


Minimal host: <pre-cpp>

  1. include <stdlib.h>
  2. include <io.h>
  1. include "cls.h"

int cb(void* instance, int op, void *ptr, int n) {

   switch(op)
   {
   case CLS_READ:
       return read(0,ptr,n);
   case CLS_WRITE:
       return write(1,ptr,n);
   case CLS_MALLOC:
       *(void**)ptr = malloc(n);
       return *(void**)ptr? CLS_OK : CLS_ERROR_NOT_ENOUGH_MEMORY;
   case CLS_FREE:
       free(ptr);
       return CLS_OK;
   default:
       return CLS_ERROR_NOT_IMPLEMENTED;
   }

}

int main () {

   extern int ClsMain (int op, CLS_CALLBACK cb, void* instance);
   ClsMain(CLS_INIT, cb, NULL);
   int ret = ClsMain(CLS_COMPRESS, cb, NULL);
   ClsMain(CLS_DONE, cb, NULL);
   return ret;

} </pre-cpp>