Resource limits

From HaskellWiki
Revision as of 19:41, 28 August 2007 by Gwern (talk | contribs) (repoint link)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

When you are writing shell scripts, it often makes sense to try to put resource limits on your programs so they don't wind up being fork bombs or any of the other nasty things shells allow for; or if you are, say, writing a program that will execute untrusted input. In general, any time you can determine rough upper limits on how much of a resource your program will need, it might be worthwhile to set some resource limits; ie. if you have a program that will search a specified file for a given string (like grep), it doesn't need to be able to fork off a 1000 processes or create a hundred gigabyte-sized files - it would never need to do that if it were working correctly, so don't let it (see Principle of least privilege[1]).

TODO: This can all be done on Unix systems by way of System.Posix.Resource, which is a wrapper around some C stuff; so the Haddock documentation is very minimal. Here's what I've found by trial and error.

Basic usage

The primary function you'll use is setResourceLimit, which does as it says. It takes a kind of 'Resource', which can be any one of

  • ResourceCoreFileSize
  • ResourceCPUTime
  • ResourceDataSize
  • ResourceFileSize
  • ResourceOpenFiles
  • ResourceStackSize
  • ResourceTotalMemory

Depending on which kind of Resource you want setResourceLimit to modify, it will call out to libc and set various rlimit variables. So, thus far we have the following code in our hypothetical example:

import System.Posix.Resource
main = do setResourceLimit ResourceCPUTime ..

Now, ResourceCPUTime itself takes an argument - a 'ResourceLimits'. A 'ResourceLimits' contains two numbers; the first number is the 'soft' limit, a limit the process is currently allowed to use up to before bad things start happening. But should that not be enough, the process is allowed to request a higher soft limit (and thus is given access to more resources); this cycle of using up all of the resource and requesting more will terminate when the process either stops making requests, or it eventually requests an amount equal to or greater than the second number, the hard limit. The process will probably crash if it really needs more resources than the second number designates; that's why it is called a 'hard' limit. See:

"Processes have soft and hard resource limits. On crossing the soft limit they receive a signal (for example the C<SIGXCPU> or C<SIGXFSZ>,

corresponding to the C<RLIMIT_CPU> and C<RLIMIT_FSIZE>, respectively). The processes can trap and handle some of these signals, please see

L<perlipc/Signals>. After the hard limit the processes will be ruthlessly killed by the C<KILL> signal which cannot be caught."

But we don't necessarily want to specify an integer for each Resource - perhaps we want to allow unlimited access to a given Resource or an indefinite amount. We could write

main = do setResourceLimit ResourceCPUTime (ResourceLimits 1000 10000)

but that will not type check, for the reason that 1000 and 10000 are not ResourceLimits but Integers or Nums or whatever. We must manually convert them:

main = do setResourceLimit ResourceCPUTime (ResourceLimits (ResourceLimit 1000) (ResourceLimit 10000))

Note that this restricts our program to ever using up 10000 seconds of CPU time. To give it as much as may be needed, we could write instead

main = do setResourceLimit ResourceCPUTime (ResourceLimits (ResourceLimitUnknown) (ResourceLimitInfinity))

That's basically it. getResourceLimit does much the same thing, but it isn't as useful - you could perhaps use it to exit a program early if the resource usage could be predicted beforehand and be known to exceed current limits.

Yes, but what do the arguments mean?

This section is extracted from the following URLs, and should help explain just what each Resource is and does, Libc: Limits on Resources and System-Posix-Resource.

packResource ResourceCoreFileSize  = (#const RLIMIT_CORE)

"The maximum size core file that this process can create. If the process terminates and would dump a core file larger than this maximum size, then no core file is created. So setting this limit to zero prevents core files from ever being created."

This is apparently in bytes. Core files can be extremely large; if you don't know why you would want a core dump, you should probably set both the soft and hard limits to 0.

packResource ResourceCPUTime       = (#const RLIMIT_CPU)<

"The maximum amount of CPU time the process can use. If it runs for longer than this, it gets a signal: SIGXCPU. The value is measured in seconds."

packResource ResourceDataSize      = (#const RLIMIT_DATA)<

"The maximum size of data memory for the process. If the process tries to allocate data memory beyond this amount, the allocation function fails."

In bytes.

packResource ResourceFileSize      = (#const RLIMIT_FSIZE)<

"The maximum size of file the process can create. Trying to write a larger file causes a signal: SIGXFSZ."

In bytes.

packResource ResourceOpenFiles     = (#const RLIMIT_NOFILE)<

"The maximum number of files that the process can open. If it tries to open more files than this, it gets error code EMFILE. Not all systems support this limit; GNU does, and 4.4 BSD does."

Integer number of files.

packResource ResourceStackSize     = (#const RLIMIT_STACK)<

The maximum stack size for the process. If the process tries to extend its stack past this size, it gets a SIGSEGV signal."

In bytes.

packResource ResourceTotalMemory   = (#const RLIMIT_AS)<

In bytes. This may be more limited in size than you would expect ("With all the discussion about the tradeoff between physical memory size and maximum virtual memory size awhile back, I was surprised to find that the RLIMIT_AS (address space) limit is about 2GB, and it doesn't seem possible for a process to get more VM than that." [2])