Contents

Introduction

Answering the question "what is the amount of memory used by a process at this moment" is impossible (in general on any modern UNIX system).

There are different values that people use to approximate the answer to the question and this document tries to do its best to explain how to arrive at the best approximation.

The VIRT-field

VIRT is displayed by top and ps and is the worst possible approximation to the correct answer (out of VIRT/RES/SHRD).

VIRT, of course, means "the number of kilobytes that a process has set up in its virtual address space for SOME use". Since we're talking about UNIX-based systems (still), the "SOME" becomes quite significant.

Modern glibc-systems (Linux) also use VIRT whenever the process will do a malloc() and glibc needs to add new space to the arena (from which the memory will eventually be returned to the caller). The arena is extended via anonymous mmap() nowadays which all will contribute to extending VIRT, even if the process never actually touches those memory areas. Just causing the arena to expand will add to VIRT.

Another thing that people seem to miss quite often is that executables and dynamic libraries in Linux (or any demand-paging system) are not normally loaded to memory first and then executed. The dynamic linker/binary parser will do a mmap() on the executable and dynamic library files each of which will contribute to VIRT.

This is quite different from what people first expect the system do, which is that the system somehow loads the executable and libraries into memory first, and then starts executing it. This would be inefficient since not all code would be executed every time and certainly not all functions would be used from all the libraries.

Only when the CPU will actually execute code from such a mapped region, will the kernel trap the paging fault (which is minor if the page is in cache, or major if the kernel needs to read from the storage device backing the mapped region). After the kernel has fulfilled the paging fault, the page will be RESIDENT in the SYSTEM. Note that it isn't resident in the PROCESS directly since there might be other processes that are using the exactly same code/data from the files later on (after the fault).

Processes can also share memory explicitly. This is used in cases when two (or more) processes want to exchange data with each other(s) using the fastest possible mechanism available locally. There are two interfaces for this in Linux (SYS V IPC SHM and POSIX SHM in more modern Linux systems). Both are based on mmap() on modern Linux systems which then means that the shared memory regions also increase the VIRT-field (for all processes that have mapped the same memory region).

One classical example of misreading VIRT-field is with most X-servers. They will use mmap() to map part of their address space into /dev/mem in order to access the display device framebuffer memory. The memory obviously exists separately on the graphics card/chip (for most cases), so it shouldn't be counted towards the memory usage of X server. Still some people do and complain that X server is taking "a lot of memory".

You can see examples of all of these cases by catting /proc/PID/maps. Try it on your X server as well.

From this we arrive to the definite explanations for the normal fields that you might see with top and ps:

VIRT/size of Virtual space:
Total amount of ADDRESS SPACE for a process that has been reserved for SOMETHING. This also includes memory usage (as per the original question), but is normally significantly higher due to the above usages.

RES/RSS/Resident set size:
Amount of data out of VIRT which is resident in the SYSTEM (ie, in physical RAM, not on disk). Note that using RES directly is incorrect as well.

SHR/SHRD/Shared:
Amount of data out of RES which is shared with other processes. This doesn't answer the original question obviously, but you'll see shortly why this is important. On very recent Linux kernels, some of this information is exposed via /proc/PID/smaps.

So, which field would you use to answer the question "how much memory is a process using"? Hopefully the answer at this point is "I can't answer that question directly".

Usable approximation with URES

I present a concept called "URES/Unique Resident set size". Nothing fancy really. It is formed by taking RES and subtracting SHR from it. This will leave us with a number which will tell how much memory is present in physical RAM that is NOT SHARED with other processes.

As far as I can see, this is the closest approximation to the answer. Don't trick yourself into thinking it is the answer though.

Two things complicate using URES (which is still the best value IMHO):


Written by Aleksandr Koltsoff (czr(at)iki(dot)fi) in order to explain URES so that people can use Meminfo more efficiently. This document is under normal Copyright, but with the provision that it may be used for personal use (ie, not commercial).