NAME
vnode,
vref,
vrele,
vrele_async,
vput,
vhold,
holdrele,
vcache_get,
vcache_new,
vcache_rekey_enter,
vcache_rekey_exit,
vrecycle,
vgone,
vgonel,
vdead_check,
vflush,
vaccess,
bdevvp,
cdevvp,
vfinddev,
vdevgone,
vwakeup,
vflushbuf,
vinvalbuf,
vtruncbuf,
vprint —
kernel representation of a
file or directory
SYNOPSIS
#include <sys/param.h>
#include <sys/vnode.h>
void
vref(
struct vnode
*vp);
void
vrele(
struct
vnode *vp);
void
vrele_async(
struct
vnode *vp);
void
vput(
struct vnode
*vp);
void
vhold(
struct
vnode *vp);
void
holdrele(
struct
vnode *vp);
int
vcache_get(
struct
mount *mp,
const void
*key,
size_t key_len,
struct vnode **vpp);
int
vcache_new(
struct
mount *mp,
struct vnode
*dvp,
struct vattr
*vap,
kauth_cred_t
cred,
struct vnode
**vpp);
int
vcache_rekey_enter(
struct
mount *mp,
struct vnode
*vp,
const void
*old_key,
size_t
old_key_len,
const void
*new_key,
size_t
new_key_len);
void
vcache_rekey_exit(
struct
mount *mp,
struct vnode
*vp,
const void
*old_key,
size_t
old_key_len,
const void
*new_key,
size_t
new_key_len);
int
vrecycle(
struct
vnode *vp);
void
vgone(
struct
vnode *vp);
void
vgonel(
struct
vnode *vp,
struct lwp
*l);
int
vdead_check(
struct
vnode *vp,
int
flags);
int
vflush(
struct
mount *mp,
struct vnode
*skipvp,
int flags);
int
vaccess(
enum
vtype type,
mode_t
file_mode,
uid_t uid,
gid_t gid,
mode_t acc_mode,
kauth_cred_t cred);
int
bdevvp(
dev_t
dev,
struct vnode
**vpp);
int
cdevvp(
dev_t
dev,
struct vnode
**vpp);
int
vfinddev(
dev_t
dev,
enum vtype,
struct vnode **vpp);
void
vdevgone(
int
maj,
int minl,
int minh,
enum vtype type);
void
vwakeup(
struct
buf *bp);
int
vflushbuf(
struct
vnode *vp,
int sync);
int
vinvalbuf(
struct
vnode *vp,
int flags,
kauth_cred_t cred,
struct lwp *l,
int slpflag,
int slptimeo);
int
vtruncbuf(
struct
vnode *vp,
daddr_t
lbn,
int slpflag,
int slptimeo);
void
vprint(
const char
*label,
struct vnode
*vp);
DESCRIPTION
A
vnode represents an on-disk file in use by the system. Each
vfs(9) file system provides a set
of
vnodeops(9) operations on
vnodes, invoked by file-system-independent system calls and supported by
file-system-independent library routines.
Each mounted file system provides a vnode for the root of the file system, via
VFS_ROOT(9). Other vnodes are
obtained by
VOP_LOOKUP(9).
Users of vnodes usually invoke these indirectly via
namei(9) to obtain vnodes from
paths.
Each file system usually maintains a cache mapping recently used inode numbers,
or the equivalent, to vnodes, and a cache mapping recently used file names to
vnodes. If memory is scarce, the system may decide to
reclaim an unused cached vnode, calling
VOP_RECLAIM(9) to remove it
from the caches and to free file-system-specific memory associated with it. A
file system may also choose to immediately reclaim a cached vnode once it is
unused, in
VOP_INACTIVE(9), if the
vnode has been deleted on disk.
When a file system retrieves a vnode from a cache, the vnode may not have any
users, and another thread in the system may be simultaneously deciding to
reclaim it. Thus, to retrieve a vnode from a cache, one must use
vcache_get(), not
vref(), to acquire the
first reference.
The vnode has the following structure:
struct vnode {
struct uvm_object v_uobj; /* the VM object */
kcondvar_t v_cv; /* synchronization */
voff_t v_size; /* size of file */
voff_t v_writesize; /* new size after write */
int v_iflag; /* VI_* flags */
int v_vflag; /* VV_* flags */
int v_uflag; /* VU_* flags */
int v_numoutput; /* # of pending writes */
int v_writecount; /* ref count of writers */
int v_holdcnt; /* page & buffer refs */
struct mount *v_mount; /* ptr to vfs we are in */
int (**v_op)(void *); /* vnode operations vector */
struct buflists v_cleanblkhd; /* clean blocklist head */
struct buflists v_dirtyblkhd; /* dirty blocklist head */
union {
struct mount *vu_mountedhere;/* ptr to vfs (VDIR) */
struct socket *vu_socket; /* unix ipc (VSOCK) */
struct specnode *vu_specnode; /* device (VCHR, VBLK) */
struct fifoinfo *vu_fifoinfo; /* fifo (VFIFO) */
struct uvm_ractx *vu_ractx; /* read-ahead ctx (VREG) */
} v_un;
enum vtype v_type; /* vnode type */
enum vtagtype v_tag; /* type of underlying data */
void *v_data; /* private data for fs */
struct klist v_klist; /* notes attached to vnode */
};
Most members of the vnode structure should be treated as opaque and only
manipulated using the proper functions. There are some rather common
exceptions detailed throughout this page.
Files and file systems are inextricably linked with the virtual memory system
and
v_uobj contains the data maintained by the virtual
memory system. For compatibility with code written before the integration of
uvm(9) into
NetBSD, C-preprocessor directives are used to alias
the members of
v_uobj.
Vnode flags are recorded by
v_iflag,
v_vflag
and
v_uflag. Valid flags are:
VV_ROOT
- This vnode is the root of its file system.
VV_SYSTEM
- This vnode is being used by the kernel; only used to skip
quota files in vflush().
VV_ISTTY
- This vnode represents a tty; used when reading dead
vnodes.
VV_MAPPED
- This vnode might have user mappings.
VV_MPSAFE
- This file system is MP safe.
VV_LOCKSWORK
- This vnode's file system supports locking.
VI_TEXT
- This vnode is a pure text prototype.
VI_EXECMAP
- This vnode has executable mappings.
VI_WRMAP
- This vnode might have PROT_WRITE user mappings.
VI_WRMAPDIRTY
- This vnode might have dirty pages due to
VWRITEMAP
.
VI_XLOCK
- This vnode is currently locked to change underlying
type.
VI_ONWORKLST
- This vnode is on syncer work-list.
VI_MARKER
- A dummy marker vnode.
VI_CLEAN
- This vnode has been reclaimed and is no longer attached to
a file system.
VU_DIROP
- This vnode is involved in a directory operation. This flag
is used exclusively by LFS.
The
VI_XLOCK
flag is used to prevent multiple processes
from entering the vnode reclamation code. It is also used as a flag to
indicate that reclamation is in progress. Before
v_iflag can
be modified, the
v_interlock mutex must be acquired. See
lock(9) for details on the kernel
locking API.
Each vnode has three reference counts:
v_usecount,
v_writecount and
v_holdcnt. The first is
the number of active references within the kernel to the vnode. This count is
maintained by
vref(),
vrele(),
vrele_async(), and
vput(). The second is
the number of active references within the kernel to the vnode performing
write access to the file. It is maintained by the
open(2) and
close(2) system calls. The third
is the number of references within the kernel requiring the vnode to remain
active and not be recycled. This count is maintained by
vhold() and
holdrele(). When both the
v_usecount and
v_holdcnt reach zero, the
vnode is cached. The transition from the cache is handled by a kernel thread
and
vrecycle(). Access to
v_usecount,
v_writecount and
v_holdcnt is also
protected by the
v_interlock mutex.
The number of pending synchronous and asynchronous writes on the vnode are
recorded in
v_numoutput. It is used by
fsync(2) to wait for all writes
to complete before returning to the user. Its value must only be modified at
splbio (see
spl(9)). It does not
track the number of dirty buffers attached to the vnode.
The link to the file system which owns the vnode is recorded by
v_mount. See
vfsops(9) for further
information of file system mount status.
The
v_op pointer points to its vnode operations vector. This
vector describes what operations can be done to the file associated with the
vnode. The system maintains one vnode operations vector for each file system
type configured into the kernel. The vnode operations vector contains a
pointer to a function for each operation supported by the file system. See
vnodeops(9) for a description
of vnode operations.
When a user wants a new vnode for another file or wants a valid vnode which is
cached,
vcache_get() or
vcache_new() is
invoked to allocate a vnode and initialize it for the new file.
The type of object the vnode represents is recorded by
v_type.
It is used by generic code to perform checks to ensure operations are
performed on valid file system objects. Valid types are:
VNON
- The vnode has no type.
VREG
- The vnode represents a regular file.
VDIR
- The vnode represents a directory.
VBLK
- The vnode represents a block special device.
VCHR
- The vnode represents a character special device.
VLNK
- The vnode represents a symbolic link.
VSOCK
- The vnode represents a socket.
VFIFO
- The vnode represents a pipe.
VBAD
- The vnode represents a bad file (not currently used).
Vnode tag types are used by external programs only (e.g.,
pstat(8)), and should never be
inspected by the kernel. Its use is deprecated since new
v_tag values cannot be defined for loadable file systems.
The
v_tag member is read-only. Valid tag types are:
VT_NON
- non file system
VT_UFS
- universal file system
VT_NFS
- network file system
VT_MFS
- memory file system
VT_MSDOSFS
- FAT file system
VT_LFS
- log-structured file system
VT_LOFS
- loopback file system
VT_FDESC
- file descriptor file system
VT_NULL
- null file system layer
VT_UMAP
- uid/gid remapping file system layer
VT_KERNFS
- kernel interface file system
VT_PROCFS
- process interface file system
VT_AFS
- AFS file system
VT_ISOFS
- ISO 9660 file system(s)
VT_UNION
- union file system
VT_ADOSFS
- Amiga file system
VT_EXT2FS
- Linux's ext2 file system
VT_CODA
- Coda file system
VT_FILECORE
- filecore file system
VT_NTFS
- Microsoft NT's file system
VT_VFS
- virtual file system
VT_OVERLAY
- overlay file system
VT_SMBFS
- SMB file system
VT_PTYFS
- pseudo-terminal device file system
VT_TMPFS
- efficient memory file system
VT_UDF
- universal disk format file system
VT_SYSVBFS
- systemV boot file system
The vnode lock is acquired by calling
vn_lock(9) and released by
calling
VOP_UNLOCK(9). The
reason for this asymmetry is that
vn_lock(9) is a wrapper for
VOP_LOCK(9) with extra checks,
while the unlocking step usually does not need additional checks and thus has
no wrapper.
The vnode locking operation is complicated because it is used for many purposes.
Sometimes it is used to bundle a series of vnode operations (see
vnodeops(9)) into an atomic
group. Many file systems rely on it to prevent race conditions in updating
file system type specific data structures rather than using their own private
locks. The vnode lock can operate as a multiple-reader (shared-access lock) or
single-writer lock (exclusive access lock), however many current file system
implementations were written assuming only single-writer locking.
Multiple-reader locking functions equivalently only in the presence of
big-lock SMP locking or a uni-processor machine. The lock may be held while
sleeping. While the vnode lock is acquired, the holder is guaranteed that the
vnode will not be reclaimed or invalidated. Most file system functions require
that you hold the vnode lock on entry. See
lock(9) for details on the kernel
locking API.
Each file system underlying a vnode allocates its own private area and hangs it
from
v_data.
Most functions discussed in this page that operate on vnodes cannot be called
from interrupt context. The members
v_numoutput,
v_holdcnt,
v_dirtyblkhd, and
v_cleanblkhd are modified in interrupt context and must be
protected by
splbio(9) unless it
is certain that there is no chance an interrupt handler will modify them. The
vnode lock must not be acquired within interrupt context.
FUNCTIONS
-
-
- vref(vp)
- Increment v_usecount of the vnode
vp. Any kernel thread system which uses a vnode (e.g.,
during the operation of some algorithm or to store in a data structure)
should call vref().
-
-
- vrele(vp)
- Decrement v_usecount of unlocked vnode
vp. Any code in the system which is using a vnode should
call vrele() when it is finished with the vnode. If
v_usecount of the vnode reaches zero and
v_holdcnt is greater than zero, the vnode is placed on
the holdlist. If both v_usecount and
v_holdcnt are zero, the vnode is cached.
-
-
- vrele_async(vp)
- Will asynchronously release the vnode in different context
than the caller, sometime after the call.
-
-
- vput(vp)
- Legacy convenience routine for unlocking and releasing
vp. Equivalent to:
VOP_UNLOCK(vp);
vrele(vp);
New code should prefer using
VOP_UNLOCK(9) and
vrele() directly.
-
-
- vhold(vp)
- Mark the vnode vp as active by
incrementing vp->v_holdcnt. Once held, the vnode will
not be recycled until it is released with
holdrele().
-
-
- holdrele(vp)
- Mark the vnode vp as inactive by
decrementing vp->v_holdcnt.
-
-
- vcache_get(mp,
key, key_len,
vpp)
- Allocate a new vnode. The new vnode is returned referenced
in the address specified by vpp.
The argument mp is the mount point for the file system
to lookup the file in.
The arguments key and key_len
uniquely identify the file in the file system.
If a vnode is successfully retrieved zero is returned, otherwise an
appropriate error code is returned.
-
-
- vcache_new(mp,
dvp, vap,
cred, vpp)
- Allocate a new vnode with a new file. The new vnode is
returned referenced in the address specified by vpp.
The argument mp is the mount point for the file system
to create the file in.
The argument dvp points to the directory to create the
file in.
The argument vap points to the attributes for the file
to create.
The argument cred holds the credentials for the file
to create.
If a vnode is successfully created zero is returned, otherwise an
appropriate error code is returned.
-
-
- vcache_rekey_enter(mp,
vp, old_key,
old_key_len, new_key,
new_key_len)
- Prepare to change the key of a cached vnode.
The argument mp is the mount point for the file system
the vnode vp resides in.
The arguments old_key and
old_key_len identify the cached vnode.
The arguments new_key and
new_key_len will identify the vnode after rename.
If the new key already exists
EEXIST
is returned,
otherwise zero is returned.
-
-
- vcache_rekey_exit(mp,
vp, old_key,
old_key_len, new_key,
new_key_len)
- Finish rename after calling
vcache_rekey_enter().
-
-
- vrecycle(vp)
- Recycle the referenced vnode vp if
this is the last reference. vrecycle() is a null
operation if the reference count is greater than one.
-
-
- vgone(vp)
- Eliminate all activity associated with the unlocked vnode
vp in preparation for recycling. This operation is
restricted to suspended file systems. See
vfs_suspend(9).
-
-
- vgonel(vp,
p)
- Eliminate all activity associated with the locked vnode
vp in preparation for recycling.
-
-
- vdead_check(vp,
flags)
- Check the vnode vp for being or
becoming dead. Returns
ENOENT
for a dead vnode and
zero otherwise. If flags is
VDEAD_NOWAIT
it will return
EBUSY
if the vnode is becoming dead and the
function will not sleep.
Whenever this function returns a non-zero value all future calls for this
vp will also return a non-zero value.
-
-
- vflush(mp,
skipvp, flags)
- Remove any vnodes in the vnode table belonging to mount
point mp. If skipvp is not
NULL
it is exempt from being flushed. The argument
flags is a set of flags modifying the operation of
vflush(). If FORCECLOSE
is not
specified, there should not be any active vnodes and the error
EBUSY
is returned if any are found (this is a user
error, not a system error). If FORCECLOSE
is
specified, active vnodes that are found are detached. If
WRITECLOSE
is set, only flush out regular file
vnodes open for writing. SKIPSYSTEM causes any vnodes marked
V_SYSTEM
to be skipped.
-
-
- vaccess(type,
file_mode, uid,
gid, acc_mode,
cred)
- Do access checking by comparing the file's permissions to
the caller's desired access type acc_mode and
credentials cred.
-
-
- bdevvp(dev,
vpp)
- Create a vnode for a block device.
bdevvp() is used for root file systems, swap areas and
for memory file system special devices.
-
-
- cdevvp(dev,
vpp)
- Create a vnode for a character device.
cdevvp() is used for the console and kernfs special
devices.
-
-
- vfinddev(dev,
vtype, vpp)
- Lookup a vnode by device number. The vnode is referenced
and returned in the address specified by vpp.
-
-
- vdevgone(int
maj, int min, int minh,
enum vtype type)
- Reclaim all vnodes that correspond to the specified minor
number range minl to minh
(endpoints inclusive) of the specified major
maj.
-
-
- vwakeup(bp)
- Update outstanding I/O count
vp->v_numoutput for the vnode
bp->b_vp and do a wakeup if requested and
vp->vflag has
VBWAIT
set.
-
-
- vflushbuf(vp,
sync)
- Flush all dirty buffers to disk for the file with the
locked vnode vp. The argument
sync specifies whether the I/O should be synchronous
and vflushbuf() will sleep until
vp->v_numoutput is zero and
vp->v_dirtyblkhd is empty.
-
-
- vinvalbuf(vp,
flags, cred,
l, slpflag,
slptimeo)
- Flush out and invalidate all buffers associated with locked
vnode vp. The argument l and
cred specified the calling process and its
credentials. The ltsleep(9)
flag and timeout are specified by the arguments
slpflag and slptimeo
respectively. If the operation is successful zero is returned, otherwise
an appropriate error code is returned.
-
-
- vtruncbuf(vp,
lbn, slpflag,
slptimeo)
- Destroy any in-core buffers past the file truncation length
for the locked vnode vp. The truncation length is
specified by lbn. vtruncbuf() will
sleep while the I/O is performed, The
ltsleep(9) flag and timeout
are specified by the arguments slpflag and
slptimeo respectively. If the operation is
successful zero is returned, otherwise an appropriate error code is
returned.
-
-
- vprint(label,
vp)
- This function is used by the kernel to dump vnode
information during a panic. It is only used if the kernel option
DIAGNOSTIC is compiled into the kernel. The argument
label is a string to prefix the information dump of
vnode vp.
CODE REFERENCES
The vnode framework is implemented within the file
sys/kern/vfs_subr.c.
SEE ALSO
intro(9),
lock(9),
namecache(9),
namei(9),
uvm(9),
vattr(9),
vfs(9),
vfsops(9),
vnodeops(9),
vnsubr(9)
BUGS
The locking protocol is inconsistent. Many vnode operations are passed locked
vnodes on entry but release the lock before they exit. The locking protocol is
used in some places to attempt to make a series of operations atomic (e.g.,
access check then operation). This does not work for non-local file systems
that do not support locking (e.g., NFS). The
vnode interface
would benefit from a simpler locking protocol.