There are two types of metrics in Alluxio, cluster-wide aggregated metrics, and per-process detailed metrics.
Master.[metricName].[tag1].[tag2]...
[processType].[metricName].[tag1].[tag2]...[hostName]
There is generally an Alluxio metric for every RPC invocation, to Alluxio or to the under store.
Tags are additional pieces of metadata for the metric such as user name or under storage location.
Tags can be used to further filter or aggregate on various characteristics.
Workers and clients send metrics data to the Alluxio master through heartbeats.
The interval is defined by property alluxio.master.worker.heartbeat.interval
and alluxio.user.metrics.heartbeat.interval
respectively.
Bytes metrics are aggregated value from workers or clients. Bytes throughput metrics are calculated on the leading master.
The values of bytes throughput metrics equal to bytes metrics counter value divided by the metrics record time and shown as bytes per minute.
Name | Type | Description |
Cluster.ActiveRpcReadCount |
COUNTER |
The number of active read-RPCs managed by workers |
Cluster.ActiveRpcWriteCount |
COUNTER |
The number of active write-RPCs managed by workers |
Cluster.BytesReadDirect |
COUNTER |
Total number of bytes read from Alluxio storage managed by workers and underlying UFS if data cannot be found in the Alluxio storage without external RPC involved. This records data read by worker internal calls (e.g. clients embedded in workers). |
Cluster.BytesReadDirectThroughput |
GAUGE |
Total number of bytes read from Alluxio storage managed by workers and underlying UFS if data cannot be found in the Alluxio storage without external RPC involved. This records data read by worker internal calls (e.g. clients embedded in workers). |
Cluster.BytesReadDomain |
COUNTER |
Total number of bytes read from Alluxio storage via domain socket reported by all workers |
Cluster.BytesReadDomainThroughput |
GAUGE |
Bytes read per minute throughput from Alluxio storage via domain socket by all workers |
Cluster.BytesReadLocal |
COUNTER |
Total number of bytes short-circuit read from local storage by all clients |
Cluster.BytesReadLocalThroughput |
GAUGE |
Bytes per minute throughput short-circuit read from local storage by all clients |
Cluster.BytesReadPerUfs |
COUNTER |
Total number of bytes read from a specific UFS by all workers |
Cluster.BytesReadRemote |
COUNTER |
Total number of bytes read from Alluxio storage or underlying UFS if data does not exist in Alluxio storage reported by all workers. This does not include short-circuit local reads and domain socket reads |
Cluster.BytesReadRemoteThroughput |
GAUGE |
Bytes read per minute throughput from Alluxio storage or underlying UFS if data does not exist in Alluxio storage reported by all workers. This does not include short-circuit local reads and domain socket reads |
Cluster.BytesReadUfsAll |
COUNTER |
Total number of bytes read from a all Alluxio UFSes by all workers |
Cluster.BytesReadUfsThroughput |
GAUGE |
Bytes read per minute throughput from all Alluxio UFSes by all workers |
Cluster.BytesWrittenDomain |
COUNTER |
Total number of bytes written to Alluxio storage via domain socket by all workers |
Cluster.BytesWrittenDomainThroughput |
GAUGE |
Throughput of bytes written per minute to Alluxio storage via domain socket by all workers |
Cluster.BytesWrittenLocal |
COUNTER |
Total number of bytes short-circuit written to local storage by all clients |
Cluster.BytesWrittenLocalThroughput |
GAUGE |
Bytes per minute throughput written to local storage by all clients |
Cluster.BytesWrittenPerUfs |
COUNTER |
Total number of bytes written to a specific Alluxio UFS by all workers |
Cluster.BytesWrittenRemote |
COUNTER |
Total number of bytes written to Alluxio storage in all workers or the underlying UFS. This does not include short-circuit local writes and domain socket writes. |
Cluster.BytesWrittenRemoteThroughput |
GAUGE |
Bytes write per minute throughput to Alluxio storage in all workers or the underlying UFS. This does not include short-circuit local writes and domain socket writes. |
Cluster.BytesWrittenUfsAll |
COUNTER |
Total number of bytes written to all Alluxio UFSes by all workers |
Cluster.BytesWrittenUfsThroughput |
GAUGE |
Bytes write per minute throughput to all Alluxio UFSes by all workers |
Cluster.CapacityFree |
GAUGE |
Total free bytes on all tiers, on all workers of Alluxio |
Cluster.CapacityTotal |
GAUGE |
Total capacity (in bytes) on all tiers, on all workers of Alluxio |
Cluster.CapacityUsed |
GAUGE |
Total used bytes on all tiers, on all workers of Alluxio |
Cluster.LostWorkers |
GAUGE |
Total number of lost workers inside the cluster |
Cluster.RootUfsCapacityFree |
GAUGE |
Free capacity of the Alluxio root UFS in bytes |
Cluster.RootUfsCapacityTotal |
GAUGE |
Total capacity of the Alluxio root UFS in bytes |
Cluster.RootUfsCapacityUsed |
GAUGE |
Used capacity of the Alluxio root UFS in bytes |
Cluster.Workers |
GAUGE |
Total number of active workers inside the cluster |
Name | Type | Description |
Master.AbsentCacheHits |
COUNTER |
Number of cache hits on the absent cache |
Master.AbsentCacheInvalidations |
COUNTER |
Number of invalidations on the absent cache |
Master.AbsentCacheMisses |
COUNTER |
Number of cache misses on the absent cache |
Master.AbsentCacheSize |
GAUGE |
Size of the absent cache |
Master.BlockHeapSize |
GAUGE |
An estimate of the blocks heap size |
Master.CompleteFileOps |
COUNTER |
Total number of the CompleteFile operations |
Master.CreateDirectoryOps |
COUNTER |
Total number of the CreateDirectory operations |
Master.CreateFileOps |
COUNTER |
Total number of the CreateFile operations |
Master.DeletePathOps |
COUNTER |
Total number of the Delete operations |
Master.DirectoriesCreated |
COUNTER |
Total number of the succeed CreateDirectory operations |
Master.EdgeCacheEvictions |
GAUGE |
Total number of edges (inode metadata) that was evicted from cache. The edge cache is responsible for managing the mapping from (parentId, childName) to childId. |
Master.EdgeCacheHits |
GAUGE |
Total number of hits in the edge (inode metadata) cache. The edge cache is responsible for managing the mapping from (parentId, childName) to childId. |
Master.EdgeCacheLoadTimes |
GAUGE |
Total load times in the edge (inode metadata) cache that resulted from a cache miss. The edge cache is responsible for managing the mapping from (parentId, childName) to childId. |
Master.EdgeCacheMisses |
GAUGE |
Total number of misses in the edge (inode metadata) cache. The edge cache is responsible for managing the mapping from (parentId, childName) to childId. |
Master.EdgeCacheSize |
GAUGE |
Total number of edges (inode metadata) cached. The edge cache is responsible for managing the mapping from (parentId, childName) to childId. |
Master.EdgeLockPoolSize |
GAUGE |
The size of master edge lock pool |
Master.EmbeddedJournalSnapshotDownloadGenerate |
TIMER |
The timer statistics of journal snapshot download from other masters |
Master.EmbeddedJournalSnapshotGenerateTimer |
TIMER |
The timer statistics of journal snapshot generation by this master |
Master.EmbeddedJournalSnapshotInstallTimer |
TIMER |
The timer statistics of journal snapshot install |
Master.EmbeddedJournalSnapshotLastIndex |
GAUGE |
The last index of the latest journal snapshot created by this master or downloaded from other masters |
Master.EmbeddedJournalSnapshotReplayTimer |
TIMER |
The timer statistics of journal snapshot replay |
Master.FileBlockInfosGot |
COUNTER |
Total number of succeed GetFileBlockInfo operations |
Master.FileInfosGot |
COUNTER |
Total number of the succeed GetFileInfo operations |
Master.FileSize |
GAUGE |
File size distribution |
Master.FilesCompleted |
COUNTER |
Total number of the succeed CompleteFile operations |
Master.FilesCreated |
COUNTER |
Total number of the succeed CreateFile operations |
Master.FilesFreed |
COUNTER |
Total number of succeed FreeFile operations |
Master.FilesPersisted |
COUNTER |
Total number of successfully persisted files |
Master.FilesPinned |
GAUGE |
Total number of currently pinned files |
Master.FilesToBePersisted |
GAUGE |
Total number of currently to be persisted files |
Master.FreeFileOps |
COUNTER |
Total number of FreeFile operations |
Master.GetFileBlockInfoOps |
COUNTER |
Total number of GetFileBlockInfo operations |
Master.GetFileInfoOps |
COUNTER |
Total number of the GetFileInfo operations |
Master.GetNewBlockOps |
COUNTER |
Total number of the GetNewBlock operations |
Master.InodeCacheEvictions |
GAUGE |
Total number of inodes that was evicted from the cache. |
Master.InodeCacheHitRatio |
GAUGE |
Inode Cache hit ratio |
Master.InodeCacheHits |
GAUGE |
Total number of hits in the inodes (inode metadata) cache. |
Master.InodeCacheLoadTimer |
TIMER |
Total load latency in the inodes (inode metadata) cache |
Master.InodeCacheLoadTimes |
GAUGE |
Total load times in the inodes (inode metadata) cache that resulted from a cache miss. |
Master.InodeCacheMisses |
GAUGE |
Total number of misses in the inodes (inode metadata) cache. |
Master.InodeCacheSize |
GAUGE |
Total number of inodes (inode metadata) cached. |
Master.InodeHeapSize |
GAUGE |
An estimate of the inode heap size |
Master.InodeLockPoolSize |
GAUGE |
The size of master inode lock pool |
Master.JournalEntriesSinceCheckPoint |
GAUGE |
Journal entries since last checkpoint |
Master.JournalFlushFailure |
COUNTER |
Total number of failed journal flush |
Master.JournalFlushTimer |
TIMER |
The timer statistics of journal flush |
Master.JournalFreeBytes |
GAUGE |
Bytes left on the journal disk(s) for an Alluxio master |
Master.JournalFreePercent |
GAUGE |
Percentage of free space left on the journal disk(s) for an Alluxio master |
Master.JournalGainPrimacyTimer |
TIMER |
The timer statistics of journal gain primacy |
Master.JournalLastCheckPointTime |
GAUGE |
Last Journal Checkpoint Time |
Master.JournalSequenceNumber |
GAUGE |
Current journal sequence number |
Master.LastBackupEntriesCount |
GAUGE |
The total number of entries written in the last leading master metadata backup |
Master.LastBackupRestoreCount |
GAUGE |
The total number of entries restored from backup when a leading master initializes its metadata |
Master.LastBackupRestoreTimeMs |
GAUGE |
The process time of the last restore from backup |
Master.LastBackupTimeMs |
GAUGE |
The process time of the last backup |
Master.ListingCacheEvictions |
COUNTER |
The total number of evictions in master listing cache |
Master.ListingCacheHits |
COUNTER |
The total number of hits in master listing cache |
Master.ListingCacheLoadTimes |
COUNTER |
The total load time (in nanoseconds) in master listing cache that resulted from a cache miss. |
Master.ListingCacheMisses |
COUNTER |
The total number of misses in master listing cache |
Master.ListingCacheSize |
GAUGE |
The size of master listing cache |
Master.MountOps |
COUNTER |
Total number of Mount operations |
Master.NewBlocksGot |
COUNTER |
Total number of the succeed GetNewBlock operations |
Master.PathsDeleted |
COUNTER |
Total number of the succeed Delete operations |
Master.PathsMounted |
COUNTER |
Total number of succeed Mount operations |
Master.PathsRenamed |
COUNTER |
Total number of succeed Rename operations |
Master.PathsUnmounted |
COUNTER |
Total number of succeed Unmount operations |
Master.RenamePathOps |
COUNTER |
Total number of Rename operations |
Master.SetAclOps |
COUNTER |
Total number of SetAcl operations |
Master.SetAttributeOps |
COUNTER |
Total number of SetAttribute operations |
Master.TotalBlocks |
COUNTER |
Total number of blocks in Alluxio |
Master.TotalPaths |
GAUGE |
Total number of files and directory in Alluxio namespace |
Master.UfsJournalCatchupTimer |
TIMER |
The timer statistics of journal catchup |
Master.UfsJournalFailureRecoverTimer |
TIMER |
The timer statistics of ufs journal failure recover |
Master.UfsJournalInitialReplayTimeMs |
GAUGE |
The process time of the ufs journal initial replay |
Master.UnmountOps |
COUNTER |
Total number of Unmount operations |
Name | Type | Description |
Worker.ActiveRpcReadCount |
COUNTER |
The number of active read-RPCs managed by this worker |
Worker.ActiveRpcWriteCount |
COUNTER |
The number of active write-RPCs managed by this worker |
Worker.AsyncCacheDuplicateRequests |
COUNTER |
Total number of duplicated async cache request received by this worker |
Worker.AsyncCacheFailedBlocks |
COUNTER |
Total number of async cache failed blocks in this worker |
Worker.AsyncCacheRemoteBlocks |
COUNTER |
Total number of blocks that need to be async cached from remote source |
Worker.AsyncCacheRequests |
COUNTER |
Total number of async cache request received by this worker |
Worker.AsyncCacheSucceededBlocks |
COUNTER |
Total number of async cache succeeded blocks in this worker |
Worker.AsyncCacheUfsBlocks |
COUNTER |
Total number of blocks that need to be async cached from local source |
Worker.BlockRemoverBlocksToRemovedCount |
COUNTER |
The total number of blocks removed from this worker by asynchronous block remover. |
Worker.BlockRemoverRemovingBlocksSize |
GAUGE |
The size of blocks is removing from this worker by asynchronous block remover. |
Worker.BlockRemoverTryRemoveBlocksSize |
GAUGE |
The size of blocks to be removed from this worker by asynchronous block remover. |
Worker.BlockRemoverTryRemoveCount |
COUNTER |
The total number of blocks tried to be removed from this worker by asynchronous block remover. |
Worker.BlocksAccessed |
COUNTER |
Total number of times any one of the blocks in this worker is accessed. |
Worker.BlocksCached |
GAUGE |
Total number of blocks used for caching data in an Alluxio worker |
Worker.BlocksCancelled |
COUNTER |
Total number of aborted temporary blocks in this worker. |
Worker.BlocksDeleted |
COUNTER |
Total number of deleted blocks in this worker by external requests. |
Worker.BlocksEvicted |
COUNTER |
Total number of evicted blocks in this worker. |
Worker.BlocksEvictionRate |
METER |
Block eviction rate in this worker. |
Worker.BlocksLost |
COUNTER |
Total number of lost blocks in this worker. |
Worker.BlocksPromoted |
COUNTER |
Total number of times any one of the blocks in this worker moved to a new tier. |
Worker.BlocksReadLocal |
COUNTER |
Total number of local blocks read by this worker. |
Worker.BlocksReadRemote |
COUNTER |
Total number of a remote blocks read by this worker. |
Worker.BlocksReadUfs |
COUNTER |
Total number of a UFS blocks read by this worker. |
Worker.BytesReadDirect |
COUNTER |
Total number of bytes read from Alluxio storage managed by this worker and underlying UFS if data cannot be found in the Alluxio storage without external RPC involved. This records data read by worker internal calls (e.g. a client embedded in this worker). |
Worker.BytesReadDirectThroughput |
METER |
Total number of bytes read from Alluxio storage managed by this worker and underlying UFS if data cannot be found in the Alluxio storage without external RPC involved. This records data read by worker internal calls (e.g. a client embedded in this worker). |
Worker.BytesReadDomain |
COUNTER |
Total number of bytes read from Alluxio storage via domain socket by this worker |
Worker.BytesReadDomainThroughput |
METER |
Bytes read throughput from Alluxio storage via domain socket by this worker |
Worker.BytesReadPerUfs |
COUNTER |
Total number of bytes read from a specific Alluxio UFS by this worker |
Worker.BytesReadRemote |
COUNTER |
Total number of bytes read from Alluxio storage managed by this worker and underlying UFS if data cannot be found in the Alluxio storage via external RPC channel. This does not include short-circuit local reads and domain socket reads. |
Worker.BytesReadRemoteThroughput |
METER |
Total number of bytes read from Alluxio storage managed by this worker and underlying UFS if data cannot be found in the Alluxio storage via external RPC channel. This does not include short-circuit local reads and domain socket reads. |
Worker.BytesReadUfsThroughput |
METER |
Bytes read throughput from all Alluxio UFSes by this worker |
Worker.BytesWrittenDirect |
COUNTER |
Total number of bytes written to Alluxio storage managed by this worker without external RPC involved. This records data written by worker internal calls (e.g. a client embedded in this worker). |
Worker.BytesWrittenDirectThroughput |
METER |
Total number of bytes written to Alluxio storage managed by this worker without external RPC involved. This records data written by worker internal calls (e.g. a client embedded in this worker). |
Worker.BytesWrittenDomain |
COUNTER |
Total number of bytes written to Alluxio storage via domain socket by this worker |
Worker.BytesWrittenDomainThroughput |
METER |
Throughput of bytes written to Alluxio storage via domain socket by this worker |
Worker.BytesWrittenPerUfs |
COUNTER |
Total number of bytes written to a specific Alluxio UFS by this worker |
Worker.BytesWrittenRemote |
COUNTER |
Total number of bytes written to Alluxio storage or the underlying UFS by this worker. This does not include short-circuit local writes and domain socket writes. |
Worker.BytesWrittenRemoteThroughput |
METER |
Bytes write throughput to Alluxio storage or the underlying UFS by this workerThis does not include short-circuit local writes and domain socket writes. |
Worker.BytesWrittenUfsThroughput |
METER |
Bytes write throughput to all Alluxio UFSes by this worker |
Worker.CapacityFree |
GAUGE |
Total free bytes on all tiers of a specific Alluxio worker |
Worker.CapacityTotal |
GAUGE |
Total capacity (in bytes) on all tiers of a specific Alluxio worker |
Worker.CapacityUsed |
GAUGE |
Total used bytes on all tiers of a specific Alluxio worker |
Name | Type | Description |
Client.BlockReadDataChunk |
TIMER |
The timer statistics of reading block data in chunks from Alluxio workers. This metrics will only be recorded when alluxio.user.block.read.metrics.enabled is set to true |
Client.BlockReadDataFromChunk |
TIMER |
The timer statistics of reading data from data chunks which have already fetched from Alluxio workers. This metrics will only be recorded when alluxio.user.block.read.metrics.enabled is set to true |
Client.BytesReadLocal |
COUNTER |
Total number of bytes short-circuit read from local storage by this client |
Client.BytesReadLocalThroughput |
METER |
Bytes throughput short-circuit read from local storage by this client |
Client.BytesWrittenLocal |
COUNTER |
Total number of bytes short-circuit written to local storage by this client |
Client.BytesWrittenLocalThroughput |
METER |
Bytes throughput short-circuit written to local storage by this client |
Client.BytesWrittenUfs |
COUNTER |
Total number of bytes write to Alluxio UFS by this client |
Client.CacheBytesEvicted |
METER |
Total number of bytes evicted from the client cache. |
Client.CacheBytesReadCache |
METER |
Total number of bytes read from the client cache. |
Client.CacheBytesReadExternal |
METER |
Total number of bytes read from external storage due to a cache miss on the client cache. |
Client.CacheBytesRequestedExternal |
METER |
Total number of bytes the user requested to read which resulted in a cache miss. This number may be smaller than Client.CacheBytesReadExternal due to chunk reads. |
Client.CacheBytesWrittenCache |
METER |
Total number of bytes written to the client cache. |
Client.CacheCleanupGetErrors |
COUNTER |
Number of failures when cleaning up a failed cache read. |
Client.CacheCleanupPutErrors |
COUNTER |
Number of failures when cleaning up a failed cache write. |
Client.CacheCreateErrors |
COUNTER |
Number of failures when creating a cache in the client cache. |
Client.CacheDeleteErrors |
COUNTER |
Number of failures when deleting cached data in the client cache. |
Client.CacheDeleteNonExistingPageErrors |
COUNTER |
Number of failures when deleting pages due to absence. |
Client.CacheDeleteNotReadyErrors |
COUNTER |
Number of failures when when cache is not ready to delete pages. |
Client.CacheDeleteStoreDeleteErrors |
COUNTER |
Number of failures when deleting pages due to failed delete in page stores. |
Client.CacheGetErrors |
COUNTER |
Number of failures when getting cached data in the client cache. |
Client.CacheGetNotReadyErrors |
COUNTER |
Number of failures when cache is not ready to get pages. |
Client.CacheGetStoreReadErrors |
COUNTER |
Number of failures when getting cached data in the client cache due to failed read from page stores. |
Client.CacheHitRate |
GAUGE |
Cache hit rate: (# bytes read from cache) / (# bytes requested). |
Client.CachePages |
COUNTER |
Total number of pages in the client cache. |
Client.CachePagesEvicted |
METER |
Total number of pages evicted from the client cache. |
Client.CachePutAsyncRejectionErrors |
COUNTER |
Number of failures when putting cached data in the client cache due to failed injection to async write queue. |
Client.CachePutBenignRacingErrors |
COUNTER |
Number of failures when adding pages due to racing eviction. This error is benign. |
Client.CachePutErrors |
COUNTER |
Number of failures when putting cached data in the client cache. |
Client.CachePutEvictionErrors |
COUNTER |
Number of failures when putting cached data in the client cache due to failed eviction. |
Client.CachePutInsufficientSpaceErrors |
COUNTER |
Number of failures when putting cached data in the client cache due to insufficient space made after eviction. |
Client.CachePutNotReadyErrors |
COUNTER |
Number of failures when cache is not ready to add pages. |
Client.CachePutStoreDeleteErrors |
COUNTER |
Number of failures when putting cached data in the client cache due to failed deletes in page store. |
Client.CachePutStoreWriteErrors |
COUNTER |
Number of failures when putting cached data in the client cache due to failed writes to page store. |
Client.CachePutStoreWriteNoSpaceErrors |
COUNTER |
Number of failures when putting cached data in the client cache but getting disk is full while cache capacity is not achieved. This can happen if the storage overhead ratio to write data is underestimated. |
Client.CacheSpaceAvailable |
GAUGE |
Amount of bytes available in the client cache. |
Client.CacheSpaceUsed |
GAUGE |
Amount of bytes used by the client cache. |
Client.CacheSpaceUsedCount |
COUNTER |
Amount of bytes used by the client cache as a counter. |
Client.CacheState |
COUNTER |
State of the cache: 0 (NOT_IN_USE), 1 (READ_ONLY) and 2 (READ_WRITE) |
Client.CacheStoreDeleteTimeout |
COUNTER |
Number of timeouts when deleting pages from page store. |
Client.CacheStoreGetTimeout |
COUNTER |
Number of timeouts when reading pages from page store. |
Client.CacheStorePutTimeout |
COUNTER |
Number of timeouts when writing new pages to page store. |
Client.CacheStoreThreadsRejected |
COUNTER |
Number of rejection of I/O threads on submitting tasks to thread pool, likely due to unresponsive local file system. |
Client.CacheUnremovableFiles |
COUNTER |
Amount of bytes unusable managed by the client cache. |
Fuse is a long-running Alluxio client.
Depending on the launching ways, Fuse metrics show as
When a user or an application runs a filesystem command under Fuse mount point,
this command will be processed and translated by operating system which will trigger the related Fuse operations
exposed in AlluxioFuse.
The count of how many times each operation is called, and the duration of each call will be recorded with metrics name Fuse.<FUSE_OPERATION_NAME>
dynamically.
The important Fuse metrics include:
| Metric Name | Description |
|————————-|—————————————————–|
| Fuse.readdir | The duration metrics of listing a directory |
| Fuse.getattr | The duration metrics of getting the metadata of a file |
| Fuse.open | The duration metrics of opening a file for read |
| Fuse.read | The duration metrics of reading a part of a file |
| Fuse.create | The duration metrics of creating a file for write |
| Fuse.write | The duration metrics of writing a file |
| Fuse.release | The duration metrics of closing a file after read or write. Note that release is async so fuse threads will not wait for release to finish |
| Fuse.mkdir | The duration metrics of creating a directory |
| Fuse.unlink | The duration metrics of removing a file or a directory |
| Fuse.rename | The duration metrics of renaming a file or a directory |
| Fuse.chmod | The duration metrics of modifying the mode of a file or a directory |
| Fuse.chown | The duration metrics of modifying the user and/or group ownership of a file or a directory |
The following metrics are collected on each instance (Master, Worker or Client).
Alluxio provides overall and detailed memory usage information.
Detailed memory usage information of code cache, compressed class space, metaspace, PS Eden space, PS old gen, and PS survivor space
is collected in each process.