List of Metrics

Slack Docker Pulls GitHub edit source

There are two types of metrics in Alluxio, cluster-wide aggregated metrics, and per-process detailed metrics.

  • Cluster metrics are collected and calculated by the leading master and displayed in the metrics tab of the web UI. These metrics are designed to provide a snapshot of the cluster state and the overall amount of data and metadata served by Alluxio.

  • Process metrics are collected by each Alluxio process and exposed in a machine-readable format through any configured sinks. Process metrics are highly detailed and are intended to be consumed by third-party monitoring tools. Users can then view fine-grained dashboards with time-series graphs of each metric, such as data transferred or the number of RPC invocations.

Metrics in Alluxio have the following format for master node metrics:

Master.[metricName].[tag1].[tag2]...

Metrics in Alluxio have the following format for non-master node metrics:

[processType].[metricName].[tag1].[tag2]...[hostName]

There is generally an Alluxio metric for every RPC invocation, to Alluxio or to the under store.

Tags are additional pieces of metadata for the metric such as user name or under storage location. Tags can be used to further filter or aggregate on various characteristics.

Cluster Metrics

Workers and clients send metrics data to the Alluxio master through heartbeats. The interval is defined by property alluxio.master.worker.heartbeat.interval and alluxio.user.metrics.heartbeat.interval respectively.

Bytes metrics are aggregated value from workers or clients. Bytes throughput metrics are calculated on the leading master. The values of bytes throughput metrics equal to bytes metrics counter value divided by the metrics record time and shown as bytes per minute.

NameTypeDescription
Cluster.BytesReadAlluxio COUNTER Total number of bytes read from Alluxio storage reported by all workers. This does not include UFS reads.
Cluster.BytesReadAlluxioThroughput GAUGE Bytes read throughput from Alluxio storage by all workers
Cluster.BytesReadDomain COUNTER Total number of bytes read from Alluxio storage via domain socket reported by all workers
Cluster.BytesReadDomainThroughput GAUGE Bytes read throughput from Alluxio storage via domain socket by all workers
Cluster.BytesReadLocal COUNTER Total number of bytes short-circuit read from local storage by all clients
Cluster.BytesReadLocalThroughput GAUGE Bytes throughput short-circuit read from local storage by all clients
Cluster.BytesReadPerUfs COUNTER Total number of bytes read from a specific UFS by all workers
Cluster.BytesReadUfsAll COUNTER Total number of bytes read from a all Alluxio UFSes by all workers
Cluster.BytesReadUfsThroughput GAUGE Bytes read throughput from all Alluxio UFSes by all workers
Cluster.BytesWrittenAlluxio COUNTER Total number of bytes written to Alluxio storage in all workers. This does not include UFS writes
Cluster.BytesWrittenAlluxioThroughput GAUGE Bytes write throughput to Alluxio storage in all workers
Cluster.BytesWrittenDomain COUNTER Total number of bytes written to Alluxio storage via domain socket by all workers
Cluster.BytesWrittenDomainThroughput GAUGE Throughput of bytes written to Alluxio storage via domain socket by all workers
Cluster.BytesWrittenLocal COUNTER Total number of bytes short-circuit written to local storage by all clients
Cluster.BytesWrittenLocalThroughput GAUGE Bytes throughput written to local storage by all clients
Cluster.BytesWrittenPerUfs COUNTER Total number of bytes written to a specific Alluxio UFS by all workers
Cluster.BytesWrittenUfsAll COUNTER Total number of bytes written to all Alluxio UFSes by all workers
Cluster.BytesWrittenUfsThroughput GAUGE Bytes write throughput to all Alluxio UFSes by all workers
Cluster.CapacityFree GAUGE Total free bytes on all tiers, on all workers of Alluxio
Cluster.CapacityTotal GAUGE Total capacity (in bytes) on all tiers, on all workers of Alluxio
Cluster.CapacityUsed GAUGE Total used bytes on all tiers, on all workers of Alluxio
Cluster.RootUfsCapacityFree GAUGE Free capacity of the Alluxio root UFS in bytes
Cluster.RootUfsCapacityTotal GAUGE Total capacity of the Alluxio root UFS in bytes
Cluster.RootUfsCapacityUsed GAUGE Used capacity of the Alluxio root UFS in bytes
Cluster.Workers GAUGE Total number of active workers inside the cluster

Master Metrics

Default master metrics:

NameTypeDescription
Master.CompleteFileOps COUNTER Total number of the CompleteFile operations
Master.CreateDirectoryOps COUNTER Total number of the CreateDirectory operations
Master.CreateFileOps COUNTER Total number of the CreateFile operations
Master.DeletePathOps COUNTER Total number of the Delete operations
Master.DirectoriesCreated COUNTER Total number of the succeed CreateDirectory operations
Master.EdgeCacheSize GAUGE Total number of edges (inode metadata) cached. The edge cache is responsible for managing the mapping from (parentId, childName) to childId.
Master.FileBlockInfosGot COUNTER Total number of succeed GetFileBlockInfo operations
Master.FileInfosGot COUNTER Total number of the succeed GetFileInfo operations
Master.FilesCompleted COUNTER Total number of the succeed CompleteFile operations
Master.FilesCreated COUNTER Total number of the succeed CreateFile operations
Master.FilesFreed COUNTER Total number of succeed FreeFile operations
Master.FilesPersisted COUNTER Total number of successfully persisted files
Master.FilesPinned GAUGE Total number of currently pinned files
Master.FreeFileOps COUNTER Total number of FreeFile operations
Master.GetFileBlockInfoOps COUNTER Total number of GetFileBlockInfo operations
Master.GetFileInfoOps COUNTER Total number of the GetFileInfo operations
Master.GetNewBlockOps COUNTER Total number of the GetNewBlock operations
Master.InodeCacheSize GAUGE Total number of inodes (inode metadata) cached
Master.JournalFlushFailure COUNTER Total number of failed journal flush
Master.JournalFlushTimer TIMER The timer statistics of journal flush
Master.JournalGainPrimacyTimer TIMER The timer statistics of journal gain primacy
Master.LastBackupEntriesCount GAUGE The total number of entries written in the last leading master metadata backup
Master.LastBackupRestoreCount GAUGE The total number of entries restored from backup when a leading master initializes its metadata
Master.LastBackupRestoreTimeMs GAUGE The process time of the last restore from backup
Master.LastBackupTimeMs GAUGE The process time of the last backup
Master.ListingCacheSize GAUGE The size of master listing cache
Master.MountOps COUNTER Total number of Mount operations
Master.NewBlocksGot COUNTER Total number of the succeed GetNewBlock operations
Master.PathsDeleted COUNTER Total number of the succeed Delete operations
Master.PathsMounted COUNTER Total number of succeed Mount operations
Master.PathsRenamed COUNTER Total number of succeed Rename operations
Master.PathsUnmounted COUNTER Total number of succeed Unmount operations
Master.RenamePathOps COUNTER Total number of Rename operations
Master.SetAclOps COUNTER Total number of SetAcl operations
Master.SetAttributeOps COUNTER Total number of SetAttribute operations
Master.TotalPaths GAUGE Total number of files and directory in Alluxio namespace
Master.UfsJournalFailureRecoverTime TIMER The timer statistics of ufs journal failure recover
Master.UnmountOps COUNTER Total number of Unmount operations

Dynamically generated master metrics:

Metric Name Description
Master.CapacityTotalTier Total capacity in tier of the Alluxio file system in bytes
Master.CapacityUsedTier Used capacity in tier of the Alluxio file system in bytes
Master.CapacityFreeTier Free capacity in tier of the Alluxio file system in bytes
Master.UfsSessionCount-Ufs: The total number of currently opened UFS sessions to connect to the given
Master..UFS:.UFS_TYPE:.User: The details UFS rpc operation done by the current master
Master.PerUfsOp.UFS: The aggregated number of UFS operation ran on UFS by leading master
Master. The duration statistics of RPC calls exposed on leading master

Worker Metrics

Default master metrics:

NameTypeDescription
Worker.AsyncCacheDuplicateRequests COUNTER Total number of duplicated async cache request received by this worker
Worker.AsyncCacheFailedBlocks COUNTER Total number of async cache failed blocks in this worker
Worker.AsyncCacheRemoteBlocks COUNTER Total number of blocks that need to be async cached from remote source
Worker.AsyncCacheRequests COUNTER Total number of async cache request received by this worker
Worker.AsyncCacheSucceededBlocks COUNTER Total number of async cache succeeded blocks in this worker
Worker.AsyncCacheUfsBlocks COUNTER Total number of blocks that need to be async cached from local source
Worker.BlocksAccessed COUNTER Total number of times any one of the blocks in this worker is accessed.
Worker.BlocksCached GAUGE Total number of blocks used for caching data in an Alluxio worker
Worker.BlocksCancelled COUNTER Total number of aborted temporary blocks in this worker.
Worker.BlocksDeleted COUNTER Total number of deleted blocks in this worker by external requests.
Worker.BlocksEvicted COUNTER Total number of evicted blocks in this worker.
Worker.BlocksLost COUNTER Total number of lost blocks in this worker.
Worker.BlocksPromoted COUNTER Total number of times any one of the blocks in this worker moved to a new tier.
Worker.BytesReadAlluxio COUNTER Total number of bytes read from Alluxio storage managed by this worker. This does not include UFS reads.
Worker.BytesReadAlluxioThroughput METER Bytes read throughput from Alluxio storage by this worker
Worker.BytesReadDomain COUNTER Total number of bytes read from Alluxio storage via domain socket by this worker
Worker.BytesReadDomainThroughput METER Bytes read throughput from Alluxio storage via domain socket by this worker
Worker.BytesReadPerUfs COUNTER Total number of bytes read from a specific Alluxio UFS by this worker
Worker.BytesReadUfsThroughput METER Bytes read throughput from all Alluxio UFSes by this worker
Worker.BytesWrittenAlluxio COUNTER Total number of bytes written to Alluxio storage by this worker. This does not include UFS writes
Worker.BytesWrittenAlluxioThroughput METER Bytes write throughput to Alluxio storage by this worker
Worker.BytesWrittenDomain COUNTER Total number of bytes written to Alluxio storage via domain socket by this worker
Worker.BytesWrittenDomainThroughput METER Throughput of bytes written to Alluxio storage via domain socket by this worker
Worker.BytesWrittenPerUfs COUNTER Total number of bytes written to a specific Alluxio UFS by this worker
Worker.BytesWrittenUfsThroughput METER Bytes write throughput to all Alluxio UFSes by this worker
Worker.CapacityFree GAUGE Total free bytes on all tiers of a specific Alluxio worker
Worker.CapacityTotal GAUGE Total capacity (in bytes) on all tiers of a specific Alluxio worker
Worker.CapacityUsed GAUGE Total used bytes on all tiers of a specific Alluxio worker

Dynamically generated master metrics:

Metric Name Description
Worker.UfsSessionCount-Ufs: The total number of currently opened UFS sessions to connect to the given
Worker. The duration statistics of RPC calls exposed on workers

Client Metrics

Each client metric will be recorded with its local hostname or alluxio.user.app.id is configured. If alluxio.user.app.id is configured, multiple clients can be combined into a logical application.

NameTypeDescription
Client.BytesReadLocal COUNTER Total number of bytes short-circuit read from local storage by this client
Client.BytesReadLocalThroughput METER Bytes throughput short-circuit read from local storage by this client
Client.BytesWrittenLocal COUNTER Total number of bytes short-circuit written to local storage by this client
Client.BytesWrittenLocalThroughput METER Bytes throughput short-circuit written to local storage by this client
Client.BytesWrittenUfs COUNTER Total number of bytes write to Alluxio UFS by this client
Client.CacheBytesEvicted METER Total number of bytes evicted from the client cache.
Client.CacheBytesReadCache METER Total number of bytes read from the client cache.
Client.CacheBytesReadExternal METER Total number of bytes read from external storage due to a cache miss on the client cache.
Client.CacheBytesRequestedExternal METER Total number of bytes the user requested to read which resulted in a cache miss. This number may be smaller than Client.CacheBytesReadExternal due to chunk reads.
Client.CacheBytesWrittenCache METER Total number of bytes written to the client cache.
Client.CacheCleanupGetErrors COUNTER Number of failures when cleaning up a failed cache read.
Client.CacheCleanupPutErrors COUNTER Number of failures when cleaning up a failed cache write.
Client.CacheCreateErrors COUNTER Number of failures when creating a cache in the client cache.
Client.CacheDeleteErrors COUNTER Number of failures when deleting cached data in the client cache.
Client.CacheDeleteNonExistingPageErrors COUNTER Number of failures when deleting pages due to absence.
Client.CacheDeleteNotReadyErrors COUNTER Number of failures when when cache is not ready to delete pages.
Client.CacheDeleteStoreDeleteErrors COUNTER Number of failures when deleting pages due to failed delete in page stores.
Client.CacheGetErrors COUNTER Number of failures when getting cached data in the client cache.
Client.CacheGetNotReadyErrors COUNTER Number of failures when cache is not ready to get pages.
Client.CacheGetStoreReadErrors COUNTER Number of failures when getting cached data in the client cache due to failed read from page stores.
Client.CacheHitRate GAUGE Cache hit rate: (# bytes read from cache) / (# bytes requested).
Client.CachePages COUNTER Total number of pages in the client cache.
Client.CachePagesEvicted METER Total number of pages evicted from the client cache.
Client.CachePutAsyncRejectionErrors COUNTER Number of failures when putting cached data in the client cache due to failed injection to async write queue.
Client.CachePutBenignRacingErrors COUNTER Number of failures when adding pages due to racing eviction. This error is benign.
Client.CachePutErrors COUNTER Number of failures when putting cached data in the client cache.
Client.CachePutEvictionErrors COUNTER Number of failures when putting cached data in the client cache due to failed eviction.
Client.CachePutNotReadyErrors COUNTER Number of failures when cache is not ready to add pages.
Client.CachePutStoreDeleteErrors COUNTER Number of failures when putting cached data in the client cache due to failed deletes in page store.
Client.CachePutStoreWriteErrors COUNTER Number of failures when putting cached data in the client cache due to failed writes to page store.
Client.CacheSpaceAvailable GAUGE Amount of bytes available in the client cache.
Client.CacheSpaceUsed GAUGE Amount of bytes used by the client cache.
Client.CacheSpaceUsedCount COUNTER Amount of bytes used by the client cache as a counter.
Client.CacheState COUNTER State of the cache: 0 (NOT_IN_USE), 1 (READ_ONLY) and 2 (READ_WRITE)
Client.CacheUnremovableFiles COUNTER Amount of bytes unusable managed by the client cache.

Process Common Metrics

The following metrics are collected on each instance (Master, Worker or Client).

JVM Attributes

Metric Name Description
name The name of the JVM
uptime The uptime of the JVM
vendor The current JVM vendor

Garbage Collector Statistics

Metric Name Description
PS-MarkSweep.count Total number of mark and sweep
PS-MarkSweep.time The time used to mark and sweep
PS-Scavenge.count Total number of scavenge
PS-Scavenge.time The time used to scavenge

Memory Usage

Alluxio provides overall and detailed memory usage information. Detailed memory usage information of code cache, compressed class space, metaspace, PS Eden space, PS old gen, and PS survivor space is collected in each process.

A subset of the memory usage metrics are listed as following:

Metric Name Description
total.committed The amount of memory in bytes that is guaranteed to be available for use by the JVM
total.init The amount of the memory in bytes that is available for use by the JVM
total.max The maximum amount of memory in bytes that is available for use by the JVM
total.used The amount of memory currently used in bytes
heap.committed The amount of memory from heap area guaranteed to be available
heap.init The amount of memory from heap area available at initialization
heap.max The maximum amount of memory from heap area that is available
heap.usage The amount of memory from heap area currently used in GB
heap.used The amount of memory from heap area that has been used
pools.Code-Cache.used Used memory of collection usage from the pool from which memory is used for compilation and storage of native code
pools.Compressed-Class-Space.used Used memory of collection usage from the pool from which memory is use for class metadata
pools.PS-Eden-Space.used Used memory of collection usage from the pool from which memory is initially allocated for most objects
pools.PS-Survivor-Space.used Used memory of collection usage from the pool containing objects that have survived the garbage collection of the Eden space