There are two types of metrics in Alluxio, cluster-wide aggregated metrics, and per-process detailed metrics.
There is generally an Alluxio metric for every RPC invocation, to Alluxio or to the under store.
Tags are additional pieces of metadata for the metric such as user name or under storage location.
Tags can be used to further filter or aggregate on various characteristics.
Workers and clients send metrics data to the Alluxio master through heartbeats.
The interval is defined by property alluxio.master.worker.heartbeat.interval
and alluxio.user.metrics.heartbeat.interval
respectively.
Bytes metrics are aggregated value from workers or clients. Bytes throughput metrics are calculated on the leading master.
The values of bytes throughput metrics equal to bytes metrics counter value divided by the metrics record time and shown as bytes per minute.
Name | Type | Description |
Cluster.ActiveRpcReadCount |
COUNTER |
The number of active read-RPCs managed by workers |
Cluster.ActiveRpcWriteCount |
COUNTER |
The number of active write-RPCs managed by workers |
Cluster.BytesReadDirect |
COUNTER |
Total number of bytes read from Alluxio storage managed by workers and underlying UFS if data cannot be found in the Alluxio storage without external RPC involved. This records data read by worker internal calls (e.g. clients embedded in workers). |
Cluster.BytesReadDirectThroughput |
GAUGE |
Total number of bytes read from Alluxio storage managed by workers and underlying UFS if data cannot be found in the Alluxio storage without external RPC involved. This records data read by worker internal calls (e.g. clients embedded in workers). |
Cluster.BytesReadDomain |
COUNTER |
Total number of bytes read from Alluxio storage via domain socket reported by all workers |
Cluster.BytesReadDomainThroughput |
GAUGE |
Bytes read per minute throughput from Alluxio storage via domain socket by all workers |
Cluster.BytesReadLocal |
COUNTER |
Total number of bytes short-circuit read from local storage by all clients |
Cluster.BytesReadLocalThroughput |
GAUGE |
Bytes per minute throughput short-circuit read from local storage by all clients |
Cluster.BytesReadPerUfs |
COUNTER |
Total number of bytes read from a specific UFS by all workers |
Cluster.BytesReadRemote |
COUNTER |
Total number of bytes read from Alluxio storage or underlying UFS if data does not exist in Alluxio storage reported by all workers. This does not include short-circuit local reads and domain socket reads |
Cluster.BytesReadRemoteThroughput |
GAUGE |
Bytes read per minute throughput from Alluxio storage or underlying UFS if data does not exist in Alluxio storage reported by all workers. This does not include short-circuit local reads and domain socket reads |
Cluster.BytesReadUfsAll |
COUNTER |
Total number of bytes read from all Alluxio UFSes by all workers |
Cluster.BytesReadUfsThroughput |
GAUGE |
Bytes read per minute throughput from all Alluxio UFSes by all workers |
Cluster.BytesWrittenDomain |
COUNTER |
Total number of bytes written to Alluxio storage via domain socket by all workers |
Cluster.BytesWrittenDomainThroughput |
GAUGE |
Throughput of bytes written per minute to Alluxio storage via domain socket by all workers |
Cluster.BytesWrittenLocal |
COUNTER |
Total number of bytes short-circuit written to local storage by all clients |
Cluster.BytesWrittenLocalThroughput |
GAUGE |
Bytes per minute throughput written to local storage by all clients |
Cluster.BytesWrittenPerUfs |
COUNTER |
Total number of bytes written to a specific Alluxio UFS by all workers |
Cluster.BytesWrittenRemote |
COUNTER |
Total number of bytes written to Alluxio storage in all workers or the underlying UFS. This does not include short-circuit local writes and domain socket writes. |
Cluster.BytesWrittenRemoteThroughput |
GAUGE |
Bytes write per minute throughput to Alluxio storage in all workers or the underlying UFS. This does not include short-circuit local writes and domain socket writes. |
Cluster.BytesWrittenUfsAll |
COUNTER |
Total number of bytes written to all Alluxio UFSes by all workers |
Cluster.BytesWrittenUfsThroughput |
GAUGE |
Bytes write per minute throughput to all Alluxio UFSes by all workers |
Cluster.CacheHitRate |
GAUGE |
Cache hit rate: (# bytes read from cache) / (# bytes requested) |
Cluster.CapacityFree |
GAUGE |
Total free bytes on all tiers, on all workers of Alluxio |
Cluster.CapacityTotal |
GAUGE |
Total capacity (in bytes) on all tiers, on all workers of Alluxio |
Cluster.CapacityUsed |
GAUGE |
Total used bytes on all tiers, on all workers of Alluxio |
Cluster.LeaderId |
GAUGE |
Display current leader id |
Cluster.LeaderIndex |
GAUGE |
Index of current leader |
Cluster.LostWorkers |
GAUGE |
Total number of lost workers inside the cluster |
Cluster.RootUfsCapacityFree |
GAUGE |
Free capacity of the Alluxio root UFS in bytes |
Cluster.RootUfsCapacityTotal |
GAUGE |
Total capacity of the Alluxio root UFS in bytes |
Cluster.RootUfsCapacityUsed |
GAUGE |
Used capacity of the Alluxio root UFS in bytes |
Cluster.Workers |
GAUGE |
Total number of active workers inside the cluster |
Metrics shared by the Alluxio server processes.
Name | Type | Description |
Master.AbsentCacheHits |
GAUGE |
Number of cache hits on the absent cache |
Master.AbsentCacheMisses |
GAUGE |
Number of cache misses on the absent cache |
Master.AbsentCacheSize |
GAUGE |
Size of the absent cache |
Master.AbsentPathCacheQueueSize |
GAUGE |
Alluxio maintains a cache of absent UFS paths. This is the number of UFS paths being processed. |
Master.AsyncPersistCancel |
COUNTER |
The number of cancelled AsyncPersist operations |
Master.AsyncPersistFail |
COUNTER |
The number of failed AsyncPersist operations |
Master.AsyncPersistFileCount |
COUNTER |
The number of files created by AsyncPersist operations |
Master.AsyncPersistFileSize |
COUNTER |
The total size of files created by AsyncPersist operations |
Master.AsyncPersistSuccess |
COUNTER |
The number of successful AsyncPersist operations |
Master.AuditLogEntriesSize |
GAUGE |
The size of the audit log entries blocking queue |
Master.BlockHeapSize |
GAUGE |
An estimate of the blocks heap size |
Master.BlockReplicaCount |
GAUGE |
Total number of block replicas in Alluxio |
Master.CompleteFileOps |
COUNTER |
Total number of the CompleteFile operations |
Master.CompletedOperationRetryCount |
COUNTER |
Total number of completed operations that has been retried by client. |
Master.CreateDirectoryOps |
COUNTER |
Total number of the CreateDirectory operations |
Master.CreateFileOps |
COUNTER |
Total number of the CreateFile operations |
Master.DeletePathOps |
COUNTER |
Total number of the Delete operations |
Master.DirectoriesCreated |
COUNTER |
Total number of the succeed CreateDirectory operations |
Master.EdgeCacheEvictions |
GAUGE |
Total number of edges (inode metadata) that was evicted from cache. The edge cache is responsible for managing the mapping from (parentId, childName) to childId. |
Master.EdgeCacheHits |
GAUGE |
Total number of hits in the edge (inode metadata) cache. The edge cache is responsible for managing the mapping from (parentId, childName) to childId. |
Master.EdgeCacheLoadTimes |
GAUGE |
Total load times in the edge (inode metadata) cache that resulted from a cache miss. The edge cache is responsible for managing the mapping from (parentId, childName) to childId. |
Master.EdgeCacheMisses |
GAUGE |
Total number of misses in the edge (inode metadata) cache. The edge cache is responsible for managing the mapping from (parentId, childName) to childId. |
Master.EdgeCacheSize |
GAUGE |
Total number of edges (inode metadata) cached. The edge cache is responsible for managing the mapping from (parentId, childName) to childId. |
Master.EdgeLockPoolSize |
GAUGE |
The size of master edge lock pool |
Master.EmbeddedJournalSnapshotDownloadGenerate |
TIMER |
Describes the amount of time taken to download journal snapshots from other masters in the cluster. Only valid when using the embedded journal. Use this metric to determine if there are potential communication bottlenecks between Alluxio masters. |
Master.EmbeddedJournalSnapshotGenerateTimer |
TIMER |
Describes the amount of time taken to generate local journal snapshots on this master. Only valid when using the embedded journal. Use this metric to measure the performance of Alluxio's snapshot generation. |
Master.EmbeddedJournalSnapshotInstallTimer |
TIMER |
Describes the amount of time taken to install a downloaded journal snapshot from another master. Only valid only when using the embedded journal. Use this metric to determine the performance of Alluxio when installing snapshots from the leader. Higher numbers may indicate a slow disk or CPU contention. |
Master.EmbeddedJournalSnapshotLastIndex |
GAUGE |
Represents the latest journal index that was recorded by this master in the most recent local snapshot or from a snapshot downloaded from another master in the cluster. Only valid when using the embedded journal. |
Master.EmbeddedJournalSnapshotReplayTimer |
TIMER |
Describes the amount of time taken to replay a journal snapshot onto the master's state machine. Only valid only when using the embedded journal. Use this metric to determine the performance of Alluxio when replaying journal snapshot file. Higher numbers may indicate a slow disk or CPU contention |
Master.FileBlockInfosGot |
COUNTER |
Total number of succeed GetFileBlockInfo operations |
Master.FileInfosGot |
COUNTER |
Total number of the succeed GetFileInfo operations |
Master.FileSize |
GAUGE |
File size distribution |
Master.FilesCompleted |
COUNTER |
Total number of the succeed CompleteFile operations |
Master.FilesCreated |
COUNTER |
Total number of the succeed CreateFile operations |
Master.FilesFreed |
COUNTER |
Total number of succeed FreeFile operations |
Master.FilesPersisted |
COUNTER |
Total number of successfully persisted files |
Master.FilesPinned |
GAUGE |
Total number of currently pinned files |
Master.FilesToBePersisted |
GAUGE |
Total number of currently to be persisted files |
Master.FreeFileOps |
COUNTER |
Total number of FreeFile operations |
Master.GetFileBlockInfoOps |
COUNTER |
Total number of GetFileBlockInfo operations |
Master.GetFileInfoOps |
COUNTER |
Total number of the GetFileInfo operations |
Master.GetNewBlockOps |
COUNTER |
Total number of the GetNewBlock operations |
Master.InodeCacheEvictions |
GAUGE |
Total number of inodes that was evicted from the cache. |
Master.InodeCacheHitRatio |
GAUGE |
Inode Cache hit ratio |
Master.InodeCacheHits |
GAUGE |
Total number of hits in the inodes (inode metadata) cache. |
Master.InodeCacheLoadTimes |
GAUGE |
Total load times in the inodes (inode metadata) cache that resulted from a cache miss. |
Master.InodeCacheMisses |
GAUGE |
Total number of misses in the inodes (inode metadata) cache. |
Master.InodeCacheSize |
GAUGE |
Total number of inodes (inode metadata) cached. |
Master.InodeHeapSize |
GAUGE |
An estimate of the inode heap size |
Master.InodeLockPoolSize |
GAUGE |
The size of master inode lock pool |
Master.JobCanceled |
COUNTER |
The number of canceled status job |
Master.JobCompleted |
COUNTER |
The number of completed status job |
Master.JobCount |
GAUGE |
The number of all status job |
Master.JobCreated |
COUNTER |
The number of created status job |
Master.JobDistributedLoadCancel |
COUNTER |
The number of cancelled DistributedLoad operations |
Master.JobDistributedLoadFail |
COUNTER |
The number of failed DistributedLoad operations |
Master.JobDistributedLoadFileCount |
COUNTER |
The number of files by DistributedLoad operations |
Master.JobDistributedLoadFileSizes |
COUNTER |
The total file size by DistributedLoad operations |
Master.JobDistributedLoadRate |
METER |
The average DistributedLoad loading rate |
Master.JobDistributedLoadSuccess |
COUNTER |
The number of successful DistributedLoad operations |
Master.JobFailed |
COUNTER |
The number of failed status job |
Master.JobRunning |
COUNTER |
The number of running status job |
Master.JournalCheckpointWarn |
GAUGE |
If the raft log index exceeds alluxio.master.journal.checkpoint.period.entries, and the last checkpoint exceeds alluxio.master.journal.checkpoint.warning.threshold.time, it returns 1 to indicate that a warning is required, otherwise it returns 0 |
Master.JournalEntriesSinceCheckPoint |
GAUGE |
Journal entries since last checkpoint |
Master.JournalFlushFailure |
COUNTER |
Total number of failed journal flush |
Master.JournalFlushTimer |
TIMER |
The timer statistics of journal flush |
Master.JournalFreeBytes |
GAUGE |
Bytes left on the journal disk(s) for an Alluxio master. This metric is only valid on Linux and when embedded journal is used. Use this metric to monitor whether your journal is running out of disk space. |
Master.JournalFreePercent |
GAUGE |
Percentage of free space left on the journal disk(s) for an Alluxio master.This metric is only valid on Linux and when embedded journal is used. Use this metric to monitor whether your journal is running out of disk space. |
Master.JournalGainPrimacyTimer |
TIMER |
The timer statistics of journal gain primacy |
Master.JournalLastAppliedCommitIndex |
GAUGE |
The last raft log index which was applied to the state machine |
Master.JournalLastCheckPointTime |
GAUGE |
Last Journal Checkpoint Time |
Master.JournalSequenceNumber |
GAUGE |
Current journal sequence number |
Master.LastBackupEntriesCount |
GAUGE |
The total number of entries written in the last leading master metadata backup |
Master.LastBackupRestoreCount |
GAUGE |
The total number of entries restored from backup when a leading master initializes its metadata |
Master.LastBackupRestoreTimeMs |
GAUGE |
The process time of the last restore from backup |
Master.LastBackupTimeMs |
GAUGE |
The process time of the last backup |
Master.ListingCacheEvictions |
COUNTER |
The total number of evictions in master listing cache |
Master.ListingCacheHits |
COUNTER |
The total number of hits in master listing cache |
Master.ListingCacheLoadTimes |
COUNTER |
The total load time (in nanoseconds) in master listing cache that resulted from a cache miss. |
Master.ListingCacheMisses |
COUNTER |
The total number of misses in master listing cache |
Master.ListingCacheSize |
GAUGE |
The size of master listing cache |
Master.LostBlockCount |
GAUGE |
Count of lost unique blocks |
Master.LostFileCount |
GAUGE |
Count of lost files. This number is cached and may not be in sync with Master.LostBlockCount |
Master.MetadataSyncActivePaths |
COUNTER |
The number of in-progress paths from all InodeSyncStream instances |
Master.MetadataSyncExecutorQueueSize |
GAUGE |
The number of queuing sync tasks in the metadata sync thread pool controlled by alluxio.master.metadata.sync.executor.pool.size |
Master.MetadataSyncFail |
COUNTER |
The number of InodeSyncStream that failed, either partially or fully |
Master.MetadataSyncNoChange |
COUNTER |
The number of InodeSyncStream that finished with no change to inodes. |
Master.MetadataSyncOpsCount |
COUNTER |
The number of metadata sync operations. Each sync operation corresponds to one InodeSyncStream instance. |
Master.MetadataSyncPathsCancel |
COUNTER |
The number of pending paths from all InodeSyncStream instances that are ignored in the end instead of processed |
Master.MetadataSyncPathsFail |
COUNTER |
The number of paths that failed during metadata sync from all InodeSyncStream instances |
Master.MetadataSyncPathsSuccess |
COUNTER |
The number of paths sync-ed from all InodeSyncStream instances |
Master.MetadataSyncPendingPaths |
COUNTER |
The number of pending paths from all active InodeSyncStream instances,waiting for metadata sync |
Master.MetadataSyncPrefetchCancel |
COUNTER |
Number of cancelled prefetch jobs from metadata sync |
Master.MetadataSyncPrefetchExecutorQueueSize |
GAUGE |
The number of queuing prefetch tasks in the metadata sync thread pool controlled by alluxio.master.metadata.sync.ufs.prefetch.pool.size |
Master.MetadataSyncPrefetchFail |
COUNTER |
Number of failed prefetch jobs from metadata sync |
Master.MetadataSyncPrefetchOpsCount |
COUNTER |
The number of prefetch operations handled by the prefetch thread pool |
Master.MetadataSyncPrefetchPaths |
COUNTER |
Total number of UFS paths fetched by prefetch jobs from metadata sync |
Master.MetadataSyncPrefetchRetries |
COUNTER |
Number of retries to get from prefetch jobs from metadata sync |
Master.MetadataSyncPrefetchSuccess |
COUNTER |
Number of successful prefetch jobs from metadata sync |
Master.MetadataSyncSkipped |
COUNTER |
The number of InodeSyncStream that are skipped because the Alluxio metadata is fresher than alluxio.user.file.metadata.sync.interval |
Master.MetadataSyncSuccess |
COUNTER |
The number of InodeSyncStream that succeeded |
Master.MetadataSyncTimeMs |
COUNTER |
The total time elapsed in all InodeSyncStream instances |
Master.MigrateJobCancel |
COUNTER |
The number of cancelled MigrateJob operations |
Master.MigrateJobFail |
COUNTER |
The number of failed MigrateJob operations |
Master.MigrateJobFileCount |
COUNTER |
The number of MigrateJob files |
Master.MigrateJobFileSize |
COUNTER |
The total size of MigrateJob files |
Master.MigrateJobSuccess |
COUNTER |
The number of successful MigrateJob operations |
Master.MountOps |
COUNTER |
Total number of Mount operations |
Master.NewBlocksGot |
COUNTER |
Total number of the succeed GetNewBlock operations |
Master.PathsDeleted |
COUNTER |
Total number of the succeed Delete operations |
Master.PathsMounted |
COUNTER |
Total number of succeed Mount operations |
Master.PathsRenamed |
COUNTER |
Total number of succeed Rename operations |
Master.PathsUnmounted |
COUNTER |
Total number of succeed Unmount operations |
Master.PolicyCommitExecutorQueueSize |
GAUGE |
The queue size of commit executor in action scheduler |
Master.PolicyExecutionExecutorQueueSize |
GAUGE |
The queue size of execution executor in action scheduler |
Master.PolicyRunningActionCount |
GAUGE |
The number of running actions in action scheduler |
Master.PolicyScanTimer |
TIMER |
The timer statistics of policy scan |
Master.PolicyScheduledActionCount |
GAUGE |
The number of actions scheduled by action scheduler |
Master.PolicySyncTaskCount |
GAUGE |
The number of active sync tasks |
Master.RenamePathOps |
COUNTER |
Total number of Rename operations |
Master.ReplicaMgmtActiveJobSize |
GAUGE |
Number of active block replication/eviction jobs. These jobs are created by the master to maintain the block replica factor. The value is an estimate with lag. |
Master.RoleId |
GAUGE |
Display master role id |
Master.RpcQueueLength |
GAUGE |
Length of the master rpc queue. Use this metric to monitor the RPC pressure on master. |
Master.SetAclOps |
COUNTER |
Total number of SetAcl operations |
Master.SetAttributeOps |
COUNTER |
Total number of SetAttribute operations |
Master.ToRemoveBlockCount |
GAUGE |
Count of block replicas to be removed from the workers. If 1 block is to be removed from 2 workers, 2 will be counted here. |
Master.TotalPaths |
GAUGE |
Total number of files and directory in Alluxio namespace |
Master.TotalRpcs |
TIMER |
Throughput of master RPC calls. This metrics indicates how busy the master is serving client and worker requests |
Master.UfsJournalCatchupTimer |
TIMER |
The timer statistics of journal catchupOnly valid when ufs journal is used. This provides a summary of how long a standby master takes to catch up with primary master, and should be monitored if master transition takes too long |
Master.UfsJournalFailureRecoverTimer |
TIMER |
The timer statistics of ufs journal failure recover |
Master.UfsJournalInitialReplayTimeMs |
GAUGE |
The process time of the ufs journal initial replay.Only valid when ufs journal is used. It records the time it took for the very first journal replay. Use this metric to monitor when your master boot-up time is high。 |
Master.UfsStatusCacheChildrenSize |
COUNTER |
Total number of UFS file metadata cached. The cache is used during metadata sync. |
Master.UfsStatusCacheSize |
COUNTER |
Total number of Alluxio paths being processed by the metadata sync prefetch thread pool. |
Master.UniqueBlocks |
GAUGE |
Total number of unique blocks in Alluxio |
Master.UnmountOps |
COUNTER |
Total number of Unmount operations |
Name | Type | Description |
Worker.ActiveClients |
COUNTER |
The number of clients actively reading from or writing to this worker |
Worker.ActiveRpcReadCount |
COUNTER |
The number of active read-RPCs managed by this worker |
Worker.ActiveRpcWriteCount |
COUNTER |
The number of active write-RPCs managed by this worker |
Worker.BlockReaderCompleteTaskCount |
GAUGE |
The approximate total number of block read tasks that have completed execution |
Worker.BlockReaderThreadActiveCount |
GAUGE |
The approximate number of block read threads that are actively executing tasks in reader thread pool |
Worker.BlockReaderThreadCurrentCount |
GAUGE |
The current number of read threads in the reader thread pool |
Worker.BlockReaderThreadMaxCount |
GAUGE |
The maximum allowed number of block read thread in the reader thread pool |
Worker.BlockRemoverBlocksRemovedCount |
COUNTER |
The total number of blocks successfully removed from this worker by asynchronous block remover. |
Worker.BlockRemoverRemovingBlocksSize |
GAUGE |
The size of blocks is being removed from this worker at a moment by asynchronous block remover. |
Worker.BlockRemoverTryRemoveBlocksSize |
GAUGE |
The number of blocks to be removed from this worker at a moment by asynchronous block remover. |
Worker.BlockRemoverTryRemoveCount |
COUNTER |
The total number of blocks this worker attempted to remove with asynchronous block remover. |
Worker.BlockWriterCompleteTaskCount |
GAUGE |
The approximate total number of block write tasks that have completed execution |
Worker.BlockWriterThreadActiveCount |
GAUGE |
The approximate number of block write threads that are actively executing tasks in writer thread pool |
Worker.BlockWriterThreadCurrentCount |
GAUGE |
The current number of write threads in the writer thread pool |
Worker.BlockWriterThreadMaxCount |
GAUGE |
The maximum allowed number of block write thread in the writer thread pool |
Worker.BlocksAccessed |
COUNTER |
Total number of times any one of the blocks in this worker is accessed. |
Worker.BlocksCached |
GAUGE |
Total number of blocks used for caching data in an Alluxio worker |
Worker.BlocksCancelled |
COUNTER |
Total number of aborted temporary blocks in this worker. |
Worker.BlocksDeleted |
COUNTER |
Total number of deleted blocks in this worker by external requests. |
Worker.BlocksEvicted |
COUNTER |
Total number of evicted blocks in this worker. |
Worker.BlocksEvictionRate |
METER |
Block eviction rate in this worker. |
Worker.BlocksLost |
COUNTER |
Total number of lost blocks in this worker. |
Worker.BlocksPromoted |
COUNTER |
Total number of times any one of the blocks in this worker moved to a new tier. |
Worker.BlocksReadLocal |
COUNTER |
Total number of local blocks read by this worker. |
Worker.BlocksReadRemote |
COUNTER |
Total number of a remote blocks read by this worker. |
Worker.BlocksReadUfs |
COUNTER |
Total number of a UFS blocks read by this worker. |
Worker.BytesReadDirect |
COUNTER |
Total number of bytes read from Alluxio storage managed by this worker and underlying UFS if data cannot be found in the Alluxio storage without external RPC involved. This records data read by worker internal calls (e.g. a client embedded in this worker). |
Worker.BytesReadDirectThroughput |
METER |
Total number of bytes read from Alluxio storage managed by this worker and underlying UFS if data cannot be found in the Alluxio storage without external RPC involved. This records data read by worker internal calls (e.g. a client embedded in this worker). |
Worker.BytesReadDomain |
COUNTER |
Total number of bytes read from Alluxio storage via domain socket by this worker |
Worker.BytesReadDomainThroughput |
METER |
Bytes read throughput from Alluxio storage via domain socket by this worker |
Worker.BytesReadPerUfs |
COUNTER |
Total number of bytes read from a specific Alluxio UFS by this worker |
Worker.BytesReadRemote |
COUNTER |
Total number of bytes read from Alluxio storage managed by this worker and underlying UFS if data cannot be found in the Alluxio storage via external RPC channel. This does not include short-circuit local reads and domain socket reads. |
Worker.BytesReadRemoteThroughput |
METER |
Total number of bytes read from Alluxio storage managed by this worker and underlying UFS if data cannot be found in the Alluxio storage via external RPC channel. This does not include short-circuit local reads and domain socket reads. |
Worker.BytesReadUfsThroughput |
METER |
Bytes read throughput from all Alluxio UFSes by this worker |
Worker.BytesWrittenDirect |
COUNTER |
Total number of bytes written to Alluxio storage managed by this worker without external RPC involved. This records data written by worker internal calls (e.g. a client embedded in this worker). |
Worker.BytesWrittenDirectThroughput |
METER |
Total number of bytes written to Alluxio storage managed by this worker without external RPC involved. This records data written by worker internal calls (e.g. a client embedded in this worker). |
Worker.BytesWrittenDomain |
COUNTER |
Total number of bytes written to Alluxio storage via domain socket by this worker |
Worker.BytesWrittenDomainThroughput |
METER |
Throughput of bytes written to Alluxio storage via domain socket by this worker |
Worker.BytesWrittenPerUfs |
COUNTER |
Total number of bytes written to a specific Alluxio UFS by this worker |
Worker.BytesWrittenRemote |
COUNTER |
Total number of bytes written to Alluxio storage or the underlying UFS by this worker. This does not include short-circuit local writes and domain socket writes. |
Worker.BytesWrittenRemoteThroughput |
METER |
Bytes write throughput to Alluxio storage or the underlying UFS by this workerThis does not include short-circuit local writes and domain socket writes. |
Worker.BytesWrittenUfsThroughput |
METER |
Bytes write throughput to all Alluxio UFSes by this worker |
Worker.CacheBlocksSize |
COUNTER |
Total number of bytes that being cached through cache requests |
Worker.CacheFailedBlocks |
COUNTER |
Total number of failed cache blocks in this worker |
Worker.CacheRemoteBlocks |
COUNTER |
Total number of blocks that need to be cached from remote source |
Worker.CacheRequests |
COUNTER |
Total number of cache request received by this worker |
Worker.CacheRequestsAsync |
COUNTER |
Total number of async cache request received by this worker |
Worker.CacheRequestsSync |
COUNTER |
Total number of sync cache request received by this worker |
Worker.CacheSucceededBlocks |
COUNTER |
Total number of cache succeeded blocks in this worker |
Worker.CacheUfsBlocks |
COUNTER |
Total number of blocks that need to be cached from local source |
Worker.CapacityFree |
GAUGE |
Total free bytes on all tiers of a specific Alluxio worker |
Worker.CapacityTotal |
GAUGE |
Total capacity (in bytes) on all tiers of a specific Alluxio worker |
Worker.CapacityUsed |
GAUGE |
Total used bytes on all tiers of a specific Alluxio worker |
Worker.EncryptedBlocksRead |
COUNTER |
Total number of local encrypted blocks read by this worker. |
Worker.RpcQueueLength |
GAUGE |
Length of the worker rpc queue. Use this metric to monitor the RPC pressure on worker. |
Name | Type | Description |
Client.BlockMasterClientCount |
COUNTER |
Number of instances in the BlockMasterClientPool. |
Client.BlockReadChunkRemote |
TIMER |
The timer statistics of reading block data in chunks from remote Alluxio workers via RPC framework. This metrics will only be recorded when alluxio.user.block.read.metrics.enabled is set to true |
Client.BlockWorkerClientCount |
COUNTER |
Number of instances in the BlockWorkerClientPool. |
Client.BytesReadLocal |
COUNTER |
Total number of bytes short-circuit read from local storage by this client |
Client.BytesReadLocalThroughput |
METER |
Bytes throughput short-circuit read from local storage by this client |
Client.BytesWrittenLocal |
COUNTER |
Total number of bytes short-circuit written to local storage by this client |
Client.BytesWrittenLocalThroughput |
METER |
Bytes throughput short-circuit written to local storage by this client |
Client.BytesWrittenUfs |
COUNTER |
Total number of bytes write to Alluxio UFS by this client |
Client.CacheBytesEvicted |
METER |
Total number of bytes evicted from the client cache. |
Client.CacheBytesReadCache |
METER |
Total number of bytes read from the client cache. |
Client.CacheBytesReadExternal |
METER |
Total number of bytes read from external storage due to a cache miss on the client cache. |
Client.CacheBytesRequestedExternal |
METER |
Total number of bytes the user requested to read which resulted in a cache miss. This number may be smaller than Client.CacheBytesReadExternal due to chunk reads. |
Client.CacheBytesWrittenCache |
METER |
Total number of bytes written to the client cache. |
Client.CacheCleanErrors |
COUNTER |
Number of failures when cleaning out the existing cache directory to initialize a new cache. |
Client.CacheCleanupGetErrors |
COUNTER |
Number of failures when cleaning up a failed cache read. |
Client.CacheCleanupPutErrors |
COUNTER |
Number of failures when cleaning up a failed cache write. |
Client.CacheCreateErrors |
COUNTER |
Number of failures when creating a cache in the client cache. |
Client.CacheDeleteErrors |
COUNTER |
Number of failures when deleting cached data in the client cache. |
Client.CacheDeleteFromStoreErrors |
COUNTER |
Number of failures when deleting pages from page stores. |
Client.CacheDeleteNonExistingPageErrors |
COUNTER |
Number of failures when deleting pages due to absence. |
Client.CacheDeleteNotReadyErrors |
COUNTER |
Number of failures when cache is not ready to delete pages. |
Client.CacheGetErrors |
COUNTER |
Number of failures when getting cached data in the client cache. |
Client.CacheGetNotReadyErrors |
COUNTER |
Number of failures when cache is not ready to get pages. |
Client.CacheGetStoreReadErrors |
COUNTER |
Number of failures when getting cached data in the client cache due to failed read from page stores. |
Client.CacheHitRate |
GAUGE |
Cache hit rate: (# bytes read from cache) / (# bytes requested). |
Client.CachePageReadCacheTimeNanos |
METER |
Time in nanoseconds taken to read a page from the client cache when the cache hits. |
Client.CachePageReadExternalTimeNanos |
METER |
Time in nanoseconds taken to read a page from external source when the cache misses. |
Client.CachePages |
COUNTER |
Total number of pages in the client cache. |
Client.CachePagesEvicted |
METER |
Total number of pages evicted from the client cache. |
Client.CachePutAsyncRejectionErrors |
COUNTER |
Number of failures when putting cached data in the client cache due to failed injection to async write queue. |
Client.CachePutBenignRacingErrors |
COUNTER |
Number of failures when adding pages due to racing eviction. This error is benign. |
Client.CachePutErrors |
COUNTER |
Number of failures when putting cached data in the client cache. |
Client.CachePutEvictionErrors |
COUNTER |
Number of failures when putting cached data in the client cache due to failed eviction. |
Client.CachePutInsufficientSpaceErrors |
COUNTER |
Number of failures when putting cached data in the client cache due to insufficient space made after eviction. |
Client.CachePutNotReadyErrors |
COUNTER |
Number of failures when cache is not ready to add pages. |
Client.CachePutStoreDeleteErrors |
COUNTER |
Number of failures when putting cached data in the client cache due to failed deletes in page store. |
Client.CachePutStoreWriteErrors |
COUNTER |
Number of failures when putting cached data in the client cache due to failed writes to page store. |
Client.CachePutStoreWriteNoSpaceErrors |
COUNTER |
Number of failures when putting cached data in the client cache but getting disk is full while cache capacity is not achieved. This can happen if the storage overhead ratio to write data is underestimated. |
Client.CacheShadowCacheBytes |
COUNTER |
Amount of bytes in the client shadow cache. |
Client.CacheShadowCacheBytesHit |
COUNTER |
Total number of bytes hit the client shadow cache. |
Client.CacheShadowCacheBytesRead |
COUNTER |
Total number of bytes read from the client shadow cache. |
Client.CacheShadowCacheFalsePositiveRatio |
COUNTER |
Probability that the working set bloom filter makes an error. The value is 0-100. If too high, need to allocate more space |
Client.CacheShadowCachePages |
COUNTER |
Amount of pages in the client shadow cache. |
Client.CacheShadowCachePagesHit |
COUNTER |
Total number of pages hit the client shadow cache. |
Client.CacheShadowCachePagesRead |
COUNTER |
Total number of pages read from the client shadow cache. |
Client.CacheSpaceAvailable |
GAUGE |
Amount of bytes available in the client cache. |
Client.CacheSpaceUsed |
GAUGE |
Amount of bytes used by the client cache. |
Client.CacheSpaceUsedCount |
COUNTER |
Amount of bytes used by the client cache as a counter. |
Client.CacheState |
COUNTER |
State of the cache: 0 (NOT_IN_USE), 1 (READ_ONLY) and 2 (READ_WRITE) |
Client.CacheStoreDeleteTimeout |
COUNTER |
Number of timeouts when deleting pages from page store. |
Client.CacheStoreGetTimeout |
COUNTER |
Number of timeouts when reading pages from page store. |
Client.CacheStorePutTimeout |
COUNTER |
Number of timeouts when writing new pages to page store. |
Client.CacheStoreThreadsRejected |
COUNTER |
Number of rejection of I/O threads on submitting tasks to thread pool, likely due to unresponsive local file system. |
Client.DefaultHiveClientCount |
COUNTER |
Number of instances in the DefaultHiveClientPool. |
Client.FileSystemMasterClientCount |
COUNTER |
Number of instances in the FileSystemMasterClientPool. |
Client.MetadataCacheSize |
GAUGE |
The total number of files and directories whose metadata is cached on the client-side. Only valid if the filesystem is alluxio.client.file.MetadataCachingBaseFileSystem. |
Fuse is a long-running Alluxio client.
Depending on the launching ways, Fuse metrics show as
Fuse reading/writing file count can be used as the indicators for Fuse application pressure.
If a large amount of concurrent read/write occur in a short period of time, each of the read/write operations may take longer time to finish.
When a user or an application runs a filesystem command under Fuse mount point,
this command will be processed and translated by operating system which will trigger the related Fuse operations
exposed in AlluxioFuse.
The count of how many times each operation is called, and the duration of each call will be recorded with metrics name Fuse.<FUSE_OPERATION_NAME>
dynamically.
The following metrics are collected on each instance (Master, Worker or Client).
Alluxio provides overall and detailed memory usage information.
Detailed memory usage information of code cache, compressed class space, metaspace, PS Eden space, PS old gen, and PS survivor space
is collected in each process.