Data Orchestration Hub

Slack Docker Pulls

Data Orchestration Hub, or the Hub, is a management console that makes it easy to manage an analytics cluster and connect it with multiple data sources to unify data lakes. The service provides an easy to use unified management view for configuration and monitoring, and wizard based curation of deployment workflows.

  • Connect Your Data Sources: Connect Alluxio to data storage and catalogs across multiple clouds, single cloud or on-premises using guided wizards.
  • Monitor Your Alluxio Cluster: Monitor your Alluxio cluster.
  • Manage Configuration: Set and distribute configuration for a cluster.

When to Use

Data Orchestration Hub is co-deployed on an analytics cluster running Alluxio, and is bundled with Alluxio Enterprise Edition. The Hub connects to a single co-located Alluxio cluster, and manages that instance of Alluxio only. Further instructions can be found in the deployment section below.

Once connected to an Alluxio cluster, the Hub can be used to modify the state of the Alluxio cluster, such as updating configuration and restarting processes. The following scenarios illustrate usage of the Hub web interface.

Scenario A: Managing an Alluxio cluster

The Hub can be used to view a dashboard to monitor the state of processes on the cluster, as well as update configuration and restart processes.

hub_cluster_page

Monitor the status of an Alluxio cluster anywhere. You can start or stop cluster components from an intuitive UI.

Scenario B: Connecting to data sources across regions

Alluxio is used to connect a compute cluster with data sources across private data-centers and public clouds potentially over a wide area network. The Hub uses a self-guided wizard based approach to allow users to connect to data sources and catalogs in the same or remote data centers. A user is guided through the required configuration steps along with validation of the connection.

These wizards are applicable for multiple scenarios including: hybrid cloud, cross-data center, single cloud or private data center deployments.

hub_data_storage

Connect Alluxio to all your data sources across multiple clouds, single cloud or on-premises using self-guided wizards.

Further usage scenarios and descriptions for the available toolset can be found by following this section below.

Deployment

The Hub consists of the following components deployed on your Alluxio cluster.

  • Hub Manager: The Hub Manager is the entrypoint for a user and the web server for the sole. This is a process that runs on the same node as a Alluxio Master by default, and provides the REST endpoints to serve UI requests. When using multiple Alluxio masters, any node can be chosen to deploy the Hub Manager.
  • Hub Agent: The Hub agents are deployed on both Alluxio Masters and Alluxio Workers. These agent processes serve requests from the Hub Manager to make changes to the cluster without SSH access.

The following diagram illustrates the Hub architecture:

hub_architecture

Hub Agents must be present on all managed nodes whereas the Hub Manager is a single instance.

Choose your compute environment to see how to deploy Data Orchestration Hub.


Configuration

For a list of properties applicable to the Hub, please search for properties prefixed with alluxio.hub on this page. Note that alluxio.hub.manager.web.login.username and alluxio.hub.manager.web.login.password define the necessary credentials to sign in to the console.

These properties should be set in alluxio-site.properties before starting the Hub processes. The mechanism varies depending on the compute environment selected as in the deployment section above.

What next

Once deployed, you can visit the web console at port 30077 (default) on the node running the Hub Manager. Sign in using the configured username and password.

hub_cluster_page

Sign in using the admin credentials. Default: Username = 'alluxio', Password = 'alluxio'.

In the console you have access to the following: