Alluxio Community Day VIII

Join us at our next virtual community event on December 14th featuring fellow Alluxio community users from Apache Iceberg and WeRide.

Contributor Tools

Slack Docker Pulls GitHub edit source

IDE

We recommend using either Eclipse or IntelliJ IDEA to contribute to Alluxio. Instructions for setting up both IDEs can be found below.

Eclipse

You can generate an Eclipse configuration file by running:

$ mvn clean -Pdeveloper install -DskipTests
$ mvn clean -Pdeveloper -DskipTests eclipse:eclipse -DdownloadJavadocs=true -DdownloadSources=true

Then import the folder into Eclipse. You may also have to add the classpath variable M2_REPO by running:

$ mvn -Declipse.workspace="your Eclipse Workspace" eclipse:configure-workspace

Note: Alluxio 2.2 moved generated gRPC proto source files into alluxio/core/transport/target/generated-sources/protobuf/. You will need to mark the directory as a source folder for Eclipse to resolve the source files.

IntelliJ IDEA

To use IntelliJ IDEA to contribute to Alluxio, simply open IntelliJ and select “Import existing project”. Then select the “Maven” project type from the IntelliJ dialog. IntelliJ’s default configuration works without any modifications.

Enable the developer Maven profile

After successfully importing your local Alluxio repo into IntelliJ, you may need to add the Maven profile ‘developer’ in order to avoid import errors.

You can do this by going to

View > Tool Windows > Maven

In the Maven panel, find the “developer” profile and check the box next to it in the “Profiles” list.

Generated sources

Some source files in Alluxio are generated from templates or compiled from other languages.

  1. gRPC and ProtoBuf definitions are compiled into Java source files. Alluxio 2.2 moved generated gRPC proto source files into core/transport/target/generated-sources/protobuf/.
  2. Compile time project constants are defined in core/common/src/main/java-templates/ and compiled to core/common/target/generated-sources/java-templates/.

You will need to mark these directories as “Generated Sources Root” for IntelliJ to resolve the source files. Alternatively, you can let IntelliJ generate them and mark the directories automatically by running “Generate Sources and Update Folders for All Projects”. You can find the button to trigger the generation at the top of the Maven panel, or you can search for this action from the Navigate > Search Everywhere dialog.

See also Modifying a gRPC definition and Modifying a Protocol Buffer Message.

Run Alluxio processes within IntelliJ IDEA

Start a single master Alluxio cluster
  1. Run dev/intellij/install-runconfig.sh
  2. Restart IntelliJ IDEA
  3. Edit conf/alluxio-site.properties to contain these configurations
    alluxio.home=/path/to/alluxio
    alluxio.master.hostname=localhost
    alluxio.job.master.hostname=localhost
    alluxio.master.journal.type=UFS
    
  4. Edit conf/log4j.properties to print log in console Replace the log4j.rootLogger configuration with
     log4j.rootLogger=INFO, ${alluxio.logger.type}, ${alluxio.remote.logger.type}, stdout
    

    and add the following configurations

     log4j.threshold=ALL
     log4j.appender.stdout=org.apache.log4j.ConsoleAppender
     log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
     log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    
  5. Format the Alluxio master by running bin/alluxio formatMasters
  6. In Intellij, start Alluxio master process by selecting Run > Run > AlluxioMaster
  7. In Intellij, start Alluxio job master process by selecting Run > Run > AlluxioJobMaster
  8. Prepare the RamFS and format the Alluxio Worker with bin/alluxio-mount.sh SudoMount && bin/alluxio formatWorker
  9. In Intellij, start Alluxio worker process by selecting Run > Run > AlluxioWorker
  10. In Intellij, start Alluxio job worker process by selecting Run > Run > AlluxioJobWorker
  11. Verify the Alluxio cluster is up as Running Alluxio Locally
Start a High Availability (HA) Alluxio cluster
  1. Create journal directories for the masters
     mkdir -p /tmp/alluxio-tmp/alluxio-0/journal
     mkdir -p /tmp/alluxio-tmp/alluxio-1/journal
     mkdir -p /tmp/alluxio-tmp/alluxio-2/journal
    

    These directories are defined in the run configurations, i.e. alluxio/dev/intellij/runConfigurations/AlluxioMaster_0.xml.

    Note: If the journal folders exist, and you want to apply a new HA cluster, you should clear files in the journal folders first.

  2. Run dev/intellij/install-runconfig.sh
  3. Restart IntelliJ IDEA
  4. Edit conf/alluxio-site.properties to contain these configurations
     alluxio.home=/path/to/alluxio
     alluxio.master.hostname=localhost
     alluxio.job.master.hostname=localhost
     alluxio.master.embedded.journal.addresses=localhost:19200,localhost:19201,localhost:19202
     alluxio.master.rpc.addresses=localhost:19998,localhost:19988,localhost:19978
    

    The ports are defined in the run configurations.

  5. In Intellij, start the Alluxio master processes by selecting Run > Run > AlluxioMaster-0, AlluxioMaster-1, and AlluxioMaster-2
  6. Prepare the RamFS and format the Alluxio Worker with bin/alluxio-mount.sh SudoMount && bin/alluxio formatWorker
  7. In Intellij, start the Alluxio worker process by selecting Run > Run > AlluxioWorker
  8. In Intellij, start the Alluxio job master process by selecting Run > Run > AlluxioJobMaster
  9. In Intellij, start the Alluxio job worker process by selecting Run > Run > AlluxioJobWorker
  10. Verify the HA Alluxio cluster is up, by running bin/alluxio fsadmin journal quorum info -domain MASTER, and you will see output like this:
    Journal domain	: MASTER
    Quorum size	: 3
    Quorum leader	: localhost:19201
        
    STATE       | PRIORITY | SERVER ADDRESS
    AVAILABLE   | 0        | localhost:19200
    AVAILABLE   | 0        | localhost:19201
    AVAILABLE   | 0        | localhost:19202
    

    In addition, you can also start a High Availability (HA) JobMaster process on this basis.

  11. Stop the Alluxio job master and worker processes from step 8 and 9 if they are running.
  12. Edit conf/alluxio-site.properties and add these configurations
    alluxio.job.master.rpc.addresses=localhost:20001,localhost:20011,localhost:20021
    alluxio.job.master.embedded.journal.addresses=localhost:20003,localhost:20013,localhost:20023
    
  13. In Intellij, start the Alluxio job master processes by selecting Run > Run > AlluxioJobMaster-0, AlluxioJobMaster-1, and AlluxioJobMaster-2
  14. In Intellij, start the Alluxio job worker process by selecting Run > Run > AlluxioJobWorker
  15. Verify the HA JobMaster cluster is up, by running bin/alluxio job leader, and you will see output like this:
    localhost
    
Start an AlluxioFuse process
  1. Start a single master Alluxio cluster, or a High Availability cluster in Intellij.
  2. In Intellij, start AlluxioFuse process by selecting Run > Run > AlluxioFuse. This creates a FUSE mount point at /tmp/alluxio-fuse.
  3. Verify the FUSE filesystem is working by running these commands:
     $ touch /tmp/alluxio-fuse/tmp1
     $ ls /tmp/alluxio-fuse
     $ bin/alluxio fs ls /
    

    You should be able see the file is created and listed by both ls commands.

Maven Targets and Plugins

Before pushing changes or submitting pull requests, we recommend running various maven targets on your local machine to make sure your changes do not break existing behavior.

For these maven commands we’ll assume that your command terminal is located in the root directory of your local copy of the Alluxio repository.

$ cd ${ALLUXIO_HOME}

Checkstyle

To make sure your code follows our style conventions you may run. Note that this is run any time you run targets such as compile, install, or test.

$ mvn checkstyle:checkstyle

FindBugs

Before submitting the pull-request, run the latest code against the findbugs Maven plugin to verify no new warnings are introduced.

$ mvn compile findbugs:findbugs

Compilation

To simply compile the code you can run the following command:

$ mvn clean compile -DskipTests

This will not execute any unit tests but will execute maven plugins such as checkstyle and findbugs.

To speed up compilation you may use the following command:

$ mvn -T 2C compile -DskipTests -Dmaven.javadoc.skip -Dfindbugs.skip -Dcheckstyle.skip \
  -Dlicense.skip -pl '!webui'

This command will skip many of our checks that are in place to help keep our code neat. We recommend running all checks before committing.

  • -T 2C runs maven with up to 2 threads per CPU core
  • -DskipTests skips running unit and integration tests
  • -Dmaven.javadoc.skip skips javadoc generation
  • -Dfindbugs.skip skips findbugs execution
  • -Dcheckstyle.skip skips code-style checking
  • -Dlicense.skip skips checking files for license headers
  • -pl '!webui' skips building the Alluxio UI module. If this module isn’t compiled then the UI cannot be accessed locally.

You may replace the compile target in the above command with any other valid maven target to skip checks as well. The targets compile, verify, and install are typically the most useful.

Creating a Local Install

If you want to test your changes with a compiled version of the repository, you may generate the jars with the Maven install target. The first time Maven executes it will likely need to download many dependencies. Please be patient as the first build may take a while.

$ mvn -T 2C install -DskipTests

After the install target executes, you can follow the instructions at Running Alluxio Locally to start a local cluster.

Unit Tests

Run all unit and integration tests

$ cd ${ALLUXIO_HOME}
$ mvn test

This will use the local filesystem as the under storage.

Run a single unit test

$ mvn -Dtest=<AlluxioTestClass>#<testMethod> -DfailIfNoTests=false test

Run unit tests for a specific module

You can execute the maven test command targeting the desired submodule directory. For example, to run tests for HDFS UFS module you would run

$ mvn test -pl underfs/hdfs

Run unit tests for HDFS UFS module with a different Hadoop version

# build and run test on HDFS under storage module for Hadoop 2.7.0
$ mvn test -pl underfs/hdfs -Phadoop-2 -Dhadoop.version=2.7.0

# build and run test on HDFS under storage module for Hadoop 3.0.0
$ mvn test -pl underfs/hdfs -Phadoop-3 -Dhadoop.version=3.0.0

The above unit tests will create a simulated HDFS service with a specific version. To run more comprehensive tests on HDFS under storage using a real and running HDFS deployment:

$ mvn test -pl underfs/hdfs -PufsContractTest -DtestHdfsBaseDir=hdfs://ip:port/alluxio_test

Redirect logs to STDOUT

To have the logs output to STDOUT, append the following arguments to the mvn command

-Dtest.output.redirect=false -Dalluxio.root.logger=DEBUG,CONSOLE

Test FUSE

The FUSE tests are ignored if the libfuse library is missing. To run those tests, please install the libraries referenced in the Alluxio FUSE documentation.

Modifying a gRPC definition

Alluxio uses gRPC 1.37.0 for RPC communication between clients and servers. The .proto files defined in core/transport/src/grpc/ are used to auto-generate Java code for calling the RPCs on clients and implementing the RPCs on servers. To regenerate Java code after changing a gRPC definition, you must rebuild alluxio-core-transport module with 'generate' maven profile.

$ mvn clean install -Pgenerate -pl "org.alluxio:alluxio-core-transport"

Modifying a Protocol Buffer Message

Alluxio uses Protocol Buffers 3.12 to read and write journal entries. The .proto files defined in core/transport/src/proto/ are used to auto-generate Java definitions for the protocol buffer messages.

To change one of these messages, first read about updating a message type to make sure your change will not break backwards compatibility. To regenerate Java code after changing a definition, you must rebuild alluxio-core-transport module with the 'generate' maven profile.

$ mvn clean install -Pgenerate -pl "org.alluxio:alluxio-core-transport"

Usage of ./bin/alluxio

Please refer to Alluxio commands for all available commands.

Some commands have different prerequisites.

All commands except bootstrapConf, killAll, copyDir and clearCache will require that you have already built Alluxio (see Build Alluxio Master Branch about how to build Alluxio manually).

Some commands require the Alluxio cluster to be running, and others do not. Please check all Alluxio commands where each command specifies if it requires the Alluxio cluster to be running.