Select Page

Apache Ranger and AWS EMR Automated Installation 4

Laurence Geng
Published: December 14, 2022

In the previous two articles, we introduced the EMR-native Ranger integration solution with OpenLDAP and Windows AD. In this article, we will introduce open-source Ranger integration. This article will discuss “OpenLDAP + Open-Source Ranger.”

1. OpenLDAP + Open-Source Ranger Solution Overview

1.1 Solution Architecture

Solution Architecture

In this solution, OpenLDAP plays the authentication provider, all user accounts data store on it, and Ranger plays the authorization controller. It will sync accounts data from OpenLDAP to grant privileges against user accounts from Open LDAP. Meanwhile, the EMR cluster needs to install a series of Ranger plugins. These plugins will check with the Ranger server to assure the current user has permission to perform an action. An EMR cluster will also sync accounts data from OpenLDAP via SSSD so a user can log in nodes of the EMR cluster and submit jobs. As end users, they can log in SSH nodes of the EMR cluster with her/his OpenLDAP account. If Hue is available, they can also log into Hue with this account.

1.2 Ranger in Detail

Let’s deep dive into Ranger for more details; its architecture looks as follows:

Ranger in Detail

The installer will finish the following jobs:

  1. Install MySQL as a Policy DB for Ranger.
  2. Install Solr as an Audit Store for Ranger.
  3. Install Ranger Admin.
  4. Install Ranger UserSync.
  5. Install the HDFS Ranger Plugin.
  6. Install the Hive Ranger Plugin.

2. Installation and Integration

Generally, the installation and integration process can be divided into three stages: 

1. Prerequisites

2. All-In-One Install

3. Create EMR Cluster

The following diagram illustrates the progress in detail:

Progress in Detail

At stage 1, we need to do some preparatory work. At stage 2, we will start to install and integrate. Here are two options at this stage: one is an all-in-one installation driven by a command-line-based workflow. The other is a step-by-step installation. For most cases, the all-in-one installation is always the best choice; however, your installation workflow may be interrupted by unforeseen errors. If you want to continue installing from the last failed step, please try the step-by-step installation. You may want to re-try a step with different argument values to find the right one; step-by-step is also a better choice. At stage 3, we need to create an EMR cluster. If you already have one, skip this job. In most cases, we need to install Ranger on an existing cluster, not a new cluster. For EMR-native Ranger, it is impossible to install on an existing cluster (because EMR-native Ranger plugins can only be installed when creating a cluster), but open-source Ranger does not have this problem. You are free to install it on an existing or new EMR cluster.

There is a little overlap in the execution sequence between stages 2 and 3. At step 2.4, the installation progress will be pending, and the installer will indicate users to create their own cluster and keep monitoring the target cluster‘s status. Once the cluster is ready, the progress will resume and continue to perform REST actions.

As a design principle, the installer does not include any actions to create an EMR cluster. You should always create your cluster by yourself because an EMR cluster could have unpredictable settings, i.e., application-specific (HDFS, Yarn, etc.) configuration, step scripts, bootstrap scripts, and so on; it is unadvised to couple Ranger’s installation with the EMR cluster’s creation.

Notes

  1. The installer will treat the local host as a Ranger server to install everything on Ranger. For non-Ranger operations, i.e., installing OpenLDAP, it will initiate remote operations via SSH. So, you can stay on the Ranger server to execute command lines. There is no need to switch among multiple hosts.
  2. Although it is not required, we suggest you always use FQDN as the host address. Both IP and hostnames without domain names are not recommended.

2.1 Prerequisites

2.1.1 Create EC2 Instances as Ranger and OpenLDAP Server

First, we need to prepare two EC2 instances. One as the server of Ranger, and the other as the server of OpenLDAP. When creating instances, please select Amazon Linux 2 image and guarantee network connections among instances and the cluster to be created are reachable.

As a best practice, it’s recommended to add a Ranger server into the ElasticMapReduce-master security group because Ranger is very close to the EMR cluster; it can be regarded as a non-EMR-build-in master service. For OpenLDAP, we must make sure its port 389 is reachable from Ranger and all nodes of the EMR cluster to be created. To be simple, you can also add OpenLDAP into the ElasticMapReduce-master security group.

2.1.2 Download Installer

After EC2 instances are ready, pick the Ranger server, log in via SSH, and run the following commands to download the installer package:

sudo yum -y install git
git clone https://github.com/bluishglc/ranger-emr-cli-installer.git

2.1.3 Upload SSH Key File

As mentioned before, the installer is based on a local host (Ranger server). To perform remote installing actions on OpenLDAP or EMR cluster, an SSH private key is required, so we should upload it to the the Ranger server and make a note of the file path; it will be the value of variable SSH_KEY.

2.1.4 Export Environment-Specific Variables

During installation, the following environment-specific arguments will be passed more than once; it’s recommended to export them first, then all command lines will refer to these variables instead of literals.

export REGION='TO_BE_REPLACED'
export ACCESS_KEY_ID='TO_BE_REPLACED'
export SECRET_ACCESS_KEY='TO_BE_REPLACED'
export SSH_KEY='TO_BE_REPLACED'
export OPENLDAP_HOST='TO_BE_REPLACED'

The following are comments of the above variables:

  • REGION: the AWS Region, i.e., cn-north-1, us-east-1, and so on.
  • ACCESS_KEY_ID: the AWS access key ID of your IAM account. Be sure your account has enough privileges; it’s better having admin permissions.
  • SECRET_ACCESS_KEY: the AWS secret access key of your IAM account.
  • SSH_KEY: the SSH private key file path on the local host you just uploaded.
  • OPENLDAP_HOST: the FQDN of the OpenLDAP server.

Please carefully replace the above variables’ value according to your environment, and remember to use FQDN as the hostname, i.e., OPENLDAP_HOST. The following is a copy of the example:

export REGION='cn-north-1'
export ACCESS_KEY_ID='<change-to-your-aws-access-key-id>'
export SECRET_ACCESS_KEY='<change-to-your-aws-secret-access-key>'
export SSH_KEY='/home/ec2-user/key.pem'
export OPENLDAP_HOST='ip-10-0-14-0.cn-north-1.compute.internal'

2.2 All-In-One Installation

2.2.1 Quick Start

Now, let’s start an all-in-one installation. Execute this command line:

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install \
    --region "$REGION" \
    --access-key-id "$ACCESS_KEY_ID" \
    --secret-access-key "$SECRET_ACCESS_KEY" \
    --ssh-key "$SSH_KEY" \
    --solution 'open-source' \
    --auth-provider 'openldap' \
    --openldap-host "$OPENLDAP_HOST" \
    --openldap-base-dn 'dc=example,dc=com' \
    --openldap-root-cn 'admin' \
    --openldap-root-password 'Admin1234!' \
    --openldap-user-dn-pattern 'uid={0},ou=users,dc=example,dc=com' \
    --openldap-group-search-filter '(member=uid={0},ou=users,dc=example,dc=com)' \
    --openldap-user-object-class 'inetOrgPerson' \
    --example-users 'example-user-1,example-user-2' \
    --ranger-plugins 'open-source-hdfs,open-source-hive'

For the parameters specification of the above command line, please refer to the appendix. If everything goes well, the command line will execute steps 2.1 to 2.3 in the workflow diagram. This may take ten minutes or more, depending on the bandwidth of your network. Next, it will suspend and indicate for the user to enter the EMR cluster id. If the target cluster exists, we can fill its ID immediately. If not, we should switch to the EMR web console to create it. Next, the command line asks users to confirm if it will let Hue integrate with LDAP or not. If so, when the cluster is ready, the installer will update the EMR configuration with Hue-specific settings (this action will overwrite EMR’s existing configuration).

Fill in the above two items, and enter “y” to confirm all inputs. The installation process will resume, and if the target EMR cluster is not ready yet, the command line will keep monitoring until it goes into a “WAITING” status. The following is a snapshot for this moment of the command line:

Snapshot of Command Line

When the cluster is ready (status is “WAITING”), the command line will continue to execute steps 2.5 to 2.8 of the workflow, and end with an “ALL DONE!!” message.

2.2.2 Customization

Now that the all-in-one installation is done, we will introduce more about customization. Generally, this installer follows the principle of “Convention over Configuration.” Most parameters are preset by default values, an equivalent version with the full parameter list of the above command line is as follows:

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install \
    --region "$REGION" \
    --access-key-id "$ACCESS_KEY_ID" \
    --secret-access-key "$SECRET_ACCESS_KEY" \
    --ssh-key "$SSH_KEY" \
    --solution 'open-source' \
    --auth-provider 'openldap' \
    --openldap-host "$OPENLDAP_HOST" \
    --openldap-base-dn 'dc=example,dc=com' \
    --openldap-root-cn 'admin' \
    --openldap-root-password 'Admin1234!' \
    --openldap-user-dn-pattern 'uid={0},ou=users,dc=example,dc=com' \
    --openldap-group-search-filter '(member=uid={0},ou=users,dc=example,dc=com)' \
    --openldap-user-object-class 'inetOrgPerson' \
    --example-users 'example-user-1,example-user-2' \
    --ranger-plugins 'open-source-hdfs,open-source-hive' \
    --java-home '/usr/lib/jvm/java' \
    --skip-install-mysql 'false' \
    --skip-install-solr 'false' \
    --skip-install-openldap 'false' \
    --skip-configure-hue 'false' \
    --ranger-host $(hostname -f) \
    --ranger-version '2.1.0' \
    --mysql-host $(hostname -f) \
    --mysql-root-password 'Admin1234!' \
    --mysql-ranger-db-user-password 'Admin1234!' \
    --solr-host $(hostname -f) \
    --ranger-bind-dn 'cn=ranger,ou=services,dc=example,dc=com' \
    --ranger-bind-password 'Admin1234!' \
    --hue-bind-dn 'cn=hue,ou=services,dc=example,dc=com' \
    --hue-bind-password 'Admin1234!' \
    --sssd-bind-dn 'cn=sssd,ou=services,dc=example,dc=com' \
    --sssd-bind-password 'Admin1234!' \
    --restart-interval 30

The full-parameters version gives us a complete perspective of all custom options. In the following scenarios, you may change some of the options’ values:

  1. If you want to change the default organization name: dc=example,dc=com, or the default password: Admin1234!, please run the full-parameters version, and replace them with your own values.
  2. If you need to integrate with external facilities, i.e., a centralized OpenLDAP or an existing MySQL or Solr, please add the corresponding --skip-xxx-xxx options and set it to true.
  3. If you have another pre-defined Bind DN for Hue, Ranger, and SSSD, please add the corresponding --xxx-bind-dn and --xxx-bind-password options to set them. Note: the Bind DN for Hue, Ranger, and SSSD will be created automatically when installing OpenLDAP, but they are fixed with the following naming pattern: cn=hue|ranger|sssd,ou=services,<your-base-dn>, not the given value of the “–xxx-bind-dn” option, so if you assign another DN with the “–xxx-bind-dn” option, you must create this DN by yourself in advance. The reason this install does not create the DN assigned by the “–xxx-bind-dn” option is that a DN is a tree path. To create it, we must create all nodes in the path; it is not cost-effective to implement such a small but complicated function.

2.3 Step-By-Step Installation

As an alternative, you can also select a step-by-step installation instead of an all-in-one installation. We give the command line of each step. As for the comments for each parameter, please refer to the appendix.

2.3.1 Init EC2

This step will finish some fundamental jobs, i.e., install AWS CLI, JDK, and so on.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh init-ec2 \
    --region "$REGION" \
    --access-key-id "$ACCESS_KEY_ID" \
    --secret-access-key "$SECRET_ACCESS_KEY"

2.3.2 Install OpenLDAP

This step will install OpenLDAP on the given OpenLDAP host, as mentioned above. Although this action is performed on the OpenLDAP server, you don’t need to log into the OpenLDAP server. You only need to run the command line on the local host (the Ranger server).

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-openldap \
    --region "$REGION" \
    --access-key-id "$ACCESS_KEY_ID" \
    --secret-access-key "$SECRET_ACCESS_KEY" \
    --ssh-key "$SSH_KEY" \
    --solution 'open-source' \
    --auth-provider 'openldap' \
    --openldap-host "$OPENLDAP_HOST" \
    --openldap-base-dn 'dc=example,dc=com' \
    --openldap-root-cn 'admin' \
    --openldap-root-password 'Admin1234!'

2.3.3 Install Ranger

This step will install all server-side components of Ranger, including MySQL, Solr, Ranger Admin, and Ranger UserSync.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-ranger \
    --region "$REGION" \
    --access-key-id "$ACCESS_KEY_ID" \
    --secret-access-key "$SECRET_ACCESS_KEY" \
    --solution 'open-source' \
    --auth-provider 'openldap' \
    --openldap-host "$OPENLDAP_HOST" \
    --openldap-base-dn 'dc=example,dc=com' \
    --ranger-bind-dn 'cn=ranger,ou=services,dc=example,dc=com' \
    --ranger-bind-password 'Admin1234!' \
    --openldap-user-dn-pattern 'uid={0},ou=users,dc=example,dc=com' \
    --openldap-group-search-filter '(member=uid={0},ou=users,dc=example,dc=com)' \
    --openldap-user-object-class 'inetOrgPerson'

2.3.4 Create EMR Cluster

For a step-by-step installation, there is no interactive process for creating an EMR cluster, so feel free to create the cluster on the EMR web console, but we have to wait until the cluster is completely ready (in “WAITING” status), then export the following environment-specific variables:

export EMR_CLUSTER_ID='TO_BE_REPLACED'

The following is a copy of an example:

export EMR_CLUSTER_ID='j-2S04VJZ5YQHZ4'

2.3.5 Install Ranger Plugins

This step will install HDFS and Hive plugins on the Ranger server side and agent side (EMR nodes). This is different from the EMR-native Ranger solution. For EMR-native Ranger, EMR will install agent sides on each node automatically. For open-source Ranger, we have to do this job by ourselves via this installer.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-ranger-plugins \
    --region "$REGION" \
    --ssh-key "$SSH_KEY" \
    --solution 'open-source' \
    --auth-provider 'openldap' \
    --ranger-plugins 'open-source-hdfs,open-source-hive' \
    --emr-cluster-id "$EMR_CLUSTER_ID"

2.3.6 Install SSSD

This step will install and config SSSD on each node of the EMR cluster. The same to installing OpenLDAP; we should still keep the local host to run the command line; it will perform on remote nodes via SSH.

sudo ./ranger-emr-cli-installer/bin/setup.sh install-sssd \
    --region "$REGION" \
    --ssh-key "$SSH_KEY" \
    --openldap-host "$OPENLDAP_HOST" \
    --openldap-base-dn 'dc=example,dc=com' \
    --sssd-bind-dn 'cn=sssd,ou=services,dc=example,dc=com' \
    --sssd-bind-password 'Admin1234!' \
    --emr-cluster-id "$EMR_CLUSTER_ID"

2.3.7 Configure Hue

This step will update the Hue configuration of EMR, as highlighted in the all-in-one installation. If you have another customized EMR configuration, please skip this step, but you can still manually merge the generated JSON file for the Hue configuration by the command line into your own JSON.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh configure-hue \
    --region "$REGION" \
    --auth-provider 'openldap' \
    --openldap-host "$OPENLDAP_HOST" \
    --openldap-base-dn 'dc=example,dc=com' \
    --hue-bind-dn 'cn=hue,ou=services,dc=example,dc=com' \
    --hue-bind-password 'Admin1234!' \
    --openldap-user-object-class 'inetOrgPerson' \
    --emr-cluster-id "$EMR_CLUSTER_ID"

2.3.8 Create Example Users

This step will create two example users to facilitate the following verification.

sudo sh ./ranger-emr-cli-installer/bin/setup.sh add-example-users \
    --region "$REGION" \
    --ssh-key "$SSH_KEY" \
    --solution 'open-source' \
    --auth-provider 'openldap' \
    --openldap-host "$OPENLDAP_HOST" \
    --openldap-base-dn 'dc=example,dc=com' \
    --openldap-root-cn 'admin' \
    --openldap-root-password 'Admin1234!' \
    --example-users 'example-user-1,example-user-2'

3. Verification

After the installation and integration are completed, it’s time to see if Ranger works or not. The verification jobs are divided into two parts, which are against HDFS and Hive. First, let’s log into OpenLDAP via a client, i.e., LDAP Admin or Apache Directory Studio, then check out all DN; it should look as follows:

Verification

Next, open the Ranger web console. The address is http://<YOUR-RANGER-HOST>:6080, and the default admin account/password is admin/admin. After logging in, we should open the “Users/Groups/Roles” page first and see if example users on OpenLDAP are already synchronized to Ranger as follows:

Synchronized to RangerNext, log into the master node of the EMR cluster and export the cluster ID because subsequent command lines need this variable.

# run on master node of emr cluster
export EMR_CLUSTER_ID='TO_BE_REPLACED'

The following is a copy of the example:

# run on master node of emr cluster
export EMR_CLUSTER_ID='j-2S04VJZ5YQHZ4'

3.1 HDFS Access Control Verification

Usually, there are a set of pre-defined policies for the HDFS plugin after the installation as follows:

HDFS

We do not configure any HDFS permissions for example-user-1, but if we log into Hue with the account example-user-1, you will see it can browse most directories and files on HDFS. This is because most directories and files have a+w permission. Please keep in mind that HDFS r/w/x file mode attributes and Ranger-based permissions always become effective at the same time.

To verify if the HDFS plugin works, select “blacklist” mode to test. First, let’s create a directory named /ranger-test on HDFS and set example-user-1 as its owner:

# run on master node of emr cluster
sudo -u hdfs hdfs dfs -mkdir /ranger-test
sudo -u hdfs hdfs dfs -chown example-user-1:example-group /ranger-test
sudo -u hdfs hdfs dfs -chmod 700 /ranger-test

Next, let’s add a deny-policy, which disables example-user-1 read and write ranger-test:

Ranger Test

Any policy changes on the Ranger web console will sync to the agent side (EMR cluster nodes) within thirty seconds. We can run the following commands on the master node to see if the local policy file is updated:

# run on master node of emr cluster
for i in {1..10}; do
    printf "\n%100s\n\n"|tr ' ' '='
    sudo stat /etc/ranger/HDFS_${EMR_CLUSTER_ID}/policycache/hdfs_HDFS_${EMR_CLUSTER_ID}.json
    sleep 3
done

Once the local policy file is up to date, the deny policy becomes effective. Next, log into Hue with the OpenLDAP account “example-user-1” created by the installer, open “File Browser,” click root directory “/”, and then click the “ranger-test” folder. We will get the following error message: “Cannot access:/ranger-test:”

Ranger Test Folder

Even if the current user example-user-1 is the owner of this folder, it is still blocked by the Ranger HDFS plugin. This means the HDFS access control is managed by Ranger.

Finally, remember to remove the “ranger-test” policy so example-user-1 has full privileges to access this folder because the following Hive verification will re-use this folder.

3.2 Hive Access Control Verification

Usually, there is a set of pre-defined policies for the Hive plugin after installation. To eliminate interference and keep verification simple, let’s remove them first:

Eliminate Interference

Any policy changes on the Ranger web console will sync to the agent side (EMR cluster nodes) within thirty seconds. We can run the following commands on the master node to see if the local policy file is updated:

# run on master node of emr cluster
for i in {1..10}; do
    printf "\n%100s\n\n"|tr ' ' '='
    sudo stat /etc/ranger/HIVE_${EMR_CLUSTER_ID}/policycache/hiveServer2_HIVE_${EMR_CLUSTER_ID}.json
    sleep 3
done

Once the local policy file is up to date, the removing-all-policies action becomes effective. Next, log into Hue with the OpenLDAP account “example-user-1” created by the installer, open Hive editor, and enter the following SQL (remember to replace “ranger-test” with your own bucket) to create a test table (change ‘ranger-test’ to your own bucket name):

-- run in hue hive editor
create table ranger_test (
  id bigint
)
row format delimited
stored as textfile location '/ranger-test';

Next, run it and an error occurs:

It shows example-user-1 is blocked by database-related permissions. This proves the Hive plugin is working. Next, we go back to Ranger and add a Hive policy named “all – database, table, column” as follows:

Working Hive Plugin

It grants example-user-1 all privileges on all databases, tables, and columns. Next, check the policy file again on the master node with the previous command line. Once updated, go back to Hue, re-run that SQL, and it will go well as follows:

Successful Run

To double check if example-user-1 has full read and write permissions on the table, we can run the following SQL:

insert into ranger_test(id) values(1);
insert into ranger_test(id) values(2);
insert into ranger_test(id) values(3);
select * from ranger_test;

The execution result is:

Execution Result

By now, Hive access control verifications are passed.

4. Appendix

The following is parameter specification:

Parameter

Comment

–region

The AWS region.

–access-key-id

The AWS access key id of your IAM account.

–secret-access-key

The AWS secret access key of your IAM account.

–ssh-key

The SSH private key file path.

–solution

The solution name, accepted values ‘open-source’ or ‘emr-native.’

–auth-provider

The authentication provider, accepted values ‘AD’ or ‘OpenLDAP.’

–openldap-host

The FQDN of the Open LDAP host.

–openldap-base-dn

The Base DN of Open LDAP. For example: ‘dc=example,dc=com,’ change it according to your env.

–openldap-root-cn

The cn of the root account. For example: ‘admin,’ change it according to your env.

–openldap-root-password

The password of the root account. For example: ‘Admin1234!,’ change it according to your env.

–ranger-bind-dn

The Bind DN for Ranger. For example: ‘cn=ranger,ou=services,dc=example,dc=com.’ This should be an existing DN on Windows AD/OpenLDAP. Change it according to your env.

–ranger-bind-password

The password of Ranger Bind DN. For example: ‘Admin1234!,’ change it according to your env.

–openldap-user-dn-pattern

The DN pattern for Ranger to search users on OpenLDAP. For example: ‘uid={0},ou=users,dc=example,dc=com.’ Change it according to your env.

–openldap-group-search-filter

The filter for Ranger to search groups on OpenLDAP. For example: ‘(member=uid={0},ou=users,dc=example,dc=com).’ Change it according to your env.

–openldap-user-object-class

The user object class for Ranger to search users. For example: ‘inetOrgPerson.’ Change it according to your env.

–hue-bind-dn

The Bind DN for Hue. For example: ‘cn=hue,ou=services,dc=example,dc=com.’ This should be an existing DN on Windows AD/Open LDAP. Change it according to your env.

–hue-bind-password

The password of the Hue Bind DN. For example: ‘Admin1234!.’ Change it according to your env.

–example-users

The example users to be created on OpenLDAP and Kerberos to demo Ranger’s feature. This parameter is optional, if omitted, no example users will be created.

–ranger-bind-dn

The Bind DN for Ranger. For example: ‘cn=ranger,ou=services,dc=example,dc=com.’ This should be an existing DN on Windows AD/Open LDAP. Change it according to your env.

–ranger-bind-password

The password of the Bind DN. For example: ‘Admin1234!.’ Change it according to your env.

–hue-bind-dn

The Bind DN for Hue. For example:

‘cn=hue,ou=services,dc=example,dc=com.’ This should be an existing DN on Windows AD/Open LDAP. Change it according to your env.

–hue-bind-password

The password of the Hue Bind DN. For example: ‘Admin1234!.’ Change it according to your env.

–sssd-bind-dn

The Bind DN for SSSD. For example: ‘cn=sssd,ou=services,dc=example,dc=com.’ This should be an existing DN on Windows AD/OpenLDAP. Change it according to your env.

–sssd-bind-password

The password of SSSD Bind DN. For example: ‘Admin1234!.’ Change it according to your env.

–ranger-plugins

The Ranger plugins to be installed, comma separated for multiple values. For example: ‘open-source-hdfs,open-source-hive.’ Change it according to your env.

–skip-configure-hue

Skip to configure Hue. Accepted values ‘true’ or ‘false.’ The default value is ‘false.’

–skip-migrate-kerberos-db

Skip to the migrate Kerberos database. Accepted values ‘true’ or ‘false.’ The default value is ‘false.’

Source: dzone.com