Hadoop Hive MySQL on Ubuntu 20.04

Hadoop Jun 22, 2023

Overall Step

  1. Set env file hive-env.sh
  2. Set main config file hive-site.xml
  3. MySQL setup hive user / pass and Grant
  4. Download and set MySQL JDBC connector
  5. Run schemaTool must see Hive table here
  6. Create Hive default location
  7. Checkpoint with hive command
  8. Run service metastore and server2
  9. Connect with the database tool

Step 1: Set hive-env.sh

Copy template file to the file

cp conf/hive-env.sh.template conf/hive-env.sh

Then go inside to edit the Hadoop path

# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/home/hadoop/hadoop-3.3.1

Step 2: Set hive-site.xml

First of all you have to copy template to be hive-site.xml

cp conf/hive-site.xml.template conf/hive-site.xml

Then sneak and peek to edit the following value. This will set hive to use MySQL.

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore_db?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Hive111!!!</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>Username to use against metastore database</description>
</property>

Step 3: MySQL set up

Assume MySQL is already up and running. Now we create user:

CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY 'Hive111!!!';
GRANT ALL PRIVILEGES ON *.* TO 'hiveuser'@'localhost';

Note that this is password for Hive metastore not Hive session.

This should match the value in the Step 2

Step 4: MySQL JDBC Connector

Download MySQL Connector and Copy Jar to Hive folder
The download can be found here:

MySQL :: MySQL Connector/J 8.0 Developer Guide :: 4.1 Installing Connector/J from a Binary Distribution

Then wget to specific location and copy to the Hive path.

cp mysql-connector-java-8.0.28.jar /home/hadoop/hive/lib/

I try setting CLASSPATH in several location but it doesn’t seem to work.

So please copy.

Step 5: Hive Schema Initial Tool

schematool -initSchema -dbType mysql

You must see Hive table similar to the output below

Step 6: Create hive default location

hdfs dfs -mkdir /tmp
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chmod g+w /tmp
hdfs dfs -chmod g+w /user/hive/warehouse

Step 7: Check point with Hive command line connector

Now you run the command here

hive

There might occurs issues as below:

Possible Issue #1 : Illegal character

Line 3218
Then you can see &#8 which produce the error

Possible Issue #2: ClassCastException
This is because Hive is based on JDK8 not JDK11 yet
References here: https://issues.apache.org/jira/browse/HIVE-22415

Downgrade to JDK 8 then don’t forget to change environment variable both in Hadoop and Hive

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

You can find you JAVA_HOMEby readlink -f $(which java)

Possible Issue #3: tmpdir Relative path in absolute URI

Add below code to beginning of the hive-site.xml file

<property>
<name>system:java.io.tmpdir</name>
<value>/tmp/hive/java</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>

Credit: https://stackoverflow.com/questions/27099898/java-net-urisyntaxexception-when-starting-hive

Possible Warning: SLF4J Class path contains multiple SLF4J bindings.
You can ignore this but if it is annoying you can resolve be remove the duplicated JAR.

Remove lib/log4j-slf4j-impl-2.10.0.jar in the Hive directory

Credit: https://stackoverflow.com/questions/27050820/running-hive-0-12-with-error-of-slf4j

Note: I only remove slf4j but in above thread they suggest remove 2 jar.

Step 8: Running Hive Server

hive --service metastore &
hive --service hiveserver2 &

This is run as background. If everything looks good you can disown it to prevent it to shutdown when your SSH session ends.

jobs
[1] ... metastore
[2] ... hiveserver2
disown %1
disown %2

Step 9: Connecting to Hive

Don’t forget to change port to 10001 since the default hive is on 10000.
It will take some time after start hive server to be able to get connected.

Possible Issues: Required field ‘serverProtocolVersion’ is unset!
Go back to the server, kill all hive process.
Edit the hive-site.xml with the following property:

<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>

Run step 7 again.

Finally we’re done

I wish this would be easier.
I’ve done this 3 times already but it seems painful every time.

So I write this to my future self if I have to do it again and also for people on the internet.

Hope this help you save some time.

Tags

TeamCMD

We are CODEMONDAY team and provide a variety of content about Business , technology, and Programming. Let's enjoy it with us.