Hadoop Hive MySQL on Ubuntu 20.04
Overall Step
- Set env file
hive-env.sh
- Set main config file
hive-site.xml
- MySQL setup hive user / pass and Grant
- Download and set MySQL JDBC connector
- Run schemaTool must see Hive table here
- Create Hive default location
- Checkpoint with
hive
command - Run service metastore and server2
- Connect with the database tool
Step 1: Set hive-env.sh
Copy template file to the file
Then go inside to edit the Hadoop path
HADOOP_HOME=/home/hadoop/hadoop-3.3.1
Step 2: Set hive-site.xml
First of all you have to copy template to be hive-site.xml
Then sneak and peek to edit the following value. This will set hive to use MySQL.
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore_db?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Hive111!!!</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>Username to use against metastore database</description>
</property>
Step 3: MySQL set up
Assume MySQL is already up and running. Now we create user:
GRANT ALL PRIVILEGES ON *.* TO 'hiveuser'@'localhost';
Note that this is password for Hive metastore not Hive session.
This should match the value in the Step 2
Step 4: MySQL JDBC Connector
Download MySQL Connector and Copy Jar to Hive folder
The download can be found here:
Then wget
to specific location and copy to the Hive path.
I try setting CLASSPATH in several location but it doesn’t seem to work.
So please copy.
Step 5: Hive Schema Initial Tool
You must see Hive table similar to the output below
Step 6: Create hive default location
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chmod g+w /tmp
hdfs dfs -chmod g+w /user/hive/warehouse
Step 7: Check point with Hive command line connector
Now you run the command here
There might occurs issues as below:
Possible Issue #1 : Illegal character
Line 3218
Then you can see 
which produce the error
Possible Issue #2: ClassCastException
This is because Hive is based on JDK8 not JDK11 yet
References here: https://issues.apache.org/jira/browse/HIVE-22415
Downgrade to JDK 8 then don’t forget to change environment variable both in Hadoop and Hive
You can find you JAVA_HOME
by readlink -f $(which java)
Possible Issue #3: tmpdir Relative path in absolute URI
Add below code to beginning of the hive-site.xml
file
<name>system:java.io.tmpdir</name>
<value>/tmp/hive/java</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>
Credit: https://stackoverflow.com/questions/27099898/java-net-urisyntaxexception-when-starting-hive
Possible Warning: SLF4J Class path contains multiple SLF4J bindings.
You can ignore this but if it is annoying you can resolve be remove the duplicated JAR.
Remove lib/log4j-slf4j-impl-2.10.0.jar
in the Hive directory
Credit: https://stackoverflow.com/questions/27050820/running-hive-0-12-with-error-of-slf4j
Note: I only remove slf4j
but in above thread they suggest remove 2 jar.
Step 8: Running Hive Server
hive --service hiveserver2 &
This is run as background. If everything looks good you can disown
it to prevent it to shutdown when your SSH session ends.
jobs
[1] ... metastore
[2] ... hiveserver2
disown %1
disown %2
Step 9: Connecting to Hive
Don’t forget to change port to 10001 since the default hive is on 10000.
It will take some time after start hive server to be able to get connected.
Possible Issues: Required field ‘serverProtocolVersion’ is unset!
Go back to the server, kill all hive process.
Edit the hive-site.xml
with the following property:
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
Run step 7 again.
Finally we’re done
I wish this would be easier.
I’ve done this 3 times already but it seems painful every time.
So I write this to my future self if I have to do it again and also for people on the internet.
Hope this help you save some time.