Python and Big Data is being widely accepted by the IT industry due to several reasons.
If you are new to this topic, then you should first look at the blogs below.
Going further we will see how to get connection between python and big data environment.
Note: Here are the tool versions we will be using to establish connection in between.
Centos: 6.9
Hadoop : 2.6
Python: 3.6.9
Pip 3
The system is linux and must have hadoop installed.
All the linux systems come with python installed.
To install hadoop you can visit https://acadgild.com/blog/key-configurations-in-hadoop-installation.
My system already has hadoop 2.6 and python2.6.
Below are commands we need to run to get the above mentioned python and pip version.
​sudo rm -rf /usr/bin/python sudo ln -s /usr/bin/python2.6 /usr/bin/python cd /usr/local/bin/ yum install gcc openssl-devel bzip2-devel sqlite-devel wget https://www.python.org/ftp/python/3.6.9/Python-3.6.9.tgz tar xzf Python-3.6.9.tgz cd Python-3.6.9 ./configure --enable-optimizations make altinstall ls #and see if python3.6 is present curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py" python get-pip.py ls #and see if pip3 is present cd sudo vi .bashrc alias python='/usr/local/bin/python3.6' alias pip='/usr/local/bin/pip3.6' :wq source .bashrc python -V pip -V
Once after verifying the versions we want, we can now make connections between python and hdfs.
Open python shell
from hdfs import InsecureClient client = InsecureClient('http://localhost:50070') #With this the connection has been established. Below are a few commands to read and write to HDFS. content = client.content('/') content #u is unicode string for k,v in content.items(): print("{} = {}".format(k,v)) fnames = client.list('/') fnames status = client.status('/') status for k,v in status.items(): print("{} = {}".format(k,v)) client.rename(''','') #u is a unicode string client.download('/user/data/','/home/acadgild/Desktop/',n_threads=5) client.upload('/user/data/uplaod/Salary_Data.csv','/home/acadgild/Desktop/',n_threads=5) r_file = open('/home/acadgild/Desktop/Salary_Data.csv','r') temp =[] for line in r_file: if line.startswith('p'): temp.append(line) client.write('/user/',data='\n'.join(temp)) client.delete('/user/data/',recursive=True)

Dataset link:
https://acadgildsite.s3.amazonaws.com/wordpress_images/datasets/big_data_python/ratings.csv |
This is how we connect Hadoop using Python. I hope you liked this blog. Do leave us a comment for any query or suggestions.
Keep visiting our website for more blogs.