Free Shipping

Secure Payment

easy returns

24/7 support

Big Data Python

 July 9  | 0 Comments

Python and Big Data is being widely accepted by the IT industry due to several reasons.

If you are new to this topic, then you should first look at the blogs below.

Going further we will see how to get connection between python and big data environment.

Note: Here are the tool versions we will be using to establish connection in between.

Centos: 6.9

Hadoop : 2.6

Python: 3.6.9

Pip 3

The system is linux and must have hadoop installed.

All the linux systems come with python installed.

To install hadoop you can visit https://acadgild.com/blog/key-configurations-in-hadoop-installation.

My system already has hadoop 2.6 and python2.6.

Below are commands we need to run to get the above mentioned python and pip version.

​sudo rm -rf /usr/bin/python
sudo ln -s /usr/bin/python2.6 /usr/bin/python

cd /usr/local/bin/

yum install gcc openssl-devel bzip2-devel sqlite-devel
wget https://www.python.org/ftp/python/3.6.9/Python-3.6.9.tgz
tar xzf Python-3.6.9.tgz

cd Python-3.6.9
./configure --enable-optimizations
make altinstall

ls
#and see if python3.6 is present

curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py"
python get-pip.py

ls
#and see if pip3 is present


cd
sudo vi .bashrc
alias python='/usr/local/bin/python3.6'
alias pip='/usr/local/bin/pip3.6'

:wq

source .bashrc


python -V

pip -V

Once after verifying the versions we want, we can now make connections between python and hdfs.

Open python shell

from hdfs import InsecureClient
client = InsecureClient('http://localhost:50070')

#With this the connection has been established. Below are a few commands to read and write to HDFS.

content = client.content('/')
content
#u is unicode string

for k,v in content.items():
    print("{} = {}".format(k,v))

fnames = client.list('/')
fnames

status = client.status('/')
status
for k,v in status.items():
    print("{} = {}".format(k,v))

client.rename(''','')

#u is a unicode string

client.download('/user/data/','/home/acadgild/Desktop/',n_threads=5)


client.upload('/user/data/uplaod/Salary_Data.csv','/home/acadgild/Desktop/',n_threads=5)

r_file = open('/home/acadgild/Desktop/Salary_Data.csv','r')
temp =[]
for line in r_file:
  if line.startswith('p'):
     temp.append(line)
client.write('/user/',data='\n'.join(temp))

client.delete('/user/data/',recursive=True)

Dataset link:

https://acadgildsite.s3.amazonaws.com/wordpress_images/datasets/big_data_python/ratings.csv

This is how we connect Hadoop using Python. I hope you liked this blog. Do leave us a comment for any query or suggestions.

Keep visiting our website for more blogs.

>