References:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://wiki.apache.org/hadoop/GettingStartedWithHadoop
http://blog.csdn.net/fuwencaho/article/details/37727873
/////////////////////////////////////////////////////////////
=======
Test Single-node-cluster Hadoop on CentOS6.5
wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el6.rf.i686.rpm
yum remove java-1.5.0-*
yum install java-1.6.0-openjdk.i686
export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk
java -version
sudo groupadd hadoop
sudo useradd -g hadoop hduser
ssh-keygen -t rsa
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
/etc/sysctl.conf
disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
cd /usr/local
http://mirrors.hust.edu.cn/apache/hadoop/core/hadoop-2.4.1/hadoop-2.4.1.tar.gz
cd /usr/local
$ sudo tar xzf hadoop-1.0.3.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop
//////////////////////////////
vi $HOME/.bashrc
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk
unalias fs &> /dev/null
alias fs=”hadoop fs”
unalias hls &> /dev/null
alias hls=”fs -ls”
If you have LZO compression enabled in your Hadoop cluster and
compress job outputs with LZOP (not covered in this tutorial):
Conveniently inspect an LZOP compressed file from the command
line; run via:
#
$ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
Requires installed ‘lzop’ command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
/////////////////
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
…and if you want to tighten up security, chmod from 755 to 750…
$ sudo chmod 750 /app/hadoop/tmp
/usr/local/hadoop/bin/hadoop namenode -format
echo “foo foo quux labs foo bar quux” | python ./mapper.py
echo “foo foo quux labs foo bar quux” | python ./mapper.py | sort -k1,1 | /home/hduser/reducer.py
if Error:
hadoop ssh: Could not resolve hostname
try:
sbin/start-dfs.sh
if Error:
cat: /usr/local/hadoop/conf/slaves: No such file or directory
try:
cp /hadoop/etc/hadoop/slaves /hadoop/conf/
if Error:
-copyFromLocal: /home/hduser No such file or directory
try:
/usr/local/hadoop$ hdfs dfs -mkdir -p /user/hduser
Test with 332KB text:
/usr/local/hadoop/bin/hadoop dfs -copyFromLocal /home/hduser/tmp/.txt /user/hduser/adam/
/usr/local/hadoop/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /user/hduser/adam/.txt /user/hduser/adam-out
/usr/local/hadoop/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount -D mapred.reduce.tasks=16 /user/hduser/adam/*.txt /user/hduser/adam-out2
Result:
/usr/local/hadoop/bin/hadoop dfs -cat /user/hduser/adam-out/part-r-00000 > words-count.txt
sort -n -k 2 -t t words-count.txt
重复次数最多的单词如下:
development 11
element 11
effect 12
left 12
That 12
amount 13
heat 13
planet 13
vast 13
went 13
bright 14
get 14
last 14
Mount 14
put 14
against 15
right 15
ancient 16
just 16
present 16
yet 16
might 19
fact 21
almost 25
part 28
point 29
What 29
cannot 33
different 39
what 62
first 65
most 66
But 67
must 67
out 80
about 83
light 98
great 114
but 131
It 161
not 200
at 226
it 385