References:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://wiki.apache.org/hadoop/GettingStartedWithHadoop
http://blog.csdn.net/fuwencaho/article/details/37727873
/////////////////////////////////////////////////////////////
=======
Test Single-node-cluster Hadoop on CentOS6.5
wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el6.rf.i686.rpm
yum remove java-1.5.0-*
yum install java-1.6.0-openjdk.i686
export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk
java -version
sudo groupadd hadoop
sudo useradd -g hadoop hduser
ssh-keygen -t rsa
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
/etc/sysctl.conf
disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
cd /usr/local
http://mirrors.hust.edu.cn/apache/hadoop/core/hadoop-2.4.1/hadoop-2.4.1.tar.gz
cd /usr/local
$ sudo tar xzf hadoop-1.0.3.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop
//////////////////////////////
vi $HOME/.bashrc
Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk
Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs=”hadoop fs”
unalias hls &> /dev/null
alias hls=”fs -ls”
If you have LZO compression enabled in your Hadoop cluster and
compress job outputs with LZOP (not covered in this tutorial):
Conveniently inspect an LZOP compressed file from the command
line; run via:
#
$ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
Requires installed ‘lzop’ command.
#
lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
/////////////////
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
…and if you want to tighten up security, chmod from 755 to 750…
$ sudo chmod 750 /app/hadoop/tmp
/usr/local/hadoop/bin/hadoop namenode -format
echo “foo foo quux labs foo bar quux” | python ./mapper.py
echo “foo foo quux labs foo bar quux” | python ./mapper.py | sort -k1,1 | /home/hduser/reducer.py
if Error:
hadoop ssh: Could not resolve hostname
try:
sbin/start-dfs.sh
if Error:
cat: /usr/local/hadoop/conf/slaves: No such file or directory
try:
cp /hadoop/etc/hadoop/slaves /hadoop/conf/
if Error:
-copyFromLocal: /home/hduser  No such file or directory
try:
/usr/local/hadoop$ hdfs dfs -mkdir -p /user/hduser
Test with 332KB text:
/usr/local/hadoop/bin/hadoop dfs -copyFromLocal /home/hduser/tmp/.txt /user/hduser/adam/
/usr/local/hadoop/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /user/hduser/adam/.txt /user/hduser/adam-out
 /usr/local/hadoop/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount -D mapred.reduce.tasks=16 /user/hduser/adam/*.txt /user/hduser/adam-out2
Result:
/usr/local/hadoop/bin/hadoop dfs -cat /user/hduser/adam-out/part-r-00000 > words-count.txt
sort -n -k 2 -t t words-count.txt
重复次数最多的单词如下:
development     11
element 11
effect  12
left    12
That    12
amount  13
heat    13
planet  13
vast    13
went    13
bright  14
get     14
last    14
Mount   14
put     14
against 15
right   15
ancient 16
just    16
present 16
yet     16
might   19
fact    21
almost  25
part    28
point   29
What    29
cannot  33
different       39
what    62
first   65
most    66
But     67
must    67
out     80
about   83
light   98
great   114
but     131
It      161
not     200
at      226
it      385