在eclipse上实现一个 WordCount 程序，并将 WordCount 程序打包发布到 Hadoop 分布式中运行。

大数据系统更新时间：2026-03-14 23:20:34发布时间：1567天前百科书网趣学号

1、在eclipse上编写方法，实现一个 WordCount 程序

package cn.cqsw;

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WCMap extends Mapper {
@Override
protected void map(LongWritable key, Text value, Mapper.Context context)
throws IOException, InterruptedException {
      	String[] splited=value.toString().split(" ");
        	for(String str :splited)
       	 {
		context.write(new Text(str),new LongWritable(1));
       	 }
        }
}

package cn.cqsw;

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;


public class WCReduce extends Reducer {
@Override
protected void reduce(Text text, Iterable iterable,Reducer.Context   context) throws IOException,InterruptedException {
        	long time=0;
        	for(LongWritable lw :iterable)
       	 {
		time+=lw.get();
       	 }
        	context.write(text, new LongWritable(time));
        }
}

package cn.cqsw;

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MapReduceDemo{
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
	    Configuration conf = new Configuration();
	    Job job = Job.getInstance(conf);
	    job.setJarByClass(MapReduceDemo.class);
	    job.setMapperClass(WCMap.class);
	    job.setReducerClass(WCReduce.class);
	    job.setMapOutputKeyClass(Text.class);
	    job.setMapOutputValueClass(LongWritable.class);
	    job.setOutputKeyClass(Text.class);
	    job.setOutputValueClass(LongWritable.class);
	    Path inputPath = new Path("/hello");
	    Path outputPath = new Path("/output");
	    FileInputFormat.setInputPaths(job, inputPath);
	    FileOutputFormat.setOutputPath(job, outputPath);
	    boolean waitForCompletion = job.waitForCompletion(true);
	    System.exit(waitForCompletion?0:1);
	}

}

2、将 WordCount 程序打包

将本机的 jar包拖到虚拟机里

3、将打包的jar包发布到 Hadoop 分布式中运行。

（1）创建一个文件 xiaole，并写入一些单词“xiaolei yi mi ba” 保存

（2）复制这个文件到HDFS系统上

（3）在端口上查看，创建成功！

统计单词个数

hadoop jar jar_002.jar cn.cqsw.MapReduceDemo

使用 cat 查看统计结果

hadoop fs -cat /output/part-r-00000

在端口上查看，output文件上传成功！

且能查看之前创建的hello文件中单词数量统计结果

补充，也可以用hadoop命令上传文件到hdfs系统上

1、使用 put 操作将“hello”文件上传到 HDFS 的根目录

hadoop fs -put hello /out

2、使用 Hadoop 中自带的 jar 包，实现文档中单词个数的统计功能

cd hadoop/share/hadoop/mapreduce/
输入：hadoop jar hadoop-mapreduce-examples-2.7.1.jar wordcount /hello /out

3、使用 cat 查看统计结果

hadoop fs -cat /out/part-r-00000

在eclipse上实现一个 WordCount 程序，并将 WordCount 程序打包发布到 Hadoop 分布式中运行。

大数据系统相关栏目本月热门文章