利用Hadoop实现求共同好友的示例详解-eolink官网

利用Hadoop实现求共同好友的示例详解

目录前言业务分析实现思路分析编码实现1、第一个map类2、第一个Reduce类3、第一个Job类4、第二个map类5、第二个Reducer类6、第二个Job类

前言

在很多社交APP中，比如大家熟悉的QQ好友列表中，打开会话框，经常可以看到下面有一栏共同好友的推荐列表，用户通过这种方式，可以添加潜在的关联好友

这种功能该如何实现呢？对redis比较了解的同学应该能很快想到，可以使用redis来实现这个功能。没错，redis确实是个不错的可以实现这个功能的方案。

但redis的实现有一定的局限性，因为redis存储和数据和计算时需要耗费较多的内存资源，设想一下，像腾讯QQ这样的规模，如果用这种方式做的话，估计Redis服务器的投入成本将是一笔不小的开销。

利用hadoop中的MapReduce同样可以实现这个功能，该如何实现呢？

业务分析

下面是原始的数据文件，第一栏可理解为本人，第二行为该用户的好友列表，以逗号分割，比如A用户的好友包括：B,C,D,F,E,O这几个，后面的行依次类推

A:B,C,D,F,E,OB:A,C,E,KC:F,A,D,ID:A,E,F,LE:B,C,D,M,LF:A,B,C,D,E,O,MG:A,C,D,E,FH:A,C,D,E,OI:A,OJ:B,OK:A,C,DL:D,E,FM:E,F,GO:A,H,I,J

现在的需求是：通过原始的数据文件，输出该文件中所有用户中哪些人两两之间存在共同好友并输出，格式如下：

A-B C,EA-C F,DA-D E,F......

实现思路分析

步骤一：将原始数据拆分为如下格式

通过这一步，得到一组K/V，可以清晰的反映出一个用户的所有好友

B：A #B是A的好友C：A #C是A的好友D：A #D是A的好友F：AE：AO：A

A：BC：BE：BK：B

F：CA：CD：CI：C

B：EC：ED：EM：EL：E

步骤二、对第一步的数据进一步处理成如下格式

从第一步格式完毕后的数据，可以很明显看出并总结出一个规律，那就是左边那些用户的好友列表，以C用户为例，可以看出C这个用户有A,B,E三个好友，反过来讲，ABE这三个用户，他们有一个共同的好友A

其他的类推进行理解

C A-B-E #C是A和B和E的共同好友D A-C #D是A和B的共同好友A B-C #A是B和C的共同好友B A-E #A是E和B的共同好友......

步骤三、将步骤二中的数据调换位置

从步骤2中我们得知，C的好友有ABE，反过来说，ABE他们的共同好友有C，针对这种超过3个的，可以考虑下一步进行两两组合即可

A-B-E C #A、B、E有共同好友CA-C D #A与C有共同好友DB-C A #B与C有共同好友AA-E B #A与E有共同好友B

步骤四、将步骤三得到的数据继续拆分

步骤三中，像： A-B-E C 这种数据，显然需要进一步拆分，因为最终的结果是求取两两好友之间的共同好友，所以可以拆为： A-B C，A-E C，B-E C，为下一步数据组合做最后的准备

A-B CA-E CB-E CA-C DB-C AA-E B......

步骤五、将步骤四得到的数据合并

在使用MapReduce编程中我们知道，Map阶段出去的数据，进入reduce方法中的数据都是key相同的，以第四步中的: A-E 这个key为例，就有2个，这样通过 reduce方法最终输出的结果就是： A-E C,B ，即A-E 这两个用户的共同好友为 C和B

A-B C #A,B共同好友有CA-E C,B #A，E有共同好友 C,BB-E C #B，E有共同好友 CA-C D #A，C有共同好友 DB-C A #B，C有共同好友 A......

通过以上的数据分析，最终可以达到预期的效果，同时也可以看出，上面的步骤划分到MapRedcue中，显然一个MapReduce肯定是无法完成的，至少需要2个

下面是结合上面的步骤分析，得出需要两个MapReduce的数据流程图，参考这个图来协助我们分析编写代码逻辑做参考

编码实现

1、第一个map类

public class FirstMapper extends Mapper {

@Override

protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

String val = value.toString();

String[] split = val.split(":");

//A:B,C,D,F,E,O 拆分后，左边是原用户，右边是好友

String user = split[0];

String friends = split[1];

String[] friendLists = friends.split(",");

//Map1 输出的结果为：

/**

* B A

* C A

* D A

* F A

* E A

for(String str :friendLists ){

context.write(new Text(str),new Text(user));

}

2、第一个Reduce类

public class FirstReducer extends Reducer {

@Override

protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {

StringBuffer stringBuffer = new StringBuffer();

for (Text text : values){

stringBuffer.append(text).append("-");

}

//最终写http://出去的数据格式为： A-E B ......

context.write(new Text(stringBuffer.toString()),key);

}

3、第一个Job类

public class FirstJob {

public static void main(String[] args) throws Exception {

//1、获取job

Configuration configuration = new Configuration();

Job job = Job.getInstance(configuration);

//2、设置jar路径

job.setJarByClass(FirstJob.class);

//3、关联mapper 和 Reducer

job.setMapperClass(FirstMapper.class);

job.setReducerClass(FirstReducer.class);

//4、设置 map输出的 key/val 的类型

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(Text.class);

//5、设置最终输出的key / val 类型

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

//6、设置最终的输出路径

String inputPath = "F:\\网盘\\csv\\friends.txt";

String outPath = "F:\\网盘\\csv\\friends1";

FileInputFormat.setInputPaths(job,new Path(inputPath));

FileOutputFormat.setOutputPath(job,new Path(outPath));

// 7 提交job

boolean result = job.waitForCompletion(true);

System.exit(result ? 0 : 1);

}

运行上面的Job代码，然后打开运行完毕后的第一个阶段的文件，从内容格式上看，符合第一阶段的输出结果要求的，即下面的这种数据格式

4、第二个map类

public class SecondMapper extends Mapper {

@Override

protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

// I-K-C-B-G-F-H-O-D- A 阶段1的文件输出格式

/**

* 最终输出格式：

* I-K A

* I-C A

* I-B A

* ......

//需要将左边的数据进行两两拆分，与V进行组合输出

String val = value.toString();

String[] split = val.split("\t");

String v2 = split[1];

String[] allUsers = split[0].split("-");

Arrays.sort(allUsers);

for(int i=0;i

for(int j=i+1;j

context.write(new Text(allUsers[i] + "-" +http:// allUsers[j]),new Text(v2));

}

5、第二个Reducer类

public class SecondReducer extends Reducer {

@Override

protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {

//上一步输出的结果：

/**

* A-B C

* A-B D

* A-E C

* A-E D

* ......

//只需要将相同的key的Val进行组合即可,即： A-B C-D，A-E C-D

StringBuffer stringBuffer = new StringBuffer();

for (Text text :values ){

stringBuffer.append(text.toString()).append("-");

}

context.write(key,new Text(stringBuffer.toString()));

}

6、第二个Job类

public class SecondJob {

public static void main(String[] args) throws Exception {

//1、获取job

Configuration configuration = new Configuration();

Job job = Job.getInstance(configuration);

//2、设置jar路径

job.setJarByClass(SecondJob.class);

//3、关联mapper 和 Reducer

job.setMapperClass(SecondMapper.class);

job.setReducerClass(SecondReducer.class);

//4、设置 map输出的 key/val 的类型

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(Text.class);

//5、设置最终输出的key / val 类型

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

//6、设置最终的输出路径

String inputPath = "F:\\网盘\\csv\\friends1\\part-r-00000";

String outPath = "F:\\网盘\\csv\\friends2";

FileInputFormat.setInputPaths(job,new Path(inputPath));

FileOutputFormat.setOutputPath(job,new Path(outPath));

// 7 提交job

boolean result = job.waitForCompletion(true);

System.exit(result ? 0 : 1);

}

运行上面的Job代码，查看最终的输出结果，可以看到，也是符合我们预期的业务的

以上就是利用Hadoop实现求共同好友的示例详解的详细内容，更多关于Hadoop求共同好友的资料请关注我们其它相关文章！

for(int j=i+1;j

context.write(new Text(allUsers[i] + "-" +http:// allUsers[j]),new Text(v2));

}

5、第二个Reducer类

public class SecondReducer extends Reducer {

@Override

protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {

//上一步输出的结果：

/**

* A-B C

* A-B D

* A-E C

* A-E D

* ......

//只需要将相同的key的Val进行组合即可,即： A-B C-D，A-E C-D

StringBuffer stringBuffer = new StringBuffer();

for (Text text :values ){

stringBuffer.append(text.toString()).append("-");

}

context.write(key,new Text(stringBuffer.toString()));

}

6、第二个Job类

public class SecondJob {

public static void main(String[] args) throws Exception {

//1、获取job

Configuration configuration = new Configuration();

Job job = Job.getInstance(configuration);

//2、设置jar路径

job.setJarByClass(SecondJob.class);

//3、关联mapper 和 Reducer

job.setMapperClass(SecondMapper.class);

job.setReducerClass(SecondReducer.class);

//4、设置 map输出的 key/val 的类型

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(Text.class);

//5、设置最终输出的key / val 类型

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

//6、设置最终的输出路径

String inputPath = "F:\\网盘\\csv\\friends1\\part-r-00000";

String outPath = "F:\\网盘\\csv\\friends2";

FileInputFormat.setInputPaths(job,new Path(inputPath));

FileOutputFormat.setOutputPath(job,new Path(outPath));

// 7 提交job

boolean result = job.waitForCompletion(true);

System.exit(result ? 0 : 1);

}

运行上面的Job代码，查看最终的输出结果，可以看到，也是符合我们预期的业务的

以上就是利用Hadoop实现求共同好友的示例详解的详细内容，更多关于Hadoop求共同好友的资料请关注我们其它相关文章！

context.write(new Text(allUsers[i] + "-" +http:// allUsers[j]),new Text(v2));

}

5、第二个Reducer类

public class SecondReducer extends Reducer {

@Override

protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {

//上一步输出的结果：

/**

* A-B C

* A-B D

* A-E C

* A-E D

* ......

//只需要将相同的key的Val进行组合即可,即： A-B C-D，A-E C-D

StringBuffer stringBuffer = new StringBuffer();

for (Text text :values ){

stringBuffer.append(text.toString()).append("-");

}

context.write(key,new Text(stringBuffer.toString()));

}

6、第二个Job类

public class SecondJob {

public static void main(String[] args) throws Exception {

//1、获取job

Configuration configuration = new Configuration();

Job job = Job.getInstance(configuration);

//2、设置jar路径

job.setJarByClass(SecondJob.class);

//3、关联mapper 和 Reducer

job.setMapperClass(SecondMapper.class);

job.setReducerClass(SecondReducer.class);

//4、设置 map输出的 key/val 的类型

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(Text.class);

//5、设置最终输出的key / val 类型

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

//6、设置最终的输出路径

String inputPath = "F:\\网盘\\csv\\friends1\\part-r-00000";

String outPath = "F:\\网盘\\csv\\friends2";

FileInputFormat.setInputPaths(job,new Path(inputPath));

FileOutputFormat.setOutputPath(job,new Path(outPath));

// 7 提交job

boolean result = job.waitForCompletion(true);

System.exit(result ? 0 : 1);

}

运行上面的Job代码，查看最终的输出结果，可以看到，也是符合我们预期的业务的

以上就是利用Hadoop实现求共同好友的示例详解的详细内容，更多关于Hadoop求共同好友的资料请关注我们其它相关文章！

Python接口自动化之文件上传/下载接口怎么实现

239 2022-09-03

利用Hadoop实现求共同好友的示例详解

java中的接口是类吗

Spring中的aware接口详情

Python接口自动化之文件上传/下载接口怎么实现

推荐文章

接口调用是什么意思？几种常用接口调用方式

接口设计原则

8款在线 API 接口文档管理工具

api管理系统是什么？

什么是接口调试？接口调试的步骤有哪些？

api 接口管理系统有哪些？

接口测试有几种测试方法

API文档生成工具有哪些？

微服务和api网关区别

交换机配置步骤

最近发表

热评文章

在线接口文档管理工具推荐，支持在线测试，HTTP接口

开源的在线接口文档wiki工具Mindoc的介绍与使

如何优雅的进行接口设计？接口设计的六大原则是什么？

什么是API测试,api检测公司

遇到百度网址安全中心提醒您该页面可能存在钓鱼欺诈信息

软件接口设计怎么做？前后端分离软件接口设计思路

利用Hadoop实现求共同好友的示例详解

微信扫一扫：分享

推荐文章

最近发表

热评文章