[Java] 【分享】Java爬虫

Java爬虫合集

刚刚学习爬虫，把自己学习写的代码分享一下，都放在一个项目里面了，每个包里都是不同的爬虫小demo

爬取的内容：
                        嘀哩嘀哩网站图片的
                        游民星空图库壁纸
                        ACG壁纸
                        爬取QQ音乐
                        还有美女写真

                     界面就随便写下。。。。

                  在贴一点代码

// 解析网页public void ParseHtml(int sin,List<String> pagelinks,File dirs) throws IOException {// 遍历每个页码的链接for (String pagelink : pagelinks) {Document doc = Jsoup.connect(pagelink).header("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36").get();// 得到需要的条目的链接Elements itemlinks = doc.select("tr td h3 a[href^=htm]");// 遍历每个条目for (Element itemlink : itemlinks) {// 创建线程 重写run方法 进入每个条目的链接Thread t = new Thread() {@Overridepublic void run() {try {String hrefs = itemlink.absUrl("href");Document imgdoc = Jsoup.connect(hrefs).header("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36").get();// 根据网页情况获得h1标签Element h1;h1 = imgdoc.selectFirst("div h1");if (h1 == null)h1 = imgdoc.selectFirst("h1#subject_tpc");// 得到图片或小说或电影的标题String name = h1.text();// 得到图片Elements imgs = imgdoc.select("div.tpc_content img");// 得到下载链接Element links = imgdoc.selectFirst("div.tpc_content a[href^=http://www3.uptorrentfilespace]");// 得到小说正文Element body ;if (sin == 1) {// 用来保存图片链接ArrayList<String> arr = new ArrayList<String>();/** 判断进入不同的保存文件的方法 1.如果图片和链接都不为空，说明是电影* 2.如果1的条件不通过，判断图片是否为空，如果不为空，说明是图片* */if (imgs != null && links != null) {for (Element img : imgs) {String link = img.absUrl("src");arr.add(link);System.out.println("获得第" + i++ + "条链接-----" + link);}String text = links.text();download(name, arr, text,dirs);} else {for (Element img : imgs) {String link = img.absUrl("src");arr.add(link);System.out.println("获得第" + i++ + "条链接-----" + link);}download(name, arr,dirs);}}if (sin == 2) {body = imgdoc.selectFirst("div.tpc_content");// 去除小说中的<br>标签，并加上换行String text = body.toString().replaceAll("<br>", "\\n");// 开线程去保存小说new Thread() {public void run() {try {File dir = new File("D://download1//小说//" + name);if (!dir.exists())dir.mkdirs();BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(new File(dir, "小说.txt")));bos.write(text.getBytes());bos.close();System.out.println("下载成功 ： " + dir);} catch (Exception e) {// TODO Auto-generated catch block//e.printStackTrace();}};}.start();}} catch (Exception e) {// TODO Auto-generated catch block//e.printStackTrace();}}};// 通过线程池动态控制 执行创建的线程service.execute(t);}}

其他的代码下载项目看吧
其实里面是有福利的但是要自己发掘

链接：https://pan.baidu.com/s/1zW1wRMyP2etiLvfnfASIgw 密码：8ent

捕获2.PNG (14.23 KB, 下载次数: 4)

捕获2.PNG