欢乐颂第一季,有声小说打包下载

新聞中心

這里有您想知道的互聯(lián)網(wǎng)營(yíng)銷(xiāo)解決方案

Python爬取房產(chǎn)數(shù)據(jù)，在地圖上展現(xiàn)！

小伙伴，我又來(lái)了，這次我們寫(xiě)的是用python爬蟲(chóng)爬取烏魯木齊的房產(chǎn)數(shù)據(jù)并展示在地圖上，地圖工具我用的是 BDP個(gè)人版-免費(fèi)在線數(shù)據(jù)分析軟件，數(shù)據(jù)可視化軟件，這個(gè)可以導(dǎo)入csv或者excel數(shù)據(jù)。

創(chuàng)新互聯(lián)公司是專(zhuān)業(yè)的松原網(wǎng)站建設(shè)公司，松原接單;提供成都網(wǎng)站設(shè)計(jì)、成都網(wǎng)站制作、外貿(mào)網(wǎng)站建設(shè),網(wǎng)頁(yè)設(shè)計(jì),網(wǎng)站設(shè)計(jì),建網(wǎng)站,PHP網(wǎng)站建設(shè)等專(zhuān)業(yè)做網(wǎng)站服務(wù);采用PHP框架,可快速的進(jìn)行松原網(wǎng)站開(kāi)發(fā)網(wǎng)頁(yè)制作和功能擴(kuò)展;專(zhuān)業(yè)做搜索引擎喜愛(ài)的網(wǎng)站,專(zhuān)業(yè)的做網(wǎng)站團(tuán)隊(duì),希望更多企業(yè)前來(lái)合作!

首先還是分析思路，爬取網(wǎng)站數(shù)據(jù)，獲取小區(qū)名稱(chēng)，地址，價(jià)格，經(jīng)緯度，保存在excel里。再把excel數(shù)據(jù)上傳到BDP網(wǎng)站，生成地圖報(bào)表

本次我使用的是scrapy框架，可能有點(diǎn)大材小用了，主要是剛學(xué)完用這個(gè)練練手，再寫(xiě)代碼前我還是建議大家先分析網(wǎng)站，分析好數(shù)據(jù)，再去動(dòng)手寫(xiě)代碼，因?yàn)楹玫姆治隹梢允掳牍Ρ?，烏魯木齊樓盤(pán),2017烏魯木齊新樓盤(pán),烏魯木齊樓盤(pán)信息 - 烏魯木齊吉屋網(wǎng) 這個(gè)網(wǎng)站的數(shù)據(jù)比較全，每一頁(yè)獲取房產(chǎn)的LIST信息，并且翻頁(yè)，點(diǎn)進(jìn)去是詳情頁(yè)，獲取房產(chǎn)的詳細(xì)信息(包含名稱(chēng)，地址，房?jī)r(jià)，經(jīng)緯度)，再用pipelines保存item到excel里，最后在bdp生成地圖報(bào)表，廢話不多說(shuō)上代碼：

JiwuspiderSpider.py

 
 
 
 
  
  
  
  # -*- coding: utf-8 -*-   
  
  
  from scrapy import Spider,Request   
  
  
  import re   
  
  
  from jiwu.items import JiwuItem   
  
  
     
  
  
     
  
  
  class JiwuspiderSpider(Spider):   
  
  
      name = "jiwuspider"   
  
  
      allowed_domains = ["wlmq.jiwu.com"]   
  
  
      start_urls = ['http://wlmq.jiwu.com/loupan']   
  
  
     
  
  
      def parse(self, response):   
  
  
          """   
  
  
          解析每一頁(yè)房屋的list   
  
  
          :param response:    
  
  
          :return:    
  
  
          """   
  
  
          for url in response.xpath('//a[@class="index_scale"]/@href').extract():   
  
  
              yield Request(url,self.parse_html)  # 取list集合中的url  調(diào)用詳情解析方法   
  
  
     
  
  
          # 如果下一頁(yè)屬性還存在，則把下一頁(yè)的url獲取出來(lái)   
  
  
          nextpage = response.xpath('//a[@class="tg-rownum-next index-icon"]/@href').extract_first()   
  
  
          #判斷是否為空   
  
  
          if nextpage:   
  
  
              yield Request(nextpage,self.parse)  #回調(diào)自己繼續(xù)解析   
  
  
     
  
  
     
  
  
     
  
  
      def parse_html(self,response):   
  
  
          """   
  
  
          解析每一個(gè)房產(chǎn)信息的詳情頁(yè)面，生成item   
  
  
          :param response:    
  
  
          :return:    
  
  
          """   
  
  
          pattern = re.compile('.*?lng = \'(.*?)\';.*?lat = \'(.*?)\';.*?bname = \'(.*?)\';.*?'   
  
  
                               'address = \'(.*?)\';.*?price = \'(.*?)\';',re.S)   
  
  
          item = JiwuItem()   
  
  
          results = re.findall(pattern,response.text)   
  
  
          for result in results:   
  
  
              item['name'] = result[2]   
  
  
              item['address'] = result[3]   
  
  
              # 對(duì)價(jià)格判斷只取數(shù)字，如果為空就設(shè)置為0   
  
  
              pricestr =result[4]   
  
  
              pattern2 = re.compile('(\d+)')   
  
  
              s = re.findall(pattern2,pricestr)   
  
  
              if len(s) == 0:   
  
  
                  item['price'] = 0   
  
  
              else:item['price'] = s[0]   
  
  
              item['lng'] = result[0]   
  
  
              item['lat'] = result[1]   
  
  
          yield item

item.py

 
 
 
 
  
  
  
  # -*- coding: utf-8 -*-   
  
  
     
  
  
  # Define here the models for your scraped items   
  
  
  #   
  
  
  # See documentation in:   
  
  
  # http://doc.scrapy.org/en/latest/topics/items.html   
  
  
     
  
  
  import scrapy   
  
  
     
  
  
     
  
  
  class JiwuItem(scrapy.Item):   
  
  
      # define the fields for your item here like:   
  
  
      name = scrapy.Field()   
  
  
      price =scrapy.Field()   
  
  
      address =scrapy.Field()   
  
  
      lng = scrapy.Field()   
  
  
      lat = scrapy.Field()   
  
  
     
  
  
      pass

pipelines.py 注意此處是吧mongodb的保存方法注釋了，可以自選選擇保存方式

 
 
 
 
  
  
  
  # -*- coding: utf-8 -*-   
  
  
     
  
  
  # Define your item pipelines here   
  
  
  #   
  
  
  # Don't forget to add your pipeline to the ITEM_PIPELINES setting   
  
  
  # See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html   
  
  
  import pymongo   
  
  
  from scrapy.conf import settings   
  
  
  from openpyxl import workbook   
  
  
     
  
  
  class JiwuPipeline(object):   
  
  
      wb = workbook.Workbook()   
  
  
      ws = wb.active   
  
  
      ws.append(['小區(qū)名稱(chēng)', '地址', '價(jià)格', '經(jīng)度', '緯度'])   
  
  
      def __init__(self):   
  
  
          # 獲取數(shù)據(jù)庫(kù)連接信息   
  
  
          host = settings['MONGODB_URL']   
  
  
          port = settings['MONGODB_PORT']   
  
  
          dbname = settings['MONGODB_DBNAME']   
  
  
          client = pymongo.MongoClient(host=host, port=port)   
  
  
     
  
  
          # 定義數(shù)據(jù)庫(kù)   
  
  
          db = client[dbname]   
  
  
          self.table = db[settings['MONGODB_TABLE']]   
  
  
     
  
  
      def process_item(self, item, spider):   
  
  
          jiwu = dict(item)   
  
  
          #self.table.insert(jiwu)   
  
  
          line = [item['name'], item['address'], str(item['price']), item['lng'], item['lat']]   
  
  
          self.ws.append(line)   
  
  
          self.wb.save('jiwu.xlsx')   
  
  
     
  
  
          return item

最后報(bào)表的數(shù)據(jù)

mongodb數(shù)據(jù)庫(kù)

地圖報(bào)表效果圖：https://me.bdp.cn/share/index.html?shareId=sdo_b697418ff7dc4f928bb25e3ac1d52348

文章標(biāo)題：Python爬取房產(chǎn)數(shù)據(jù)，在地圖上展現(xiàn)！
本文來(lái)源：http://fisionsoft.com.cn/article/cdjosee.html

新聞中心

其他資訊