BotProxy: Rotating Proxies Made for professionals. Really fast connection. Built-in IP rotation. Fresh IPs every day.
I want to scrape the javascript list of the 'size' section of this address:
http://store.nike.com/us/en_us/pd/magista-opus-ii-tech-craft-2-mens-firm-ground-soccer-cleat/pid-11229710/pgid-11918119
What I want to do is get the sizes that are in stock, it will return a list. How would I be able to do it?
Here's my full code:
# -*- coding: utf-8 -*-
from scrapy import Spider
from scrapy.http import Request
class ShoesSpider(Spider):
name = "shoes"
allowed_domains = ["store.nike.com"]
start_urls = ['http://store.nike.com/us/en_us/pd/magista-opus-ii-tech-craft-2-mens-firm-ground-soccer-cleat/pid-11229710/pgid-11918119']
def parse(self, response):
shoes = response.xpath('//*[@class="grid-item-image-wrapper sprite-sheet sprite-index-0"]/a/@href').extract()
for shoe in shoes:
yield Request(shoe, callback=self.parse_shoes)
def parse_shoes(self, response):
name = response.xpath('//*[@itemprop="name"]/text()').extract_first()
price = response.xpath('//*[@itemprop="price"]/text()').extract_first()
#sizes = ??
yield {
'name' : name,
'price' : price,
'sizes' : sizes
}
Thanks
Here is the code to extract sizes in stock.
import scrapy
class ShoesSpider(scrapy.Spider):
name = "shoes"
allowed_domains = ["store.nike.com"]
start_urls = ['http://store.nike.com/us/en_us/pd/magista-opus-ii-tech-craft-2-mens-firm-ground-soccer-cleat/pid-11229710/pgid-11918119']
def parse(self, response):
sizes = response.xpath('//*[@class="nsg-form--drop-down exp-pdp-size-dropdown exp-pdp-dropdown two-column-dropdown"]/option')
for s in sizes:
size = s.xpath('text()[not(parent::option/@class="exp-pdp-size-not-in-stock selectBox-disabled")]').extract_first('').strip()
yield{'Size':size}
Here is the result:
M 4 / W 5.5
M 4.5 / W 6
M 6.5 / W 8
M 7 / W 8.5
M 7.5 / W 9
M 8 / W 9.5
M 8.5 / W 10
M 9 / W 10.5
In the for loop, if we write it like this, it will extract all the sizes, whether they are in stock or not.
size = s.xpath('text()').extract_first('').strip()
But if you want to get those that are in stock only, they are marked with the class "exp-pdp-size-not-in-stock selectBox-disabled" which you have to exclude through adding this:
[not(parent::option/@class="exp-pdp-size-not-in-stock selectBox-disabled")]
I have tested it on other shoe pages, and it works as well.