BotProxy: Rotating Proxies Made for professionals. Really fast connection. Built-in IP rotation. Fresh IPs every day.
Hello everyone i'm trying to use selenium and scrapy to scraping some information from https://answers.yahoo.com/dir/index/discover?sid=396545663
I try different method, i use Selenium and setting PhantomJs like driver. For scrolling down the page, it's a infinite scroll page, i use this instruction:
elem.send_keys(Keys.PAGE_DOWN)
For simulating the press of Page Down button, instead of the JavaScript function:
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Because this one "seems" load less elements in the page.
The main problem is how i can know when i have reached the bottom of the page? Is "Infinite Scroll" page so i can't know when it end i need to scroll down, but i don't have any element in the bottom to analyze.
Actually i use temporized cycle, but look really stupid.
Thanks
I would actually look for that "Loading..." indicator. Wait for it to be visible on every scroll, but if you'll get a TimeoutException
- there was no loading indicator this time and there are no more items to load.
Sample implementation:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
while True:
# do the scrolling
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
wait.until(EC.visibility_of_element_located((By.XPATH, "//*[. = 'Loading...']")))
except TimeoutException:
break # not more posts were loaded - exit the loop
Not tested. cc by-sa 3.0