I want to scrape some data from a page which is in a table. So I am only bothered about the data in the table. Earlier I was using Mechanize, but I found sometimes some of the data are missing, especially in the bottom of the table. Googling, I found out that it may be due to mechanize not handling Jquery/Ajax.
So I switched to Selenium today. How do I wait for one and only one table to load completely and then extract all links from that table using selenium and python? If I wait for complete page to load, it is taking some time. I want to ensure that only data in the table is loaded. My current code:
driver = webdriver.Firefox() for page in range(1, 2): driver.get("http://somesite.com/page/"+str(page)) table = driver.find_element_by_css_selector('div.datatable') links = table.find_elements_by_tag_name('a') for link in links: print link.text
WebDriverWait to wait until the table is located:
from selenium.webdriver.common.by import By from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.support import expected_conditions as EC ... wait = WebDriverWait(driver, 10) table = wait.until(EC.presence_of_element_located(By.CSS_SELECTOR, 'div.datatable'))
This would be an explicit wait.
Alternatively, you can make the driver wait implicitly:
An implicit wait is to tell WebDriver to poll the DOM for a certain amount of time when trying to find an element or elements if they are not immediately available. The default setting is 0. Once set, the implicit wait is set for the life of the WebDriver object instance.
from selenium import webdriver driver = webdriver.Firefox() driver.implicitly_wait(10) # wait up to 10 seconds while trying to locate elements for page in range(1, 2): driver.get("http://somesite.com/page/"+str(page)) table = driver.find_element_by_css_selector('div.datatable') links = table.find_elements_by_tag_name('a') for link in links: print link.text