Categories
Automated testing Python Selenium WebDriver

How to Implement Slow Page Scrolling with Python and Selenium

Usually, finding elements on a page with Python and Selenium is not too difficult. To do this, you just need to choose the right locator. But there are pages where even the simplest search for an element can return an error. One of the reasons for these problems are pages with lazy loading.

Lazy loading

What is it and how does it affect the search for elements on the page? Usually, when you open a page, all of its elements are immediately loaded and searchable. But not in this case. On these pages, elements appear one by one as you scroll down.

I met two types of such pages. On some, it is enough to quickly scroll the page to the end so that all the elements are displayed on the page. On others, this is not enough, since the elements only appear for the area you are looking at. Thus, in order to display all the elements, you need to slowly scroll the entire page.

Scrolling to the bottom of the page

This solution is for the first type of pages. You only need to complete one action. Scroll the page to the bottom. In order to do this, we can use a JavaScript function and indicate that we want to scroll the entire page. And call this function using the execute_script selenium function

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

As a result, you will find yourself at the very bottom of the page and all the elements will immediately begin to load.

Slowly scroll the entire page

In order to deal with the second type of page, let’s find a page that has this type of loading. The only thing I managed to find among the public is the vivino website page. For example, we can take a page of a particular product and try to find an element on it that has data-testid=”mentions”. This element looks like this

The element we are looking for

If we open this page manually, do not scroll anywhere and try to find this element, we will fail. The same result will be if we search for this element using Python and Selenium.

The element not found on the page

But if we slowly scroll the whole page and repeat the search, then the element will be found.

Finally we’ve found it

Implement slow scrolling

Now let’s move on to trying to find this element with Python and Selenium.

First, we can try the same approach we used for the first page type. The code will look like this

driver.get('https://www.vivino.com/US-TX/en/brewer-clifton-acin-pinot-noir/w/5087212?year=2015&price_id=21431479')
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
mentions_text = driver.find_element(By.XPATH, '//div[@data-testid="mentions"]').text
print(mentions_text)

This code will return the error:

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@data-testid="mentions"]"}

Thus, we see that this approach is not suitable for this page. Therefore, we need to force the browser to scroll the page, not entirely, but screen by screen.

To do this, we need to know the height of the page and how much of the page is displayed on the screen. The first parameter can be obtained using the following code

total_page_height = driver.execute_script("return document.body.scrollHeight")

And the second parameter can be obtained like this

browser_window_height = driver.get_window_size(windowHandle='current')['height']

Also, while we are scrolling, we will need to understand where we are now on the page. You can find out with this code

current_position = driver.execute_script('return window.pageYOffset')

Now we can create a loop that will scroll screen after screen until the page ends.

while total_page_height - current_position > browser_window_height:
    driver.execute_script(f"window.scrollTo({current_position}, {browser_window_height + current_position});")
    current_position = driver.execute_script('return window.pageYOffset')

Also, we need to make a short pause in each cycle so that the loading of elements in this section begins.

In the end, the whole code will be like this

from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep


driver = webdriver.Chrome()
driver.get('https://www.vivino.com/US-TX/en/brewer-clifton-acin-pinot-noir/w/5087212?year=2015&price_id=21431479')
driver.implicitly_wait(10)
total_page_height = driver.execute_script("return document.body.scrollHeight")
browser_window_height = driver.get_window_size(windowHandle='current')['height']
current_position = driver.execute_script('return window.pageYOffset')
while total_page_height - current_position > browser_window_height:
    driver.execute_script(f"window.scrollTo({current_position}, {browser_window_height + current_position});")
    current_position = driver.execute_script('return window.pageYOffset')
    sleep(1)  # It is necessary here to give it some time to load the content
mentions_text = driver.find_element(By.XPATH, '//div[@data-testid="mentions"]').text
print(mentions_text)
driver.quit()

As a result of executing this code, we will get the content of the element that we were looking for.

By Eugene Okulik