session.get(url) is not returning all of the data in a page. How do I fix this?

Get involved · April 26, 2024

I'm using a python script to extract all the tables from a Confluence page. I'm using requests and BeautifulSoup.

def fetch_page(session, url):

    """Fetch page content using a requests session for persistent connection."""

    response = session.get(url)

    if response.status_code == 200:

        return response.content

    raise ValueError(f"Failed to fetch the page: {response.status_code}")

I'm parsing through the tables through a function:

def parse_table(content, tag):

    """Parse HTML tables and extract relevant data based on tag."""

    soup = BeautifulSoup(content, 'html.parser')

    tables = soup.find_all('table')

    all_tables_data = []
    ...

For some reason, I can't get the last few tables of a page. I have about 12 tables, and I can only get to table 5 or 6, and the rest don't appear in my dataframe. Why is this the case?

Forums

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

session.get(url) is not returning all of the data in a page. How do I fix this?

1 answer

1 accepted

Suggest an answer

Was this helpful?

Thanks!

DEPLOYMENT TYPE

PRODUCT PLAN

TAGS

Atlassian Community Events