Community
Q&A
Confluence
Questions
How to check if the links on a page are valid?

How to check if the links on a page are valid?

Hi, I've got a space that hasn't been updated for a long time and I need to migrate some of its contents to another space. The problem is that since the content hasn't been updated for years, I have grave doubts that many links are already dead.

Can anybody suggest me what can I try to check the links?

So far my approach is to get body.view from all pages in a given space and extract all links from it. The problem is that the number of links makes it unfeasible to check them manually, so what I want is to try some Python script to iterate over them.

import requests
from requests.auth import HTTPDigestAuth

r = requests.get(link, auth = HTTPDigestAuth(user, password))
code = r.status_code
print(code)

My first attempt was to simply use requests library to get the status codes, but it appears that no matter what link I pass there - I get "200" status, even when I look up for non-existing page.

1 answer

1 accepted

0 votes

Answer accepted

later I realized that all requests return 200 as my requests were redirected to the authentication page. I updated the code with proper authentication and now it works as intended. Maybe someone will find it useful for similar task too.

import getpass
import urllib3


agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
         AppleWebKit/537.36 (KHTML, like Gecko) 
         Chrome/108.0.0.0 Safari/537.36"

passwd = getpass.getpass("type your password: ")

def link_runner_auth(link):

    http = urllib3.PoolManager()
    headers = urllib3.make_headers(basic_auth=f'user_id:{passwd}', 
                                   user_agent=agent)
    try:
        response = http.request('GET', link,headers=headers).status
        time.sleep(0.5)
        print(f'{link} ---- {response}')

     except KeyboardInterrupt:

         print("Keyboard interrupt")

    except:

        response = print(f'{link} ---- ERROR')

return response

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Forums

Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

How to check if the links on a page are valid?

1 answer

1 accepted

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events