Hi, I've got a space that hasn't been updated for a long time and I need to migrate some of its contents to another space. The problem is that since the content hasn't been updated for years, I have grave doubts that many links are already dead.
Can anybody suggest me what can I try to check the links?
So far my approach is to get body.view from all pages in a given space and extract all links from it. The problem is that the number of links makes it unfeasible to check them manually, so what I want is to try some Python script to iterate over them.
import requests
from requests.auth import HTTPDigestAuth
r = requests.get(link, auth = HTTPDigestAuth(user, password))
code = r.status_code
print(code)
My first attempt was to simply use requests library to get the status codes, but it appears that no matter what link I pass there - I get "200" status, even when I look up for non-existing page.
later I realized that all requests return 200 as my requests were redirected to the authentication page. I updated the code with proper authentication and now it works as intended. Maybe someone will find it useful for similar task too.
import getpass
import urllib3
agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/108.0.0.0 Safari/537.36"
passwd = getpass.getpass("type your password: ")
def link_runner_auth(link):
http = urllib3.PoolManager()
headers = urllib3.make_headers(basic_auth=f'user_id:{passwd}',
user_agent=agent)
try:
response = http.request('GET', link,headers=headers).status
time.sleep(0.5)
print(f'{link} ---- {response}')
except KeyboardInterrupt:
print("Keyboard interrupt")
except:
response = print(f'{link} ---- ERROR')
return response
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.