Hi Folks,
We have a documentation space which is published to our clients, and a private drafting space which we use internally to draft pages prior to publishing. Occasionally links to the the draft space are left in the published pages which means that clients are unable to use them. We also have the occasional dead link that makes it through.
I'd love to be able to generate a report of all links in a given space so that we could scan through that and find the ones that need to be cleaned up. I know you can see this on a page by page basis, but we have a large collection of pages and this would be quite time consuming. It would be much easier if we could generate a report for the whole space and then search it.
Is there a native way to do this, or an app we could use to make it work?
-Brendan
I'd use Find and Replace app by Easy Apps to find and modify URLs in one go. I left a commend with a screenshot in this thread.
More imporantly...
If I may ask, what method to you use to move content from the drafting to the client facing space? Manual copy paste or a space sync app?
Hi @brendan_long ,
There are some app that I can think of below:
Links Hierarchy
Space Content Manager
Better Content Archiving
Scroll Documents
I would personally go via API route to have more control over what kind of report I want to generate for the links.
import requests
from bs4 import BeautifulSoup
import csv
# ==============================
# CONFIGURATION
# ==============================
BASE_URL = "https://your-domain.atlassian.net/wiki"
SPACE_KEY = "SPACEKEY"
EMAIL = "your-email@example.com"
API_TOKEN = "your-api-token"
OUTPUT_FILE = "confluence_space_links.csv"
PAGE_LIMIT = 50
# ==============================
# AUTH & HEADERS
# ==============================
auth = (EMAIL, API_TOKEN)
headers = {
"Accept": "application/json"
}
# ==============================
# FETCH ALL PAGES IN SPACE
# ==============================
def get_all_pages():
pages = []
start = 0
while True:
url = f"{BASE_URL}/rest/api/content"
params = {
"spaceKey": SPACE_KEY,
"limit": PAGE_LIMIT,
"start": start,
"expand": "body.storage,version"
}
response = requests.get(url, headers=headers, auth=auth, params=params)
response.raise_for_status()
data = response.json()
pages.extend(data["results"])
if "_links" not in data or "next" not in data["_links"]:
break
start += PAGE_LIMIT
return pages
# ==============================
# EXTRACT LINKS FROM HTML
# ==============================
def extract_links(html):
soup = BeautifulSoup(html, "html.parser")
links = []
for a in soup.find_all("a", href=True):
href = a["href"]
if href.startswith("http"):
link_type = "External"
elif href.startswith("/wiki"):
link_type = "Internal"
else:
link_type = "Other"
links.append((href, link_type))
return links
# ==============================
# MAIN EXECUTION
# ==============================
def main():
pages = get_all_pages()
with open(OUTPUT_FILE, mode="w", newline="", encoding="utf-8") as csvfile:
writer = csv.writer(csvfile)
writer.writerow([
"Space",
"Page Title",
"Page ID",
"Page URL",
"Link URL",
"Link Type"
])
for page in pages:
title = page["title"]
page_id = page["id"]
page_url = f"{BASE_URL}{page['_links']['webui']}"
html = page["body"]["storage"]["value"]
links = extract_links(html)
for link_url, link_type in links:
writer.writerow([
SPACE_KEY,
title,
page_id,
page_url,
link_url,
link_type
])
print(f"✅ Report generated: {OUTPUT_FILE}")
if __name__ == "__main__":
main()
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
👆🏻Above is sample python code! Let me know if you need anything else.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.