How to scrape the pages requiring login?

I am trying to crawl through the wiki pages but the authentication required is not allowing me to get the form field names or anything. I was wondering if there is a better way to do this. Thanks!


1 answer

0 vote

If the web interface wants you to authenticate (to login), as that page does not allow anonymous access, then your web scraper also needs to authenticate (as that is essentially doing the same thing: getting an HTTP response about that Confluence page). 

I suggest you follow the Confluence REST API way, but even in that case: you need to authenticate. 

Suggest an answer

Log in or Sign up to answer
Atlassian Community Anniversary

Happy Anniversary, Atlassian Community!

This community is celebrating its one-year anniversary and Atlassian co-founder Mike Cannon-Brookes has all the feels.

Read more
Community showcase
Kesha Thillainayagam
Posted Apr 13, 2018 in Confluence

We want to hear how your non-technical teams are using Confluence!

Hi Community! Kesha (kay-sha) from the Confluence marketing team here! Can you share stories with us on how your non-technical (think Marketing, Sales, HR, legal, etc.) teams are using Confluen...

2,260 views 25 10
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you