Digging through your post stats by downloading your community content as JSON and parsing it

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 22, 2021

Hey there I'm still relatively new here, but I see that some of you have 1000s of posts, and if you're a nerd like me, you might be interested in statistics like, "How many of my posts were answers to questions, as opposed to comments on blogs or discussions?"

If the fine folks at Atlassian could add some timestamp to the data they publish, we could even answer questions like "How many posts did I do every month?" or "Do I spend more time answering questions at night, or in the morning?"

My community content

So, maybe you're wondering - how do I find this data? Well, if you edit your comunity profile and scroll down to the bottom, you'll see a few links at the bottom, and you'll want to click on "My community content":

Screen Shot 2021-01-22 at 10.53.45 AM.png

So you'll end up with a file named my_community_content.json. JSON just means that it's structured data, but lucky for us, Atlassian has added formatting to make it readable for humans too.

If you open it up in a text editor, you'll see a section called "images", which is a list of links to every image you've uploaded to community.atlassian.com.

The next section is "messages", which contain the subject, text, and "category" of every post you've made. So far the categories I've found are:

      "qanda" : "Questions"
      "blog" : "Articles"
      "forum" : "Discussions"

Can Atlassian please add timestamps? Tags? Product? Views/Likes?

Unfortunately there's no timestamps in the data, but I know Atlassian has the data (it's right there in your Recent Activity), so hopefully they can also add it to this export file? Oh, and what about tags, so we could see how many cloud vs server questions we answer? (Ahem, Cloudies.) It would also be really useful to see what product area we were posting to. Oh, and views and likes? How about everything in Recent Activity? Am I asking too much? :-}

Generic Statistics

Anyways though, until we get timestamps, we can still get generic stats, using some good old Unix command-line tools (these are standard on Mac/Linux, and if you want to play with them on Windows, you could install Cygwin). On my Mac, I opened up the Terminal app and typed the commands below in bold (everything after the $ sign):

$ cd Downloads
Downloads $ grep '"qanda" : "Questions"' my_community_content.json | wc -l
71
Downloads $ grep '"forum" : "Discussions"' my_community_content.json | wc -l
4
Downloads $ grep '"blog" : "Articles"' my_community_content.json | wc -l
5

So what's happening here? Well, first off, I'm changing to the Downloads directory, which for me is the default location for saved files.

Then I'm doing a grep which searches each line of a file for a particular string of text. Then I'm "piping" those search results into wc -l which counts the number of lines that were just found by grep.

So then, my humble stats show that I've posted 71 comments/answers to Questions, 4 comments on Discussions, and 5 comments on Articles.

What else is this good for?

One handy use of of this data might be to answer the question "I posted an answer about XXX but I can't find the question because it was years ago and it's not coming up in search."

Well, now you can just download your content and search through it with the text editor of your choice.

What if I want to convert my posts into a more readable format or do interesting things like graphing data, etc?

Great question! So, if and when Atlassian adds timestamps, views, likes, then we could process the data with tools like jq or python, and even bring that data into Excel/Google Sheets to make neat graphs.

But I'm probably getting ahead of myself. If you would like Atlassian to provide more metadata in the export file with our own posts, please like this post!

Thank you for your time!

7 comments

Comment

Log in or Sign up to comment
Ravi Sagar _Sparxsys_
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 22, 2021

Hi @Darryl Lee 

I like your curiosity :) and you made me curious as well, I think my weekend will be good.

Ravi

Like # people like this
Eva Kasiak
Contributor
January 22, 2021

Hi @Darryl Lee these are cool hints! I am gonna give it a try :) need to pimp up my numbers firstly and contribute more as I'm back to community after a break. 

Eva

Like Darryl Lee likes this
Carlos Garcia Navarro
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 22, 2021

Very cool, @Darryl Lee  :-)

Tanya C
Contributor
January 23, 2021

@Darryl Lee , This is one of the best postings I've read in a while. I love your writing style and how extremely helpful the information you provided is to me. I have two profiles so want to compare what is done under each one.

Cheers!

Like # people like this
John Funk
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 25, 2021

What a great idea!

Dave Liao
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 25, 2021

@Darryl Lee - thanks for pulling this together!

For any Windows users out there, first install Windows Subsystem for Linux (WSL), and you can follow along at home inside Windows Terminal. 😉 No need for Cygwin, though Cygwin is a little "lighter" resource-wise.

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 25, 2021

Thanks - yeah, it's been a while since I've been on a Windows-system proper. I've heard good things about WSL, and I'd probably install it if I had the chance. But heh, yeah a whole VM just to run grep and wc seems a bit much. :-}

OTOH, having a true *nix (ok fine, BSD) environment on a Mac was/has been a game-changer for somebody coming from a *nix sysadmin background.

Like Dave Liao likes this
Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
February 22, 2021

In case anybody remembers this thread...

So apparently Atlassian Community runs on the Khoros platform.

I posted to their forums and got back exactly... nothing.

I'm guessing as their customer, it would have to be Atlassian (@Daniel Eads?) filing a feature request with Khoros.

Who knows? Maybe it will "gather interest" and we can all vote on it. ;-P

Like # people like this
Daniel Eads
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
February 22, 2021

Khoros doesn't function the same way as Atlassian does when it comes to public-facing feature requests and roadmaps. They have an ideas board where you can post / vote, although I haven't yet seen strong evidence that it factors heavily into their priorities.

Like Dave Liao likes this
Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
February 22, 2021

Thanks for replying, Daniel.

Alas, my Developer account does not have access to that page, since I'm not a customer.Screen Shot 2021-02-22 at 1.31.22 PM.png

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
March 2, 2021

So I think the Khoros Community makes you wait a few days before making your first post, so I didn't properly pose my question until today:

Can you add metadata to "My community content"? 

And wouldn't you know it? @Stan Gromer (Legendary indeed) from the Khoros community showed me that the answer was here in the Atlassian Community all along! (Insert "It's coming from inside the house!" GIF)

So way back in 2017, @Tyler T posted how to use the Community API, which has a handy dandy SQL-ish language named LiQL that enables you to do searches on your own data (and well, everybody else's too), and download it in the aforementioned JSON format.

OK, let's get to it then. How can we get all of y/our posts? First off, you'll need your Community User Id. Go ahead and hover over your avatar up in the top-right corner, and you should see it show up in the "status bar" at the bottom of your browser. For me, it's 1365322. (You can also click on your avatar and get the Id from the URL, but let's save a click, shall we?)

Then, you simply need to add {YOUR_ID} to this link:

https://community.atlassian.com/api/2.0/search?api.pretty_print=true&q=SELECT%20*%20FROM%20messages%20WHERE%20author.id=%27{YOUR_ID}%27%20LIMIT%201000

So, if you wanted to see all of my posts, this link should work:

https://community.atlassian.com/api/2.0/search?api.pretty_print=true&q=SELECT%20*%20FROM%20messages%20WHERE%20author.id=%271365322%27%20LIMIT%201000

You can then save this output using your browser's "Save Page As..." command and use my Unix command line tricks like above.

Anywho though... let's write take a look at the query. It's actually:

SELECT * FROM messages WHERE author.id='1365322' LIMIT 1000

You can actually append that query (with spaces and all), to this string:

https://community.atlassian.com/api/2.0/search?api.pretty_print=true&q=

Too many posts!

One thing to note is that if you're a prolific poster (hi @William Sheboy :) then you may actually have more than 1000 posts, which is the max that a query can return. So then, as Tyler points out, you'll need to append OFFSET 1000 to the original query.

Having to merge big JSON files is beyond the scope of this article. Also, if Bill wanted to save all of his posts, he'd have to figure out how to keep adding OFFSETS until the JSON returned includes a data.size of < 1000 items. 

Can't we use a smarter query?

I was hoping I could combine an author.id query with the tags that Tyler details in his article to get something like:

SELECT * FROM messages WHERE tags.text = 'cloud' AND author.id='1365322' LIMIT 1000

But weirdly that only brought up my most recent post about 'cloud'. I have a feeling the 1000 message limit kicks in *before* the author.id filter can be applied. Time to do a little more digging into LiQL.

But hey at the very least, for those of you with < 1000 posts, you should be able to grab all your posts in one go. Or, using OFFSET (1000, 2000, 3000), you can just keep downloading your data until you get a response where data.size = 0. 

Coincidentally, according to my JSON file, this is going to be my 200th post. Nice one to commemorate it with.

Maybe I'll follow up with some jq, because now that we've got some *real data*, who doesn't love parsing JSON with command-line tools?

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
March 2, 2021

"Do I spend more time answering questions at night, or in the morning?"

jq .data.items[].post_time mydata.json | cut -d 'T' -f 2 | cut -d ':' -f 1|sort |uniq -c
 9 00
7 01
2 02
2 08
7 09
15 10
17 11
20 12
14 13
21 14
11 15
13 16
5 17
8 18
1 19
8 20
13 21
17 22
10 23 

(That is is the #s I posted during the hour of each day.)

So it would seem to indicate that I do most of my posting between 10AM-4PM, and then another spurt between 9PM and 12AM.

Alas, with only 201 posts, that's not a lot of data to work with.

Behind the command-line

As previously mentioned, jq is an incredibly powerful command-line tool for parsing JSON files. In my command, I'm specifically extracting the post_time from each of my posts:

jq .data.items[].post_time mydata.json

So what is going on here? Well, if you open up your JSON data with a text editor, you should see a structure like this:

{
...
  "data" : {
    ...
    "items" : [ {
      ...
      "subject" : "So I think the Khoros Community makes you wait a few days...",
      ...
      "post_time" : "2021-03-02T22:10:24.800-08:00",

I've used ... to skip over stuff we don't care about this time around. 

data is the main "container" for all of the items[], which is an array of my messages. Each message contains a post_time.

The jq query extracts post_time for each item in the items[] array, so I end up with:

"2021-03-02T22:10:24.800-08:00"
"2021-03-02T17:05:28.544-08:00"
"2021-03-01T22:37:20.428-08:00"
...

You could use a similar query to just get all the "subjects" of your posts:

jq .data.items[].subject mydata.json
"So I think the Khoros Community makes you wait a few days..."
"Thanks everyone - honored to be in such esteemed company...."
"Oh man almost forgot to include my favorite piece of swag..."

Anywho, once I got the date, it was some Unix command-line plumbing.

The cut command is super-useful. It lets you designate a delimiting character, and then which "field" you want. So Unix lets me take the dates that jq spit out and "pipe" them to another command, in this case, cut. In the first pass, I tell cut that the delimiting character is T, and I want the second field:

cut -d 'T' -f 2 
22:10:24.800-08:00"
17:05:28.544-08:00"
22:37:20.428-08:00"

I can then do another cut, using a delimiting character of ':' and just selecting the first field:

cut -d ':' -f 1
22
17
22

Woot. Now I have every hour of every post I made. Then it's a few more Unix commands, sort (which well, sorts all the lines) and uniq, which finds the unique entries in the list. Adding -c provides a count of each of those entries.

Why would I want to learn this?

Well if you ever wanted to clean up your Jira instance and wanted to know how many of your Workflow Schemes were all variants of "Scrum Workflow Scheme", you could use Bob Swift's CLI to get the list of all schemes and then use Unix commands to sort the list (well, CLI can also do regex for you).

And did you know that Automation Rules are exported in JSON format? What this means is that you could use jq to generate a listing of all your rules, including description, authors, or even extract specific pieces of the rules.

And well, it's good nerdy fun. :-}

Like Bill Sheboy likes this
Tyler T
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 3, 2021

I'm a bit rusty with the Community API, but I tried adding your author ID before tags.text and got 3 results. The order might be important there.

Nice post!

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
March 3, 2021

Thanks @Tyler T - yeah, I tried flipping them around and I get 2 results regardless of order. (Weird that you're seeing 3.)

Thing is, I'm pretty sure I've posted more than 2 or 3 messages with the cloud tag.

When I discovered "My community content", I hoped it would help me track my quest to earn a Cloudie (answering 50 questions tagged cloud).

I've since earned one...

BUT, it is interesting that the Kudos system shows that I'm only 93% of the way towards answering 30 questions.

Anyways, when I get some time, I'll write a little script to go through my 200something tickets to grab the tags, since annoyingly, they don't show up in the JSON itself, but instead refer me to another search query, scoped for each message:

      "tags" : {
        "query" : "SELECT * FROM tags WHERE messages.id = '1627586'"
      },
Like Tyler T likes this
Tyler T
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 3, 2021

Ok, I think I know what is happening. The content you are trying to query are answers to questions. I believe the tags might only be associated with the question itself (e.g. the top of the thread which is the message with a depth of 0).

My post from 2017 was about queries for top level posts. I think it might be possible to do your type of query but cannot remember how offhand.

The profile page UI is a bit misleading since the 'Contains tag' field only applies to top-level posts - meaning only questions are discussions you started will appear (hence only 2-3 results): https://community.atlassian.com/t5/user/viewprofilepage/user-id/1365322

The query would need to be something like

WHERE parent.tags.text = 'cloud'

But that doesn't work...

The Khoros site might have some updated API docs that could help!

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
March 3, 2021

Ah, yeah, that makes sense. I can get all the parent message ids like so:

SELECT parent.id FROM messages WHERE author.id='1365322' LIMIT 1000

I think I'll still end up having to make separate calls for each of those parents to grab the tags:

SELECT * FROM tags WHERE messages.id='1628420'

Unless there's some kind of JOIN functionality in LiQL. I didn't see that in the docs, but will keep digging. (As @Daniel Eads pointed out, anyone can sign up as a Khoros Developer, which does get you access.)

Tyler T
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 3, 2021

Seems like you are getting closer to a possible solution. Keep in mind that answers to questions can also have replies - so you'd want to check the post depth. Meaning, a reply to an answer will mean the answer is the parent post, not the topic (main) question.

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
March 3, 2021

Ah yes, it's topic.id that I want. And also, I figured I should filter only for messages where is_answer == true.

So then... yikes, it looks like I've only posted 41 Answers to topics with the tag of "cloud".

I'll spare everyone from the over-explanation of curl, bash for loops, etc, and just leave this here:

cat /dev/null > taglist.txt
for x in `curl -s 'https://community.atlassian.com/api/2.0/search?q=SELECT%20topic.id,is_answer%20FROM%20messages%20WHERE%20author.id=%271365322%27%20LIMIT%201000' | jq '.data.items[] | select(.is_answer == true)| .topic.id' |sed -e "s/\"/'/g"`; do echo $x >> taglist.txt; curl -s "https://community.atlassian.com/api/2.0/search?q=SELECT%20*%20FROM%20tags%20WHERE%20messages.id=$x" >> taglist.txt; echo >> taglist.txt;  done

Basically I end up with a file called taglist.txt with a separate JSON block for each the set of tags for each topic I've answered. I can then use good old grep to see which tags matched:

$ grep '"cloud"' taglist.txt |wc -l
      41
$ grep '"server"' taglist.txt |wc -l
       7

Ok, now I need to answer 9 more cloud topics to rightfully earn this guy on my desk.

TAGS
AUG Leaders

Atlassian Community Events