Values received for summary field are not UTF-8

Boris Karl Schlein March 6, 2024

Hi, I just started writing a new Python script called "Tiny Jira Exporter" (see https://github.com/bks07/tiny-jira-exporter) and am having trouble finding an answer.

My code looks like this:

self._issues = self._jira.search_issues(jql_query, maxResults=max_results)
...
for issue in self._issues:
summary = issue.fields.summary

However, whenever the summary contains a special character like German umlauts, things get complicated.

I also tried the following:

character_set = chardet.detect(summary.encode())
encoded_string = summary.encode(character_set["encoding"])
return_string = encoded_string.decode("utf-8")

And also this:

character_set = chardet.detect(summary.encode())
encoded_string = summary.encode(character_set["encoding"])
return_string = bytes(value,character_set["encoding"]).decode("utf-8")

However, it's not working and I cannot find an answer elsewhere.

Best, Boris

1 answer

1 accepted

0 votes
Answer accepted
Boris Karl Schlein March 7, 2024

I solved the riddle. The strings were shown correctly when I displayed them via print(). However, the error occurred when writing them to CSV using the Panda module.

data_frame = pd.DataFrame.from_dict(data)
date_frame.to_csv(location, index=False, sep=";", encoding="latin-1")
I had to set the encoding to "latin-1" instead of "utf-8".
Using UTF-8 to write the CSV file and ensure all 'incoming' strings were UTF-8 didn't work. Now, I don't care about the strings coming from Jira; I use the encoding "latin-1" to write the CSV file.
It feels a bit dirty since I got a decoding error when I encoded the issue summary and decoded it as shown in my code snipped above - but it works now. However, I would find it better if all strings coming from Jira were UTF-8.

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
PRODUCT PLAN
FREE
PERMISSIONS LEVEL
Site Admin
TAGS
AUG Leaders

Atlassian Community Events