Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Garbled text display in "Log View"

claudio-silva November 20, 2012

In Sourcetree, when in Log View, on the datagrid with columns "Graph", "Description", "Commit", "Author" and "Date", commit messages and author names having unicode accented characters display as garbled text with strange characters.

But on the panel below, which displays the selected commit's full message, text appears correct.

Other applications display text correctly, only Sourcetree has this problem.

On Mac OS X 10.8.2

2 answers

1 accepted

0 votes
Answer accepted
stevestreeting
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 20, 2012

SourceTree supports UTF everywhere, and accented characters (and Japanese in fact) are very common. Is it possible the encoding used here is something else, perhaps one of the Latin ASCII subsets from another platform?

claudio-silva November 20, 2012

Hello Steve, thank you for your answer.

The commit messages are written in Portuguese, and encoded as UTF-8.

The messages have been written on SourceTree itself, or on SmartGit. It doesn't matter. If I write accented characters on a commit message in SourceTree, for instance, and then commit, they will appear garbled on the Log View.

All other applications (including command-line git) correctly show the messages.

Event Sourcetree shows the text correctly in several places, except on the Log View's main datagrid, as explained above.

I include below, on this message, a screenshot that may help you understant what I'm talking about.

I created a test message having several accented characters (in this case: ÁèíçããâêÇ) to make a more obvious example.

As you can see, the selected commit message appears correct on the commit details pane, but appears garbled on the log list.
All other messages also display the same problem.

The Author names also appear garbled.

I have hilighted some areas of the application's interface where the problems are clearly visible.

I hope this helps to diagnose the problem.

stevestreeting
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 21, 2012

I just copied & pasted those characters in your comment above into a commit in SourceTree and it worked fine for me:

stevestreeting
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 21, 2012

Are you able to give us a copy of the repo to investigate? You can do it privately at https://support.atlassian.com if you like.

claudio-silva November 21, 2012

Great! That means there is hope!

Have you any idea of what could be done to investigate this issue further?

The problem is, the way things are, I will have to give up using SourceTree, for it's too unpleasant to see a Log View of garbled messages.

Are you sure this isn't a bug in SourceTree that only happens on certain repo configurations? SmartGit and the command line Git show the log fine.

What can I do?

Thanks

claudio-silva November 21, 2012

Unfortunately, the repo contains a private company project which I cannot divulge. But thank you for your offer!

The problem also occurs on other repos, so it's not just a weird accident with a specific repo.

I decided to investigate the issue further, and after many tests, I discovered the source of the error:

The problem starts on a commit who's author name has accented characters that are encoded in a specific way.

When displaying a log where such a commit exists (even if just one), SourceTree displays all commit messages and author names with an incorrect encoding (it probably starts using the encoding of the offending commit).

I exported 2 patches to demonstrate the problem:

In the first patch excerpt, you may see the author name (Cláudio Silva) encoded as UTF-8
(this is just one line from the patch file):

From: =?UTF-8?q?Cla=CC=81udio=20Silva?= <claudio.silva@impactwave.com>

The accented A is encoded as 3 characters (a xCC x81). This commit causes no problems on SourceTree.

Now, here's the author name from an offending (error inducing) commit:

From: =?UTF-8?q?Cl=E1udio=20Silva?= <claudio.silva@impactwave.com>

The accented A is encoded as just 1 character (xE1). This seems to be a valid encoding for Unicode, but NOT for UTF-8 (see this: Unicode Character 'LATIN SMALL LETTER A WITH ACUTE' (U+00E1)).

All it takes is just one commit with an author name encoded like this to make SourceTree go mad!...

Nevertheless, command line Git and SmartGit display the logs just fine, and are unaffected by this. At most, the incorrectly encoded characters may appear garbled, but all other text appears fine.

The problematic commit was probably created on another application (perhaps on Windows, with msysgit or with SmartGit, I don't know).

So, in conclusion, I believe making SourceTree being able to handle incorrectly encoded strings without going nuts would be a nice enhancement to the software.

May I suggest bringing up this issue to the development team?

Best regards.

stevestreeting
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 22, 2012

Would you mind attaching the patch that reproduces this either here (as an attachment rather than inline, the encoding seems to have been lost) or against https://jira.atlassian.com/browse/SRCTREE-1285 ? That will make it easier to make sure we test this case directly.

stevestreeting
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 22, 2012

OK, that makes sense, thanks for the detailed analysis. I've seen this problem once before in fact, and the issue is the way that Cocoa deals with character encoding - basically if one character in the stream fails UTF decoding, it refuses to decode the entire stream as UTF, meaning it falls back on a simpler encoding (which then breaks the other extended UTF characters). It doesn't appear to be possible to tell it to skip the offending characters. SourceTree loads the log in bulk for performance reasons which is why this problem can leak across up to 200 lines when it occurs.

I'm guessing that SmartGit works because Java is more tolerant of bad encoding. Command-line git is fine because it does one line at a time.

I've tried to find a workaround for this in the past and not managed it (without horribly killing performance), but I'll try again. The one case this happened in before became a non-issue because it faded into history really fast, but obviously this is more of a problem for you - the problem will go away eventually once that commit drops out of the first 200 lines in the log (after that it won't make the decoding fail for the entire first batch). We'll track it here: https://jira.atlassian.com/browse/SRCTREE-1285

claudio-silva November 22, 2012

Done!

Thank you very much for looking into this issue.

Best regards.

0 votes
nmb_isep October 4, 2013

Hi, I think I have the same exact problem. Any news related to this matter?

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events