In Sourcetree, when in Log View, on the datagrid with columns "Graph", "Description", "Commit", "Author" and "Date", commit messages and author names having unicode accented characters display as garbled text with strange characters.
But on the panel below, which displays the selected commit's full message, text appears correct.
Other applications display text correctly, only Sourcetree has this problem.
On Mac OS X 10.8.2
Hello Steve, thank you for your answer.
The commit messages are written in Portuguese, and encoded as UTF-8.
The messages have been written on SourceTree itself, or on SmartGit. It doesn't matter. If I write accented characters on a commit message in SourceTree, for instance, and then commit, they will appear garbled on the Log View.
All other applications (including command-line git) correctly show the messages.
Event Sourcetree shows the text correctly in several places, except on the Log View's main datagrid, as explained above.
I include below, on this message, a screenshot that may help you understant what I'm talking about.
I created a test message having several accented characters (in this case: ÁèíçããâêÇ) to make a more obvious example.
As you can see, the selected commit message appears correct on the commit details pane, but appears garbled on the log list.
All other messages also display the same problem.
The Author names also appear garbled.
I have hilighted some areas of the application's interface where the problems are clearly visible.
I hope this helps to diagnose the problem.
Great! That means there is hope!
Have you any idea of what could be done to investigate this issue further?
The problem is, the way things are, I will have to give up using SourceTree, for it's too unpleasant to see a Log View of garbled messages.
Are you sure this isn't a bug in SourceTree that only happens on certain repo configurations? SmartGit and the command line Git show the log fine.
What can I do?
Unfortunately, the repo contains a private company project which I cannot divulge. But thank you for your offer!
The problem also occurs on other repos, so it's not just a weird accident with a specific repo.
I decided to investigate the issue further, and after many tests, I discovered the source of the error:
The problem starts on a commit who's author name has accented characters that are encoded in a specific way.
When displaying a log where such a commit exists (even if just one), SourceTree displays all commit messages and author names with an incorrect encoding (it probably starts using the encoding of the offending commit).
I exported 2 patches to demonstrate the problem:
In the first patch excerpt, you may see the author name (Cláudio Silva) encoded as UTF-8
(this is just one line from the patch file):
From: =?UTF-8?q?Cla=CC=81udio=20Silva?= <email@example.com>
The accented A is encoded as 3 characters (a xCC x81). This commit causes no problems on SourceTree.
Now, here's the author name from an offending (error inducing) commit:
From: =?UTF-8?q?Cl=E1udio=20Silva?= <firstname.lastname@example.org>
The accented A is encoded as just 1 character (xE1). This seems to be a valid encoding for Unicode, but NOT for UTF-8 (see this: Unicode Character 'LATIN SMALL LETTER A WITH ACUTE' (U+00E1)).
All it takes is just one commit with an author name encoded like this to make SourceTree go mad!...
Nevertheless, command line Git and SmartGit display the logs just fine, and are unaffected by this. At most, the incorrectly encoded characters may appear garbled, but all other text appears fine.
The problematic commit was probably created on another application (perhaps on Windows, with msysgit or with SmartGit, I don't know).
So, in conclusion, I believe making SourceTree being able to handle incorrectly encoded strings without going nuts would be a nice enhancement to the software.
May I suggest bringing up this issue to the development team?
OK, that makes sense, thanks for the detailed analysis. I've seen this problem once before in fact, and the issue is the way that Cocoa deals with character encoding - basically if one character in the stream fails UTF decoding, it refuses to decode the entire stream as UTF, meaning it falls back on a simpler encoding (which then breaks the other extended UTF characters). It doesn't appear to be possible to tell it to skip the offending characters. SourceTree loads the log in bulk for performance reasons which is why this problem can leak across up to 200 lines when it occurs.
I'm guessing that SmartGit works because Java is more tolerant of bad encoding. Command-line git is fine because it does one line at a time.
I've tried to find a workaround for this in the past and not managed it (without horribly killing performance), but I'll try again. The one case this happened in before became a non-issue because it faded into history really fast, but obviously this is more of a problem for you - the problem will go away eventually once that commit drops out of the first 200 lines in the log (after that it won't make the decoding fail for the entire first batch). We'll track it here: https://jira.atlassian.com/browse/SRCTREE-1285
Supported Platforms macOS Windows We recently introduced support for additional hosting services such as GitHub Enterprise, GitLab (Cloud, Community Edition, Enterprise Edition), and...
Connect with like-minded Atlassian users at free events near you!Find a group
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no AUG chapters near you at the moment.Start an AUG
You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs