Setting up Postgresql database for Confluence

Thomas Hsieh March 26, 2018

Hi,

I'm new to Postgresql and Confluence, and I'm trying to setup the database (I have the on-premise server version) by following this guide:

https://confluence.atlassian.com/doc/database-setup-for-postgresql-173244522.html

At step 2.2, it states that "Collation must also be set to utf8", but how exactly do I do that? I'm assuming it's the LC_COLLATE option, and I've tried 'UTF-8', 'UTF8', 'en_US.utf8', and 'en_US.UTF-8', none of which worked.

Any help is appreciated, thanks!

3 answers

1 accepted

14 votes
Answer accepted
Dave Chevell
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 26, 2018

Postgres's default database template should already have these values set correctly, so you should be able to issue a "normal" create DB statement with encoding specified and get the correct collation out of the box:

CREATE DATABASE confluencedb WITH OWNER connie ENCODING 'UTF-8';

 If you need to specify these manually at creatino time, you should be able to specify them like so:

CREATE DATABASE confluencedb WITH OWNER connie ENCODING 'UTF-8' LC_COLLATE='en_US.utf8' LC_CTYPE='en_US.utf8'

However if you're having to specify these and not getting it set "out of the box", my guess is the host system of your Postgres server has its locale information set to something explicitly different.  If you have the locales package installed, running the locale command should tell you what LC_CTYPE and LC_COLLATE are set to by default in your host environment.

Thomas Hsieh March 27, 2018

Hi Dave,

Thank you for the quick response!

I forgot to mention that I'm using Windows. I just added a tag.

I used 

SHOW LC_COLLATE;

to check the locale, and I got 'English_United States.1252'. I've also tried 'English_United States.utf8' but also got the same error invalid locale name (this is the same error I got for all values listed in OP). In this case, is it ok that I use default LC_COLLATE and LC_CTYPE values? Or is there something wrong with my database setting?

Like huang haizhou likes this
Igor_ Zhuravljov April 23, 2019

Hello!
The main language of my Confluence is Russian. Most of the content is in Russian, but many English words are also used. It is necessary that the search works correctly with two languages.
Will it work out?
What values ​​should I set to LC_COLLATE = and LC_CTYPE =?
I apologize if I could not understand the previous answers.

Metin Savignano
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 18, 2023

May I suggest that @atlassian integrates the above answer in their documentation?

It would be very useful to be able to copy and paste the database creation command for Confluence from the installation docs. It's like that for Jira, so why not for Confluence, too?

Like IT IT likes this
2 votes
Team Verwaltung
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
March 12, 2019

Weird, 

we have this DB settings:

     Name     |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   

--------------+----------+----------+-------------+-------------+-----------------------

 confluencedb | postgres | UTF8     | en_US.utf8  | en_US.utf8  | 

And Confluence says:

The database collation 'en_US.utf8' is not supported by Confluence. You need to use 'utf-8'.

Cheers,

Dirk

M Hoogenboom
Contributor
April 10, 2019

I'm having the exact same issue when trying to upgrade confluence.  I'm not new to Confluence and Postgresql, and I think this is somehow incorrect. Confluence won't start saying the collation is incorrect and should be utf-8 and not en_US.utf8. I cannot create a database with that collation. Stuck...

Have you figured it out, Dirk or someone else?

Else I'll submit a ticket.

Like Martin Herren likes this
M Hoogenboom
Contributor
April 10, 2019

hmm, already done by someone else: https://jira.atlassian.com/browse/CONFSERVER-58052.  Maybe just move to an older version...

Like Martin Herren likes this
Martin Herren October 8, 2019

Still not fixed, bug still present when upgrading to 7.0.2

Like Daniel Santaella likes this
1 vote
Mike Fornefett September 22, 2023

This is tested with Debian 12.

You can check with locale -a if your locale is present in your system. If not 

locale-gen en_US.UTF-8 

should create one. (In my case, de_DE.UTF-8). Alternatively, 

dpkg-reconfigure locales

uses a small graphical interface to add locales. (In generation it is defined with '-': en_US.UTF-8, but in display it says: 'en_US.utf8')

Check again with locale -a and restart your database:

sudo service postgresql restart

Now you may add your database with desired locales:

CREATE DATABASE confluencedb OWNER confluenceuser ENCODING 'UTF-8' LC_COLLATE='de_DE.utf8' LC_CTYPE='de_DE.utf8' TEMPLATE template0;

which should result in a desired database:

 Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
--------------+----------------+----------+------------+------------+-----------------------+---------+------------+--------------------------------------------
confluencedb | confluenceuser | UTF8 | de_DE.utf8 | de_DE.utf8 | | 8409 kB | pg_default |
postgres | postgres | UTF8 | C.UTF-8 | C.UTF-8 | | 8561 kB | pg_default | default administrative connection database
template0 | postgres | UTF8 | C.UTF-8 | C.UTF-8 | =c/postgres +| 8409 kB | pg_default | unmodifiable empty database
| | | | | postgres=CTc/postgres | | |
template1 | postgres | UTF8 | C.UTF-8 | C.UTF-8 | =c/postgres +| 8561 kB | pg_default | default template for new databases
| | | | | postgres=CTc/postgres | | |
(4 rows)

Any *.utf8 will store its data in UTF8 encoding, but what is the difference between the locale in the first position? Here, e.g. the difference between "C.utf8" and "en_US.utf8":

The differences are in:
int_curr_symbol
currency_symbol
mon_decimal_point
mon_thousands_sep
mon_grouping
negative_sign
int_frac_digits
frac_digits
p_cs_precedes
p_sep_by_space
n_cs_precedes
n_sep_by_space
p_sign_posn
n_sign_posn
crncystr
thousands_sep
grouping
d_t_fmt
d_fmt
t_fmt
yesexpr
noexpr

Found here: https://community.hpe.com/t5/operating-system-hp-ux/difference-between-c-utf8-and-en-us-utf8-points/td-p/4418194 

Here are also some more detailed explanations regarding the locale which can be configured during the database cluster generation using initdb:

https://www.postgresql.org/docs/current/locale.html 

See: 24.1.2. Behavior

The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events