Postgres@CERN

PostgreSQL at the university: some lessons learned

Romuald THION
Université Claude Bernard Lyon 1

Friday January 17th 2020

Introduction

Whoami

Romuald THION

https://perso.liris.cnrs.fr/rthion/dokuwiki/

https://github.com/romulusFR

Why this talk?

What PostgreSQL have done for us:

  • offered a pro-grade full-featured DBMS to our students
  • with minimal effort for teachers
  • with new opportunities for classes
  • and some fun to practice DBA

Why (not) PostgreSQL ?

We switched from Oracle to PostgreSQL in 2018

  • PRO: “Oracle is a must have on a CV”
  • CON: “It’s bul..hit”

Loved DB by stackoverflow Dreaded DB by stackoverflow

Why (not) PostgreSQL ?

PostgreSQL's cliché :

  1. Lack of features, of tools, of support
  2. RDBMS has no future, see NoSQL (back in 2010)
  3. Old school, made for system geeks

Feedback: not true (with some reserve on #3)

PostgreSQL at the university

An unusual setting

A “production” PostgreSQL server in a university is quite atypical (from a DBA perspective)

  • Light availability requirement (lab sessions)
  • Data can (must) be lost!
  • Users are “real” rookie developers, not software

Feedback: Students in CS are “lame power users”

The main instance

  • HAProxy: 1 vcpu, 1 GB
  • PostgreSQL (x2): 2 vcpu, 4GB, 250 GB HDD
  • Other servers behind the reverse-proxy
    • Blazegraph (triplestore/RDF)
    • MongoDB

Feedback: size does not matter

The main instance

postgres@~=# select count(*) from pg_user ;
--  count 
-- -------
--    427
postgres@~=# select count(*) from pg_database;
--  count 
-- -------
--    429
postgres@~=# select sum(pg_database_size(datname))/10^6 AS total_m
from pg_database ;
--  total_m 
-- --------------
--   7776.486359

Feedback: a small instance with many dbs and users

The main instance

select   datname, tup_inserted/10^6 AS tup_ins_M,
         tup_deleted/10^3 AS tup_del_K,
         pg_database_size(datname)/10^6 as db_size_M
from     pg_stat_database
order by db_size_M DESC LIMIT 5;

--   datname   | tup_ins_m | tup_del_k | db_size_m  
-- ------------+-----------+-----------+------------
--  pg11809287 |  3.192844 |     1.726 | 403.378847
--  pg11711637 | 24.509523 |      7.13 | 373.961375
--  pedago     |  0.928755 |     0.186 | 267.743903
--  pg11709217 |    2.5619 |     9.165 | 264.598175
--  pg11511220 |  3.816837 |     2.813 | 257.053343

Feedback: no one tried to fill the volume!

Our DBA tasks

  • deployment and base configuration
  • add some datasets
    • configure FDW to have access accross
  • user management
    • create account and DB for students and teachers
    • give connection information and support
    • (many) updates of enrolments
  • add ram, restart instance, add ram, diagnose errors, add ram

Feedback: Oracle was a pain i. t.. a..

Feedback: Users’ accounts is a key point.

Lab sessions with PostgreSQL

Lab sessions with PostgreSQL

  • (3rd year) advanced databases and foundations

    • modelling, normalization, functional dependencies
    • intermediate SQL (1992 style)
    • views, transactions/isolation, indexing
    • triggers, pgplsql programming

feedback: transition was almost transparent BUT the some labs still need extra refactoring

Lab sessions with PostgreSQL

feedback: no issue at all, stopped installing Oracle as it was bad advertisement for the product

Lab sessions with PostgreSQL

Other smaller classes and labs

Feedback: documentation is GREAT, COPY is fast, extensions are cool!

Conclusion

Challenges for students (and teachers)

Proficient use of psql is still a challenge

  • no use of .psqlrc or .pgpass
  • no use of \x
  • \setenv PSQL_EDITOR 'nano'
  • \setenv PAGER 'less -S'
  • forget about \set HISTFILE or \set PROMPT1

Feedback: we asked the IT to install DBeaver…

Challenges for students (and teachers)

How to tame the schema beast?

  • SET search_path TO “$user”,public;
  • ALTER ROLE role_name SET search_path TO …;
  • ALTER DATABASE SET search_path TO …;
  • GRANT … ON SCHEMA schema_name;
  • GRANT … ON ALL TABLES IN SCHEMA schema_name;

Feedback: good practice: delete the public schema?

I’m a rookie DBA

Some (classical?) pitfalls:

  • default timeout in HAProxy is 50000ms
  • One of the student is named Dolores O’Riordan
  • Extensions are not instance-wide
  • GRANT USAGE ON FOREIGN DATA WRAPPER … TO
  • Default volume ($PGDATA) was filled, thanks rsync!

Feedback: nothing difficult, but the experience is worth sharing with academics

Next moves

More proficient DBA

  • fully automated deploiement with Ansible
  • monitoring
  • robust and idempotent user creation script

Feedback: need some time, but it’s worth it and it gonna be fun

Next moves

“A lot has changed since SQL-92” (Markus Winand)

Feedback: @l_avrot put pressure on me to do so!

Next moves

PostgreSQL features and extensions

Feedback: IMHO a convincing way to promote PostgreSQL!

Thanks for your attention

Feel free to ask questions!

Keep in touch :

romuald.thion [at] univ-lyon1.fr