This notebook shows how to access a PostgreSQL database when using Python.
Before beginning you will need access to a PostgreSQL database. PostgreSQL is a powerful, open source, object-relational database system. It is a multi-user database management system and has sophisticated features such as Multi-Version Concurrency Control, point in time recovery, and more. To learn more, see the PostgreSQL website. You can find PostgreSQL databases on IBM Cloud.
When dealing with large data sets (for example 50 GB) that potentially exceed the memory of your machine (RAM), it is nice to have another possibility such as an PostgreSQL database, where you can query the data in smaller digestible chunks. In this way, you just query data in smaller chunks (for instance 2 GB), and leave resources for the computation.
Psycopg2 is a driver for interacting with PostgreSQL from the Python scripting language. It provides to efficiently perform the full range of SQL operations against Postgres databases. Run the commands below to install and import the psycopg2 library:
!pip install psycopg2 --user
Requirement already satisfied: psycopg2 in /home/dsxuser/.local/lib/python3.5/site-packages
import psycopg2
import sys
/home/dsxuser/.local/lib/python3.5/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>. """)
Connecting to PostgreSQL database requires the following information:
All of this information must be captured in a connection string in a subsequent step. Provide the PostgreSQL connection information as shown:
#Enter the values for you database connection
dsn_database = "<database name>" # for example "compose"
dsn_hostname = "<your host name>" # for example "aws-us-east-1-portal.4.dblayer.com"
dsn_port = "<port>" # for example 11101
dsn_uid = "<your user id>" # for example "admin"
dsn_pwd = "<your password>" # for example "xxx"
Set up a connection as follows. If a connection cannot be made an exception will be raised. conn.cursor will return a cursor object and you can use this cursor to perform queries:
try:
conn_string = "host="+dsn_hostname+" port="+dsn_port+" dbname="+dsn_database+" user="+dsn_uid+" password="+dsn_pwd
# print("Connecting to database\n ->%s" % (conn_string))
conn=psycopg2.connect(conn_string)
print("Connected!\n")
except:
print("Unable to connect to the database.")
Connected!
The next step is to define a cursor to work with. It is important to note that Python/Psycopg cursors are not cursors as defined by PostgreSQL. Given the cursor, we can execute a query, for example, to retrieve the list of databases:
cursor = conn.cursor()
cursor.execute("""SELECT datname from pg_database""")
rows = cursor.fetchall()
We can iterate through rows to print the results:
print("\nShow me the databases:\n")
for row in rows:
print (" ", row[0])
Show me the databases: template1 template0 qhrlhjub dmdrgfkh hitkyrbb csjlrsrn gfcgpbmp pesqbntq isxflxan tciwytqw vbrbbqlz stnrbwbg ibcvrkdp kiczavdw erpkviqq hxxplgil mebyyfhn kvgcuugn ewxkbgbp ivhbftfp rzvrywdz qyhtpwrs noeoupse kyqhfqcc sslblurk zmizwogy efhpfvlz uyqpffth lmejkkfe lznlqbfm lozpcqvf cavssykz rqecnirg wvnjdpsh pwoxdake hrrqoeos lugcppzo xgolmari qlvrizxz izrvffgj slxnyrdh ueoioxlc vnepifvr urnyivcc aonrmntz fxaoigui cawrlsrj wgbikraq mmwydbpk drgjjnfu ghmvhxze bgpquqda fhqzrwzm alfqrejg rusdhhne phmbeawd gykauqjs rqhvdogj uinoaufa ajyubwtg brkezaqp vgxrztpn eocdzgqy ywgamtjc cxtbqzsi kpdcplxi daqougbn apidueru iqvstdjv hwlhtgmj ffnvoqxq zgbdrbpx wxasfubd agqzxrhf arkobzfr rnfaviiu eotqkzqc thowaxjw xuwbyrcu zvteqxdg fmusgkac mrkzanaw cvsxqqik eczfboas uannzbrp eeyqjswi rlnvjcrt htojcpxc xlgfxzpa kevjmcci wtnngzzy eukjokgu lukwngfo swjbgnne aqnoalsn xrgpqaee vvkjxjce wqnxbvue sugmvulo xesfggxt wbjosdcs bxvxhuzd jbrfcxmo osbllhtf sjmpquxu zmhmgryi kpgqisrf cidphtrv mmlsheyh pzvroxkc vnqplaqv jydfifna psozcgtf djsvwjso rebbldly rtyhkusj gtycpzhb zmqvypza jseujvly uqevsijv zifsplin bklkzwor njemvrea jvarnbqk suacduhj plsywcyh gzuqtcde uewybdsq talowkrn lnrwhxil usfaumdi ulynbjne xnfwrrzg pylfyxob abfhtxwo adplbzzy vdjwumlk mxevywmm zmwdfizx qzyjvuuu baixneqs dtpqreqt nshqrqsq mubjnhbl vmaehgdd juwbceby icvngilq xaphdmrh wywkzbmv zyiloiwq ikbqpefn ryumfutv deyypaan sddrzodz ycawmmmg xqkkhyrw qcbdtivh nyrorfst oabuzjaj okkzvzrq eiuvknpp pvzsajwe jwejbecs goorltcp gfrtmrok djkmwsvr pkavvdyc ymirfbiy shqmugsw coidjgzx gubxyfrc tjpznvoq bimeqbhn dtfflqvw vjclmwcz ilgpkcmm cugazknm xnofvrde jznbowrn lvnlitex ivfsjhci gjkxykrw xualkfdi cqurkqpy syysnrli cdycdlzt daahxehl xkbuiicw dnuedsjt tfhgspmk emdydllw hgykefuh qoajdlgb foxnhnyv pomkjwcu ieetjcal aeeznhvh uvgnogzl oomjvmbn mcizkwfw zcohvqvb ahhstkve jjxshbho dosicsln ituazben jtqofgvj kchbjdmh ddetmpst kgcqkfkn iwakllce fjrrtvlu hwzhmlvr cfqdfryi ahrbthol hthikklx htuqnllo qettkgxb tomabpfm fsrpqjdz gwwznybg znotmfkl qbhnixgt nzejatev ozqttcsr xagugyio hqbiafrk fmpzifzj erkbwcvp dugqqeri uzjmyglm nxxpkqam zvfotvrr lhodvydh pnbwgmtz srmsqutq alassxvj psejmcwd ejsikwmm uhqcrbcy odmrfxwt pzwgyfhz tyohgxra owlxzqtr wfboykzl mhrtcldh qlrijdwx iavtixom ddboqhjz ourvrkmr nkuzaajo odosfuwm hshvgphi oppiilnh gexiyalg wvpfuoal srsbyqek lmgpuizh bvstzqwi kdyryvhx xqmmnkvt uhlarqfq iuwttqak dqewvkev hpzpkgie zmisznzo eqbhskih zltduspr hwbnjqih gkzoocfc gonyifps jtcwnvsq lysrxmfk cqibtjve plulysxj lniyndps rssxbyjz hvyowaci fqojclxm dcoybxhh jutxpcqm pxgihbdm ueccfpfh inybustq btabsbju yjrluvts xcrttaqs imeunlnn mndzapdu kivehpow ckthmpvd dhjthadc rlndulbm udnxyrra pacmyvfb clsoabgs uejrmani povvlvyc lhecwuop ojmcfyxx vyahmkia dbdcoikv juaygglg lwdyhaak fzgxjiam kekmqjfo agxbpgkp kihwczpm lqgcmndx tzamegeb zpibqeko pgsvbxpu bhiwziur zjuvirhf ovdbteok hgidvavr tpvuvhuw ukbvxkjw pspciyxe dyincwoe tqslnrou ghgracsa ppgfpscc dtvjxeoc pwrcudmf tkmhfmrs gkbnbvjm yalvnnjj weqqugvi qverhlqa aktpjyft vsjrwtri jknvamdk krdrnzhg xbsitwul imunpnoy wdtzemyf jgnywghm jqzbrdob blobsdld oavpwktz lidycthz wefafrpi nrlevuec btcvuutt auenemcv qkchorpa yjeswaua yrklnnwp ozhzgnav fmdtzegf vbtkdjct wvimgloi mkqtrywq tbxvpebo mbfzkynp lftdevvf xnhcnhvm vbgugdsl xjtnvqad ccenrwio ulddbhjf bkreylrm rgbrhsxp hznswkyd tknoofwt jzdwiyck fxpasjaz ivnzcrjj ykvacnel ryovfden hfkjknqr ridgwzrf gygwedht elffptep sunyehsl hopffcay hcpqhdjn fewwjbym gmauabyr vyuqqdgt edwlugaz ngujjnrj afmjbtbu ztvbnuga cbkfbcbj iujmxiwy hinhkdhf mfqdfvgo jmnxskwu fcwpzjah lqjokehz nkdujexw flilnlej tcdhfigh qvbxzcqn eqrporzz zijakdmp woxasqjd ogaelsuy esukypst lxmmcjep kccboyww nacmafva ugditiki fkdqneqx xuwsgoqp qxrxgyzv ntoctzpi owoueamg uubmfzgd ufptzocr kpwojbav gvujhzsj ltftsfhe xpqkqhzh dnjhyjhr fmtclkcc bamldkzw duwjmwbt hvqmopjz wulszrlw rtmimmwy xuljkkwj jkhfwsli ukpfnaao blkwfqbj jfgxlqbe pcfrubaw znekxxhe zelrmmyw tkrzrikr fmhzrsvh izyqryky rjxwqrzt exzljibf cojdaojy yxcmekrt tciotcqn dashcebf rqqpekgy rounieet kmroychy pouvxrvb vidctpyx illutqiz tldtdcio mvezakwj vqqvqetv tkdeevgx pbmuopzk jqizwkbm elgstsao kpdebhgz jbiglheo lhsmlqhv venyyesr iadpzulh zwhowimt wrmrirpx kawwhbug hbktzoux dtvpiwxn rkjpgrxg ljxzrycr ngzjudna fnvsjyxq ijwwxjnw kxsyghym pbwrpthe orzhxbbj hixymryt myxwpnwn bxkbstxb hmzruslc cllqnuvl qmclxdnu oqdgplnp eopbmmzy wocuxyiw rojkwpfy cgevlmcf axtbsexe jsltrscz nqbharqa nqclzicc dkghuhjc haefmpof vgkyyxpi bcfeehdm qphyejbu uxeenczv iraqatwm nufdkzfl skmwpptz reldigbn feinvsxr oohbveku pojmczdw xgkghymi ewsztltq vkoiihzv jtndytqp knlfnmxj woofufga bfctbzrz pagrjpzj mcoabnrt citiglpf nvrczbof mefpzjfw zqqaikoe efgqztzj qhxywnjg friozyjo yayblmsk yhfodwbg itkrrdmk pbpljfze acnokrhk vzrzloym jxochzqk radcldap xgmmmytc sitiublq lphoqorg denjzshl bcgnoxbs ehthyvwp nbkteiik vayeiwzm ojvofsul mlmzbyhf nipbtdcx uwiclcyr yaxbsxhz wiwcvvht vfqqghrb postgres bxtufnek xbsjqbvv amxekblj lbfymghh hzvxiwbx kmezddbi noqddglp uojhfltr bsmdtcly wbargjod vetgxzoh xhzdqofi qavyilms xahcntyb xkdsuhmk ctfawdcp uhgtmjih qtjdokji qirccbht qqywdkfb xjzgvkia vevxdnux evbnklxn zrwvlvgv bixzkdnm ddbokrfa xhmaylyv lqziaxfa alghleug entogblq zdostxxj vqfpvwkz rvuzyeoq zlroflbh gadcjcul guyljdfw xxofepig gvvoyrku gldroiki gusleauu ngujflci uudgeysk hplxilrv hfasnidr kdozbvxa rdptdjsc jxrqtqqg mtahntnt vlquqgao fqlbcdry lewiezzp nokhxztc jrxykzwj hdhlyazj iuxtztqk wsozuavo vbsyxyle icrtqqyj qdtalkhc vzargzzf axzjnlzu jnvyonyu fqmiufge izdtfmxe ihemdvjq yfylvsaf iryrxcoy lhjpofdk nvusqouu ogckctik nmqcxequ axhumlnr lqhdtczw mrsrruwm asfiyvae cnelylqs vzlewhbn rjwvzokq pkehgyha rufxyqwn upjonqqo mzqelczd kmlhnjsx bjlkxsvy kdfykiob cztfnuzv upiwmape enniypgu zybgxqfo dpmlbtyr vrmdvqja emnpclqt xztyhmyt ruloofty zbxunmmo yaegylrw whyvhbjd kigxjalj iewdhjos osoffbjh cupkssjs jnqstahx tjcnxzee
Create a test table named Cars. The code below drops the Cars table if it already exists, and then creates the new table:
cursor.execute("DROP TABLE IF EXISTS Cars")
cursor.execute("CREATE TABLE Cars(Id INTEGER PRIMARY KEY, Name VARCHAR(20), Price INT)")
Run the following commands to create records in the new Cars table:
cursor.execute("INSERT INTO Cars VALUES(1,'Audi',52642)")
cursor.execute("INSERT INTO Cars VALUES(2,'Mercedes',57127)")
cursor.execute("INSERT INTO Cars VALUES(3,'Skoda',9000)")
cursor.execute("INSERT INTO Cars VALUES(4,'Volvo',29000)")
cursor.execute("INSERT INTO Cars VALUES(5,'Bentley',350000)")
cursor.execute("INSERT INTO Cars VALUES(6,'Citroen',21000)")
cursor.execute("INSERT INTO Cars VALUES(7,'Hummer',41400)")
cursor.execute("INSERT INTO Cars VALUES(8,'Volkswagen',21600)")
conn.commit()
The following Python code fetches and displays records from the Cars table:
cursor.execute("""SELECT * from Cars""")
rows = cursor.fetchall()
You can display the records neatly using pretty print:
print("\nShow me the databases:\n")
import pprint
pprint.pprint(rows)
Show me the databases: [(1, 'Audi', 52642), (2, 'Mercedes', 57127), (3, 'Skoda', 9000), (4, 'Volvo', 29000), (5, 'Bentley', 350000), (6, 'Citroen', 21000), (7, 'Hummer', 41400), (8, 'Volkswagen', 21600)]
Use a loop to show each row:
for row in rows:
print(" Number=", row[0] ," Name=", row[1]," Price", row[2])
Number= 1 Name= Audi Price 52642 Number= 2 Name= Mercedes Price 57127 Number= 3 Name= Skoda Price 9000 Number= 4 Name= Volvo Price 29000 Number= 5 Name= Bentley Price 350000 Number= 6 Name= Citroen Price 21000 Number= 7 Name= Hummer Price 41400 Number= 8 Name= Volkswagen Price 21600
Export data using copy_to() methods.
fout = open('cars.csv', 'w')
cursor.copy_to(fout, 'cars', sep=",")
Similarly, import data using copy_from() methods:
f = open('cars.csv', 'r')
cursor.copy_from(f, 'cars', sep=",")
conn.commit()
It is good practice to close your database connection after work is done:
conn.close()
This notebook demonstrated how to establish a connection to a PostgreSQL database from Python using the psycopg2 library.
Saeed Aghabozorgi, PhD, is a Data Scientist in IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge. He is a researcher in the data mining field and an expert in developing advanced analytic methods like machine learning and statistical modelling on large data sets.
Polong Lin is a Data Scientist at IBM in Canada. Under the Emerging Technologies division, Polong is responsible for educating the next generation of data scientists through Big Data University. Polong is a regular speaker in conferences and meetups, and holds an M.Sc. in Cognitive Psychology.
Copyright © 2017, 2018. Cognitive Class. This notebook and its source code are released under the terms of the MIT License.