Home > postgresql > Best way to update the content of large lookups tables

Best way to update the content of large lookups tables

October 9Hits:1
Advertisement

In our application we have a about 20 tables that contain lookup information (read only data that the application doesn't modify).

Nightly we'd like to update this information by deleting all the content and running a script that run a series of inserts and updates. This script could be a couple of million lines long.

I was thinking to run this within a transaction.

What behaviour should I expect from the database? Will the tables be updated at the end of the transaction without downtime?

Is there something I should be aware of when running large transactions?

The database I'm using is Postgres 9.3

Thanks.

Answers

Assuming that the script contains only INSERTs, UPDATEs and DELETEs to the target table(s), and that the app never attempts row locks (SELECT ... FOR UPDATE/SHARE) or DML on those tables, then the script shouldn't affect the app except for the increased load on the DB server.

When the script commits, the changes will instantly become visible to new statements (if the transaction is in READ COMMITTED isolation) or transactions (if they're in SERIALIZABLE isolation).

However...

Nightly we'd like to update this information by deleting all the content and running a script that run a series of inserts and updates. This script could be a couple of million lines long.

You should use COPY for this instead of a script full of inserts and updates, if possible.

Just:

  • DELETE FROM mytable;
  • VACUUM mytable;
  • COPY mytable FROM 'inputfile.csv' WITH (FORMAT CSV)

or similar.

It will be a lot faster. Because of the next point, you might want to load several smaller files though.

Is there something I should be aware of when running large transactions?

Long running data modifying transactions prevent VACUUM from cleaning up rows in other tables that they might be able to "see". This can create quite a lot of table bloat.

If you use READ COMMITTED isolation and stick to individual statements that don't take an extremely long time you won't have a problem, because the snapshot that controls which rows the transaction can see gets updated after each statement.

It's mostly a concern if you're using cursors, working in SERIALIZABLE isolation, or running extremely long-running individual data modifying statements.

If you have a foreign key constraint from the data table that points to any of these 20 tables you will not be able to bulk delete from them.

Obviously the application will not receive any results if it tries to read any of the lookup tables after the deletion and before the insertion of the new values. Will the application crash if this happens?

Deleting and loading millions of rows will blow out the size of your transaction files. Ensure you have enough disk space to handle this and appropriate log maintenance procedures in place.

I would like to suggest a better architecture to you. Load the new data into a set of staging tables. There will be a staging table for each lookup table, with the same columns and data types. Run verification against these staging tables, if required. Perform and "upsert" (aka Merge) from each staging table to its corresponding lookup table. Perform this in batches of a few thousand rows at a time, with an explicit transaction. The ideal batch size will depend on your system and requirements. This will afford good restart capabilities in the case of an error and may limit log file growth, depending on your setup. Once complete, truncate the staging tables ready for the next run.

Related Articles

  • Best way to update the content of large lookups tablesOctober 9

    In our application we have a about 20 tables that contain lookup information (read only data that the application doesn't modify). Nightly we'd like to update this information by deleting all the content and running a script that run a series of inse

  • arcpy - ascii encoding error when updating field content in SQL Server tableAugust 20

    I know there are plenty of posts and blog explanations about unicode errors, but I still can't figure out to handle it in my particular case. So here is my problem: I am writing a Python script to update records in an SQL Server table, with field con

  • Updating a content type via feature receiver at the list definition levelSeptember 28

    Problem is I am trying to update a content type (created within a list definition, going into a new list instance) and it is not liking the ID I pass in code. I tried passing JUST the child ID, parent & child, and full (base, parent, child) ID. All o

  • Rules help: After updating existing content

    Rules help: After updating existing contentJuly 12

    First time using Rules to make a complicated rule. Would appreciate some help. I'm making rules to send an e-mail, to a pre-defined e-mail address, after an existing node is updated. This node is of a certain type, and this type has a taxonomy term r

  • Update existing content using migrate moduleAugust 14

    I'm using migrate module in Drupal 7 to update existing content of a Drupal site. My source and destination tables are the same. I got some documentation from http://drupal.org/node/1117454. Here's my code: <?php class MTestMigration extends Migratio

  • Rules: after saving new content + after updating existing content both triggered on node createJanuary 24

    I have two rules - Triggers - after new content is saved Triggers - after updating existing content Problem When I create a new node both rules are getting triggered together. Probable Cause I have an image field in my content type. Something that I

  • SCCM 2012: How to properly update the content of an application?

    SCCM 2012: How to properly update the content of an application?February 18

    I recently set up a new SCCM 2012 environment at my workplace and now we are creating our applications for distribution. Some applications are set up using a script. When during testing, something was not right and the content of the application need

  • Update list content using a web part

    Update list content using a web partMay 23

    I'm currently displaying content from a list using a webpart, is it possible to update the content of this list using the webpart ? I would prefer to use an 'out of the box' web part. How can I create a link to the edit form of the List ? -----------

  • Drush command to update all contentAugust 6

    We have a client who was adding content to a separated dev server, the dev server is whitelisted only and doesn't allow incoming/outgoing connections to alot of services. The client was adding videos with the video embed field module, which automatic

  • Unable save/update any content after migrating from local to productionJanuary 19

    This is the first time I have made my Drupal site live on a production server (Ubuntu ls 10.04). These are the steps I did: Zip the entire webroot folder Upload to the production server Extract the contents into the webroot Take db dump from local se

  • Update all content types inheriting from this type?May 5

    I've added a column to a content type view of a list, and while changing the column order SharePoint asks this: Update Sites and Lists Specify whether all child site and list content types using this type should be updated with the settings on this p

  • Rules: after saving new content and after updating existing content both triggered on node createJuly 17

    I define two rules that should send an email once new node has been created. And second rule once user updated his content. The problem I am having here is that both rules trigger in one time when your create a new node. I found the same issue has be

  • Update Page Content using SPServicesOctober 3

    I am trying to update a Page's content using SPServices. When I run my code, it does update the correct page, but instead of updating the content, it just removes the content. var newContent = "<p>This is a test</p>"; $().SPServices(

  • Hyper-V how to update the contents of virtual machine to physical machineJune 17

    I have created a virtual machine from a physical machine. My question is if am working on virtual machine then how to update the contents on physical machine. I have created the virtual machine for back-up purpose, if physical machine is going down I

  • Feeds XPath Parser - Update existing contentSeptember 28

    I have created a Feed Importer and used XPath Parser for mapping. I have problem with updating a existing content. For example I have this xml structure. <node> <nid>1</nid> <title>Title 1</title> </node> <node> &

  • How to cancel task "Updating Navigator Content Viewer" in eclipse?January 22

    My eclipse is ver slow because the task "Updating Navigator Content Viewer" is always running. i didn't find any option to disable this. Do you know how i can remove this task?

  • How to update the content of a div when a form is submitted using javaScriptJanuary 26

    I would like to update the content of #contentRight when either editForm-1 or editForm-2 is submitted with the submitted form values. I can get the response output from the updateDB.php file and see it at the alert("Data: " + theResponse + "

  • How to update foreign keys in a large fact table using FORALL?May 26

    I have 2 tables, one is a fact table (100000000 records), and the other is a catalog (2000), I need to update the FK at the fact table, can I use FORALL in this case?. I wrote this but it doesn't work. CURSOR cur_matched IS SELECT PER_C_PERIOD_ORIG,

  • Update row contents on some conditionDecember 8

    For example, let's say I'm building a database for an auction system, in which various users can partake in both selling and bidding on different items, I'd want for an "Auction" instance to also store the highest bidder (Which should obviously

  • SOLVED: Postgres 8.4.5: Bad query plan vs Good query plan when changing date range on large data tableMarch 18

    SOLVED. I was able to resolve this issue by altering the time.date column to allow 1000 stats: ALTER TABLE time ALTER COLUMN date SET STATISTICS 1000; Now the queries are running very fast, down to 300ms from 5s. I'll be looking into patching postgre

Copyright (C) 2017 ceus-now.com, All Rights Reserved. webmaster#ceus-now.com 14 q. 0.797 s.