Reading XML files into the database

This came up as a question on IRC: how to read an XML file on the server into a table column?

One solution, albeit far from ideal, would be:

create or replace function xml_import(filename text)
  returns xml
  volatile
  language plpgsql as
$f$
  declare
    content bytea;
    loid oid;
    lfd integer;
    lsize integer;
  begin
    loid := lo_import(filename);
    lfd := lo_open(loid,262144);
    lsize := lo_lseek(lfd,0,2);
    perform lo_lseek(lfd,0,0);
    content := loread(lfd,lsize);
    perform lo_close(lfd);
    perform lo_unlink(loid);
    return xmlparse(document convert_from(content,'UTF8'));
  end;
$f$;

This obviously only works when called as superuser; it’s also not safe to make it security-definer without also coding in a lot of restrictions on the filename. A possible refinement would be to examine the file content to try and determine the actual encoding, rather than hardcoding that as UTF-8.

EDIT:
Changing the return type to bytea and the last line of the function body to

    return content;

allows the code to work for binary files instead.

This entry was posted in PlanetPG, Postgres. Bookmark the permalink.

3 Responses to Reading XML files into the database

  1. Andrew says:

    something is very broken with this blog software; it took me two days (two days!) to find a way to post that code fragment without having the whitespace eaten by the editor.

  2. gregj says:

    I thought that XML by default should always be utf8, and other encodings are enforced/nonstandart.

  3. Andrew says:

    <?xml encoding="iso-8859-1">

Comments are closed.