PostgreSQL is broken #45

beanumber · 2016-06-10T04:37:55Z

One problem is that the dbWriteTable method from the RPostgreSQL package will not actually accept the HEADER flag (https://github.com/cran/RPostgreSQL/blob/master/R/PostgreSQLSupport.R#L570)
This breaks the current implementation.
Possible workarounds:

use the data.frame method -- but that will be slow and memory inefficient
don't write the header row in the CSV if you are using Postgres. Kludgy.
wait for @hadley et al to get the alternative RPostgres package working. This will probably be the long-term solution, but I don't know how long it will take.

I believe this problem applies to all etl packages that use the dbWriteTable character method.

The text was updated successfully, but these errors were encountered:

homerhanumat · 2017-12-08T16:52:36Z

Do you know if package RPostgres is now up to the task? I'm planning to create an airlines DB and would like to use PostgreSQL if possible.

beanumber · 2017-12-08T18:08:37Z

@homerhanumat I haven't tried it in a long time, but I know that package is under active development. Can you try it out and report back?

homerhanumat · 2017-12-08T18:15:42Z

Will do. Thanks, by the way, for MDSR -- very impressive text.

homerhanumat · 2017-12-08T21:35:39Z

Might be a while before I get to try it. My remote Ubuntu-based server installs things fine but has only about 30G disk space to devote to the DB. My laptop is a Mac with plenty of space but RPostgres installation fails on Mac at the moment: r-dbi/RPostgres#155.

nicholasjhorton · 2017-12-08T21:44:30Z

Can you test with a single years worth of data? All the best, Nick

…

On Dec 8, 2017, at 4:35 PM, Homer White ***@***.***> wrote: Might be a while before I get to try it. My remote Ubuntu-based server installs things fine but has only about 30G disk space to devote to the DB. My laptop is a Mac with plenty of space but RPostgres installation fails on Mac at the moment: r-dbi/RPostgres#155. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

homerhanumat · 2017-12-08T22:04:56Z

2015 through the problematic 2016 -- sure, I can give ti a try.

homerhanumat · 2017-12-08T23:37:26Z

I tried this:

library(airlines)
db <- src_postgres(RPostgres::Postgres(), user = "homer",
                password = "XX",
                host = "localhost",
                dbname = "airlines",
                port = 5432)
airlines <- etl("airlines", db, dir = "~/dumps/airlines")
airlines %>%
  etl_init() %>%
  etl_update(years = 2015)

Got an error saying that it could not find the initialization script.

So I tried again with an explicit reference to the script in the package:

library(airlines)
db <- src_postgres(RPostgres::Postgres(), user = "homer",
                password = "XX",
                host = "localhost",
                dbname = "airlines",
                port = 5432)
script <- system.file("sql/init.postgres", package = "airlines")
airlines <- etl("airlines", db, dir = "~/dumps/airlines")
airlines %>%
  etl_init(script = script) %>%
  etl_update(years = 2015)

Got two errors, the second of which was a complaint that it could not parse the first commented line of the script.

I copied the script into my home directory, deleted the comment-lines, and tried again:

library(airlines)
db <- src_postgres(RPostgres::Postgres(), user = "homer",
                password = "XX",
                host = "localhost",
                dbname = "airlines",
                port = 5432)
script <- "/home/homer/init.postgres"
airlines <- etl("airlines", db, dir = "~/dumps/airlines")
airlines %>%
  etl_init(script = script) %>%
  etl_update(years = 2015)

Now I'm down to just the one error, which is:

Running SQL script at /home/homer/init.postgres
trying URL 'http://www.transtats.bts.gov/Download_Lookup.asp?Lookup=L_UNIQUE_CARRIERS'
Content type 'Application/X-Unknown' length 49465 bytes (48 KB)
==================================================
downloaded 48 KB

Parsed with column specification:
cols(
  Code = col_character(),
  Description = col_character()
)
trying URL 'https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat'
Content type 'text/plain; charset=utf-8' length 1068028 bytes (1.0 MB)
==================================================
downloaded 1.0 MB

Parsed with column specification:
cols(
  X1 = col_integer(),
  X2 = col_character(),
  X3 = col_character(),
  X4 = col_character(),
  X5 = col_character(),
  X6 = col_character(),
  X7 = col_double(),
  X8 = col_double(),
  X9 = col_integer(),
  X10 = col_double(),
  X11 = col_character(),
  X12 = col_character(),
  X13 = col_character(),
  X14 = col_character()
)
Warning: 322 parsing failures.
row # A tibble: 5 x 5 col     row   col           expected actual                                          file expected   <int> <chr>              <chr>  <chr>                                         <chr> actual 1   329    X2 delimiter or quote      C '/home/homer/dumps/airlines/raw/airports.dat' file 2   329    X2 delimiter or quote        '/home/homer/dumps/airlines/raw/airports.dat' row 3   666    X2 delimiter or quote      S '/home/homer/dumps/airlines/raw/airports.dat' col 4   666    X2 delimiter or quote        '/home/homer/dumps/airlines/raw/airports.dat' expected 5  1310    X2 delimiter or quote      M '/home/homer/dumps/airlines/raw/airports.dat'
... ................. ... ..................................................................................... ........ ..................................................................................... ...... ..................................................................................... .... ....... [... truncated]
Error in filter_impl(.data, quo) : found duplicated column name: NA
In addition: Warning message:
In rbind(names(probs), probs_f) :
  number of columns of result is not a multiple of vector length (arg 1)

Thoughts?

homerhanumat · 2017-12-10T03:58:05Z

The above error occurs in the extract phase and is unrelated to choice of database. Perhaps the structure of airports.dat has changed during the past year.

homerhanumat · 2017-12-10T05:39:54Z

Yes, some raw data sets have changed, some URLs have changed, and some dplyr underscore functions no longer work. I'll submit a pull request if I can get everything working.

homerhanumat · 2017-12-11T03:54:19Z

So I have updated the package (here)to account for changes in location and structure of source files and in dplyr as well. It now works again with the RMySQL driver. It fails using RPostgres, with an error that suggests that the RPostgres method for DBI::dbWriteTable() attempts to write the header row of a flights csv file into the flights table, just as with RPostgreSQL.

homerhanumat · 2017-12-12T05:11:34Z

Latest update to my fork of airlines permits creation and updating of a Postgres DB, loading the processed flights csv file to a PostgresDB with a direct sql COPY table FROM command instead of going through dbWriteTable(). Tested on 2015 and the first three month of 2016, with no parsing problems or other errors.

Monthly updates (both in MySQL and Postgres) conclude with a warning:

Missing column names filled in: 'X110' [110]

This occurs because the flight csv files have a spurious trailing comma on the first line, so no harm is done.

At this point one should not build the connection with RPostgres::Postgres(), as dplyr does not yet support it as a back-end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostgreSQL is broken #45

PostgreSQL is broken #45

beanumber commented Jun 10, 2016

homerhanumat commented Dec 8, 2017

beanumber commented Dec 8, 2017

homerhanumat commented Dec 8, 2017

homerhanumat commented Dec 8, 2017

nicholasjhorton commented Dec 8, 2017 via email

homerhanumat commented Dec 8, 2017

homerhanumat commented Dec 8, 2017 •

edited

Loading

homerhanumat commented Dec 10, 2017

homerhanumat commented Dec 10, 2017

homerhanumat commented Dec 11, 2017

homerhanumat commented Dec 12, 2017 •

edited

Loading

PostgreSQL is broken #45

PostgreSQL is broken #45

Comments

beanumber commented Jun 10, 2016

homerhanumat commented Dec 8, 2017

beanumber commented Dec 8, 2017

homerhanumat commented Dec 8, 2017

homerhanumat commented Dec 8, 2017

nicholasjhorton commented Dec 8, 2017 via email

homerhanumat commented Dec 8, 2017

homerhanumat commented Dec 8, 2017 • edited Loading

homerhanumat commented Dec 10, 2017

homerhanumat commented Dec 10, 2017

homerhanumat commented Dec 11, 2017

homerhanumat commented Dec 12, 2017 • edited Loading

homerhanumat commented Dec 8, 2017 •

edited

Loading

homerhanumat commented Dec 12, 2017 •

edited

Loading