Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ch112819] Error importing KML files with slashes in 'name' nodes #15897

Merged
merged 10 commits into from
Nov 3, 2020

Conversation

Shylpx
Copy link
Collaborator

@Shylpx Shylpx commented Oct 22, 2020

Resources

Changes

  • When a KML is imported, sanitize the layer names of the file before create one file by layer using that names. The sanitized method is it the same used to sanitize the file names in order to create dataset tables, respecting backward compatibility this way.
  • Minor Rubocop fixes.

@Shylpx Shylpx requested a review from amiedes October 22, 2020 11:11
@Shylpx Shylpx self-assigned this Oct 22, 2020
@Algunenano
Copy link
Contributor

Can you please explain the relation between the CH ticket and this? Maybe it's wrong. Also, pinging @jgoizueta since he has been solving many issues in the different importers similar to this one and he's aware of the many implications a change like this can have in the service.

@Shylpx Shylpx changed the title [ch113184] Error importing KML files with slashes in 'name' nodes [ch112819] Error importing KML files with slashes in 'name' nodes Oct 22, 2020
@Shylpx
Copy link
Collaborator Author

Shylpx commented Oct 22, 2020

@Algunenano You are right, I was working with another story and I added the wrong one. It is updated now.

Copy link
Contributor

@amiedes amiedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments. Good work and good idea trying to reuse core rails methods 💪 🎉

services/importer/lib/importer/kml_splitter.rb Outdated Show resolved Hide resolved
services/importer/spec/acceptance/kml_spec.rb Outdated Show resolved Hide resolved
@amiedes amiedes requested a review from jgoizueta October 22, 2020 13:59
@Shylpx Shylpx requested a review from amiedes October 23, 2020 11:17
@jgoizueta
Copy link
Contributor

Hey, a couple of questions to have in mind:

The final names of tables generated by imports are sanitized and adapted to PG requirements here:

cartodb/app/models/table.rb

Lines 1298 to 1305 in 97688bf

# Gets a valid postgresql table name for a given database
# See http://www.postgresql.org/docs/9.5/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
def get_valid_name(contendent)
user_table_names = owner.tables.map(&:name)
# We only want to check for UserTables names
Carto::ValidTableNameProposer.new.propose_valid_table_name(contendent, taken_names: user_table_names)
end

But for KML layers we need first to make sure we have names that are valid for the filesystem.

And we have to take care with existing sync tables: if we change the name of the generated table we make break them. This means we should alter only layer names that would cause errors (invalid file names) and not layer names that currently cause no error, so could be in use in some sync, or we could make sure the transmormations performed are consistent with get_valid_name. I'm not sure it would be a good idea to use the method get_valid_name itself here, since it also avoids table name collisions, so it may not be idempotent, and I don't like the idea of checking user tables names here too.

😅 Hey I feel I'm bringing too much complication into a simple matter 🤔 I'm tempted to check if there are any sync tables at all in production using KML files, if there aren't we could save ourselves some trouble. If we know KML syncs aren't too common I wouldn't care for these details.

@jgoizueta
Copy link
Contributor

We also have this intented to produced valid file names (and it seems used by Carto::ValidTableNameProposer

def self.sanitize_identifier(identifier, replacement_character: '_')
if replacement_character =~ DISALLOWED_CHARACTERS
raise "Unsafe replacement character '#{replacement_character}'"
else
identifier.gsub(DISALLOWED_CHARACTERS, replacement_character)
end
end

Maybe the ActiveStorage method is more comprehensive, but I think this will be good enough for our purposes, and since it's been used finally for filenames there should be less chance of altering the final result.

Copy link
Contributor

@jgoizueta jgoizueta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use Carto::FileSystem::Sanitize.sanitize_identifier for consistency with table naming and other uses.

If you find the ActiveStorage sanitize method has some advantage I would incorporate it into our method (but beware of altering existing syncs).

services/importer/lib/importer/kml_splitter.rb Outdated Show resolved Hide resolved
@Shylpx
Copy link
Collaborator Author

Shylpx commented Oct 27, 2020

@jgoizueta Thank you very much, I think using Carto::FileSystem::Sanitize.sanitize_identifier is the best approach. This way we keep the consistency without the need of checking the existent sync KML tables, and at the same time we are solving the problem by removing the slashes from the KML layer names.

@Shylpx Shylpx requested a review from jgoizueta October 27, 2020 14:47
Copy link
Contributor

@jgoizueta jgoizueta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏁

@Shylpx Shylpx merged commit f7d8980 into master Nov 3, 2020
@thedae thedae mentioned this pull request Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants