Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versioned column sanitization #15326

Merged
merged 34 commits into from
Jan 17, 2020
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
33c9a8b
Version column sanitization (WIP)
jgoizueta Dec 17, 2019
ae5622b
Fix extra_options modiication
jgoizueta Dec 18, 2019
2f1a1e9
Remove reserved_words options
jgoizueta Dec 18, 2019
4d27846
Fix sanitization version
jgoizueta Dec 18, 2019
40585a9
Use existing column sanitization
jgoizueta Dec 18, 2019
5cc3e11
Fix fetching Table
jgoizueta Dec 18, 2019
e282e04
Make method public
jgoizueta Dec 18, 2019
69ce107
Add NEWS entry
jgoizueta Dec 18, 2019
77228e5
Refactor sanitization functions
jgoizueta Dec 18, 2019
198a30e
Fix syntax
jgoizueta Dec 18, 2019
ab4db5a
Fix reserved name detection
jgoizueta Dec 18, 2019
86a9ed2
Tests for legacy sanitization
jgoizueta Dec 19, 2019
4b7be30
Remove unnecessary parameter
jgoizueta Dec 19, 2019
c24ea38
More sanitization tests
jgoizueta Dec 19, 2019
b0a65b4
Implement v2 sanitization
jgoizueta Dec 19, 2019
9e2dc67
Rename cyrillic parameter
jgoizueta Dec 19, 2019
777f755
Add rejected columns to preserve behaviour when adding columns
jgoizueta Dec 19, 2019
03bea34
Make StringSanitizer methods module functions
jgoizueta Dec 19, 2019
0527f31
Fix bug
jgoizueta Dec 19, 2019
b2e30b5
Fix column modification
jgoizueta Dec 19, 2019
cc788b3
Preserve sanitization behaviour for multiple underscores
jgoizueta Dec 19, 2019
c553e43
Adapt tests to new multiple underscore behaviour
jgoizueta Dec 19, 2019
bcf74c9
Neew sanitization tests
jgoizueta Dec 19, 2019
952b0bf
Bugfixing
jgoizueta Dec 19, 2019
4183bbf
Remove test case
jgoizueta Dec 19, 2019
6b481f8
Test using ActiveRecord transliteration
jgoizueta Dec 19, 2019
e1b2e15
Adjust tests to new sanitization
jgoizueta Dec 19, 2019
1ff4270
Fix reserved column names
jgoizueta Dec 19, 2019
d537a80
Allow asigning user_id in Table constructor
jgoizueta Dec 20, 2019
4eb840a
Fix access to Table data_import from column_sanitization_version
jgoizueta Dec 20, 2019
1726064
Code simplification
jgoizueta Jan 13, 2020
c0d773e
Separate methods per sanitization version
jgoizueta Jan 16, 2020
635f924
Merge branch 'master' into column_sanitization
jgoizueta Jan 16, 2020
8f7c420
Fix NEWS
jgoizueta Jan 17, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ sudo make install
* Fixed issue while creating a new user's database: force to alter extension always as template_postgis may have the current version defined and the extension won't be installed

### Bug fixes / enhancements
* New versioned sanitization of column names ([#15326](https://github.com/CartoDB/cartodb/issues/15326))
gonzaloriestra marked this conversation as resolved.
Show resolved Hide resolved
* Avoid warnings when running test in parallel with an empty environment
* Improve concurrent Ghost Tables syncs handling ([#15272](https://github.com/CartoDB/cartodb/pull/15272))
* Fix consent screen in OAuth apps without user ([#15247](https://github.com/CartoDB/cartodb/pull/15247))
Expand Down
23 changes: 17 additions & 6 deletions app/models/data_import.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
require_relative '../../services/importer/lib/importer/overviews'
require_relative '../../services/importer/lib/importer/connector_runner'
require_relative '../../services/importer/lib/importer/exceptions'
require_relative '../../services/importer/lib/importer/column'

require_dependency 'carto/tracking/events'
require_dependency 'carto/valid_table_name_proposer'
Expand Down Expand Up @@ -118,9 +119,10 @@ def after_initialize
# New ones are already tracked during the data_import create inside the controller
# For the old dashboard
def before_create
if self.from_common_data?
self.extra_options = self.extra_options.merge({:common_data => true})
if from_common_data?
self.extra_options = extra_options.merge(common_data: true)
end
self.extra_options = extra_options.merge(column_sanitization_version: CartoDB::Importer2::Column::CURRENT_COLUMN_SANITIZATION_VERSION)
end

def before_save
Expand Down Expand Up @@ -426,6 +428,10 @@ def final_state?
[STATE_COMPLETE, STATE_FAILURE, STATE_STUCK].include?(state)
end

def column_sanitization_version
extra_options[:column_sanitization_version] || CartoDB::Importer2::Column::INITIAL_COLUMN_SANITIZATION_VERSION
end

private

def dispatch
Expand Down Expand Up @@ -574,12 +580,17 @@ def overwrite_strategy?
collision_strategy == Carto::DataImportConstants::COLLISION_STRATEGY_OVERWRITE
end


def sanitize_columns(table_name)
Table.sanitize_columns(table_name, {
# TODO: is this called before table is registered? otherwise we should use Table#sanitize_columns
Table.sanitize_columns(
table_name,
column_sanitization_version,
{
connection: current_user.in_database,
database_schema: current_user.database_schema,
reserved_words: CartoDB::Importer2::Column::RESERVED_WORDS
})
database_schema: current_user.database_schema
}
)
end

def migrate_existing(imported_name)
Expand Down
10 changes: 9 additions & 1 deletion app/models/synchronization/adapter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,15 @@ def import_cleanup(schema_name, table_name)
user.db_service.in_database_direct_connection(statement_timeout: STATEMENT_TIMEOUT) do |user_database|

# For consistency with regular imports, also eases testing
Table.sanitize_columns(table_name, {database_schema: schema_name, connection: user_database})
# The sanitization of @table_name is applied to the newly imported table_name
# This should not be necessary, since setup_table, called after cartodbfication
# also perfomes column sanitization via Table#import_to_cartodb
# but is kept because there may be existing syncs for which this double sanitization
# (version 1 sanitization which wasn't idemponent) had the effect of altering some
# satinizated names (e.g. __1 -> _1).
table = ::Table.new(name: @table_name, user_id: @user.id)
# we could as well: table = Carto::UserTable.find(@user.tables.where(name: @table_name).first.id).service
table.sanitize_columns(table_name: table_name, database_schema: schema_name, connection: user_database)

# When tables are created using ogr2ogr they are added a ogc_fid or gid primary key
# In that case:
Expand Down
100 changes: 47 additions & 53 deletions app/models/table.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
require_relative './visualization/table_blender'
require_relative '../../services/importer/lib/importer/query_batcher'
require_relative '../../services/importer/lib/importer/cartodbfy_time'
require_relative '../../services/importer/lib/importer/column'
require_relative '../../services/datasources/lib/datasources/decorators/factory'
require_relative '../../services/table-geocoder/lib/internal-geocoder/latitude_longitude'
require_relative '../model_factories/layer_factory'
Expand All @@ -37,13 +38,6 @@ class Table
GEOMETRY_TYPES_PRESENT_CACHING_TIMEOUT = 24.hours
STATEMENT_TIMEOUT = 1.hour * 1000

# See http://www.postgresql.org/docs/9.5/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
PG_IDENTIFIER_MAX_LENGTH = 63

# @see services/importer/lib/importer/column.rb -> RESERVED_WORDS
# @see config/initializers/carto_db.rb -> RESERVED_COLUMN_NAMES
RESERVED_COLUMN_NAMES = %w(oid tableoid xmin cmin xmax cmax ctid ogc_fid).freeze

DEFAULT_THE_GEOM_TYPE = 'geometry'

VALID_GEOMETRY_TYPES = %W{ geometry multipolygon point multilinestring }
Expand All @@ -59,6 +53,7 @@ def initialize(args = {})
else
@user_table = args[:user_table]
end
self.user_id = args[:user_id] if args[:user_id].present?
# TODO: this probably makes sense only if user_table is not passed as argument
@user_table.set_service(self)
end
Expand Down Expand Up @@ -214,20 +209,23 @@ def self.table_and_schema(table_name)
end
end

def data_import
@data_import ||= DataImport.where(id: @user_table.data_import_id).first || DataImport.new(user_id: owner.id)
end

## Callbacks

def import_to_cartodb(uniname = nil)
@data_import ||= DataImport.where(id: @user_table.data_import_id).first || DataImport.new(user_id: owner.id)
if migrate_existing_table.present? || uniname
@data_import.data_type = DataImport::TYPE_EXTERNAL_TABLE if @data_import.data_type.nil?
@data_import.data_source = migrate_existing_table || uniname
@data_import.save
data_import.data_type = DataImport::TYPE_EXTERNAL_TABLE if data_import.data_type.nil?
data_import.data_source = migrate_existing_table || uniname
data_import.save

# ensure unique name, also ensures self.name can override any imported table name
uniname = get_valid_name(name ? name : migrate_existing_table) unless uniname

# Make sure column names are sanitized. Make it consistently.
self.class.sanitize_columns(uniname, {database_schema: owner.database_schema, connection: owner.in_database})
sanitize_columns(table_name: uniname, database_schema: owner.database_schema, connection: owner.in_database)

# with table #{uniname} table created now run migrator to CartoDBify
hash_in = ::SequelRails.configuration.environment_for(Rails.env).merge(
Expand All @@ -242,12 +240,12 @@ def import_to_cartodb(uniname = nil)
:debug => Rails.env.development?,
:remaining_quota => owner.remaining_quota,
:remaining_tables => owner.remaining_table_quota,
:data_import_id => @data_import.id
:data_import_id => data_import.id
).symbolize_keys
importer = CartoDB::Migrator.new(hash_in)
imported_name = importer.migrate!
@data_import.reload
@data_import.save
data_import.reload
data_import.save
imported_name
end
end
Expand Down Expand Up @@ -676,10 +674,11 @@ def make_sequel_compatible(attributes)
end

def add_column!(options)
raise CartoDB::InvalidColumnName if RESERVED_COLUMN_NAMES.include?(options[:name]) || options[:name] =~ /^[0-9]/
raise CartoDB::InvalidColumnName if CartoDB::Importer2::Column.rejected?(options[:name])
type = options[:type].convert_to_db_type
cartodb_type = options[:type].convert_to_cartodb_type
column_name = options[:name].to_s.sanitize_column_name
# FIXME: consider CartoDB::Importer2::Column.get_valid_column_name with CURRENT_COLUMN_SANITIZATION_VERSION
column_name = CartoDB::Importer2::Column.sanitize_name(options[:name].to_s)
owner.in_database.add_column name, column_name, type

update_cdb_tablemetadata
Expand All @@ -700,16 +699,19 @@ def drop_column!(options)
end

def modify_column!(options)
old_name = options.fetch(:name, '').to_s.sanitize
new_name = options.fetch(:new_name, '').to_s.sanitize
raise 'This column cannot be modified' if CARTODB_COLUMNS.include?(old_name.to_s)
old_name = options.fetch(:name, '').to_s
new_name = options.fetch(:new_name, '').to_s

# FIXME: consider CartoDB::Importer2::Column.get_valid_column_name with CURRENT_COLUMN_SANITIZATION_VERSION
old_sanitized_name = CartoDB::Importer2::Column.sanitize_name(old_name)
raise 'This column cannot be modified' if CARTODB_COLUMNS.include?(old_sanitized_name.to_s)

if new_name.present? && new_name != old_name
new_name = new_name.sanitize_column_name
rename_column(old_name, new_name)
new_sanitized_name = CartoDB::Importer2::Column.sanitize_name(new_name)
rename_column(old_sanitized_name, new_sanitized_name)
end

column_name = (new_name.present? ? new_name : old_name)
column_name = (new_name.present? ? new_sanitized_name : old_sanitized_name)
convert_column_datatype(owner.in_database, name, column_name, options[:type])
column_type = column_type_for(column_name)

Expand All @@ -731,7 +733,7 @@ def rename_column(old_name, new_name='')
raise 'Please provide a column name' if new_name.empty?
raise 'This column cannot be renamed' if CARTODB_COLUMNS.include?(old_name.to_s)

if new_name =~ /^[0-9]/ || RESERVED_COLUMN_NAMES.include?(new_name) || CARTODB_COLUMNS.include?(new_name)
if CartoDB::Importer2::Column.reserved_or_unsupported?(new_name) || CARTODB_COLUMNS.include?(new_name)
raise CartoDB::InvalidColumnName, 'That column name is reserved, please choose a different one'
end

Expand Down Expand Up @@ -1221,6 +1223,15 @@ def update_bounding_box
SequelRails.connection.run(update_sql)
end

# Apply the sanitization defined for this table.
# If the table_name parameter is passed, the sanitization which was originally applied to self
# is applied to the table so named.
# If no table_name parameters is paseed the normalization is applied to self.
# This allows re-applying the same normalization to imported tables.
def sanitize_columns(table_name: name, **options)
self.class.sanitize_columns(table_name, column_sanitization_version, options)
end

private

def valid_cartodb_id_candidate?(col_name)
Expand Down Expand Up @@ -1303,14 +1314,19 @@ def get_valid_name(contendent)
Carto::ValidTableNameProposer.new.propose_valid_table_name(contendent, taken_names: user_table_names)
end

def self.sanitize_columns(table_name, options={})
def column_sanitization_version
data_import&.column_sanitization_version || CartoDB::Importer2::Column::NO_COLUMN_SANITIZATION_VERSION
end

def self.sanitize_columns(table_name, column_sanitization_version, options={})

connection = options.fetch(:connection)
database_schema = options.fetch(:database_schema, 'public')

connection.schema(table_name, schema: database_schema, reload: true).each do |column|
column_name = column[0].to_s
column_type = column[1][:db_type]
column_name = ensure_column_has_valid_name(table_name, column_name, options)
column_name = ensure_column_has_valid_name(table_name, column_name, column_sanitization_version, options)
if column_type == 'unknown'
CartoDB::ColumnTypecaster.new(
user_database: connection,
Expand All @@ -1323,42 +1339,20 @@ def self.sanitize_columns(table_name, options={})
end
end

def self.ensure_column_has_valid_name(table_name, column_name, options={})
def self.ensure_column_has_valid_name(table_name, column_name, column_sanitization_version, options={})
connection = options.fetch(:connection)
database_schema = options.fetch(:database_schema, 'public')

valid_column_name = get_valid_column_name(table_name, column_name, options)
valid_column_name = get_valid_column_name(table_name, column_name, column_sanitization_version, options)
if valid_column_name != column_name
connection.run(%Q{ALTER TABLE "#{database_schema}"."#{table_name}" RENAME COLUMN "#{column_name}" TO "#{valid_column_name}";})
end

valid_column_name
end

def self.get_valid_column_name(table_name, candidate_column_name, options={})
reserved_words = options.fetch(:reserved_words, [])

existing_names = get_column_names(table_name, options) - [candidate_column_name]

candidate_column_name = 'untitled_column' if candidate_column_name.blank?
candidate_column_name = candidate_column_name.to_s.squish

# Subsequent characters can be letters, underscores or digits
candidate_column_name = candidate_column_name.gsub(/[^a-z0-9]/,'_').gsub(/_{2,}/, '_')

# Valid names start with a letter or an underscore
candidate_column_name = "column_#{candidate_column_name}" unless candidate_column_name[/^[a-z_]{1}/]

# Avoid collisions
count = 1
new_column_name = candidate_column_name
while existing_names.include?(new_column_name) || reserved_words.include?(new_column_name.upcase)
suffix = "_#{count}"
new_column_name = candidate_column_name[0..PG_IDENTIFIER_MAX_LENGTH-suffix.length] + suffix
count += 1
end

new_column_name
def self.get_valid_column_name(table_name, candidate_column_name, column_sanitization_version, options={})
CartoDB::Importer2::Column.get_valid_column_name(candidate_column_name, column_sanitization_version, get_column_names(table_name, options))
end

def self.get_column_names(table_name, options={})
Expand Down Expand Up @@ -1440,7 +1434,7 @@ def create_table_in_database!
sanitized_force_schema = force_schema.split(',').map do |column|
# Convert existing primary key into a unique key
if column =~ /\A\s*\"([^\"]+)\"(.*)\z/
"#{$1.sanitize} #{$2.gsub(/primary\s+key/i,'UNIQUE')}"
"#{CartoDB::Importer2::Column.sanitize_name $1} #{$2.gsub(/primary\s+key/i,'UNIQUE')}"
else
column.gsub(/primary\s+key/i,'UNIQUE')
end
Expand Down
2 changes: 0 additions & 2 deletions config/initializers/carto_db.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ module CartoDB
SURROGATE_NAMESPACE_PUBLIC_PAGES = 'rp'.freeze
SURROGATE_NAMESPACE_VIZJSON = 'rj'.freeze

RESERVED_COLUMN_NAMES = %w(FORMAT CONTROLLER ACTION oid tableoid xmin cmin xmax cmax ctid ogc_fid).freeze

LAST_BLOG_POSTS_FILE_PATH = "#{Rails.root}/public/system/last_blog_posts.html"

# Helper method to encapsulate Rails full URL generation compatible with our subdomainless mode
Expand Down
Loading