Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitelist of Wikipedia linked terms #1317

Closed
acrymble opened this issue May 27, 2019 · 4 comments
Closed

Whitelist of Wikipedia linked terms #1317

acrymble opened this issue May 27, 2019 · 4 comments
Assignees

Comments

@acrymble
Copy link

related to #1209 the building of an English language stylesheet. I have put together a list of all links to Wikipedia in lessons. About half of lessons contain Wikipedia links, to what I'd generally call "technical" or "mathematical" terms. In order to ensure that our authors more consistently link these terms to Wikipedia, I think it would be worth trying to articulate which terms should be linked, and possibly finding an automated way to scan incoming lessons for these terms during the editorial process (thereby reducing editor effort required).

The questions are:

  1. is a simple whitelist (a set of terms that we check for) going to be good enough for our purposes here, while encouraging authors to also use their judgment for adding in links on terms not yet on our whitelist
  2. Is there a clear way of describing what these types of words are, so that it's less ambiguous when the wikipedia link is needed?

Here is the full list (I tried to remove links to proper names):

  • Accession_number_(library_science)
  • Advanced_Audio_Coding
  • ALA-LC_romanization_for_Russian
  • Algorithm
  • Anomaly_detection
  • Application_programming_interface
  • Approximate_string_matching
  • Array_data_structure#Two-dimensional_arrays
  • Array_data_type
  • Ascii
  • Attribute_(computing)
  • Attribute%E2%80%93value_pair
  • Authority_control
  • Bipartite_graph#Examples
  • Call_stack
  • CamelCase
  • Carriage_return
  • Cartesian_coordinate_system
  • Cascading_Style_Sheets
  • Categorical_variable
  • Chi-squared_test
  • Choropleth_map
  • Close_reading
  • Codec
  • Comma-separated_values
  • Comparison_of_source_code_hosting_facilities#Popularity
  • Comparison_of_version_control_software
  • Concatenation
  • Conceptual_graph
  • Controlled_vocabulary
  • Cross-platform
  • Cyrillic_script
  • Cyrillic_script_in_Unicode
  • Data_dictionary
  • Data_model
  • Data_profiling
  • Data_set
  • Data_structure
  • Delimiter
  • Determinism
  • Diacritic
  • Digital_container_format
  • Directed_graph#Indegree_and_outdegree
  • Directory_(computing)
  • Discourse_analysis
  • Document_Object_Model
  • Domain_name
  • Domain_Name_System
  • Dublin_Core
  • Emoji
  • Epistemology
  • Ethnography
  • Exploratory_data_analysis
  • Exponential_function
  • Expression_(computer_science)
  • Faceted_search
  • FOAF_(ontology)
  • For_loop
  • Force-directed_graph_drawing
  • Free_will
  • Function_word
  • Gazetteer
  • Geocoding
  • Geographic_information_system
  • Georeference
  • Gibbs_sampling
  • H.264/MPEG-4_AVC
  • HTML
  • Hue
  • Hypertext_Transfer_Protocol
  • IDLE_%28Python%29
  • Information_silo
  • ISO_8601
  • Join_(SQL)
  • Latent_Dirichlet_allocation
  • Latin_script
  • Leading_zero
  • Lemmatisation
  • Levenshtein_distance
  • Lexical_analysis#Tokenization
  • Library_(computing)
  • Library_of_Congress_Control_Number
  • Line_fitting
  • Linked_data
  • Linked_data#Linked_open_data
  • Linnaean_taxonomy
  • List_of_probability_distributions
  • Machine_learning
  • Markdown
  • Mathematical_object
  • Matrix_(mathematics)
  • Mean
  • Median
  • Metadata
  • Multivariate_statistics
  • N-gram
  • Naive_Bayes_classifier
  • Named-entity_recognition
  • Namespace
  • Natural_language_processing
  • Natural_logarithm
  • Natural-language_processing
  • Negative_binomial_distribution
  • Normalization_(statistics)
  • Object-oriented_programming
  • Ogg
  • Ontology_(information_science)
  • Optical_character_recognition
  • Order_of_operations
  • Ordnance_Survey_National_Grid
  • Orthophoto
  • P-value
  • Parameter
  • Part-of-speech_tagging
  • Pearson_correlation_coefficient
  • Pivot_table
  • Plain_text
  • Poisson_distribution
  • Prelinger_Archives
  • Prince_Royalty,_Prince_Edward_Island
  • PRINCE2
  • Probability_distribution
  • Probability_theory
  • Proprietary_software
  • Python_(programming_language)
  • Quartile
  • RDF/XML
  • Reference_(computer_science)
  • Regression_analysis
  • Regular_expression
  • Regular_language
  • Relational_database
  • Resource_Description_Framework
  • Rubbersheeting
  • Ruby_%28programming_language%29
  • Sample_(statistics)
  • Scatter_plot
  • Semantic_reasoner
  • Semantic_triple
  • Semantic_Web
  • Sentiment_analysis
  • Serialization
  • Shapefile
  • Simple_Knowledge_Organization_System
  • Simple_linear_regression
  • Slope
  • Social_network
  • SPARQL
  • Sparse_matrix
  • Spatial_reference_system
  • SQL
  • Square_root
  • Standard_deviation
  • Standard_score
  • Statistical_classification
  • Statistical_model
  • Statistical_significance
  • Stop_words
  • String_(computer_science)
  • Stylometry
  • Subroutine
  • Supervised_learning
  • Syntax
  • Syntax_error
  • Tab-separated_values
  • Table_(database)
  • Tagged_Image_File_Format
  • Taxonomy
  • Technicolor
  • Terminal_(macOS)
  • Terminal_%28OS_X%29
  • Text_corpus
  • Theora
  • Topology
  • Tree_(data_structure#Terminology)
  • Tree_structure
  • Turtle_(syntax)
  • Unicode
  • Uniform_Resource_Identifier
  • Uniform_Resource_Locator
  • Unix
  • Unix_shell
  • Upton_Sinclair
  • UTF-8
  • Variable_(computer_science)
  • Vectorscope#Video
  • Vim_%28text_editor%29
  • Vorbis
  • Wide_and_narrow_data
  • Windows_PowerShell
  • Word_processor
  • Working_directory
  • World_file
  • WYSIWYG
  • XML
  • Xpath
  • XSL
  • XSLT_elements
  • Y-intercept
  • YAML
@acrymble acrymble self-assigned this May 27, 2019
@arojascastro
Copy link
Contributor

Maybe we could create a glossary with a standard definition where all lessons point to. This glossary could contain:

  • definition in several languages
  • links to Wikipedia or other resources
  • suggested translation terms into Spanish and French

@acrymble
Copy link
Author

@arojascastro that sounds like a lot of work.

@acrymble
Copy link
Author

@mdlincoln would it be technically possible to scan lesson requests with travis to look for a list of terms to check if they are linked to Wikipedia? Or is that going to be a nightmare?

@mdlincoln
Copy link
Contributor

At 2019-06-26 meeting, decided that the more general technical writing guidelines will advise linking technical terms to wikipedia, so a separate "whitelist" will be redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants