This is a user and developer guide intended for Cartogratree. For any questions, email me (jjzieve AT ucdavis DOT edu) or treegenes (tg-help AT ucdavis DOT edu)
- Overview
- Backend/Database
- Core frontend libraries
- Styling
- Models & Collections
- Views
- Router
- Code & design caveats
- TODO
CTree is written in the Backbone framework which is a frontend (Javascript) MVC. Do a tutorial or two to familiarize yourself before your dive into the code. Here's the ones I relied on most:
- Code school: Anatomy of Backbone.js <- the basics
- Backbone tutorials <- Organizational tips and using require.js for AMD
- google and stackoverflow! (by far my best resources)
Tips: If you're an experienced web developer, just ignore me because you probably have better tools than me. But if you aren't, here are some tools that helped me along the way:
- RESTClient, Cool plugin Damian showed me for testing REST requests in your browser (like curl on command line)
- Firebug, best development tool (imho) for my favorite browser
- Empty Cache, useful for those times when "you don't know why your styles haven't been updated"
I decided to write ctree in a framework to attempt to adhere to coding best practices (e.g. DRY), how well I did that, is debatable... haha This project was a learning experience for me, so forgive me for my callback hell and spaghetti code :) Have Fun!
####Backend/Database We essentially have two backends, Google's and our own (i.e. treegenes). We have 4 fusion tables (we only get max 5 for free!) that effectively mirror what we have in our db but provide the fast rendering on google maps.
#####Treegenes Postgres tables worth mentioning:
- inv_* -> source of genotypes, phenotypes, and the original data for the "is" part of the fusion table "sts_is"
- sample_treesamples -> source of genotypes, phenotypes, and the original data for the "sts" part of the fusion table "sts_is"
- tgdr_* -> source of genotypes, phenotyes, and the original data for the "tgdr" fusion table
- ctree_fusion_table_mv -> source of the "Taxa" part of the sidebar selection tree, essentially a query of the above tables
- tgdr_data_availability_mv -> source of the "Published studies" part of the sidebar selection tree and the tgdr page, also a query of the above tables
The "mv" was my convention to denote materialized views. However, these are "snapshot" materialized views, if we have postgres 9.3+ installed, then they can be made more dynamic, or you can write some trigger functions, up to you. But for now, run update_scripts/create_ctree_fusion_table_mv.php and update_scripts/create_tgdr_data_availability_mv.php everytime there is an update to the tgdr_* tables. You will need the "utils.php" script to run them, it has sensitive info so email me for this.
#####Fusion tables Existing:
To update a fusion table, for example, tgdr:
- run update_scripts/genTSVtgdr.php > tgdr.tsv
- Login to the google account "treegenesdb" and go to drive (email me for the password).
- Upload tgdr.tsv as a fusion table
- Change icon style to reflect the "icon_name" column
- Make sure anyone with the link has access to the table (i.e. Cartogratree!)
For the more data-intensive queries such as viewing the genotypes, we query treegenes. See GetCommon*.php scripts
Tips:
- sts_is fusion table == inv_* + samples_treesamples tables, look at the queries in the php scripts to clarify
- sswap_agent is the db role calling all the queries to treegenes. So, if there is a wierd issue where you get a 200K and no data, check the permissions on this guy or check the apache logs
- jQuery, basically the backbone of Backbone.js, used extensively in the DOM manipulation and event binding
- Bootstrap v3.1.1, 90% of the widgets and styles are directly from pulled from these libraries, to give everything a "Web 2.0"ish vibe, and save significant time
- jquery-treetable, for the map display selection tree
- select2, for the map display tree id search
- After much debate on what js table API to use (dataTables,tablecloth.js, etc.) we chose Slickgrid for better or worse (mostly because of it's "out of the box" lazy loading capabilities)
I highly recommend re-doing all my terrible css with a pre-processor such as LESS or SASS. This was the most hacky part of the project, I'm truly sorry (I will hopefully never touch css again). If I left any inline styling in index.php, or "! important" tags in the main stylesheet, please email me angry threats. Alas, /css/style.css is the main stylesheet, the libraries also have corresponding sylesheets (e.g. slickgrid -> slick.grid.css + example-bootstrap.css)
- js/models/tree_node.js, handles the data behind the selection tree on the left "Map display" panel. Admittedly, it was a way for me to practice template rendering with models
- js/(models|collections)/(query|queries).js, the core data element for selection.
Example: A user selects a "Pinus taeda" to be displayed on the map, a model with a unique id along with the column parameter (in this case, "species"), and its value (in this case, "Pinus taeda")are added to the queries collection, the queries collection constructs a meta variable for the sts_is fusion table (in this case, "species in ('Pinus taeda')"). If a user ctrl+clicks "Picea glauca" the same thing happens except the meta gets updated to "species in ('Pinus taeda','Picea glauca')". I highly recommend playing with the console and logging the queries collection as you click on different parts of the selection tree to see what I mean. After a user selects some trees with the rectangle select, the _meta queries and the rectangle coordinates are sent to google in the urls (QueryFusionTables.php) and are used to populate the analysis sample table.
- js/(models|collections)/tree_id(s).js, the main data shared within the grids in the analysis pane. It makes sense that its the tree ids because they basically act as the joining "key" across genotypes, phenotypes, etc. Its also used as a sub_collection for the sample_grid because that grid relies on the map's queries and the other grid's tree ids. <- This lead to a lot of race conditions, an unfortunate consequence of all the asynchronicity.
- js/views/navbar.js, displays the nav-pills at the top of the page
- js/views/sidebar_tree_id_search.js, uses the select2 library so a user can search for specific tree_ids, it also queries the fusion tables
- js/views/sidebar_selection_tree.js, pulls from the tree_node model but handles most of the logic for the queries collection.
- js/views/sidebar_filters.js, handles the rest of the logic to the queries collection. It also displays the numbers showing users how many samples for each filter category appear on the map (very buggy). The implementation is a bit convoluted because we couldn't decide whether we wanted a set addition or subtraction when selecting multiple filters. See online shopping websites for an example.
- js/views/map.js, using the query collection, it queries google for the map rendering. It also handles the configuring of the map and the heatmap data, this is by far the largest file.
- js/views/bottom_tabs.js, a controller per se for all the grids, when they should be destroyed,created,deleted from, inserted into, etc. based on the run_tools and view buttons
- js/views/*grid.js, slickgrids with corresponding data. My solution for how to handle destruction and creation for these through the tree_nodes_meta attributes is very hacky. Needs a lot of work!!
- js/views/grid_mixin.js, added this to take out a lot of the duplicate functionalities of the grid, may have introduced some errors but at least its DRY! I highly recommend the use of mixins, and I'm sure its relevant elsewhere in the app.
Hopefully this drawing can visually explain whats going on. Essentially, the models (circles) share the data with the views (rectangles) that they overlap. The arrows indicate directionality of data (e.g. the selection_tree can update the map, but not vice versa) and everything is roughly laid out how it is on the actual page.
####Router A.k.a the "controller" in other MVC-like frameworks was under-utilized by myself in this single page app. The file (js/router.js) handles the models, collection, and view creation (order matters!). It also handles taking tree_ids from the url see TODO. If it was utilized to its full potential we could save states or handle user uploads by REST. Though, talking to the db more seamlessly, using a framework of some type (e.g laravel) should be required for this level of integration.
- What the hell is
var that = this;
anyway? I use it alot in my ajax calls, here's a good reference. Related topics are closures and callbacks
- I made a _meta variable for the queries collection to hold the dynamic query strings that would be sent off to google. In object-orientated-speak you can think of this like a static class variable shared across the "query" objects/models.
- Often "snp", "genotype", "geno" and similarily for "pheno",etc., are used interchangeably in variable names, sorry about that, the same goes for bottom_ and data_
- I didn't atomize the model <-> view relationships as much as I should have in hindsight, but it seemed overkill for me to add a view for every element in the DOM. After all, it wasn't a Java project haha
- Allow map display to reflect URI. This is a significant problem because google won't allow a GET parameter to go beyond a certain number of chars and this is how the map is currently being filtered down (see models and collections)
- Merge backend scripts and general code refactoring. Example, GetCommonSNP.php and GetGenoData.php, effectively run the same query; they just return different things. Again, a light-weight php framework would be relevant here.
- Allow filtering in analysis tables. Because the analysis tables are linked, this will allow a user to subset their data based on knowledge of metadata (e.g. only analyze the samples with a certain genotype). See the original cartogratree and how filtering works for the amplicon table. Also relevant is how to apply filtering in slickgrids
- Allow phenotype search in the map display. This would go under the tree id search and allow users to only show markers with certain phenotypes. Ontology may be necessary here, along with cleaning up some data in the backend.
- Integrate soil data. Ameriflux is too sparse a resource to really be utilized. If we could somehow mirror what was done with the worldclim data using the same source as the soil survey ArcGIS layer this could be invaluable. I also never fully integrated ameriflux with the analysis tables, this would be a start refer to "ecp_worlclim" table for more info. Also, for trydb data, as metric values I just used ecp_trydb.obsdataid, not sure if this was supposed to link out to the actual values or what??...
- Include other genotype marker types For instance, right now our genotype grid sort of assumes the data is SNPs but the majority of our data is actually SSRs.
- TEST!!!! I'm sure there are countless bugs (see last bullet). I only did "integration" testing, but unit testing might be in order?(frontend: qunit.js, backend: phpunit). Also, cross-browser support is important as many users will likely be using older versions of IE (which isn't supported at all yet), I recommend just a basic user-agent plugin or, if you're very thorough, using Vagrant to virtualize some older Microsoft + IE environments.
- Reduce number of AJAX calls, especially for the filtering map counts, slows down app quite a bit (i.e. look at network tab in firebug). Also, adding error checking to these calls is necessary, I only added an "error" callback to a couple of the grid views.
- Update main.js dependencies periodically. The require.js shim config captures most of these dependecies (most libraries need jquery) but I didn't get them all, so if you reload the page and everything is broken... firebug will tell you what dependency broke(e.g. "jQuery is undefined in slickgrid.js", so add 'slick_grid':{ deps: ['jquery'] })
- Consider "unsyncing" the grids, this functionality makes the various states quite convuluted...
- Cron everything in the update_scripts directory, so no manual refreshes are required for new TGDR submissions
- Factor out db connection scripts to utils.php
- Known bugs:
- Exiting a sub-grid/tab sometimes deletes the main sample grid, effectively breaking the app
- Counts for filter map display are sometimes inaccurate