Sample And Index

Sample and Index against layers

Introduction

In this page we document how to:

Sample occurrences against layers in biocache (add sampling data to Cassandra).
Index biocache sampling data in a new biocache SOLR core (to be swapped-in later).

Steps

We'll sample and raster data against some added layers.

Check the configuration for biocache-store is pointing to sampling URL:
1. $ /data/biocache/config
2. $ more biocache-config.properties
Look for spatial.layers.url=http://spatial.l-a.site/ws/fields
1. You can do this by connecting with ssh to the livingatlas-demo server you are using and issue command biocache config | grep -e ".*layers.*url"
Load a DwCA into the collectory. If you are just testing, please choose a small dataset (<50k records) just for speed, preferably Mammals (as this affects a later step of the documentation with taxonomy).
1. For IPT users:
  1. Start here: http://collections.l-a.site/admin/
  2. Create a data provider and point at IPT instance by setting website URL to IPT URL e.g. https://ipt.gbif.es
  3. Click “Update data resources” button
  4. Note: check the unique fields. Typical values are catalogNumber or occurrenceID. The default is catalogNumber
  5. Find a UID e.g. ´dr123´ to load
2. For Non-IPT users:
  1. Start here: http://collections.l-a.site/admin/
  2. Create a data resource
  3. Upload your DwCA
  4. Note: check the unique fields. Typical values are catalogNumber or occurrenceID. The default is catalogNumber
Load a DwCA into the biocache using command line tool
1. Use the command biocache load dr123
2. Validate the data has been loaded using Cassandra command line tool. Use the tool cqlsh on the command line.
  1. Connect to occ keyspace using use occ;,
  2. Run select * from occ;.
Process the data resource - Use the command biocache process -dr dr123
Sampling - Use the command biocache sample -dr dr123
Indexing - Use the command biocache index -dr dr123
Test the indexing was successful by:
1. Viewing the SOLR admin console: http://index.l-a.site:8983 See solr admin interface to tips to access this.
2. View the results in biocache services
  1. http://biocache.l-a.site/occurrences/search?q=:
3. Test with an Area Report in the Spatial Portal
  1. Search for Gazetteer Polygon e.g. “Queensland”
  2. Tools > Area Report - and follow wizard Successful sampling/indexing depends upon specific details being properly applied in sequence. It is easy to miss or bungle a step, and it can be hard to tell which one caused a problem. To avoid those headaches, testing the outcome of each step can help to troubleshoot. Also, some background information about system configuration provides an overview of the process, and will hopefully help to debug issues as they arise. To that end, a synopsis:

Troubleshooting

Sampling takes each occurrence having geospatial data (lat, lng) in biocache (Cassandra) and references it against each properly-configured layer. The outcome of successful sampling for a single occurrence is a dependent value in biocache (Cassandra) for the column cl_p in the table occ. The value of cl_p will look something like this (from a Cassandra query):

cl_p | {"cl100001":"England", "cl100002":"North Yorkshire", "cl100003":"Beast Cliff", "cl100004":"OV 0000"}

where json keys like "cl10001" are layers' field IDs which you identified when you configured layers in the spatial portal. View field IDs directly with eg. https://spatial.l-a.site/ws/manageLayers/field/cl10001
Indexing (of biocache) does more than just the indexing of sampled layers. But for the purposes of this discussion of sampled layers, indexing duplicates biocache (Cassandra) sampling data in a biocache (Solr) index. The outcome of successful indexing (of biocache) for sampled layers will look something like this (from a Solr query):

..., "cl10001":"England", "cl10001":"North Yorkshire", "cl10003":"Beast Cliff", "cl10004":"OV 0000", ...,

End

Index

Wiki home
Community
Getting Started
Support
- LA Netiquette. Asking the smart way
- Troubleshooting
Portals in production
ALA modules
Demonstration portal
- Requirements
- Installation of ala-demo
Data management in ALA Architecture
DataHub
- Data Hub
Customization
Internationalization (i18n)
Administration system
Contribution to main project
- Good practices
Study case
- Setting up Atlas of Living Scotland

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample And Index

Sample and Index against layers

Introduction

Steps

Troubleshooting

End

Clone this wiki locally