SR3 AMQP Data Pump Assistance For Mirroring Directory #1376
-
I have currently set up an AMQP server that is able to get data from dd.weather.gc.ca using SR3. However when trying to mirror the directory of alerts from the website to a mounted directory it does not work. Is there any issue with mirroring currently or is there a different way to approach mirroring to a mounted directory versus locally? Any clarification pertaining to this would be greatly appreciated thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 28 replies
-
Mirroring should be just a subscriber that publishes to your AMQP broker after it downloads. I guess you can do a subscriber first... see if that works, then at posting after download to your config ... perhaps share the configuration you are using... this kind of thing should just work. |
Beta Was this translation helpful? Give feedback.
-
Your topic is not going to work try this one. but alerts are not issued very often.
Then you will get files being copied immediately. You can have both subtopics active |
Beta Was this translation helpful? Give feedback.
-
uh... this answer is too long... but I guess it has the virtue of being thorough... The topics... do exactly match the (real) directory tree: https://dd.weather.gc.ca/20250127/WXO-DD/alerts/ Is the directory where the alerts are written. The first directory (* in the subtopic) is the date. The second sub-topic is WXO-DD is called a source in sarracenia parlance, it is a descriptive name meant to represent where a data set comes from. On internal servers there are dozens of different sources for data, but they get coalesced into one for the public. Other sources may show up in due course. dd.weather's structure now matches exactly how all of our other deployments of sarracenia data pumps are done. dd was the original deployment and used to have a pre-standard tree. This change was announced here: prior to standardization:
with the standard tree, we remove the tree from a given date after a set number of days has passed.
So that's the official new tree. We did however understand that people were used to the old tree (in place since 2004.) so there is a forest of links created to make the new organization look familiar. For example... alerts/ is linked to todays' date/ WXO-DD/alerts this link gets replaced at midnight every day. so that makes it look like alerts/ is the same as it always was... but it doesn't contain data from previous days anymore. I hope that clears things up. Does it make more sense? or did I just overwhelm you with a wall of text... sorry if I did that... |
Beta Was this translation helpful? Give feedback.
-
First problem: It's winter... there isn't much forest fire data available ;-) so this is hard to test. The following would be an example of a poll routine that would produce messages that the rest of the sarracenia components can consume so that it works just like hpfx or a datamart. ~/.config/sr3/poll/hotspots
You might need an entry in ~/.config/sr3/credentials.conf like so:
that would be if the poll spits out some missing or bad credential errors. Note, this has a callback routine... callback poll.hotspots_html_page in it to reformat the html listing from the site.. You would need to put the plugin code in ~/.config/sr3/plugins/poll/hotspots_html_page.py #!/usr/bin/env python3
"""
Description:
This is an html page plugin used for NRCAN data.
Ported to sr3 by ANL 2022/12/19
Usage:
callback poll.hotspots_html_page
"""
from sarracenia.flowcb.poll import Poll
import html.parser, logging, paramiko
import time
logger = logging.getLogger(__name__)
class Hotspots_html_page(Poll):
def __init__(self, options):
super().__init__(options, logger)
self.last = None
self.myfname = None
self.fdate = None
def handle_starttag(self, tag, attrs):
for attr in attrs:
c,n = attr
if self.last == 'file' and c == 'href':
self.myfname = n
return
if n != '' :
self.last = n
def handle_data(self, data):
data = data.strip()
data = data.strip('\t')
if data == '': return
if self.last == 'date':
self.fdate = data
return
if self.last != 'size': return
if self.myfname == None: return
mysize = data.lower()
mysize = mysize.replace('\t', '')
mysize = mysize.replace(' ', '')
mysize = mysize.replace('b', '')
# date example 2019-06-12 06:06:04
mydate = self.fdate
try:
t = time.strptime(mydate, '%d-%b-%Y %H:%M:%S')
except:
t = time.strptime(mydate, '%Y-%m-%d %H:%M:%S')
mydate = time.strftime('%b %d %H:%M', t)
self.entries[self.myfname] = '-rwxr-xr-x 1 101 10 ' + mysize + ' ' + mydate + ' ' + self.myfname
logger.debug("(%s) = %s" % (self.myfname, self.entries[self.myfname]))
self.last = None
self.myfname = None so with that you will be able to create messages on your local broker, and subscribe to them so that the download happens. |
Beta Was this translation helpful? Give feedback.
-
Hi @petersilva I have noticed that when I run the following configuration file that my directory is mirroring as expected however the actual file itself does not seem to show. So, for example right now I have created an instance of Observer that is watching my mounted directory and it alerts me when a file is received such as T_WWCN12_C_CWNT_202501311750_3167450066.cap. But when I go into my directory it is not actually present. Any suggestions on how to resolve this? |
Beta Was this translation helpful? Give feedback.
-
Also, I was wondering if there is any newsletter or anything of that sort to be subscribed and notified of any updates or changes? |
Beta Was this translation helpful? Give feedback.
-
with mirror on ... it will reproduce the entire directory tree from the source onto the destination. with it off, it places the file in the specified direcory |
Beta Was this translation helpful? Give feedback.
-
Let us set aside uploading to S3 for now... You just need one subscriber to download ... so you would have a file called something like ~/.config/sr3/subscribe/alert_download.conf
The sr3 package includes a bunch of example configurations you can see with the sr3 list ie command, sr3 add subscribe/ddc_cap-xml.conf
sr3 edit subscribe/ddc_cap-xml # change the directory to match what you want.
sr3 declare
mkdir ~/MSCDatamartClone2/
sr3 start subscribe/ddc_cap-xml
In the sr3 edit command, I just changed the directory setting (for me it is ${HOME}/MSCDatamartClone2/alerts )
with mirror on, it got an alert, and ended up here:
The mirror created the entire tree directories in place on the source tree... if you just want to put the so anyways... This is to download to your local debian vm. Please confirm this works for you and |
Beta Was this translation helpful? Give feedback.
-
so now you are downloading ok... Are you ok to publish to a local broker, and uplink to S3? |
Beta Was this translation helpful? Give feedback.
First problem: It's winter... there isn't much forest fire data available ;-) so this is hard to test.
but fwiw:
The following would be an example of a poll routine that would produce messages that the rest of the sarracenia components can consume so that it works just like hpfx or a datamart.
~/.config/sr3/poll/hotspots