-
Notifications
You must be signed in to change notification settings - Fork 1
/
chapter4.tex
571 lines (463 loc) · 57.8 KB
/
chapter4.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
\chapter{Context Discovery Framework}
In this chapter, we shall look at the CueNet framework and its components: a \textit{data integration module} to model and query the various data sources and sensors, a \textit{discovery algorithm} to construct queries agnostic to what the sources are themselves, a \textit{knowledge representation module} to store relationships about the various real world objects, and finally mechanisms to interact with a \textit{face verification algorithm}, which computes the confidence score of a person being present in a photo or not.
\section{Pruning Search Spaces with CueNet}
Automatic media annotation algorithms essentially assign one or more labels from a search space to a given input image. Figure \ref{fig:with-without-cuenet} shows the various approaches of constructing search spaces for such algorithms. The traditional approach is shown in \ref{fig:with-without-cuenet}(a). These spaces were limited to a set of labels chosen by an expert, with no way of pruning the search space in case it got very large.
The focus was instead on extracting the best features from images, to obtain high overall classification accuracy\cite{turk1991eigenfaces}.
\begin{figure}[t]
\centering
\includegraphics[width=0.95\textwidth]{media/with-without-cuenet-2.png}
\caption{The different approaches in search space construction for a multimedia annotation problem. A traditional classifier setup is shown in (a) where the search space candidates are manually specified. Context is used to generate large static search spaces in (b). The desired framework is shown in (c), which aims to prune search spaces and rank highly, the correct candidates.}
\label{fig:with-without-cuenet}
\end{figure}
With the popularity of global social networks and proliferation of mobile phones, information about people, their social connections and day-to-day activities becoming available at a very large scale. The web provides an open platform for documenting many real world events like conferences, weather events and sports games. With such context sources, the search space construction is being delegated to one or a few sources \cite{henter2012tag, li2012fusing, naaman2005identity, o2009context, stone2008autotagging} (figure \ref{fig:with-without-cuenet}(b)). These approaches rely on a single \textit{type} of context. For example, time and location information or social network information from Facebook to help recognize faces in personal photos. We refer to such a direct dependency between the search space and a data source as \textbf{static linking}. Although these systems are meritorious in their own right, they suffer from the following drawbacks: they do not employ multiple sources, and therefore the \textbf{relations} between them. By realizing that these sources are interconnected in their own way, we are able to treat the entire source topology as a network. Our intuition in this work is to navigate this network to progressively discover the search space for a given media annotation problem. Figure \ref{fig:with-without-cuenet}(c) shows how context discovery can provide substantially smaller search spaces for a set of images, which contain a large number of correct tags. A small search space with large number of true positives provides the ideal ground for a classification algorithm to exhibit superior performance.
\textbf{The CueNet framework} provides access to multiple heterogeneous autonomous data sources containing event, social, and geographical information through a unified query interface to extract information from them. CueNet encapsulates our \textbf{Context Discovery Algorithm}, which utilizes the query interface to discover the most relevant search space for a media annotation problem. To ensure a hands-on discussion, we show the use of context discovery in a real world application: face tagging in personal photos. As a case study, we will attempt to tag photos taken at conference, trip and party events by different users. These photos could contain friends, colleagues, speakers giving very interesting talks, or newly found acquaintances (who are not yet connected to the user through any social network). This makes the conference photos particularly interesting because no single source can provide all the necessary information. At the same time, by studying the efficacy in trips and parties, we will ensure that discovery can be done across different kinds of events. It emphasizes the need to utilize multiple sources in a meaningful way.
\begin{figure}[t]
\centering
\includegraphics[width=0.65\textwidth]{media/prog-discovery.png}
\caption{Navigation of a discovery algorithm between various data sources.}
\label{fig:prog-discovery}
\end{figure}
Here is an \textbf{example} to illustrate CueNet's discovery process. Let's suppose that Joe takes a photo with a camera that records time and GPS in the photo's EXIF header. Additionally, Joe has two friends. One with whom he interacts on Google+, and the other using Facebook. The framework checks if either of them have any interesting event information pertaining to this time and location. We find that the friend on Google+ left a calendar entry describing an event (a title, time interval and name of the place). The entry also marks Joe as a participant. In order to determine the category of the place, the framework uses Yelp.com with the name and GPS location to find whether it is a restaurant, sports stadium or an apartment complex. If the location of the event was a sports stadium, it navigates to upcoming.com to check what event was occurring here at this time. If a football game or a music concert was taking place at the stadium, we look at Facebook to see if the friend ``Likes" the sports team or music band. By traversing the different data sources in this fashion, the number of people, who could potentially appear in Joe's photograph, was incrementally built up, rather than simply reverting to everyone on his social network or people who could be in the area where the photograph was taken. We refer to such navigation between different data sources to identify relevant contextual information as \textbf{progressive discovery}. The salient feature of CueNet is to be able to progressively discover events, and their associated properties, from the different data sources and relate them to the photo capture event. We argue that given this structure and relations between the various events, CueNet can make assertions about the presence of a person in the photograph. Once candidates have been identified by CueNet, they are passed to the face tagging algorithm (as in \cite{facever_pami2010}), which can perform very well as their search space is limited to two candidates.
Figure \ref{fig:cuenet-arch} shows the different components of the CueNet framework. The Ontological \textbf{Event Models} specify various event and entity classes, and the different relations between them. These declared types provide a vocabulary to declare the relations stored in various \textbf{Data Sources}. This module also provides a unified query interface, which is responsbile for converting local queries to their native form which can be executed on individual data sources. The \textbf{Person Verification Tools} consist of a database of people, their profile information and photos containing these people. When this module is presented with a candidate and the input photograph, it compares the features extracted from the candidate's photos and the input photo to find the confidence threshold. In this section, we describe each module in greater detail, and how the context discovery algorithm utilizes them to accomplish its task.
\begin{figure}[t]
\centering
\includegraphics[width=0.9\textwidth]{media/cuenet-high-level-arch.png}
\caption{The conceptual architecture of CueNet.}
\label{fig:cuenet-arch}
\end{figure}
\section{The Tagging Problem}
Consider a photograph, $H$, with a set of $n$ faces $F$ = $\{f_1, f_2 ... f_n\}$, and a set of candidates, $P$ = $\{p_1, p_2 ... p_m\}$. The traditional face tagging problem is described as a matching problem between the two sets $P$ and $F$, where each face $f_i$ is matched with some candidate person $p_j$, such that the tagging confidence $c_{ij}$ = $C (f_i \rightarrow p_j)$ is maximum, where $0 < c_{ij} < 1$.
The current face verification technology \cite{nk_attribute_classifiers} answers the following question: do two given images contain the same person's face? This involves comparison with each person in the database to check who the highest ranking candidate is. In a typical situation involving thousand photos and thousand candidates, the entire dataset can be tagged in 14 hours, if a single verification operation takes 50 milliseconds. This is a very large amount of time, especially, when billions of personal photos are being uploaded to social networking sites on a weekly basis. In our work, we prune the search space to avoid such a large number of verifications. But that would not be possible if the final tag has to have \textit{maximum} confidence among all other candidates (which implies that the confidence value for all candidates must be computed)
In order to relax this constraint of the original problem, we introduce a new threshold parameter $d$, where $0 < d < 1$. We associate a candidate with a face if its tagging confidence, $c$ is greater than the threshold parameter. The problem addressed in this dissertation is to prune the search space of the face tagging problem for a given image such that at least one candidate is matched to a face such that the tagging confidence is greater than a given constant threshold, $d$, such that the total number of verification is much fewer than $m$, the number of candidates.
The more general version of this problem is the \textbf{First-k Discoveries} problem, where at most $k$ candidates can be associated for a given face, each of whose tagging confidence is greater than $d$. Thus, the first-k discoveries problem can be stated as:
\textbf{First-k Discoveries Problem}: Given a photograph, $H$, with a set of $n$ faces $F$ = $\{f_1, f_2 ... f_n\}$, and a set of candidates, $P$ = $\{p_1, p_2... p_m\}$, for each face $f_i$, find a set of candidates $T_i$, where $T_i \subset P$ and $|T_i| \leq k$, and performing less than $m$ verifications.
\section{General Approach}
Figure \ref{fig:cuenet-arch} shows a high level architecture of CueNet. The data integration module of CueNet provides a uniform query interface to a multitude of autonomous data sources, which may reside within an a personal device (such as mobile phone or the PC) or on the World Wide Web \cite{halevy2001answering}. The event model describes the various real world events that are relevant to the problem, and their relations and constraints with other entities. Any relevant axioms are specified in the model. We will describe axioms for the face tagging problem in section \ref{section:conditions-for-discovery}. For the face tagging problem, CueNet needs to closely work with tagging algorithms to check if a person is present in a photo or not. For this purpose, we shall assume verification semantics in such a tagging algorithm, where given an input photo and a candidate person, the algorithm returns true or false (possibly, with a confidence score). Face recognition models would have to be retrained when the candidate set changes, which is not necessary in verification algorithms. Also, as described in chapter 3, the state-of-the-art techniques for face verification perform much more reliably than their recognition counterparts. Alternatively, an web based service such as the former face.com could be utilized.
At the heart of CueNet, lies the context discovery algorithm. Given a photo the algorithm constructs a context network with all the known information. Using the knowledge base, the algorithm constructs queries to be executed on the interface provided by the data integration layer. A query is constructed in the following way: the current structure of the network is examined to list all nodes. For each node, we check the ontology to see what ontological relations this node usually shares with other types of objects. For example, if the node is a Person, the ontology will respond by saying that Persons \texttt{participate-in} events. Using this node and possible relation information, algorithm creates a query to find all entities which can satisfy these relations. These queries are executed on different sources by the data integration module. Contextual information returned from the data sources are merged into existing context network. New entities in the network are passed to the face tagging algorithm to check for their presence in the photo. If they are present, the context network is updated to reflect this information. The query and merge operations are iteratively performed until all the faces are tagged, or the data integration module is unable to furnish any new context. In the later sections, we will see a more detailed specification of these algorithms.
\section{Execution Trace}
In this section, we will trace the execution on two different photos, to see how the different modules interact to produce context networks, and how they are used to tag faces. The first example will be a relatively simpler one, requiring only 2 data sources, whereas the second photo will require multiple sources to successfully tag all photos.
\subsection{Simple Case}
Consider the photo shown in figure \ref{fig:stacktrace-simple-torsten-hidden}. For the purposes of this trace, we assume that we have access to the sources shown in figure \ref{fig:stacktrace-simple-sources} through the data integration module. Given, an input photo, the knowledge base is queried to find what other objects can be associated with a photo object. The KB contains the information that every photo consists of an EXIF header, which stores timestamp and location coordinates and a fact which states that every photo is owned by a user object, where \textbf{owner-of} is a relationship described in the KB. This knowledge is used to construct the context network shown in figure \ref{fig:exif-network}.
\begin{figure}[ht]
\begin{minipage}[b]{0.45\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter4/stacktrace/torsten-hidden.png}
\caption{Input photo.}
\label{fig:stacktrace-simple-torsten-hidden}
\end{minipage}
\hspace{0.5cm}
\begin{minipage}[b]{0.45\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter4/stacktrace/sources.png}
\caption{Available data sources.}
\label{fig:stacktrace-simple-sources}
\end{minipage}
\end{figure}
\begin{figure}[h]
\centering
\includegraphics[width=0.75\textwidth]{media/chapter4/stacktrace/init-network.png}
\caption{Context network built after the user information and EXIF sources are queried and merged.}
\label{fig:exif-network}
\end{figure}
Now, the algorithm traverses the graph to list all the possible queries it can execute on the data integration layer. Given the knowledge that entities participate in events, and events can contain participants, it generates the following queries:
\begin{itemize}
\item Does any data source contain participant information related to the photo capture event?
\item At time t, and location l, which events contain the owner (entity:ArjunSatish) as a participant?
\end{itemize}
The data integration system looks at the different sources and says that none of them store information about photo capture events, and skips executing the first query. But many sources describe events, and store their participants too (Google Calendar, Facebook, Conference Calendar). The query is converted to their native formats (API calls or relational database queries) and their respective results are sent back to the data integration module. We see that there was a calendar entry returned be the Google Calendar source, as shown in \ref{fig:stacktrace-simple-calendar}. This information is now merged with the existing context graph to produce a context graph similar to that shown in \ref{fig:stacktrace-simple-context-network}.
\begin{figure}[h]
\centering
\includegraphics[width=\textwidth]{media/chapter4/stacktrace/calendar.png}
\caption{Calendar Event.}
\label{fig:stacktrace-simple-calendar}
\end{figure}
\begin{figure}[h]
\centering
\includegraphics[width=0.75\textwidth]{media/chapter4/stacktrace/context-network-torsten.png}
\caption{Context network after integrating calendar information.}
\label{fig:stacktrace-simple-context-network}
\end{figure}
Now, we have new entities related to the photo. The face verification algorithm is invoked with the new set of candidates. It must be noted that this verification problem is much easier than trying to verify out of many thousands of candidates. Once the correct entity is identified, the photo is annotated as shown in figure \ref{fig:stacktrace-simple-torsten-tagged}.
One last look at the source diagram in figure \ref{fig:stacktrace-simple-all} shows which data sources revealed interesting information related to this photo. In this case, EXIF provided some relevant context on when and where the photo was taken. The owner's personal calendar provided information on what event was occurring during the time of photo capture, and who else was involved in it.
\begin{figure}[h!]
\begin{minipage}[b]{0.9\linewidth}
\centering
\includegraphics[width=0.9\textwidth]{media/chapter4/stacktrace/torsten-tagged.png}
\caption{Face tagged with the correct person.}
\label{fig:stacktrace-simple-torsten-tagged}
\end{minipage}
\hspace{0.5cm}
\begin{minipage}[b]{0.9\linewidth}
\centering
\includegraphics[width=0.9\textwidth]{media/chapter4/stacktrace/time-space-calendar-selected.png}
\caption{Highlighted Sources Provided Relevant Context.}
\label{fig:stacktrace-simple-all}
\end{minipage}
\end{figure}
% \begin{figure}[h]
% \centering
% \includegraphics[width=0.9\textwidth]{media/chapter4/stacktrace/torsten-tagged.png}
% \caption{The Conceptual Architecture of CueNet.}
% \label{fig:stacktrace-simple-torsten-tagged}
% \end{figure}
% \begin{figure}[h]
% \centering
% \includegraphics[width=0.75\textwidth]{media/chapter4/stacktrace/time-space-calendar-selected.png}
% \caption{Sources which provided relevant context are highlighted by green circles.}
% \label{fig:stacktrace-simple-all}
% \end{figure}
\subsection{Complex Case}
Now, we will consider a more complex case which requires more than just metadata and personal sources for successful tagging. The photo under consideration is shown in \ref{fig:vldb-hidden}. We will use the same set of data sources, shown again in \ref{fig:vldb-sources}.
\begin{figure}[ht]
\begin{minipage}[b]{0.45\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter4/stacktrace/vldb-hide-all.jpg}
\caption{Input photo.}
\label{fig:vldb-hidden}
\end{minipage}
\hspace{0.5cm}
\begin{minipage}[b]{0.45\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter4/stacktrace/sources.png}
\caption{Available data sources.}
\label{fig:vldb-sources}
\end{minipage}
\end{figure}
\begin{figure}[h]
\centering
\includegraphics[width=0.9\textwidth]{media/chapter4/stacktrace/vldb-network-1.png}
\caption{Context network built after user information and EXIF sources are queried and merged.}
\label{fig:vldb-exif-network}
\end{figure}
Using metadata sources and personal information, we arrive at the context network shown in figure \ref{fig:vldb-exif-network}. The procedure until here is exactly same as that for the previous scenario. Now, given the known state of the world, if we invoke the face verification tools, we discover that the owner is actually present in the photo (figure \ref{fig:vldb-network-2}). In this case, the candidate set contains just one entity, and therefore reduces the complexity of the tagging algorithm.
\begin{figure}[h]
\centering
\includegraphics[width=0.9\textwidth]{media/chapter4/stacktrace/vldb-network-2.png}
\caption{Context network built after user information and EXIF sources are queried and merged.}
\label{fig:vldb-network-2}
\end{figure}
The next query generated by the system is to discover what the (entity:ArjunSatish) was doing at this time? But, this time we find that the conference calendar holds the answer.
\begin{figure}[h]
\centering
\includegraphics[width=0.75\textwidth]{media/chapter4/stacktrace/vldb-network-3.png}
\end{figure}
\begin{figure}[h]
\centering
\includegraphics[width=0.9\textwidth]{media/chapter4/stacktrace/vldb-source-1.png}
\caption{Context network after querying conference sources.}
\label{fig:vldb-network-3}
\end{figure}
At this point, the conference event is known to our knowledge base to have a definite structure, in terms of keynote, session and talk events with lunch/coffee breaks interleaved, and having many attendees. So it immediately queries the conference source to find and merge all of these objects. It discovers that the photo was taken during a break event, and that the conference (VLDB 2009) has many hundreds of participants, as shown in figure \ref{fig:vldb-network-4}.
\begin{figure}[h]
\begin{minipage}[b]{0.5\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter4/stacktrace/vldb-network-4.png}
\caption{Context network after discovering conference attendees.}
\label{fig:vldb-network-4}
\end{minipage}
\hspace{0.5cm}
\begin{minipage}[b]{0.5\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter4/stacktrace/vldb-network-5.png}
\caption{Context network after discovering relations between attendees and owner.}
\label{fig:vldb-network-5}
\end{minipage}
\end{figure}
Figure \ref{fig:vldb-network-4} shows the various attendees discovered by the algorithm from the conference source. But finding 3 candidates from hundreds is an equally challenging task. Before invoking the face tagging algorithm, we want to see if we can discover any more relations between the objects in the context network. So the discovery algorithm consults the knowledge base to find that entities can be related through a \textbf{friend-of} relation. So it queries all known sources to find friend relations, and finds that Facebook, Gmail and Twitter are sources which store data containing this relation. The social networking information reveals that a few of the people who were present at the conference were related to the user, and therefore have a bigger chance of appearing in the photo. The face verifier is invoked only with these candidates, for potential true positives. By doing this we tag two more faces in the photo. The context network is shown in the figure \ref{fig:vldb-network-6}.
\begin{figure}[h]
\centering
\includegraphics[width=0.75\textwidth]{media/chapter4/stacktrace/vldb-network-6.png}
\caption{Context network after discovering social relations.}
\label{fig:vldb-network-6}
\end{figure}
\begin{figure}[h]
\centering
\includegraphics[width=0.9\textwidth]{media/chapter4/stacktrace/vldb-source-2.png}
\caption{Sources used so far.}
\label{fig:vldb-network-3}
\end{figure}
Since we have more candidates tagged in the photo, we can repeat the above procedure to discover more relation between the entities related to the photo and those who are present in the conference. This time results are returned from Gmail, and none from Facebook and Twitter (because these people had sent emails to each other during the conference, but did not connect through Facebook or Twitter). The changes in the context network are shown in \ref{fig:vldb-network-7} and \ref{fig:vldb-network-8} (the final face is identified). Figure \ref{fig:vldb-network-3} highlights all sources which returned relevant context for this trace.
\begin{figure}[h!]
\begin{minipage}[b]{0.5\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter4/stacktrace/vldb-network-7.png}
\caption{Context network after discovering further social relations.}
\label{fig:vldb-network-7}
\end{minipage}
\hspace{0.5cm}
\begin{minipage}[b]{0.5\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter4/stacktrace/vldb-network-8.png}
\caption{Context network after tagging all faces.}
\label{fig:vldb-network-8}
\end{minipage}
\end{figure}
\begin{figure}[h!]
\centering
\includegraphics[width=\textwidth]{media/chapter4/stacktrace/vldb-source-3.png}
\caption{Sources used to tag all faces shown in figure \ref{fig:vldb-network-8}.}
\label{fig:vldb-network-3}
\end{figure}
\section{Event Model}
Our ontologies extend the E* model\cite{gupta2011managing} to specify relationships between events and entities. Specifically, we utilize the relationships ``\textbf{subevent-of}", which specifies event containment. An event $e1$ is a subevent-of another event $e2$, if $e1$ occurs completely within the spatiotemporal bounds of $e2$. Additionally, we utilize the relations \textbf{occurs-during} and \textbf{occurs-at}, which specify the space and time properties of an event. Also, another important relation between entities and events is the ``\textbf{participant}" property, which allows us to describe which entity is participating in which event. It must be noted that participants of a subevent are also participants of the parent event. A participation relationship between an event and person instance asserts the presence of the person within the spatiotemporal region of the event. We argue that the reverse is also true, i.e., if a participant $P$ is present in $\mathcal{L}_P$ during the time $\mathcal{T}_P$ and an event $E$ occurs within the spatiotemporal region $<\mathcal{L}_E$, $\mathcal{T}_E>$, we say $P$ is a participant of $E$ if the event's spatiotemporal span contained that of the participant.
\begin{equation}
\label{eq:participation-region}
\begin{aligned}
\text{\texttt{participant}(E, P)} \iff (\mathcal{L}_P \sqsubset_L \mathcal{L}_E) \wedge (\mathcal{T}_P \sqsubset_T \mathcal{T}_E)
\end{aligned}
\end{equation}
The symbols $\sqsubset_L$ and $\sqsubset_T$ indicate spatial and temporal containment respectively. Please refer to \cite{gupta2011managing} for more details. In later sections, we refer to the location and time of the event, $\mathcal{L}_E$ and $\mathcal{T}_E$ as $E$.\textbf{occurs-at} and $E.$\textbf{occurs-during} respectively.
\section{Data Sources}
\label{sec:data-sources}
The ontology makes available a vocabulary of classes and properties. Using this vocabulary, we can now declaratively specify the schema of each source. With these schema descriptions, CueNet can infer what data source can provide what type of data instances. For example, the framework can distinguish between a source which describes conferences and another which is a social network. We use a LISP like syntax to allow developers of the system to specify these declarations. The example below describes a source containing conference information.
\begin{verbatim}
(:source conferences
(:attrs a_url a_name a_time a_location a_title)
(:rel c-conf type-of cuenet:conference)
(:rel d-time type-of dolce:time-interval)
(:rel d-loc type-of dolce:location)
(:rel c-attendee type-of cuenet:person)
(:rel c-attendee participant-in conf)
(:rel c-conf occurs-at loc)
(:rel c-conf occurs-during time)
(:axioms
(:map d-time a_time)
(:map d-loc a_location)
(:map c-conf.title a_ltitle)
(:map c-conf.url a_url)
(:map c-attendee.name a_name)))
\end{verbatim}
% The above source declaration consists of a s-expression, where the source keyword indicates a unique name for the source. The \texttt{attrs} keyword is used to list the attributes of this source. The \texttt{rel} keyword constructs the instances conf, time, loc, attendee which are of conference, time-interval, location and person class types respectively, and relates them with relations specified in the ontology. Finally, the mapping \texttt{axioms} are used to map nodes in the relationship graph constructed above to attributes of the data source. For example, the first axiom (specified using the map keyword) maps the time node to the time attribute.
A source declaration comprises of a single nested s-expression. We will refer to the first symbol in each expression as a keyword, and the following symbols as operands. This above declaration uses five keywords (\texttt{source}, \texttt{attrs}, \texttt{rel}, \texttt{axioms}, \texttt{map}). The \texttt{source} keyword is the root operator, and declares a unique name of the data source. The source mapper can be queried for finding accessors using this name. The \texttt{attrs} keyword is used to list the attributes of this source. Currently we assume a tuple based representation, and each operand in the attrs expression maps to an element in the tuple. The \texttt{rel} keyword allows construction of a relationship graph where the nodes are instances of ontology concepts. And edges are the relationships described by this particular source. In the above example, we construct individuals \textit{c-conf}, \textit{d-time}, \textit{d-loc} and \textit{d-attendee} who are instances of the \textit{conference} (from cuenet namespace in the OWL ontology), \textbf{time-interval}, \textbf{location} and \textbf{person} class (from the Dolce namespace) respectively. We further say that attendee is a \textbf{\textit{participant-of}} the conference, which \textbf{\textit{occurs-at}} location loc and \textbf{\textit{occurs-during}} the interval time. Finally, the \texttt{mapping axioms} are used to map nodes in the relationship graph to attributes of the data source. For example, the first axiom (specified using the map keyword) maps the time node to the time attribute. The third map expression creates a literal called title, and associates it to the conference node, whose value comes from the ltitle attribute of the conference data source.
Formally, we represent the given ontology as $O$. The various classes and properties in $O$ are represented by $C^O$ and $P^O$ respectively. Since our upper ontology consists of DOLCE and E*, we assume the inclusion of the classes \texttt{Endurant}, \texttt{Perdurant}, \texttt{Event} and \texttt{Person} in $C^O$. Each source $S$ consists of three parts, a relation graph $G^S(V^S, E^S)$ where the nodes $V^S \in C^O$, specify the various ``things'' described by the source. The edges $E^S \in P^O$ specify the relations among the nodes. Any graph retrieved from such a source is an instance of the relation graph, $G^S$. Further, the tuple $A^S_T$ consists of the attributes of the data source. Finally, the mapping $M^S: \{G^S \rightarrow A^S_T\}$ specifies how to map different nodes in the relation graph to the different attributes of the native data source.
\section{Conditions for Discovery}
\label{section:conditions-for-discovery}
CueNet is entirely based on reasoning in the event and entity (i.e., person) domain, and the relationships between them. These relationships include participation (event-entity relation), social relations (entity-entity relation) and subevent relation (event-event). For the sake of simplicity, we restrict our discussions to events whose spatiotemporal spans either completely overlap or do not intersect at all. We do not consider events which partially overlap. In order to develop the necessary conditions for context discovery, we consider the following two axioms:
\textbf{Object Existence Axiom}: Objects can be present only at one place during a moment. The entity cannot exist outside a spatiotemporal boundary containing this place.
\textbf{Participation Semantics Axiom}: If an object is participating in two events at the same time, then one is the subevent of the other.
% Before we provide an overview of the discovery algorithm, we must make a note of set of conditions required for its correct execution.
Given, the ontology $O$, we can construct event instance graph $G^I(V^I, E^I)$, whose nodes are instances of classes in $C^O$ and edges are instances of the properties in $P^O$. The context discovery algorithm relies on the notion that given an instance graph, \textit{queries} to the different sources can be automatically constructed. A query is a set of predicates, with one or more unknown variables. For the instance graph $G^I (V^I, E^I)$, we construct a query $Q(D, U)$ where $D$ is a set of predicates, and $U$ is a set of unknown variables.
\textbf{Query Construction Condition:} Given an instance graph $G^I (V^I, E^I)$ and ontology $O(C^O, P^O)$, a query $Q(D, U)$ can be constructed, such that $D$ is a set of predicates which represent a subset of relationships specified in $G^I$. In other words, $D$ is a subgraph induced by $G^I$. $U$ is a class, which has a relationship $r \in P^O$, with a node $n \in D$. Essentially, the ontology must prescribe a relation between some node $n$ through the relationship $r$. In our case, the relation $r$ will be either a \textbf{participant} or \textbf{subevent} relation. If the relationship with the instances does not violate any object property assertions specified in the ontology, we can create the query $Q(D, U)$.
\textbf{Identity Condition:} Given an instance graph $G^I(V^I, E^I)$, and a result graph $G^R(V^R, E^R)$ obtained from querying a source, we can merge two events only if they are identical. Two nodes $v^I_i \in V^I$ and $v^R_r \in V^R$ are identical if they meet the following two conditions \textbf{(i)} Both $v^I_i$ and $v^R_r$ are of the same class type, and \textbf{(ii)} Both $v^I_i$ and $v^R_r$ have exactly overlapping spatiotemporal spans, indicated by the $=_L$ and $=_T$. Mathematically, we write:
\begin{equation}
\label{eq:identity}
\begin{aligned}
v^I_i = v^R_r \iff (v^I_i.\text{\textbf{type-of}} = v^R_r.\text{\textbf{type-of}}) \wedge \\
(v^I_i.\text{\textbf{occurs-at}} =_L v^R_r.\text{\textbf{occurs-at}}) \wedge \\
(v^I_i.\text{\textbf{occurs-during}} =_T v^R_r.\text{\textbf{occurs-during}})
\end{aligned}
\end{equation}
\textbf{Subevent Condition:} Given an instance graph $G^I(V^I, E^I)$, and a result graph $G^R(V^R, E^R)$ obtained from querying a source, we can construct a subevent edge between two nodes $v^I_i \in V^I$ and $v^R_r \in V^R$, if one is spatiotemporally contained within the other, and has at least one common \texttt{Endurant}.
\begin{equation}
\label{eq:sube-st-containment}
\begin{aligned}
v^I_i \sqsubset_L v^R_r,\\
v^I_i \sqsubset_T v^R_r
\end{aligned}
\end{equation}
\begin{equation}
\label{eq:sube-entity-containment}
\begin{aligned}
v^I_i.\text{\textbf{Endurants}} \cap v^R_r.\text{\textbf{Endurants}} \neq \{\phi\}\\
\end{aligned}
\end{equation}
Here $v^I_i.$\textbf{Endurants} is defined as a set $\{w | w \in V^I_i \wedge w.$type-of$ = $Endurant$\}$. If equation \eqref{eq:sube-entity-containment} does not hold, we say that $v^I_i$ and $v^R_r$ co-occur.
\begin{figure*}[h]
\centering
\includegraphics[width=\textwidth]{media/exec/exec-cycle-one-line.png}
\caption{The discover, merge, and prune up stages in an iteration of the discovery algorithm.}
\label{fig:exec-cycle}
\end{figure*}
\textbf{Merging Event Graphs}: With the above conditions, we can now describe an important building block for the context discovery algorithm: the steps needed to merge two event graphs. An example for this is shown in figure \ref{fig:exec-cycle}(b-d). Given the event graph consisting of the photo capture event on the left of (b) and a meeting event $m$ and conference event $c$, containing their respective participants, the goal is to produce a single network, which combines the information from all three graphs. In this example, the meeting event graph, $m$ is semantically equivalent to the original graph. But the conference event, $c$ is telling that the person $AG$ is also participating in a conference at the time the photo was taken. The result of merging is shown in (d). An event graph merge consists of two steps. The first is a \texttt{subevent hierarchy join}, and the second is a \texttt{prune-up} step. Algorithm \ref{alg:merge-alg} presents a detailed algorithm discussion of merging event graphs. Here, we will look at only at it briefly.
Given an original graph, $O_m$, and a new graph $N_m$, the join function works as follows: All nodes in $N_m$ are checked against all nodes in $O_m$ to find identical counterparts. For entities, the identity is verified through an identifier, and for events, equation \eqref{eq:identity} is used. Because of the entity existence and participation semantics axioms, all events which contain a common participant are connected to their respective super event using the subevent relation (equations \eqref{eq:sube-st-containment} and \eqref{eq:sube-entity-containment} must be satisfied by the events). Also, if two events have no common participant, then they can be still be related with the subevent edge, if the event model says it is possible. For example, if in a conference event model, keynotes, lunches and banquets are declared as known subevents of an event. Then every keynote event, or banquet event to be merged into an event graph is made a subevent of the conference event, if the equation \eqref{eq:sube-st-containment} holds between the respective events.
It must be noted that node $AG$ occurs twice in graph (c). In order to correct this, we use the participation semantics axiom. We traverse the final event graph from the leaves to the root events, and remove every person node if it appears in a subevent. This is the \texttt{prune-up} step. Using these formalisms, we now look at the working of the context discovery algorithm.
\section{Context Discovery Algorithm}
\label{sec:discovery-algorithm}
Algorithm \ref{alg:cx-alg} below outlines the discovery algorithm, denoted as the \textbf{discover} function. The function is tail recursive, invoking itself until a termination condition is reached (when at most $k$ tags are obtained for all faces or no new data is obtained from all data sources for all generated queries). The input to the algorithm is a photo (with EXIF tags) and an associated owner (the user). It must be noted that by seeding the graph with owner information, we bias the discovery towards his/her personal information. An event instance graph is created where each photo is modeled as a photo capture event. Each event and object is a node in the instance graph. Each event is associated with time and space attributes. All relationships are edges in this graph. All EXIF tags are literals, related to the photo with data property edges. Figure \ref{fig:exec-cycle} graphically shows the main stages in a single iteration of the algorithm.
The event graph is traversed to produce a queue of event and object nodes, which we shall refer to as DQ (discovery queue). The algorithm consists of two primary functions: \textbf{query} and \textbf{merge}. The behavior of the query function depends on the type of the node. If the node is an event instance, the function consults the ontology to find any known sub-events, and queries data sources to find all these subevents, its properties and participants of the input event node. On the other hand, if it is an entity instance, the function issues a query to find all the events it is participating in.
Results from data source wrappers are returned in the form of event graphs. These event graphs are merged into the original event graph by taking the following steps. First, it identifies \textbf{duplicate} events using the conditions mentioned above. Second, it identifies subevent hierarchies using the graph merge conditions described above, and performs a \textbf{subevent hierarchy join}. Third, the function \textbf{prune\_up} removes entities from an event when its subevent also lists it as a participant node. Fourth, \textbf{push\_down} is the face verification step if the number of entities in the parents of the photo-capture events is small (less than $T$).
\restylealgo{ruled}
\SetAlgoSkip{}
\begin{algorithm}[h!]
\dontprintsemicolon
\KwData{A photograph H, with a set of detected faces F. Voting threshold, T. The owner O of the photo.}
\KwResult{For each face f $\in$ F, a set of atmost $k$ person tags.}
\Begin{
$ $\;
function discover(): \{ \;
\Indp while (\texttt{DQ} is not empty): \{ \;
\Indp \texttt{node} = \texttt{DQ}.deque() \;
\texttt{results} = query (\texttt{node}) \;
\texttt{E} $\leftarrow$ merge (\texttt{E}, \texttt{results}) \;
if (termination\_check()): \;
\Indp \textbf{return} prepare\_results(); \;
\Indm
\Indm
\} \;
reconstruct \texttt{DQ} $\leftarrow$ \texttt{E} \;
discover() \;
\Indm
\}\;
$ $\;
function merge(\texttt{O}, \texttt{N}): \{ \;
\Indp
remove\_duplicates() \;
\texttt{M} $\leftarrow$ subevent\_hierarchy\_join(\texttt{O}, \texttt{N}) \;
prune\_up(\texttt{M}) \;
if (less than \texttt{T} new candidates were discovered): \; {
\Indp
push\_down(\texttt{M}) \;
\Indm
} else: \;
\Indp
vote\_and\_verify(\texttt{M}) \;
\Indm
return \texttt{M}; \;
\Indm
\}
$ $ \;
\texttt{E} $\leftarrow$ construct event graph with \texttt{H} and \texttt{O} \;
construct discoverable nodes queue, \texttt{DQ} $\leftarrow$ \texttt{E} \;
\textbf{return} discover() \;
}
\caption{The Context Discovery Algorithm}
\label{alg:cx-alg}
\end{algorithm}
The \textbf{push\_down} step will try to verify if any of the newly discovered objects are present in the photo and if they are (if the tagging confidence of this object, obtained from the face verification algorithm, is higher than the given threshold), the objects are removed from the super event, and linked to the photo capture event as its participant. In other words, they are pushed down the subevent hierarchy. Alternatively, if the number of new objects is larger than $T$, the algorithm initiates the \textbf{vote-and-verify} method, which ranks all the candidates based on social relationships with people already identified in the photo. For example, if a candidate is related to two persons present in the photo through some social networks, then its score is 2. Ordering is done by simply sorting the candidate list by descending order of score. The face verification runs only on the top ranked $k$ candidates. If there are still untagged faces after the termination of the algorithm, we vote over all the remaining people, and return the ranked list for each untagged face.
Figure \ref{fig:exec-cycle} shows the various stages in the algorithm graphically. (a) shows an example event graph describing a photo taken at a meeting. The meeting consists of three participants AG, SR and AS. The photo contains SR and AS. (b) shows two events returned from the data sources. One is a meeting event which is semantically identical to the input. The other is a conference event with AG. (c) shows the result of merging these graphs. (d) The \texttt{prune-up} function removes the duplicate reference to AG. A live visualization of these steps for different photos can be found at \url{http://www.ics.uci.edu/~arjun/cuenet/icmr-demo/}.
\section{Merging Context Networks}
In this section, we look more closely at the merge function. Algorithm \ref{alg:merge-alg} presents the pseudo-code for merging a secondary context network, $S$, into a primary context network $P$. A terminology of primary and secondary is to signify that all data instances from a secondary network will be merged into a primary network. While merging networks, we also assume that events have at most one super-event. Thus, no diamond structures are found in either network. Algorithm \ref{alg:merge-alg} shows the steps needed to merge two networks each with a single root. A root event is one which has no super-events. The symbol $\forall_{se}$ stands for ``for all subevents''.
\restylealgo{ruled}
\SetAlgoSkip{}
\begin{algorithm}[h!]
\dontprintsemicolon
\KwData{Context Network P and S, where S will be merged into P.}
\KwResult{All event instances in S will be merged into P.}
\Begin{
$ $\;
function merge(P, S): \{ \;
\Indp
\texttt{[Pr]} = root event of P \;
\texttt{[Sr]} = root event of S \;
recursive\_merge(P, Pr, S, Sr) \;
\Indm
\}
$ $\;
function recursive\_merge(P, Pr, S, Sr): \{ \;
\Indp
addAsSubevent = true \;
containedSubevents $\leftarrow$ \texttt{\{$\phi$\}} \;
$\forall_{se}$ (\texttt{ps} of P) \{ \;
\Indp
if (equals(ps, Sr)) \{ \;
\Indp
addAsSubevent = false \;
mergeInformation(ps, Sr) \;
$\forall_{se}$ (\texttt{s} of Sr): recursive\_merge(P, ps, S, se) \;
\Indm
\} \;
if (contains(ps, Sr)) \{ \;
\Indp
addAsSubevent = false \;
recursive\_merge(P, ps, S, Sr) \;
\Indm
\} \;
if (contains(Sr, ps)) \{ \;
\Indp
addAsSubevent = false \;
containedSubevents.add(ps) \;
\Indm
\} \;
\Indm
\} \;
$ $\;
if (addAsSubevent) \{ \;
\Indp
addSubeventEdge(Pr, Sr) \;
if (containedSubevents.size() $>$ 0) \{ \;
\Indp
removeSubevents(Pr, containedSubevents) \;
createSubevents(Sr, containedSubevents) \;
\Indm
\} \;
$\forall_{se}$ (\texttt{s} of Sr): recursiveMerge(subtree, Sr, other, s); \;
\Indm
\} \;
\Indm
\} \;
$ $\;
}
\caption{The Merge Algorithm}
\label{alg:merge-alg}
\end{algorithm}
Once the two root nodes, $Pr$ and $Sr$ have been identified, we descend the subevent trees of the two context networks, and do one of the following operations. For each subevent of $Pr$, we check if any subevent is equal to $Sr$, then merge the information from $Sr$ to this subevent, and continue recursively merging the children of $Sr$ into the children of the subevent. If a subevent of $Pr$ contains the new instance, $Sr$, we simply continue recursion. But, if $Sr$ can contain the subevent node, then we add of the siblings sub-events which can become subevents of $Sr$ to a list \texttt{containedSubevents}, and add $Sr$ as a subevent of $Pr$, remove all the children from $Pr$ in \texttt{containedSubevents}, and add them as subevents of $Sr$. Recursion is continued on the children of $Sr$ from the newly connected $Sr$ node in the primary network. Any new literal properties, spatio-temporal attributes and participant information in events which exist in the primary networks are copied using the \texttt{mergeInformation} function.
\begin{figure}[t]
\centering
\includegraphics[width=0.75\textwidth]{media/chapter4/merge-example-setup.png}
\caption{(a) Primary and (b) Secondary Example Networks.}
\label{fig:merge-setup}
\end{figure}
Let's consider an example to understand the merge function. Assume we have to merge network \ref{fig:merge-setup}(b) into \ref{fig:merge-setup}(a). To simplify the example, we will assume that all the events are occurring at the same location. The time intervals occupied by the events are provided in the table \ref{tab:intervals} below.
\begin{table}[h]
\begin{center}
\begin{tabular}{|cc|}
\hline
Event & Time Interval \\
\hline
1 & 0 - 120 \\
2 & 10 - 45 \\
3 & 70 - 100 \\
4 & 10 - 15 \\
5 & 28 - 37 \\
6 & 70 - 80 \\
7 & 90 - 100 \\
\hline
\end{tabular}
\quad
\begin{tabular}{|cc|}
\hline
Event & Time Interval \\
\hline
A & 10 - 105 \\
B & 10 - 45 \\
C & 80 - 90 \\
\hline
\end{tabular}
\end{center}
\caption{Time intervals for events in primary and secondary networks shown in \ref{fig:merge-setup}}
\label{tab:intervals}
\end{table}
We first check if the root of the primary would contain the secondary. Since, event \texttt{1}'s time interval completely contains event \texttt{A}, this is true, and we initiate the merge function where \texttt{Sr=A, Pr=1}. The subevents \texttt{2, 3} of event \texttt{1} are both contained within A, the lines 23-24 from the algorithm populate the \texttt{containedSubevents} list. This list is used to remove the edges between \texttt{1} and \texttt{2, 3} and create subevent edges between \texttt{1} and \texttt{A} and \text{A} and \texttt{2, 3} in the lines 28-36. The new primary network is shown in figure \ref{fig:merge-output-stages}(a). Now, the algorithm recursively proceeds to add events \texttt{B, C} as subevents of A in the primary network \texttt{Sr=A, Pr=B}. Since the temporal extents of \texttt{2} and \texttt{B} are identical, the event \texttt{B} is merged into \texttt{2} as shown in \ref{fig:merge-output-stages}(b). All participant information which exists in \texttt{B}, and not in \texttt{2} is copied. This is done in lines 13-17 of the algorithm. Finally, the event \texttt{C} is compared with event \texttt{3}, and found to be contained within it. The recursion proceeds to the subevents of \texttt{3} in lines 18-21. Here, the algorithm realizes that none of the events \texttt{6, 7} could contain \texttt{3}, and hence it is made a subevent of \texttt{3} itself. This is done at the line 29. Figure \ref{fig:merge-output-stages}(c) shows the final primary network with all events from the secondary merged into it.
\begin{figure}[h]
\centering
\includegraphics[width=\textwidth]{media/chapter4/merge-example-output.png}
\caption{The three steps taken to merge the different nodes of the secondary network into the primary network shown in figure \ref{fig:merge-setup}.}
\label{fig:merge-output-stages}
\end{figure}
\section{Implementation}
In this section, we will look at the current implementation of the CueNet framework. Figure \ref{fig:d-concept-arch} shows a detailed conceptual architecture of the framework. It consists of three verticals: the data integration modules on the left side, the discovery algorithm and its supporting components in the middle, and the candidate management and face verification tools on the right side. Let's look at them one by one. The data integration module is configured using a source script. This script contains the schema declarations of the various sources, and specifications of how the objects in these sources are related to the objects in the ontology. The complete script listing is provided in Appendix A. We use JavaCC to construct the compiler for this script. The schema and the relations are stored as graphs which can be queried by other modules. Any discover query to be executed on the multitude of sources is checked upon each graph to see if can respond to the type of query. For example, exclude social networks when queries are requesting event information. Each source requires mediator code to align its content with that of the tuples requested in the query engine. This approach is more reminiscent of GAV designs in data integration literature. We chose this because of simpler query processing designs, and the fact that although we required to plug-in and out sources easily, these sources themselves changed very rarely. At the bottom of the data integration stack, we find some common utilities to make HTTP requests and DB query requests, which are used by the source mediators. The discover queries are created in SPARQL. The query engine uses Jena's SPARQL query processing framework, and extracts triple patterns and predicates using the framework's visitor pattern capabilities.
Now, let's look at the various components making up the discovery algorithm. Our knowledge base is specified using OWL ontologies. This is parsed using the open source Jena library, and the objects and relations specified are loaded into in memory DAGs. When a tag discovery request is made by a user, the discovery algorithm is initialized with a photo-capture-event, EXIF information is extracted through Perl based exiftool library, and the seed context network is constructed and provided to the discovery algorithm. The algorithm initiates the discover and merge operations using the query engine, and the candidate manager. As new relations are discovered, the related entities are then checked for their presence in the photo using verification tools. We used face.com until its termination in August 2012, after which we use the system based on \cite{nk_attribute_classifiers} until early 2013, following which we resorted to manual verification upon the top ranked candidates.
\begin{figure}[t]
\centering
\includegraphics[width=\textwidth]{media/chapter4/detailed-concept-arch.png}
\caption{Main components of the current CueNet implementation.}
\label{fig:d-concept-arch}
\end{figure}
When a tag request is initiated, a message is passed to the candidate manager, which scans the user's database to create a list of all entities. This is done by invoking scan queries on all data sources through the query engine. Each data source has a different representation for the same entity. The job of the candidate manager is to aggregate entities by using cues based on their common names or emails, pass them to an entity resolver (if the data is stored on a web-page), and if needed ask a user to correct any mistakes it made in this process. Ultimately, at the end of its processing, we obtain a list of candidates, each which holds identifiers which are local to each data source. When a context network is created by the source mediator, it looks up the candidate reference through the candidate manager before emitting the network back to the discover algorithm. This step makes the nature of data sources oblivious to the discovery algorithm.
\begin{figure}[h]
\centering
\includegraphics[width=\textwidth]{media/chapter4/web-stack.png}
\caption{Web stack for aggregating personal, public and social data.}
\label{fig:web-stack}
\end{figure}
The web interface to the system is shown in figure \ref{fig:web-stack}. This stack is essential in aggregating data from various sources, and is instrumental in dealing with their intricacies related to authentication, API requests and data formatting. Sources like Facebook and Google have rigid authentication mechanisms which need to be bypassed before any personal data can be accessed. Obtaining user permission needs to be done via a web client for Google APIs, whereas Facebook requires a valid session key from a user. These small details prompted us to implement the data aggregations using Javascript APIs which are served using an asynchronous web server based on node.js. The asynchronous nature of the server allows us to develop web clients which execute a large number of HTTP callbacks without worrying about multi-threading issues -- a design decision which is becoming increasingly popular for web architecture. Once the data has been aggregated at the client, it is passed back to the server which pushes it to specific process which can deal with processing it. For example, email information is sent to an independent Python process which initiates the IMAP protocol on the user's message server to aggregate emails. The Facebook events, social network, photo tag and Google calendar information is sent to a script which loads them into appropriate mongodb collections. Unstructured data like conference web pages are sent to a process which initiates the SNER over them to aggregate entities. These entities are stored in appropriate collections. The server communicates through these processes using REDIS, an in memory cache/database which supports PubSub channels. Multiple channels are created, one for each \textit{type} of data source. Any process can listen for data on a channel, fetch the data when it becomes available, and process it independently of other processes.
\begin{figure}[h]
\centering
\includegraphics[width=\textwidth]{media/chapter4/adj-list.png}
\caption{The in-memory data structure for storing entities and their relations in Context Network.}
\label{fig:adj-list}
\end{figure}
Lastly, we will describe the data structures used to hold context networks. A context network is a directed graph containing event, and entity nodes, and their relationships. Any standard directed graph implementation can be used to hold this, but given the operations we perform on these networks, some additional properties can be exploited to achieve faster merges. One operation we need often is to eliminate merging networks which provide no new information. For this purpose we modify the adjacency list structure to look like the one in figure \ref{fig:adj-list}. A traditional adjacency list holds a hash-table for nodes, and each node is associated with a linked list or another hash-table for the nodes which it connects to. Since IDs don't hold any special meaning between context networks, we cannot rely only on them to distinguish between events in separate context networks. Event identity is established based on type, spatial and temporal information. For this purpose, our primary hash-table contains the \textbf{types} of events present in the network. In the example figure \ref{fig:adj-list}, we see the events of type \texttt{conference}, \texttt{photo-capture}, \texttt{social-party} and \texttt{talk} contained in the network. For each event type, we associate a list of instances to it. This is implemented currently using a HashSet. Each instance in the context network is identified by a unique instance Id. This architecture allows us to quickly compare new events to existing ones and eliminate duplicates.
Edge relations are maintained by a list of (instance-id, edge-type) pairs. Thus traversing a network implies starting from a root node, looking up the instance Ids of the outgoing edges by scanning the HashSet, looking the outgoing nodes in the adjacency list, and repeating the process. In order to avoid multiple lookups, we maintain an additional hash-table over instance IDs. Given an instance ID, this table allows fast lookups to avoid time spent in scanning very large sets which are created when there are many instances of the same type of events.
This architecture suffices because we discover context for a single photo at a time. In the case where an application needs to discover context for many events, we use a separate network for each atomic event under consideration, and assume that the context for one event is independent of the other. Thus, a merge operation reduces to finding the existing context network it can be merged with, and performing the merge only with one such network. This assumption will not be true if the events are spatiotemporally very close to each other. Here, information about one event will affect events in the other. This is beyond the scope of this dissertation. We also use a R-Tree built using the Java Spatial Index library to index the time intervals of the events of the various context networks (which do not have any super-events themselves), to reduce the search during the merge step. This increases the overall efficiency of discovery algorithm.
In this chapter, we looked at the basic foundations of a context discovery framework. We saw the novel algorithms to discover and merge contextual information into the context network of a given photo, and the different design decisions that were taken to achieve our current implementation. We saw Participation Semantics and Object Existence axioms and the critical role they play in the algorithms. In the next chapter we will evaluate this framework on a set of real world photos and simulated information to test its efficacy and performance aspects.