This repository has been archived by the owner on Jul 3, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
concept_dev.tex
101 lines (85 loc) · 6.31 KB
/
concept_dev.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
\documentclass[11pt, oneside]{article} % use "amsart" instead of "article" for AMSLaTeX format
\usepackage{geometry} % See geometry.pdf to learn the layout options. There are lots.
\geometry{letterpaper} % ... or a4paper or a5paper or ...
%\geometry{landscape} % Activate for rotated page geometry
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
\usepackage{graphicx} % Use pdf, png, jpg, or eps§ with pdflatex; use eps in DVI mode
% TeX will automatically convert eps --> pdf in pdflatex
\usepackage{amssymb}
%SetFonts
%SetFonts
\title{Concept Dev for COMP study}
\author{Sitaram Devarakonda \& George Chacko}
%\date{} % Activate to display a given date or no date
\begin{document}
\maketitle
\section{Motivation} The field of computer science has experienced vigorous growth in the last twenty years that has also impacted other fields. Citation analysis is the major avenue down which we choose to explore this growth and diversification.
\section{Data and Methods} The following sections describe datasets and initial explorations.
\subsection{Scopus Year slices} Year slices were developed for all publications from 1996-2018. All publications of language `English', with at-least 2 references and a complete publication record in Scopus are selected. Publications type include (article, conference paper, dissertation etc) and belong to subject area 'COMP'.
\subsection{DBLP Data} Data from DBLP, for the years 1996-2018 were collected. A whole text matching approach (@dk) was used to match titles to Scopus identifiers.
\subsection{Clustering}
%\vspace{-7mm}
% latex table generated in R 3.5.1 by xtable 1.8-4 package
% Thu Aug 15 14:13:47 2019
\begin{table}[ht]
\caption{Summary of initial Scopus Analytical data set. The number of unique publications, unique references, total references . }
\label{tab:summary_data}
\vspace{3 mm}
\centering
\scalebox{0.7}{
\begin{tabular}{|r rrr r|}
\hline
& Year & Unique Publications & Unique References & Total References \\
\hline
1 & 1996 & 30783 & 185816 & 320533 \\
2 & 1997 & 37284 & 230475 & 389779 \\
3 & 1998 & 45198 & 254160 & 445703 \\
4 & 1999 & 40776 & 241013 & 420839 \\
5 & 2000 & 45310 & 266764 & 471079 \\
6 & 2001 & 65168 & 357472 & 651196 \\
7 & 2002 & 75583 & 410290 & 761714 \\
8 & 2003 & 88701 & 458459 & 904755 \\
9 & 2004 & 88984 & 506063 & 980388 \\
10 & 2005 & 110741 & 609294 & 1247543 \\
11 & 2006 & 140729 & 731622 & 1579690 \\
12 & 2007 & 171357 & 845569 & 1879419 \\
13 & 2008 & 205852 & 1019330 & 2281452 \\
14 & 2009 & 239905 & 1239406 & 2841795 \\
15 & 2010 & 255254 & 1391036 & 3214371 \\
16 & 2011 & 255633 & 1534865 & 3464829 \\
17 & 2012 & 264011 & 1700862 & 3845051 \\
18 & 2013 & 261442 & 1866133 & 4199482 \\
19 & 2014 & 272711 & 2004406 & 4581855 \\
20 & 2015 & 309446 & 2220732 & 5202960 \\
21 & 2016 & 316430 & 2422531 & 5630283 \\
22 & 2017 & 308102 & 2622125 & 5918274 \\
23 & 2018 & 340221 & 3012740 & 6960056 \\
\hline
\end{tabular}}
\vspace{-1mm}
\end{table}
\subsection{Disciplinary Composition} One way to identify the growth and diversity of a field is by considering the count and subject areas of scientific publications in it and how these change over time. In our study of the evolution of the field of computer science, we initially used the All Science Journal Classification (ASJC) codes in Scopus to journals to estimate relevant publications and their cited references from 1996-2018. Accordingly, we searched for articles or conference proceedings labelled computer science (COMP), one of the 27 major subject areas in the Scopus classification system. It should be noted that COMP itself comprises 13 minor subject areas such as Hardware and Architecture and Artificial Intelligence. A limitation of the ASJC system is that publications inherit ASJC codes code from those assigned to their parent journal or conferences. Further, multidisciplinary journals such as Nature, Science, Proceedings of the National Academy of Sciences are classified under General and computer science articles published in those journals could be overlooked. The extend of both false positives (publications that are not computer science but have COMP labels) and false negatives (publications that are computer science but do not have COMP labels) must, therefore, be considered when engaging in such studies. We chose to refine an initial subsetting of Scopus by restricting to COMP and GENERAL and then refining the results in through article level clustering using citations to form clusters. A second approach involves mining the dblp repository assuming that it contains a validated set of computer science publications from journals and conference proceedings. Publications so mined are matched to Scopus identifiers and further analyzed.
\subsection{Possible next steps(based on discussion)}
\begin{itemize}
\item Expt 1: Use dblp as input to Scopus instead of Scopus\_COMP (super idea from Sitaram)
\begin{enumerate}
\item Match dblp titles to Scopus titles using (insert full text search syntax) using a stringent rankfilter cut off $> 0.99$ for matching to Scopus
\item Get cited references using Scopus ids from matched titles and construct an undirected citation graph
\item Cluster with Graclus into 30 clusters
\item Get Topics of Prominence (TOP) counts for articles in these clusters
\item Vary rankfilter cutoff
\item Vary cluster number
\end{enumerate}
\end{itemize}
Expt 2: Reduce Graclus and Leiden clusters to top tenth percentile of cited articles
• Use total citations from Scopus for citation counts
• Get TOP counts for these publications
Expt 3: George will keep fiddling with Sitaram's previously generated data to extract insight.
\begin{description}
\item[$\bullet$ ] NSF follows a single subject area for journals which can be used for publication level classification.
\item[$\bullet$ ] Ludo Waltman methodology for publication-level classification (uses direct citation)
\item[$\bullet$ ] Kevin Boyack methodology
\item[$\bullet$ ] Using Scopus FPE to generate publication level classification
\item[$\bullet$ ] Obtain top conferences/journals in COMP field and analyze their citation patterns. (Assuming they contain articles related to single discipline)
\end{description}
\end{document}