-
Notifications
You must be signed in to change notification settings - Fork 2
Description
I am figuring out how the digraph.pickle file is created because I need to create mine in my research on a subset of Wikipedia. So I downloaded a small subset of Wikipedia and extracted the plain text using WikiExtractor.
I am getting this error while running the following command:
python extract_wiki.py preprocess_wikipedia [Wikipedia/folder]
File "DEER/extract_wiki.py", line 1073, in
cal_freq = CalFreq(path_pattern_count_file)
File "DEER/extract_wiki.py", line 389, in init
c, log_max_cnt = load_pattern_freq(path_freq_file)
File "DEER/extract_wiki.py", line 382, in load_pattern_freq
One problem is that "save_pair_files" are empty! In the following code, the "len(ents) <= 1:" is always less than or equal to the 1, so data is always empty. Therefore, the path_pattern.pickle and the sub_path_pattern.pickle contain just counter()!
print('cal_cooccur_similarity')
for f_id, save_cooccur__file in enumerate(tqdm.tqdm(save_cooccur__files)):
with open(save_cooccur__file) as f_in:
cooccurs = f_in.read().split('\n')
print("cooccurs is: ", cooccurs)
data = []
for line in cooccurs:
ents = line.split('\t')
certain_len = len(ents)
**if len(ents) <= 1:
data.append('')**
else:
temp_data = []
# valid_entities = []
matrix = []
for ent in ents:
try:
vec = w2vec.get_entity_vector(ent)
except:
vec = np.zeros(100, dtype=np.float32)
matrix.append(vec)
# Collect pairs between certain entities
matrix = np.array(matrix)
result = cosine_similarity(matrix, matrix)
for i in range(certain_len):
for j in range(i+1, certain_len):
tup = (float(result[i, j]), ents[i], ents[j])
temp_data.append(str(tup))
data.append('\t'.join(temp_data))
my_write(save_pair_files[f_id], data)
Can you please help me to realize where the problem is?