File tree 1 file changed +38
-0
lines changed
1 file changed +38
-0
lines changed Original file line number Diff line number Diff line change
1
+ # Wiki document source
2
+
3
+ Fetching information from wikis is an essential
4
+ feature for fine-tuning LLMs on public knowledge.
5
+
6
+ ## Interfaces
7
+
8
+ qna.yaml file, ` document ` section:
9
+
10
+ - Wiki Host: The base URL of a wiki host.
11
+ - Page titles: The titles of the Wiki pages to fetch.
12
+ - oldid: IDs of old releases.
13
+
14
+ The qna.yaml file can define single host and multiple spaces and pages,
15
+ each with an optional version.
16
+
17
+ Example of fetch URL:
18
+
19
+ - https://en.wikipedia.org/w/index.php?title=IBM_Granite&oldid=1235007056&action=raw
20
+
21
+ Note that oldid is sufficient to reterieve a page:
22
+
23
+ - https://en.wikipedia.org/w/index.php?oldid=1235007056&action=raw
24
+
25
+ Page title is used for vaidation.
26
+
27
+ ## Changes across modules
28
+
29
+ - [ Schema module] ( https://github.com/instructlab/schema ) defines the structure and validation rules for
30
+ the qna.yaml file.
31
+ - [ SDG taxonomy module] ( https://github.com/instructlab/sdg/blob/main/src/instructlab/sdg/utils/taxonomy.py )
32
+ fetches documents
33
+ - [ SDG unit tests] ( https://github.com/instructlab/sdg/tree/main/tests )
34
+
35
+ ## Additional External Packages
36
+
37
+ - urllib
38
+
You can’t perform that action at this time.
0 commit comments