Skip to content

Commit 4839a21

Browse files
authored
OPENNLP-1521: Add documentation to describe how to re-generate snowball stemmer code (#744)
1 parent c9440e6 commit 4839a21

File tree

2 files changed

+86
-0
lines changed

2 files changed

+86
-0
lines changed

README.md

+4
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,10 @@ After cloning the repository go into the destination directory and run:
114114
mvn install
115115
```
116116

117+
### Additional Developement Information
118+
119+
- [Building and Integrating Snowball Stemmer for OpenNLP](dev/Snowball-Stemmer.md)
120+
117121
## Contributing
118122

119123
The Apache OpenNLP project is developed by volunteers and is always looking for new contributors to work on all parts of the project. Every contribution is welcome and needed to make it better. A contribution can be anything from a small documentation typo fix to a new component.

dev/Snowball-Stemmer.md

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one or more
3+
contributor license agreements. See the NOTICE file distributed with
4+
this work for additional information regarding copyright ownership.
5+
The ASF licenses this file to You under the Apache License, Version 2.0
6+
(the "License"); you may not use this file except in compliance with
7+
the License. You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
# Building and Integrating Snowball Stemmer for OpenNLP
19+
20+
This guide outlines the steps to build the Snowball compiler, generate stemmer classes, and integrate them into OpenNLP.
21+
22+
---
23+
24+
## Prerequisites
25+
26+
- A Unix-like environment with `make` installed.
27+
- Access to the [Snowball repository](https://github.com/snowballstem/snowball).
28+
- The OpenNLP repository checked out locally.
29+
30+
---
31+
32+
## Procedure
33+
34+
### 1. Clone and Build the Snowball Compiler
35+
36+
Clone the Snowball repository and build the compiler using `make`:
37+
38+
```bash
39+
git clone https://github.com/snowballstem/snowball.git
40+
cd snowball
41+
make
42+
```
43+
44+
This will generate the snowball compiler in the root directory of the repository.
45+
46+
# Run the Snowball Compiler
47+
48+
Run the Snowball compiler to generate the stemmer code.
49+
50+
```bash
51+
#!/bin/bash
52+
53+
# Define an array of languages
54+
languages=("arabic" "catalan" "danish" "dutch" "english" "finnish" "french" "german" "greek" "hungarian" "indonesian" "irish" "italian" "norwegian" "porter" "portuguese" "romanian" "russian" "spanish" "swedish" "turkish")
55+
56+
# Base paths
57+
snowball_exec_path="../snowball"
58+
output_base="../../../../IdeaProjects/opennlp/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball"
59+
60+
# Loop through the languages and execute the command
61+
for lang in "${languages[@]}"; do
62+
"${snowball_exec_path}" "${lang}.sbl" -java -o "${output_base}/${lang}Stemmer"
63+
done
64+
```
65+
66+
Usage:
67+
1. Save this script as `generate_stemmers.sh` at the appropriate location.
68+
2. Make it executable with `chmod +x generate_stemmers.sh`.
69+
3. Run it using `./generate_stemmers.sh`.
70+
71+
# Manually Reformat Code to Match OpenNLP Style
72+
73+
- Open the generated Java files in your preferred IDE or text editor.
74+
- Reformat the code to match the OpenNLP code style. This may include:
75+
- Adjusting indentation.
76+
- Renaming variables or methods as needed.
77+
- Ensuring proper spacing and alignment.
78+
79+
# Add License Information
80+
81+
- Ensure each generated file includes the appropriate license information for both Snowball and OpenNLP.
82+

0 commit comments

Comments
 (0)