|
| 1 | +<!-- |
| 2 | +Licensed to the Apache Software Foundation (ASF) under one or more |
| 3 | +contributor license agreements. See the NOTICE file distributed with |
| 4 | +this work for additional information regarding copyright ownership. |
| 5 | +The ASF licenses this file to You under the Apache License, Version 2.0 |
| 6 | +(the "License"); you may not use this file except in compliance with |
| 7 | +the License. You may obtain a copy of the License at |
| 8 | +
|
| 9 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 10 | +
|
| 11 | +Unless required by applicable law or agreed to in writing, software |
| 12 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 13 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 14 | +See the License for the specific language governing permissions and |
| 15 | +limitations under the License. |
| 16 | +--> |
| 17 | + |
| 18 | +# Building and Integrating Snowball Stemmer for OpenNLP |
| 19 | + |
| 20 | +This guide outlines the steps to build the Snowball compiler, generate stemmer classes, and integrate them into OpenNLP. |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## Prerequisites |
| 25 | + |
| 26 | +- A Unix-like environment with `make` installed. |
| 27 | +- Access to the [Snowball repository](https://github.com/snowballstem/snowball). |
| 28 | +- The OpenNLP repository checked out locally. |
| 29 | + |
| 30 | +--- |
| 31 | + |
| 32 | +## Procedure |
| 33 | + |
| 34 | +### 1. Clone and Build the Snowball Compiler |
| 35 | + |
| 36 | +Clone the Snowball repository and build the compiler using `make`: |
| 37 | + |
| 38 | +```bash |
| 39 | +git clone https://github.com/snowballstem/snowball.git |
| 40 | +cd snowball |
| 41 | +make |
| 42 | +``` |
| 43 | + |
| 44 | +This will generate the snowball compiler in the root directory of the repository. |
| 45 | + |
| 46 | +# Run the Snowball Compiler |
| 47 | + |
| 48 | +Run the Snowball compiler to generate the stemmer code. |
| 49 | + |
| 50 | +```bash |
| 51 | +#!/bin/bash |
| 52 | + |
| 53 | +# Define an array of languages |
| 54 | +languages=("arabic" "catalan" "danish" "dutch" "english" "finnish" "french" "german" "greek" "hungarian" "indonesian" "irish" "italian" "norwegian" "porter" "portuguese" "romanian" "russian" "spanish" "swedish" "turkish") |
| 55 | + |
| 56 | +# Base paths |
| 57 | +snowball_exec_path="../snowball" |
| 58 | +output_base="../../../../IdeaProjects/opennlp/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball" |
| 59 | + |
| 60 | +# Loop through the languages and execute the command |
| 61 | +for lang in "${languages[@]}"; do |
| 62 | + "${snowball_exec_path}" "${lang}.sbl" -java -o "${output_base}/${lang}Stemmer" |
| 63 | +done |
| 64 | +``` |
| 65 | + |
| 66 | +Usage: |
| 67 | +1. Save this script as `generate_stemmers.sh` at the appropriate location. |
| 68 | +2. Make it executable with `chmod +x generate_stemmers.sh`. |
| 69 | +3. Run it using `./generate_stemmers.sh`. |
| 70 | + |
| 71 | +# Manually Reformat Code to Match OpenNLP Style |
| 72 | + |
| 73 | +- Open the generated Java files in your preferred IDE or text editor. |
| 74 | +- Reformat the code to match the OpenNLP code style. This may include: |
| 75 | +- Adjusting indentation. |
| 76 | +- Renaming variables or methods as needed. |
| 77 | +- Ensuring proper spacing and alignment. |
| 78 | + |
| 79 | +# Add License Information |
| 80 | + |
| 81 | +- Ensure each generated file includes the appropriate license information for both Snowball and OpenNLP. |
| 82 | + |
0 commit comments