Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced conversion techniques: Experiments and Discussion #150

Open
clort81 opened this issue Jul 5, 2023 · 27 comments
Open

Advanced conversion techniques: Experiments and Discussion #150

clort81 opened this issue Jul 5, 2023 · 27 comments
Labels
feature New feature or request quality Output quality research Research & discussion

Comments

@clort81
Copy link

clort81 commented Jul 5, 2023

One thing human ascii artists do, is shift input image regions to adapt to the text grid alignment and available character set. Finally i found smart people who found a way to emulate this.

http://www.cse.cuhk.edu.hk/~ttwong/papers/asciiart/asciiart.pdf

I am mentally underpowerded to implement the described algorithm properly, and leave this here as a reference for future improvements to chafa.

iterative_alignment

@hpjansson
Copy link
Owner

hpjansson commented Jul 6, 2023

Great find! I love it.

I've been thinking about flexible grid alignment and tested a naïve (but fast) approach -- basically shift the grid of each cell a few ways around the center and look for matches that are better by some factor. Unfortunately that didn't improve the output very much and made it worse in many cases (especially with connective characters).

The paper has a much better approach, but it's also much more complex. But from a quick skim, I think there's nothing stopping us from implementing it or something similar. A couple of things we'd need to resolve, though:

  • The paper's algorithm works on high-contrast outlines. In order to be as general as possible, we'd want to generate our own outlines, for instance with edge detection (Canny/Deriche?) or some segmentation approach, before vectorization.
  • How do we best do vectorization? Ideally we'd write a small C implementation with no external dependencies. There must be many simple/fast algorithms out there that we could use.
  • We'd want it to work in full color. We could try something fancy like attaching color information to our vectors (either a "stroke color" or inner/outer color pair for the area delimited by the stroke so filled areas would deform along with the outline). Or we could have always-black outlines with color infill using connective characters or by modulating the background color.

We could iterate on it in order to get some modest improvements more quickly. A tentative plan could look like this:

  • Stage 1: Implement alignment-insensitive shape similarity described in section 4. Use this instead of fixed grid at higher work factors. New feature: better quality.
  • Stage 2: Add vectorization and deformation. New feature: monochrome line art -> monochrome ASCII.
  • Stage 3: Add edge detection. New feature: any image -> monochrome ASCII.
  • Stage 4: Figure out how to deal with color. New feature: any image -> mixed-color mixed-charset (ASCII + connective) with deformation.

I'm going to dump a few loosely related ideas below. This issue seems like a good place, hope you don't mind :-)

@clort81
Copy link
Author

clort81 commented Jul 6, 2023

OK i found two more papers! ASCII ART SERIOUS BUSINESS

2019: https://ieeexplore.ieee.org/document/7491376
(you may need .edu access - i may not share it afaik)

2022: https://gwern.net/doc/design/typography/2022-chung.pdf

A git repo with some kind of implementation of the 2010 paper
https://github.com/MacLeek/ascii-art?ysclid=ljq8q05xt7678629926

out of time for more investigation. CHEERS!

@hpjansson
Copy link
Owner

Blocky look when using background colors

Another thing I've been thinking about is how to improve on the blocky look caused by variable background colors. I remember you pointing this out in another bug, but I don't think it would work to use bright colors for foreground details and dark colors as background. Imagine a blue sky gradient with a black bird and a white bird -- you'd want the blue gradient to be drawn in background color, and the details for both birds to be drawn as foreground symbols (e.g. ASCII letters).

The solution to this might be to use frequency analysis and draw low-frequency features (e.g. gradients) using background and high-frequency features (details, outlines, etc) as foreground. We could also use an iterative approach where cells' background colors are averaged with neighboring cells, or come up with some other fancy scheme.

@hpjansson
Copy link
Owner

Deep learning

Akiyama 2017: https://github.com/OsciiArt/DeepAA

That project is for Shift-JIS with a specific variable-width font, but we could use a similar approach for other forms of character art. It might be even simpler to do for fixed-width fonts.

Neural nets are all the rage, and I think it'd be possible to write a no-dependencies CNN that could run on the CPU with a pre-trained model. We'd need a model for each specific kind of character art, though. I thought about using it for PETSCII; you could generate training data by taking PETSCII images with 8x8-pixel cells and downsampling them by a factor of 4 and augmenting (small shifts pre-sample, using different downsamplers, adding noise, modulating colors) to make image pairs with an image that resembles a "natural" image and its PETSCII equivalent for training.

PETSCII images are typically 40x25 cells, where each cell is 8x8 pixels. That's 320x200 pixels. After downsampling by a factor of four, that's 80x50 pixels, which should make for a fairly small model.

@clort81
Copy link
Author

clort81 commented Jul 6, 2023

Blocky look when using background colors

I don't think it would work to use bright colors for foreground details and dark colors as background. Imagine a blue sky gradient with a black bird and a white bird -- you'd want the blue gradient to be drawn in background color, and the details for both birds to be drawn as foreground symbols (e.g. ASCII letters).

That instance at least would have a large number of pixels with color vectors clustered around some mean (sky) in the regions around the currently processed character, with some outliers. The key for those braille characters would be to not be randomly flipping fg/bg assignments.

The solution to this might be to use frequency analysis and draw low-frequency features (e.g. gradients) using background and high-frequency features (details, outlines, etc) as foreground. We could also use an iterative approach where cells' background colors are averaged with neighboring cells, or come up with some other fancy scheme.

Ah yeah. Yes. Experiments needed. Funding needed for experiments.

@hpjansson
Copy link
Owner

We need to write a grant proposal, should be a slam dunk with its implications for global GDP.

@clort81 clort81 changed the title Advanced deformation and alignrment ala ttwong Advanced conversion techniques: Experiments and Discussion Jul 6, 2023
@clort81
Copy link
Author

clort81 commented Jul 6, 2023

And another paper from Japan! http://nishitalab.org/user/nis/cdrom/iccg/miyake_nico.pdf

@hpjansson hpjansson added feature New feature or request research Research & discussion quality Output quality labels Jul 13, 2023
@hpjansson
Copy link
Owner

Leicht looks like a good starting point for minimalist NN experimentation.

@cdluminate
Copy link
Collaborator

Well, my Leicht project is largely educational. Its performance is almost not optimized at all, although it should run on all architectures. Intel has a highly optimized oneMKL library for amd64. A couple of high performance C libraries exist as well.

@hpjansson
Copy link
Owner

That's good to know.

I think an NN approach made to fit within the scope of Chafa would have to be pretty naïve, ideally a plain C implementation with no external dependencies, with each layer of weights being just an array of packed floats, and one or two hardcoded activation functions (ReLU + softmax?). We could borrow inspiration from Akiyama et al, but make it more lightweight computationally since we're using a fixed grid, lower symbol resolution and much fewer symbols (in the ASCII and PETSCII cases, <= 128). It could have AVX and multithread optimizations, since it's not too complex to do and we're already using those approaches elsewhere in the library.

It might not be feasible to make something so simple and retain a measure of usefulness, so I think it's nice to have a small implementation to play around with to see what works (and if it can't be made to work, failing fast) and not having to get into all the care and feeding of a bigger framework :-)

I can easily be wrong/dumb in my assumptions, though, since it's not really my field. I'm very glad to have your input @cdluminate.

@cdluminate
Copy link
Collaborator

OK. My suggestion is to separate the training and inference code for Chafa.

Training scripts are suggested to be implemented in PyTorch. So that you can try to train different NNs in a very fast pace. Implementing this in C/C++ does not worth the time cost at all. We can export the trained NN into some common formats like ONNX, or just some self-defined json/binary dump format, as long as it is easy to load in ansi C.

As for inference code, you may take a look at https://github.com/BVLC/caffe, which is an obsolete but high-quality C++ code base widely used in many commercial products. Once we have an concrete idea on how the NN would look like, we can start to borrow code from Caffe for the implementation. My Leicht is not as mature as Caffe but it could serve as a reference if you would like.

In terms of external dependency. I'd suggest at least incorporating BLAS for accelerating the basic linear algebra routines. It deals with many of the computational bottlenecks in NN computation.

@cdluminate
Copy link
Collaborator

It makes no sense to optimize BLAS on our own. It's a very complicated work. See BLAS implementations in the following links:
https://wiki.debian.org/DebianScience/LinearAlgebraLibraries
https://wiki.gentoo.org/wiki/Blas-lapack-switch
https://wiki.gentoo.org/wiki/BLAS_and_LAPACK_Providers

In some distributions like Debian and Gentoo, the BLAS backend is switchable at run-time. So compiling against the generic BLAS and running on an optimized BLAS is not an issue.

@cdluminate
Copy link
Collaborator

One of the most significant performance bottleneck is gemm. The fully-connected layer (or say, linear layer, affine layer, dense layer, people use different names to refer exactly the same thing) is simply vector broadcast, matrix add, immediately after the matrix multiplication (gemm).

Matrix add and matrix multiplication are exactly the functionality of BLAS. Existing libraries like OpenBLAS and BLIS are already well-optimized. Don't ever try to re-invent the wheel because it is a very complicated thing of you want to reach high performance.

FYI, gemm stands for GEneral Matrix-Matrix multiplication. Fortran does not allow long routine names. That's how the abbreviations are created.

@hpjansson
Copy link
Owner

hpjansson commented Jul 17, 2023

Great pointers! Yes, we'd definitely want to keep all the training stuff separate, and only implement inference in Chafa proper. My speculation around wheel-inventing was based on the need to do forward passes only.

Another obstacle is model distribution; I think anything over 10MB ought to be kept out of direct distribution. For comparison, the DeepAA model is 666MB. Ours would likely have to be download-on-demand from private hosting, fetched via libsoup or something like that. We could alleviate it somewhat by making a simpler model and maybe quantizing the weights.

There may be non-NN data-driven approaches with more modest requirements. For instance, it would be possible to build a frequency table of symbols' spatial relationships in art and use that to "repair" high-loss cells by predicting their contents from their local neighborhood after the currently implemented MSE-minimizing pass. With a kernel size of 5x5 cells and 128 possible symbols we'd need 5*5*128*128 = 409600 entries. We could normalize the frequencies to 16 bits and come in at under a megabyte. No idea if it'd actually improve the quality, though :-)

@cdluminate
Copy link
Collaborator

One more thing to mention is that the licensing of deep neural networks is still an unclear area, unless the training dataset is fully open source. E.g., all training data are licensed under CC-BY-SA 4.0 or CC-0 or something alike.

Debian has an unofficial document on the policy for distributing AI stuff: https://salsa.debian.org/deeplearning-team/ml-policy/-/blob/master/ML-Policy.rst (I drafted it). This policy is strict, in exchange of no loss of software freedom (at a cost of usefulness, as usual).

OSI is already working on this direction: https://deepdive.opensource.org/

@hpjansson
Copy link
Owner

My position is that we definitely want to err on the side of legal and ethical use, and speculation here on possible ML approaches is subject to that. Additionally we'd want to adhere to Debian's official policy wherever that's stricter.

I noticed you made an exception for simpler statistical models; you're more up to date on the legal details than I am, so I'll defer to your judgement as to what is permissible. The idea is to make good terminal-printable facsimiles of the user's own images, not wholesale generation (e.g. with text embedding) like Adobe, SD, OpenAI, Midjourney, Microsoft, etc. are doing.

Licensing is also a conundrum for ANSI art archives, since almost none of the art has historically been released with a license. A side effect of that is that this art form is a lot harder to find now than it was just 10 years ago.

@cdluminate
Copy link
Collaborator

Well. This ML-Policy draft is a result of a very lengthy discussion in debian-devel mailing list. So it has gone through some kind of review and reached somewhat a consensus. Simpler statistical models already widely exist in the open-source world. Posing restrictions on them could be overkill. For instance, input methods are already using that for decades. Simpler models can be interpreted by humans very well. But the deep neural networks cannot be well understood yet. So, what the model does won't matter much here.

My point here is just to provide information so we can avoid complicating the distribution (in terms of licensing).
In fact, chafa8x8 font is also subject to the ML-Policy. It is an artifact resulted from k-means algorithm, based on the COCO dataset. The images are not actually open-source licensed: https://cocodataset.org/#termsofuse

I actually care more about latency instead of wether deep learning can improve the display quality a little bit.
One of my most important use cases of chafa is to display images on servers through ssh terminal without X forwarding. I do not really like using a deep neural network to generate something to induce latency for me. Similarly, people taking glance to images using ranger-fm cannot bear the latency introduced by deep neural networks, I suppose.

Here are some of my thoughts about the previously mentioned issues:

  1. slightly shift the image a little bit in every direction and figure out the best restoration. For instance, suppose we use the pipe symbol | to display a vertical edge. But if the line is located at the 0.3 position instead of the 0.5 (center) position of the corresponding cell, the matching algorithm may select some other character instead. I think very slight shift (like 0.1~0.3 cell) can already change the character matching for the whole image. But taking a step backward, what I usually do is to press Ctrl+- multiple times in my terminal and turn chafa output into super high resolution for free.

  2. In terms of blocky look of background cells. Each cell is processed individually (IIRC) in chafa. What if we introduce some similar post processing step similar to anti-aliasing? We can smooth the background blocks in this way. I personally think this might improve the visual effect of chafa.

@hpjansson
Copy link
Owner

Well. This ML-Policy draft is a result of a very lengthy discussion in debian-devel mailing list. So it has gone through some kind of review and reached somewhat a consensus. Simpler statistical models already widely exist in the open-source world. Posing restrictions on them could be overkill. For instance, input methods are already using that for decades. Simpler models can be interpreted by humans very well. But the deep neural networks cannot be well understood yet. So, what the model does won't matter much here.

My point here is just to provide information so we can avoid complicating the distribution (in terms of licensing). In fact, chafa8x8 font is also subject to the ML-Policy. It is an artifact resulted from k-means algorithm, based on the COCO dataset. The images are not actually open-source licensed: https://cocodataset.org/#termsofuse

Yeah. Anyway, CNNs aren't exactly high on the agenda :-) The top contender for "really fancy processing" is still the first paper @clort81 posted. Its technique is old-school (edge thinning, path tracing, iterative deformation), and it seems to produce great results, though it'd still be a lot of work to implement.

I actually care more about latency instead of wether deep learning can improve the display quality a little bit. One of my most important use cases of chafa is to display images on servers through ssh terminal without X forwarding. I do not really like using a deep neural network to generate something to induce latency for me. Similarly, people taking glance to images using ranger-fm cannot bear the latency introduced by deep neural networks, I suppose.

Absolutely. I went to some lengths to make it fast, and I'd like to keep it that way too. The -w flag exists because not everyone agrees on the right speed/quality tradeoff, but if we introduced some really slow processing it'd probably be behind a separate flag (even -w 9 should not take more than a couple of seconds to produce a picture).

The machinery that makes it work well on low-end terminals (e.g. Linux console or fbterm running on a tiny screen glued to some DIY project) provides an opening for more "artsy" applications, for lack of a better word. It's fun to explore those, and I want to do more of it, but it shouldn't come at the cost of more pragmatic use cases.

Here are some of my thoughts about the previously mentioned issues:

1. slightly shift the image a little bit in every direction and figure out the best restoration. For instance, suppose we use the pipe symbol `|` to display a vertical edge. But if the line is located at the 0.3 position instead of the 0.5 (center) position of the corresponding cell, the matching algorithm may select some other character instead. I think very slight shift (like 0.1~0.3 cell) can already change the character matching for the whole image. But taking a step backward, what I usually do is to press `Ctrl+-` multiple times in my terminal and turn chafa output into super high resolution **for free**.

Yes, shifting the entire image and selecting the offset that results in the lowest error is a good idea. I think we could also revisit my attempt at shifting cells on an individual basis, but modify it to only apply when a non-connective character was chosen. That may make small details look better without tearing up connective cells. But as you point out, it probably wouldn't save you those keypresses :-)

2. In terms of blocky look of background cells. Each cell is processed individually (IIRC) in chafa. What if we introduce some similar post processing step similar to [anti-aliasing](https://en.wikipedia.org/wiki/Spatial_anti-aliasing)? We can smooth the background blocks in this way. I personally think this might improve the visual effect of chafa.

The image is already scaled with anti-aliasing before applying the rest of the algorithm, but it may be possible to do something, e.g. with a median filter to smooth things out while preserving edges. I'm not sure I understand 100% what you're suggesting, though.

I think adding some kind of spatial hysteresis (so BG/FG colors don't flip as demonstrated in #127) could also work.

@hpjansson
Copy link
Owner

Another lead one could follow would be to define and apply different kinds of parametric shape grammar. Just jotting it down here before I forget.

@hpjansson
Copy link
Owner

hpjansson commented Sep 12, 2023

How about a mesh transform with a fixed image size? Place control points where cell corners meet and have a solver move them around. In each step it'd resample the image according to the distorted grid, then calculate cell MSE as usual with a distortion penalty.

I think it'd be somewhat analogous to the first paper posted here, except simpler (e.g. no need to do path tracing), and it'd work with colors and areas, not just line art. It would blur the image, though, so it could be necessary to do contrast enhancement/line thickening pre-transform to preserve details. But that's not hard to do either.

@Bedrovelsen
Copy link

[
  {
    "title": "Structure-based ASCII art",
    "doi": "10.1145/1833349.1778789",
    "year": 2010
  },
  {
    "title": "ASCII Art Generation Using the Local Exhaustive Search on the GPU",
    "doi": "10.1109/CANDAR.2013.35",
    "year": 2013
  },
  {
    "title": "A character art generator using the local exhaustive search, with GPU acceleration",
    "doi": "10.1080/17445760.2014.962026",
    "year": 2014
  },
  {
    "title": "Texture-aware ASCII art synthesis with proportional fonts",
    "doi": "10.2312/EXP.20151191",
    "year": 2015
  },
  {
    "title": "Fast Rendering of Image Mosaics and ASCII Art",
    "doi": "10.1111/cgf.12597",
    "year": 2015
  },
  {
    "title": "Automatic ASCII Art conversion of binary images using non-negative constraints",
    "doi": "10.1049/CP:20080660",
    "year": 2015
  },
  {
    "title": "COMPARISON OF TWO ASCII ART EXTRACTION METHODS: A RUN-LENGTH ENCODING BASED METHOD AND A BYTE PATTERN BASED METHOD",
    "doi": "10.2316/P.2015.827-026",
    "year": 2015
  },
  {
    "title": "ASCII Art Synthesis from Natural Photographs",
    "doi": "10.1109/TVCG.2016.2569084",
    "year": 2016
  },
  {
    "title": "ASCII Art Classification based on Deep Neural Networks Using Image Feature of Characters",
    "doi": "10.17706/jsw.13.10.559-572",
    "year": 2018
  },
  {
    "title": "Generating ASCII-Art: A Nifty Assignment from a Computer Graphics Programming Course",
    "doi": "10.2312/EGED.20171021",
    "year": 2017
  },
  {
    "title": "ASCII Art Classification Model by Transfer Learning and Data Augmentation",
    "doi": "10.3233/faia200738",
    "year": 2020
  },
  {
    "title": "Fast Text Placement Scheme for ASCII Art Synthesis",
    "doi": "10.1109/ACCESS.2022.3167567",
    "year": 2022
  },
  {
    "title": "An Autoencoder Based ASCII Art Generator",
    "doi": "10.1145/3591569.3591587",
    "year": 2023
  }
]

@hpjansson
Copy link
Owner

Cool, some of those look interesting. The GPU/autoencoder/NN ones are probably out of scope for Chafa (too heavy/training data reliant), but worth a look to see what's possible. Relatedly, I came across https://github.com/theAdamColton/ascii-autoencoder and https://github.com/theAdamColton/ascii-unmasked when looking for open access sources for one of the papers (I didn't find one, unfortunately).

@Bedrovelsen
Copy link

I have implemented the image processing steps for a of these papers then sent them to chafa and had things work out great a few years ago when first getting into this topic:
ascii_art_gallery

I am really sleepy now but the one I can’t recall at the moment that does a vectorization / SVG step after pre processing was really nice for little code. Still gotta try it out with big zoomed out mode terminal and chafa as is piped into a file nice then just tiny font them into a html file in a

 block after for responsive ascii art CSS 

I really need to keep things more organized as I have various folders of various text file with png or svg files they were generated from and once and awhile the script I was testing happens t be near by :)

IMG_434 IMG_4369_vectorized 2

IMG_4901_vectorized

IMG_4369_vectorized

zoomed_x3_building_closehi


import cv2
import numpy as np
import sys

# A. STRUCTURE LINE EXTRACTION
def extract_structure_lines(input_image):
    # Convert the input image to grayscale
    gray = cv2.cvtColor(input_image, cv2.COLOR_BGR2GRAY)

    # Apply edge tangent flow and flow-based DoG filter for line extraction
    # Adjust parameters as needed
    lines_image = cv2.ximgproc.createStructuredEdgeDetection("model.yml").detectEdges(gray)

    # Convert lines_image to binary image
    _, binary_image = cv2.threshold(lines_image, 0, 255, cv2.THRESH_BINARY)

    return binary_image

# B. THINNING
def remove_noise_and_thin(structure_lines_image):
    # Apply pre-thinning method to remove noise
    pre_thinning_image = cv2.ximgproc.niBlackThreshold(structure_lines_image, maxValue=255, type=cv2.THRESH_BINARY_INV, blockSize=15, k=-0.2)

    # Perform thinning using KMM thinning algorithm
    thinning_image = cv2.ximgproc.thinning(pre_thinning_image, thinningType=cv2.ximgproc.THINNING_GUOHALL)

    return thinning_image

# Read input image file
def read_input_image(file_path):
    try:
        input_image = cv2.imread(file_path)
        return input_image
    except Exception as e:
        print(f"Error: Failed to read the input image - {str(e)}")
        sys.exit(1)

# Example usage
if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Error: Please provide the input image file path as an argument.")
        sys.exit(1)

    input_image_path = sys.argv[1]
    input_image = read_input_image(input_image_path)

    structure_lines = extract_structure_lines(input_image)
    thinned_image = remove_noise_and_thin(structure_lines)

    cv2.imshow('Structure Lines', structure_lines)
    cv2.imshow('Thinned Image', thinned_image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()


import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import string
from PIL import Image
from skimage import morphology
from skimage.measure import block_reduce
import zipfile

def split_image(img_path):
    img = Image.open(img_path)
    img_gray = img.convert('L')
    img_array = np.array(img_gray)
    sub_image_size = img_array.shape[0] // 4, img_array.shape[1] // 4
    sub_images = [img_array[y:y+sub_image_size[0], x:x+sub_image_size[1]]
                  for y in range(0, img_array.shape[0], sub_image_size[0])
                  for x in range(0, img_array.shape[1], sub_image_size[1])]
    return sub_images

def preprocess_images(sub_images):
    preprocessed_images = []
    for img in sub_images:
        edges = cv2.Canny(img, threshold1=100, threshold2=200)
        _, binary_image = cv2.threshold(edges, 127, 255, cv2.THRESH_BINARY)
        binary_image = binary_image > 0
        thinned_image = morphology.thin(binary_image)
        inverted_thinned_image = np.logical_not(thinned_image)
        preprocessed_images.append(inverted_thinned_image)
    return preprocessed_images

def generate_ascii_art(preprocessed_images):
    characters = string.printable[:-5]
    intensity_values = list(range(len(characters)))
    character_intensity = dict(zip(characters, intensity_values))
    block_size = (10, 10)
    reduced_images = [block_reduce(img, block_size, np.mean) for img in preprocessed_images]
    rescaled_images = [np.interp(img, (img.min(), img.max()), (0, len(characters) - 1)) for img in reduced_images]
    ascii_arts = []
    for img in rescaled_images:
        ascii_art = ''
        for row in img:
            for pixel in row:
                closest_intensity = min(character_intensity.values(), key=lambda x: abs(x - pixel))
                char = list(character_intensity.keys())[list(character_intensity.values()).index(closest_intensity)]
                ascii_art += char
            ascii_art += '\n'
        ascii_arts.append(ascii_art)
    return ascii_arts

def save_ascii_art(ascii_arts, preprocessed_images, output_dir):
    os.makedirs(output_dir, exist_ok=True)
    for i, ascii_art in enumerate(ascii_arts):
        with open(f"{output_dir}/image_{i+1}.txt", "w") as file:
            file.write(ascii_art)
        plt.imsave(f"{output_dir}/image_{i+1}.png", preprocessed_images[i], cmap='gray')
    with zipfile.ZipFile(f"{output_dir}.zip", "w") as zipf:
        for file in os.listdir(output_dir):
            zipf.write(os.path.join(output_dir, file), arcname=file)

def combine_ascii_art_files(directory):
    ascii_art_lines = []
    for i in range(1, 17):
        with open(f"{directory}/image_{i}.txt", "r") as file:
            ascii_art_lines.append(file.readlines())
    combined_lines = []
    for i in range(len(ascii_art_lines[0])):
        combined_line = '   '.join(ascii_art_lines[j][i] for j in range(16)).rstrip('\n')
        combined_lines.append(combined_line)
    combined_ascii_art = '\n'.join(combined_lines)
    html = f"<pre>{combined_ascii_art}</pre>"
    return html

# Specify the path of your image
image_path = 'test.png'

# Specify the output directory
output_dir = 'outresult'

# Split the image into a 4x4 grid
sub_images = split_image(image_path)

# Preprocess the sub-images
preprocessed_images = preprocess_images(sub_images)

# Generate ASCII art for the preprocessed images
ascii_arts = generate_ascii_art(preprocessed_images)

# Save the ASCII art files and preprocessed images, and compress them into a ZIP file
save_ascii_art(ascii_arts, preprocessed_images, output_dir)

# Combine the ASCII art files and wrap them in a <pre> block
html = combine_ascii_art_files(output_dir)

# Print the HTML code to check
print(html)

# Save the HTML code to a file
with open(f'{output_dir}/ascii_art.html', 'w') as file:
    file.write(html)


def create_gabor_filter(ksize, sigma, theta, lambd, gamma, phi):
    # Create a Gabor filter with the specified parameters
    gabor_filter = cv2.getGaborKernel((ksize, ksize), sigma, theta, lambd, gamma, phi, ktype=cv2.CV_32F)
    return gabor_filter

def apply_gabor_filters(image, filters):
    # Apply a bank of Gabor filters to the image and return the responses
    responses = []
    for filter in filters:
        filtered_image = cv2.filter2D(image, -1, filter)
        responses.append(filtered_image)
    return responses
def modulate_responses(responses):
    # Modulate the filter responses based on the non-CRF model
    modulated_responses = []
    for response in responses:
        # Apply non-CRF modulation to the response
        # This is a complex process that involves several steps, such as:
        # 1. Computing the local energy of the response
        # 2. Computing the local contrast of the response
        # 3. Combining the energy and contrast to modulate the response
        # ...
        modulated_responses.append(modulated_response)
    return modulated_responses

@hpjansson
Copy link
Owner

hpjansson commented Apr 1, 2024

I just pushed the structural-art branch. It's work in progress and includes a "facet" shape matcher and mesh solver. Only the facets are working correctly at the moment. It can be enabled like this:

CHAFA_USE_FACETS=1 chafa image.png

The output isn't "better" in a conventional sense, but it'll look more character-artsy.

The mesh solver can be enabled like this:

CHAFA_USE_SOLVER=1 chafa image.png

But the solver is trash at the moment. You can also enable both at once if you've got all day to wait for the output.

@Bedrovelsen
Copy link

here is an implementation of the fast text placement paper

import cv2
import numpy as np
import matplotlib.pyplot as plt
from skimage import filters
from scipy.ndimage import gaussian_filter

# Load the image and convert to grayscale
def load_image(image_path):
    image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    return image

# Edge detection
def edge_detection(image):
    edges = cv2.Canny(image, 100, 200)
    return edges

# Thinning the edges
def thinning(image):
    skel = np.zeros(image.shape, np.uint8)
    temp = np.zeros(image.shape, np.uint8)
    eroded = np.zeros(image.shape, np.uint8)
    kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3,3))

    while True:
        cv2.erode(image, kernel, eroded)
        cv2.dilate(eroded, kernel, temp)
        cv2.subtract(image, temp, temp)
        cv2.bitwise_or(skel, temp, skel)
        image, eroded = eroded, image  # Swap instead of copy

        if cv2.countNonZero(image) == 0:
            break
    return skel

# Extract pixel orientations
def extract_orientations(image):
    Ix = filters.sobel_h(image)
    Iy = filters.sobel_v(image)
    orientations = np.arctan2(Iy, Ix)
    return orientations

# Load ASCII characters
def load_ascii_chars():
    # ASCII characters can be loaded from a predefined set or generated
    ascii_chars = ["@", "#", "S", "%", "?", "*", "+", ";", ":", ",", "."]
    return ascii_chars

# Calculate character scores
def calculate_char_scores(image, orientations, ascii_chars):
    scores = {}
    for char in ascii_chars:
        char_image = np.full(image.shape, ord(char), np.uint8)  # Simplified example
        char_orientations = extract_orientations(char_image)
        score = np.sum((orientations - char_orientations) ** 2)
        scores[char] = score
    return scores

# Place text based on scores
def place_text(image, scores):
    ascii_art = np.full(image.shape, ' ', dtype=str)
    height, width = image.shape
    for y in range(0, height, 10):  # Example stride
        for x in range(0, width, 10):
            min_score = float('inf')
            best_char = ' '
            for char, score in scores.items():
                if score < min_score:
                    min_score = score
                    best_char = char
            ascii_art[y:y+10, x:x+10] = best_char
    return ascii_art

# Main function
def main(image_path):
    image = load_image(image_path)
    edges = edge_detection(image)
    thinned = thinning(edges)
    orientations = extract_orientations(thinned)
    ascii_chars = load_ascii_chars()
    scores = calculate_char_scores(thinned, orientations, ascii_chars)
    ascii_art = place_text(thinned, scores)
    return ascii_art

# Example usage
image_path = '/path/to/image.jpg'
ascii_art = main(image_path)
for row in ascii_art:
    print(''.join(row))

@Bedrovelsen
Copy link

More so focused on the edge thinning than the ascii part

@hpjansson
Copy link
Owner

Sweet! How's the output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request quality Output quality research Research & discussion
Projects
None yet
Development

No branches or pull requests

4 participants