Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to keep a filesystem hierarchy for pages #686

Closed
pachi opened this issue Jan 18, 2013 · 16 comments
Closed

Add option to keep a filesystem hierarchy for pages #686

pachi opened this issue Jan 18, 2013 · 16 comments

Comments

@pachi
Copy link

pachi commented Jan 18, 2013

The usecase is the following: being able to migrate a website with an existing directory structure.
The directory hierarchy can't be changed for SEO and other reasons.

For instance, the site has a file /news/recent/programming/newsweek20100312.htm and it could be processed as a pelican page, but currently it isn't possible to tell that the folder structure in /content/pages/ is preserved so the /news/recent/programming hierarchy could be achieved.

As a hack, I could use the category metadata changing the PAGE_SAVE_AS and PAGE_URL settings, but this only gets one more nesting level.

Proposal: Add a PRESERVE_CONTENT_HIERARCHY (= True) directive so the directory structure in content/pages/ is replicated when building pages.

@wking
Copy link
Contributor

wking commented Jan 18, 2013

On Fri, Jan 18, 2013 at 09:09:49AM -0800, Pachi Burke wrote:

Proposal: Add a PRESERVE_CONTENT_HIERARCHY (= True) directive so the
directory structure in content/pages/ is replicated when building
pages.

In #671 (which I'm rebasing against master right now), I add
PATH_METADATA (3202b881f18bdcd0acbdb98e94128c89f754ef08) and a
{path} formatting variable
(dba62effdeffca1ae1784652b25a4736bf69a3bc), both of which address this
sort of issue.

@justinmayer
Copy link
Member

I believe this was merged in bfa8851. If that doesn't address your use case, @pachi, please feel free to comment here and we'll re-open the issue.

@pachi
Copy link
Author

pachi commented Mar 22, 2013

I'm really new to Pelikan and I'm not sure I get it right, so,
Would setting PATH_METADATA = '(?P.*)|/' set the path of my files as wanted?
Should I use it alongwith :save_as: {path}/{slug}.htm in my page metadata or is that only for overrides?
Thanks for your help and working on this!

@wking
Copy link
Contributor

wking commented Mar 22, 2013

On Fri, Mar 22, 2013 at 12:37:13PM -0700, Pachi Burke wrote:

I'm really new to Pelikan and I'm not sure I get it right, so, Would
setting PATH_METADATA = '(?P.*)|/' set the
path of my files as wanted? Should I use it alongwith :save_as:
{path}/{slug}.htm in my page metadata or is that only for overrides?
Thanks for your help and working on this!

For source files whose relative path should not be changed, for example:

path/to/source/a/b/c.html → path/to/output/a/b/c.html

I use:

'STATIC_URL': '{path}',
'STATIC_SAVE_AS': '{path}',

This works on a content-class level (e.g. for all Static pages, see
4ff0d18). There are analagous config settings for other content
classes. You don't need to extract the path from the path using
PATH_METADATA, it's added to the formatting metadata by hand (in
49bf80e).

@pachi
Copy link
Author

pachi commented Mar 22, 2013

Thanks for commenting, wking!
So, for my case (no articles so far, just pages), I'd just add the following settings:
'STATIC_URL': '{path}',
'STATIC_SAVE_AS': '{path}',
'PAGE_URL' = '{path}',
'PAGE_SAVE_AS':'{path}'
and be done.
Great!

@wking
Copy link
Contributor

wking commented Mar 22, 2013

On Fri, Mar 22, 2013 at 01:01:30PM -0700, Pachi Burke wrote:

So, for my case (no articles so far, just pages), I'd just add the
following settings:
'STATIC_URL': '{path}',
'STATIC_SAVE_AS': '{path}',
'PAGE_URL' = '{path}',
'PAGE_SAVE_AS':'{path}'
and be done.

For the STATIC_* settings, those will be the defaults, so you
shouldn't have to do anything. You will need to set the PAGE_*
settings (as you propose). Note that this functionality is in the
currently unmerged #671. We've been nibbling away, getting bits of
#671 merged into master, but I don't know what the timeline is for
these particular features landing.

@pachi
Copy link
Author

pachi commented Mar 22, 2013

Ok. I'll be following Pelican's repo to keep an eye on this changes.
Thanks anyhow for your explanations and help.

@ssbarnea
Copy link

Can someone explain me how to obtain the behaviour below. I tried the documentation but it wasn't clear at all about how to obtain this.

I am using almost only pages, instead of articles, as most of the site is static. I do use nginx for serving the content and I alreadyc configure it to service pretty URLs like /contact which will return the page /contact.html from disk or /contact/index.html.

Now the trick is to convince pelican to deploy my pages in a hierarcy, as it seems that by default it does not do this. If I can telll it to keep the same hierarcy as the one from the content, the better.

Current config:

PAGE_URL = '{path}'
PAGE_SAVE_AS = '{path}.html'

This seems to keep the directory structure bug also the file extension, and I end-up with things like output/pages/contact.md.html - correct content (compiled html) but wrong extension and also wrong location, as I do want to publish the contact page inside /output/contact.html

Note, after I figure these out, I would like to document these, so others will be able to benefit from it.

I guess the trick is to play with these config options, but I wasn't able to find documentation regarding available variables that I can use inside these. So far I found (by chance) only the {slug} and {url} but not place with all options.

Thanks.

@adiroiban
Copy link

Hi,

I am not sure why is it so hard to fix this. I assume that main developers only use a flat based url scheme

I am trying to use the pelican to build a product website... rather than a blog... I still use the blog part as the 'news' section :)

I went for this hack inside my pelicanconf.py file ... with pelican 3.4.0

basically it just adds 'dirname': os.path.dirname(path_to_url(path)) to url_format

Maybe another option to fix this kind or problems, would be to provide some hook/callback to augment the metdata for each page/article

SLUGIFY_SOURCE = 'basename'

PAGE_SAVE_AS = '{dirname}/{slug}.html'
PAGE_URL = '/{dirname}/{slug}.html'

#
# Patch Content to expose a dirname placeholder
#
import copy
from pelican.contents import Content
from pelican.utils import path_to_url, SafeDatetime, slugify


def patch_Content_url_format_get(self):
    """Returns the URL, formatted with the proper values"""
    metadata = copy.copy(self.metadata)
    path = self.metadata.get('path', self.get_relative_source_path())
    default_category = self.settings['DEFAULT_CATEGORY']
    slug_substitutions = self.settings.get('SLUG_SUBSTITUTIONS', ())
    metadata.update({
        'path': path_to_url(path),
        'dirname': os.path.dirname(path_to_url(path)),
        'slug': getattr(self, 'slug', ''),
        'lang': getattr(self, 'lang', 'en'),
        'date': getattr(self, 'date', SafeDatetime.now()),
        'author': slugify(
            getattr(self, 'author', ''),
            slug_substitutions
        ),
        'category': slugify(
            getattr(self, 'category', default_category),
            slug_substitutions
        )
    })
    return metadata


Content.url_format = property(patch_Content_url_format_get)

@adiroiban
Copy link

btw, I think that this ticket should be reopened as of version 3.4.0 is no easy to preserve the filesystem hierarchy for pages

@avaris
Copy link
Member

avaris commented Jul 10, 2014

@adiroiban: PATH_METADATA should be sufficient to get what you want. To be precise:

PATH_METADATA= '(?P<dirname>.*)/(?P<basename>.*)\..*'
PAGE_SAVE_AS= '{dirname}/{basename}.html'
PAGE_URL= '{dirname}/{basename}.html'

for example:

In [14]: import pelican

In [15]: settings = pelican.settings.DEFAULT_CONFIG

In [16]: settings.update({'PATH_METADATA': '(?P<dirname>.*)/(?P<basename>.*)\..*',
    ...:                  'PATH': 'content',
    ...:                  'PAGE_SAVE_AS': '{dirname}/{basename}.html',
    ...:                  'PAGE_URL': '{dirname}/{basename}.html'})

In [17]: reader = pelican.readers.Readers(settings=settings)

In [18]: page = reader.read_file(base_path='content', 
    ...:                         path='some/nested/folder/test_page.rst', 
    ...:                         content_class=pelican.contents.Page)

In [19]: page.metadata
Out[19]: 
{'basename': 'test_page',
 u'category': <Category test>,
 'dirname': 'some/nested/folder',
 u'title': u'This is a test page'}

In [20]: page.save_as
Out[20]: 'some/nested/folder/test_page.html'

In [21]: page.url
Out[21]: 'some/nested/folder/test_page.html'

As for "provide some hook/callback to augment the metdata for each page/article", plugin system allows you to do that.

@adiroiban
Copy link

@adiroiban: PATH_METADATA should be sufficient to get what you want. To be precise:

PATH_METADATA= '(?P.)/(?P.)..*'

@avaris this will work, as long as I have no page in root.... your expression does not match test_page.rst

Trying PATH_METADATA = '(?P<dirname>.*/){0,1}(?P<basename>.*)\..*' will result in None as string :(

I will look into the plugin system in order to augment page metadata.

Thanks!

@avaris
Copy link
Member

avaris commented Jul 10, 2014

@adiroiban: ah, you're right. but since you're using dirname and basename together in the output too, this should work:

PATH_METADATA= '(?P<path_no_ext>.*)\..*'
PAGE_SAVE_AS= '{path_no_ext}.html'
PAGE_URL= '{path_no_ext}.html'

@adiroiban
Copy link

Thanks! Yet, that should do the trick, as long as I don't want a different file name... which I don't.
I have also tried the plugin system and it is great!

Just for reference:

from pelican import signals
from pelican.utils import path_to_url


class PelicanPlugin(object):
    """
    Extension for pelican system.
    """
    @classmethod
    def register(cls):
        """
        Entry point for pelican plugin system.
        """
        signals.content_object_init.connect(cls.on_content_initialized)

    @staticmethod
    def on_content_initialized(content):
        """
        Called when content (static or dynamic) is initialized.
        """
        path = path_to_url(
            content.metadata.get('path', content.get_relative_source_path()))
        dirname = os.path.dirname(path)
        if dirname:
            dirname += os.sep
        content.metadata['dirname'] = dirname

# Register our plugin.
PLUGINS = [PelicanPlugin]

@justinmayer
Copy link
Member

Since the answer to this question should be more visible, I documented the aforementioned solution via b8f2326.

@kno10
Copy link

kno10 commented Feb 8, 2023

This breaks the automatic translation system, unfortunately.

PATH_METADATA = '(?P<path_no_ext>.*?)([.-]\w\w)?\.[^.]+'
ARTICLE_TRANSLATION_ID = PAGE_TRANSLATION_ID = "path_no_ext"

strips two-letter language postfixes from the filename, so they get the same keys again. Buts its a bit hacky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants
@ssbarnea @pachi @adiroiban @wking @avaris @justinmayer @kno10 and others