Page MenuHome

New Wiki
Closed, ResolvedPublic

Description

Status:

Wiki Migration Plan:

  • Release Notes: Remove the copy of the old Release Notes and make direct links to the archived version. @Inês Almeida (brita_)
  • Add-ons: Make direct links to the archived version. This is a temporary working solution until there is a decision for new system in T54097. @Inês Almeida (brita_)
  • Technical Docs: Copy and organize Dev:Doc and Dev:Source @Inês Almeida (brita_) It's ongoing work to organize and update the wiki contents. But I consider it to be "ported" by now.
  • Licensing
  • Delete the old read-only PHP wiki. @Dan McGrath (dmcgrath)
  • [On hold] until:
    • it is no longer needed to copy the content source, and
    • there is the static HTML version.
---- Original task details below ------
NOTE: The new Wiki is in it's transitional phase atm.

It’s goal is much more limited than previous one, basically this is a
developer/module team members tool to share technical docs. It should
also have general help info like setting up a new build of blender, some
code architecture docs, etc.

Moving a subset of old wiki pages is on-going process.

Old wiki shall be locked (made read-only) in a few weeks, proposed date
is June 15th. That means that developpers (also including GSoC students)
should request accounts on new wiki if they do not have one already, and
move their pages there.

Note that once in read-only state, it will still be possible to access
and copy wiki-markup content of the pages!

Recently Dan set up a fresh install of Media Wiki and migrated the old pages to it. This can be found at https://en.blender.org

To set up an account (developers only) just informally poke a wiki administrator for now
[these are @Campbell Barton (campbellbarton), @Brecht Van Lommel (brecht), @Sybren A. Stüvel (sybren), @Dan McGrath (dmcgrath) atm]

There are still several todos

Theme

  • We should keep the theme as close to the original to make maintenance easy.

Clean Structure

  • Remove the wiki versions (2.4, 2.5, 2.6) and instead only have one that is alway the latest.
  • Make all pages independent. Pages like https://wiki.blender.org/index.php/Dev:Doc/Building_Blender/Windows/msvc/CMake are really confusing to edit.
  • Remove namespaces, so [[Building_Blender/Windows]] instead of [[Dev:Doc/Building_Blender/Windows]] (can be done after creation by "moving" the page)

Excluded from Wiki Migration

There are a couple of sections that mostly contain historic content and it is preferred to not port these over
(but this has to be checked carefully -- if in doubt, ask around... help in these decissions is welcome!)

What still needs to be done

  • Building
  • Tools
  • Developer Introduction
  • move Todo section (https://wiki.blender.org/index.php/Dev:Source/Development/Todo) to phabricator (parent) tasks. This way intern-linking (existing) phabricator TODO tasks is just more useful, too. As a first step, this has now been brought over in its current stats, see T55361 (and subtasks). These will now have to be cleaned up to remove "out-of-date" content.
    • Tools
    • Render
    • User Interface
    • Animation System
    • Game Engine (exclude this)
    • Editors
    • Breaking Backward Compatibility
    • Scripting
    • Import Export
    • Building Software
    • Installation, Environment
    • Regression Tests
    • User Based Todo (exclude this)
  • port Release Logs over (this is lower priority but does not require checking if content is out of date). For older releases up until 2.78 it was decided to just link to the old/archived wiki for now (lots of broken links / missing content / not subject to further editing anyways...).
    • Blender 2.80 (under development) note: https://wiki.blender.org/index.php/Dev:2.8/ hasnt been ported (yet?)
    • [real port] Blender 2.79 (latest stable release)
    • Blender 2.78
    • Blender 2.77
    • Blender 2.76
    • Blender 2.75
    • Blender 2.74
    • Blender 2.73
    • Blender 2.72
    • Blender 2.71
    • Blender 2.70
    • Blender 2.69
    • Blender 2.68
    • Blender 2.67
    • Blender 2.66
    • Blender 2.65
    • Blender 2.64
    • Blender 2.63
    • Blender 2.62
    • Blender 2.61
    • Blender 2.60
    • Blender 2.59
    • Blender 2.58
    • Blender 2.57
    • Blender 2.56
    • Blender 2.55
    • Blender 2.54
    • Blender 2.53
    • Blender 2.52
    • Blender 2.51
    • Blender 2.50
    • Blender 2.49
    • Blender 2.48
    • Blender 2.47
    • Blender 2.46
    • Blender 2.45
    • Blender 2.44
    • Blender 2.43
    • Blender 2.42
    • Blender 2.41
    • Blender 2.40

Details

Type
Design

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Here are some instructions which might help with static-isfying the old wiki: http://camwebb.info/blog/2012-12-20/

This all seems reasonable.

Regarding release notes, it's not entirely clear to me what is being proposed. Once the old wiki is archived to HTML we can redirect to that for old versions, and then afterwards remove the old versions from wiki.blender.org. I'm not sure if anyone will convert the wiki release notes of the current or future releases to blender.org though, that's a lot of work. If we want to have the release notes in another format for future release we better write it in that format directly, otherwise we probably just keep linking to the wiki.

One request regarding the new wiki I would like to see: explicit version numbers. The new wiki does not specify which version of Blender the articles are relevant to. Something I appreciate that the Python documentation does and the old wiki does is have a selectable version dropdown menu which allows you to make sure the version of the page you are reading is correct for the version you are working with, and select a different version and be immediately taken to the correct page. I've been hitting a lot of dead links lately trying to figure out which set of build instructions is correct for 2.7, which has been a bit difficult since the old wiki does not have anything labeled 2.7, but the instructions in the new wiki are specific to the latest builds of 2.8.

EDIT: I see that one of the goals of the changes is to remove version numbers and always have it up to date to the latest versions. I suppose I simply disagree with this for the above reasons.

@Benjamin Humpherys (brhumphe) The Blender User Manual does have versioning: https://docs.blender.org/manual/ . But you say compile? That shouldn't really change, it would be nice to have an easier explanation of how to work with branches, but otherwise, I don't think it needs versioning.

@Brecht Van Lommel (brecht)
Regarding release notes, I propose to delete everything under https://wiki.blender.org/wiki/Reference/Release_Notes except for the ones for 2.80.
This is because the copies don't look good and it would be too much work to make it better, without a clear benefit.
Examples:
https://wiki.blender.org/wiki/Reference/Release_Notes/2.49
Compare an original version with the copy. -> Images are missing. Click the first link "Cycles Rendering". Images are also missing and links are broken.

I thought that the wiki release notes were already converted to blender.org by @Pablo Vazquez (pablovazquez) or @Francesco Siddi (fsiddi) ?

@Benjamin Humpherys (brhumphe), the versioning in the old wiki was never actually used for developer docs. The individual history of pages can be used to go back to a specific date, however build instructions are actually identical for master (2.7) and blender2.8 branches.

@Inês Almeida (brita_), wiki release notes don't get converted, there is only a landing page on blender.org that links to the wiki pages. We can delete the release notes on wiki.blender.org once there is a way to redirect to the archive versions of those pages with images, if we delete them right now it breaks a lot of links.

@Brecht Van Lommel (brecht)
I don't know what's the process for the release notes after they're finished on the wiki.

I see an html version on blender.org. For example: https://www.blender.org/download/releases/2-74/
Clicking on a "Read More" indeed links to the wiki.
I experimented with making an explicit redirect, but redirects to an external link require an extension as they are seen as a security risk.

I came up with this instead: https://wiki.blender.org/wiki/Dev:Ref/Release_Notes/2.74/UI

We can either do that, a link that people would need to manually click, or install the extension.
I think the security issue is that anyone could edit a popular page to redirect to their own scam website or so?
We can make it so that only external links to the old wiki domain are allowed. I don't see a security problem with that. @Dan McGrath (dmcgrath)?

It's still a choice if we want an automatic redirect or an intermediate page so that the reader clearly understands that he is going from a beautifully maintained wiki into an archive.

Hi,

I tried to bake the old wiki as static HTML, but wget is disallowed in robots.txt. Also, it makes sense to run it locally/close to server, so somebody with local access should be able to run the bake command (after modification of robots.txt):

cd archive
wget --recursive --level=10 --page-requisites --adjust-extension --convert-links --no-parent -R "*Special*" -R "*action=*" -R "*printable=*" -R "*oldid=*" -R "*title=Talk:*" -R "*limit=*" "http://en.blender.org/index.php"

After that maybe check if something else listed in http://camwebb.info/blog/2012-12-20/ is still needed.

Hi Tuomo, if the user agent is the only thing breaking your bake command, you can change it:

wget --user-agent="BlenderBaker/1.0"

you can also convince wget to ignore robots.txt with -e robots=off

Thanks for advice, seems to work. So, will somebody do it locally, or should I give it a go next weekend?

@Tuomo Keskitalo (tkeskita)
I gave your command a quick try with --level=1 and it seems to work pretty well! (I don't have local access to the server)
Some links are going to the web version, but I think that's because those were excluded from the copy.

I played a bit with the rejection list and it looks possible to keep both the edit page and the history list (without the diffing), both of which are super useful.
I simply omitted -R "*action=*" , but then there's all sorts of other things to discard, such as "action=delete" and "action=unprotect".
Curiously, some of the rejected expressions didn't work for me. I still see the oldid=*
I wonder how much space do the rejected pages occupy and if it wouldn't be better to simply copy everything.

@Dan McGrath (dmcgrath)
Is it possible to unlock the old wiki for edits before the bake, in order to add a note saying it's the archive (as it was done when the manual was moved)?

The rejection list is mainly to prevent the process from purging all the pages and doing other crazy things. If some of the unwanted pages are getting sneaked in, you can always delete the file.

As for the note, is it possible to ensure all pages have it, without going and editing every single page?

Also, with all those months en.blender.org existed, is it really a place for an archive? Or we do break all the google search links to it once the archive is done?

I think that when the manual was moved, the note was added to namespaces and not per page.
Poking a random page, I see the warning, but not the source for it and also nothing in the history. -> https://en.blender.org/index.php?title=Doc:2.6/Tutorials
I don't know how it was done.

I don't think there should be an "en.blender.org" it has nothing that resembles "archived old wiki" on the title. It's just confusing, and it has been so for many months.
Search engine indexes will recover in another couple months? Hopefully they'll start pointing to the new wiki once that's a bit more composed?

We could put the archive on archive.blender.org/wiki. If needed we can still set en.blender.org to redirect to the new location.

If we need a message on all pages, I can insert some html directly into the php/theme code on the server.

We can either do that, a link that people would need to manually click, or install the extension.
I think the security issue is that anyone could edit a popular page to redirect to their own scam website or so?
We can make it so that only external links to the old wiki domain are allowed. I don't see a security problem with that. @Dan McGrath (dmcgrath)?

Sorry, I'm not exactly sure what you are asking about here. I am not exactly the best person to ask about MediaWiki features, let alone security related features.

nBurn (nBurn) added a comment.EditedNov 19 2018, 2:40 PM

One request regarding the new wiki I would like to see: explicit version numbers. The new wiki does not specify which version of Blender the articles are relevant to. <snip> I've been hitting a lot of dead links lately trying to figure out which set of build instructions is correct for 2.7, which has been a bit difficult since the old wiki does not have anything labeled 2.7, but the instructions in the new wiki are specific to the latest builds of 2.8.

This can be tricky to do with editable wiki docs. What would be your typical use case here? If this is only for building older versions of Blender, one option could be to start packing a copy of the build instructions on the wiki into the Blender source code when it get archived during a release. That way even if a certain build setup gets depreciated or undergoes significant changes, there will still be a instructions available for how the code was compiled without having to dig through the wiki's edit history or hoping there's a backup copy of the instructions somewhere on archive.org .

@Brecht Van Lommel (brecht)
Regarding release notes, I propose to delete everything under https://wiki.blender.org/wiki/Reference/Release_Notes except for the ones for 2.80.
This is because the copies don't look good and it would be too much work to make it better, without a clear benefit.
Examples:
https://wiki.blender.org/wiki/Reference/Release_Notes/2.49
Compare an original version with the copy. -> Images are missing. Click the first link "Cycles Rendering". Images are also missing and links are broken.

I don't think it would take that much work to port over what's left of the release notes pages to the new wiki. The main difference between the two URLs you listed is the missing images which should be a quick fix. Any special template formatting should not be a major problem either, just port the old templates to the new wiki or strip out the template formatting. The thing is if the old wiki is left online as static HTML this removes the main reason to port these pages over.

I would recommend keeping the 2.79 docs on the new wiki and not just the 2.80 info. My thinking is many users are likely to continue using 2.79 for a long long time after 2.80 is released to keep 3rd party add-ons working and because 2.80 will not have Blender Internal, BGE, or compatibility with pre OpenGL 3.3 hardware. If 2.79 retains a large user base, having editable wiki docs for it makes sense.

As for the note, is it possible to ensure all pages have it, without going and editing every single page?

If the pages are being flattened to HTML, this might be possible through the CSS, but web scripting is not something I have much familiarity with.

Also, with all those months en.blender.org existed, is it really a place for an archive? Or we do break all the google search links to it once the archive is done?

I don't think there should be an "en.blender.org" it has nothing that resembles "archived old wiki" on the title. It's just confusing, and it has been so for many months.
Search engine indexes will recover in another couple months? Hopefully they'll start pointing to the new wiki once that's a bit more composed?

I don't think moving the old wiki will be a major issue. Last time I checked, search engines still had broken indexing for the old wiki from the prior move from wiki.blender.org to en.blender.org. Most of the older pages seem to be indexed now, but they rarely show up in the first few pages of search results unless you use rather stringent search strings. Maybe as a temp fix there could be a placeholder page at the "en.blender.org" address for a while with a 404 page or something that points to whatever the archived wiki's new address will be.

@Inês Almeida (brita_) : Looks like you can specify pcre regexps for wget, does this what you want? (updated:)

wget --recursive --level=1 --page-requisites --adjust-extension --convert-links --no-parent --regex-type pcre --reject-regex "Special\:|action=(?!(history|edit))|printable=|oldid=|title=Talk\:|limit=" -e robots=off "http://en.blender.org/index.php"

I guess you also want to remove remaining links to en.blender.org?

This removes normal, possibly multiline links to https://en.blender.org
find . -type f -exec perl -i -pe 'BEGIN{undef $/;} s/\<a\ href=\"https\:\/\/en\.blender\.org.*?\>(.*?)\<\/a\>/$1/smg' {} \;

This removes <script> clauses that include string en.blender.org
find . -type f -exec perl -i -pe 's/\<(\w+?)\s.*?https\:\/\/en\.blender\.org.*?\<\/\w+\>//smg' {} \;

This removes <link> clauses that include string en.blender.org
find . -type f -exec perl -i -pe 's/\<.*?https\:\/\/en\.blender\.org.*?\>//smg' {} \;

@Brecht Van Lommel (brecht) can you run Tuomo's script and add the notice to the development part saying that it's archived?
Or if not, it's not exactly clear yet how the html bake will end up on the server :)

Between https://archive.wiki.blender.org and https://archive.blender.org/wiki
archive.blender.org already exists. /wiki has nothing in it at the moment, but I don't know if it could cause a problem?
Apart from that, I don't have a preference. :)

Meanwhile, I made a separate task for the structure of the content and landing page -> T57987

@Brecht Van Lommel (brecht) @Inês Almeida (brita_) : If you want I can test bake first to see that everything is ok during weekend, and put that available on my own web site for you to check? You could then reproduce my process locally.

If you could add the warning about "This wiki has been archived, new wiki is available at ..." to all pages before, that would be nice. Thanks!

@Tuomo Keskitalo (tkeskita) sounds good!
Thanks for helping out ^^

Status of the wiki statification bake: This is my current version of the wget command, which has been running now for 20 hours, collected 3.4G of files, and its not yet finished.. Not sure why it is so slow, but at least it has rejected 12 million URLs so far. I hope I can run it until it finishes..

wget --recursive --level=inf --page-requisites --adjust-extension --convert-links --regex-type pcre --reject-regex "Special\:|action=(?!(history|edit))|printable=|oldid=|diff=|limit=|redlink=1|\\\"" -e robots=off --rejected-log=wget.reject.log --no-verbose --limit-rate=500k "https://en.blender.org/index.php" 2>&1 | tee wget.log

Update 2018-11-26: Still not finished.

Update 2018-11-27: wget finished in 2d 5h. Proceeding now with string replacements. Also I found a somewhat nice way to inject warning to all pages, so no need to modify the source. More tomorrow.

Update 2018-11-28: Continuing string replacements finesse. Currently 9.6 GB of data files. Gonna take a while (tomorrow/Friday).

OK I managed to get first version of static wiki bake ready already today. I uploaded it temporarily to my web pages, please check the results here: http://tkeskita.kapsi.fi/temp/en.blender.org/ and compare with the original at https://en.blender.org/ and please let me know what you think. At least the tamsiteTree menus on the left panel are not working correctly, should I try to fix those?

I used the wget in previous comment and then run these bash commands to get that result:

(script updated 2018-11-30)


# Clean-up string replacements for old Blender wiki after running wget
# This removes normal, possibly multiline links to https://en.blender.org but leaves the text intact
find . -type f -regex ".*\.html$" -exec perl -i -pe 'BEGIN{undef $/;} s/\<a\ href=\"https\:\/\/en\.blender\.org.*?\>(.*?)\<\/a\>/$1/smg' {} \;
# This removes <script> clauses that include string en.blender.org
find . -type f -regex ".*\.html$" -exec perl -i -pe 's/\<(\w+?)\s.*?https\:\/\/en\.blender\.org.*?\<\/\w+\>//smg' {} \;
# This removes <link> clauses that include string en.blender.org
find . -type f -regex ".*\.html$" -exec perl -i -pe 's/\<.*?https\:\/\/en\.blender\.org.*?\>//smg' {} \;
# This adds warning about wiki having been archived after start of content
find . -type f -regex ".*\.html$" -exec perl -i -pe 's/\<\!--\ START\ content\ --\>/\<\!-- START content --\>\<div\>\<ul\>\<center\>\<font color\=\"red\" size\=\"+2\"\>Warning: This wiki has been archived. Current Blender wiki address is \<a href\=\"https\:\/\/wiki.blender.org\/wiki\/\"\>https\:\/\/wiki.blender.org\/wiki\/\<\/a\>\<\/font\>\<\/center\>\<\/ul\>\<\/div\>/' {} \;
# Change URLs in css files, replace old_blender_wiki_archive_url to correct archive URL
find . -type f -regex ".*\.css$" -exec perl -i -pe 's/en\.blender\.org/old_blender_wiki_archive_url/smg' {} \;
# Convert all remaining URLs to old_blender_wiki_url
find . -type f -exec perl -i -pe 's/https\:\/\/en\.blender\.org/old_blender_wiki_url/smg' {} \;
# Move x.css?270.css to x.css
mv skins/blender/main.css?270.css skins/blender/main.css
mv skins/blender/niftyCorners.css?270.css skins/blender/niftyCorners.css
mv skins/naiad/main.css?270.css skins/naiad/main.css
mv skins/common/commonPrint.css?270.css skins/common/commonPrint.css
mv skins/common/shared.css?270.css skins/common/shared.css
mv skins/monobook/main.css?270.css skins/monobook/main.css
# And remove references to %3F270.css in html files
find . -type f -regex ".*\.html$" -exec perl -i -pe 's/\?270\.css//smg' {} \;
# Move x.js?270 to x.js
mv skins/common/metadata.js?270 skins/common/metadata.js
mv skins/common/wikibits.js?270 skins/common/wikibits.js
mv skins/common/history.js?270 skins/common/history.js
mv skins/common/edit.js?270 skins/common/edit.js
mv skins/common/ajax.js?270 skins/common/ajax.js
mv skins/common/rightclickedit.js?270 skins/common/rightclickedit.js
mv skins/common/mwsuggest.js?270 skins/common/mwsuggest.js
# And remove references to %3F270.css in html files
find . -type f -regex ".*\.html$" -exec perl -i -pe 's/\.js\?270/\.js/smg' {} \;
# Copy index.php.html to index.html
cp index.php.html index.html

@Tuomo Keskitalo (tkeskita)
Is it possible to add the archived warning only on pages that are not part of the manual? Those already have a warning linking to the new Blender Manual.

I don't fully understand all the regex that is going on and what is their purpose. I would say that links from en.blender.org should be replaced by the archive domain, but never removed?

I see that the links to edit a page and see its history are not available. Is there a reason for removing them? I thought they were very useful.

When you mention that the tree menus at the left are not working, is it just the problem with the icons? Otherwise, it's working fine for me.

Hi @Inês Almeida (brita_),

Is it possible to add the archived warning only on pages that are not part of the manual? Those already have a warning linking to the new Blender Manual.

Yes if there is a string in manual page name it is possible to generate regex which catches and omits those. But wouldn't it be better to make all archived pages look similar?

I don't fully understand all the regex that is going on and what is their purpose. I would say that links from en.blender.org should be replaced by the archive domain, but never removed?

My reasoning here is that if you ensure that no links to en.blender.org remain in the static version, you can be sure the static version is self-contained (except for css files) and no dangling links to en.blender.org remain, so then static pages can be placed or moved anywhere without breaking internal links.

I see that the links to edit a page and see its history are not available. Is there a reason for removing them? I thought they were very useful.

I think they should be there? For example http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Doc:2.6/Manual/index.html if you click on "view source" on top right you get to http://tkeskita.kapsi.fi/temp/en.blender.org/index.php%3Ftitle=Doc:2.6%252FManual&action=edit.html and if you click on "history" you get http://tkeskita.kapsi.fi/temp/en.blender.org/index.php%3Ftitle=Doc:2.6%252FManual&action=history.html What page are you referring to specifically?

When you mention that the tree menus at the left are not working, is it just the problem with the icons? Otherwise, it's working fine for me.

Oh it is working even if there is no "+" icon. Well that might be OK then, I guess.

Warning Notice

Yes if there is a string in manual page name it is possible to generate regex which catches and omits those. But wouldn't it be better to make all archived pages look similar?

The manual pages already have a warning linking to the new manual, they shouldn't have another linking to the developer docs.
I think that the nicest thing to do is for each page, if it has the NiceTip that says "IMPORTANT! Do not update this page!", remove it entirely and add:

Warning: This wiki has been archived. The Blender User Manual has moved to a [[https://docs.blender.org/manual|new location]].

otherwise, add:

Warning: This wiki has been archived. The Blender Developer Wiki has moved to a [[https://wiki.blender.org|new location]].

See paste: P848
Example:

Regex for en.blender.org
Are you trying to convert absolute links to relative ones, then?

Page Source and History
You're right, everything looks fine. I don't know where I was looking the other day. I'll poke more pages this weekend :)

Icons for the navtree
Icons should be under /extensions/BlenderTreeAndMenu/img/.

Hi @Inês Almeida (brita_) ,

about regexes: Yes I'm trying to modify or get rid of unconverted URLs. When wget is finished, it converts only links that point to a page it has downloaded into relative path. For the rest it does nothing. This leaves a lot of references to en.blender.org in page sources. Here attached is example of https://en.blender.org/index.php/Doc:2.6/Reference/Nodes/Node_Editor produced by wget:

So, in order to make static pages work right, those URLs containing en.blender.org must be dealt with somehow.

I'll try to make a new conversion, including that modified treatment for NiceTip pages you requested.

I see the point of the regexes :) I would go for replacing them instead of removing, though!

Meanwhile, I notice that some links have numbers on them:
http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Dev:Ref/GSoC.1.html
http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Dev:Source/Development/Projects.1.html
This important as it actually breaks links. Do the links really need the .html at the end?

I also got this weird thing:
Last link on http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Dev:Ref/GSoC.1.html
has /2006 and it shouldn't (also on the navtree)
http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Dev:Ref/GoogleSummerOfCode/2016/Students.html

@Inês Almeida (brita_) :

Meanwhile, I notice that some links have numbers on them:

Hmm.. Looks like wget adds a number to filename, and also ".html" to the end, when it reads URL ending in "/" (a directory, it seems to assume). Internally no links should be broken, but I guess you mean that when you replace just the host part in the old URL to new static base path you get a broken link? So you would want a rewrite/redirection rule to work on converting old URLs to static wiki pages?

OK, this has been one perl of a weekend. X-|

I've now updated the second version of static old wiki pages to http://tkeskita.kapsi.fi/temp/en.blender.org/ including wishes by @Inês Almeida (brita_) for page top warnings (different for manual and other pages). I think that I managed to solve this ".1.html" issue by creating index.html from that file, so you can access pages like http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Dev:Ref/GoogleSummerOfCode so it should be possible to use rewrite/redirection rules to map pages to archive location. Can you find something still wrong?

Here is the updated script which is fast becoming very scary looking.

(updated 2018-12-05)


# Clean-up string replacements for old Blender wiki (en.blender.org) after running wget

# This removes possibly multiline <a href> links to https://en.blender.org but leaves the text intact
find . -type f -regex ".*\.html$" -exec perl -i -pe 'BEGIN{undef $/;} s/\<a\ href=\"https\:\/\/en\.blender\.org.*?\>(.*?)\<\/a\>/$1/smg' {} \;
# This removes <script> clauses that include string en.blender.org
find . -type f -regex ".*\.html$" -exec perl -i -pe 's/\<(\w+?)\s.*?https\:\/\/en\.blender\.org.*?\<\/\w+\>/\<\!--_removed_site_clause_--\>/smg' {} \;
# This removes <link> clauses that include string en.blender.org
find . -type f -regex ".*\.html$" -exec perl -i -pe 's/\<.*?https\:\/\/en\.blender\.org.*?\>/\<\!--_removed_link_clause_--\>/smg' {} \;

# This monster replaces NiceTip section in manual pages with a simple archival warning. This also changes "<!-- START content -->" so that next replacement does not process these pages twice
find . -type f -regex ".*\.html$" -exec perl -i -pe 'BEGIN{undef $/;} s/\<\!--\ START\ content\ --\>\s*\<div\>\<table\ class\=\"NiceTip\"\>.*?IMPORTANT\!\ Do\ not\ update\ this\ page\!.*?documentation\ project.*?\<\/div\>/\<\!--_START_manual_page_content_--\>\<div style\=\"padding\:15px\;color\:red\;font-size\:22px\;\"\>\<center\>Warning\: This wiki has been archived. The Blender User Manual has moved to \<a href\=\"http\:\/\/www\.blender\.org\/manual\"\>a new location\<\/a\>\.\<\/center\>\<\/div\>/smg' {} \;

# Then non-manual related pages: This adds different warning about wiki having been archived after "<!-- START content -->" and changes "<!-- START content -->"
find . -type f -regex ".*\.html$" -exec perl -i -pe 's/\<\!--\ START\ content\ --\>/\<\!--_START_content_modified_--\>\<div style\=\"padding\:15px\;color\:red\;font-size\:22px\;\"\>\<center\>Warning\: This wiki has been archived. The Blender Developer Wiki has moved to \<a href\=\"https\:\/\/wiki\.blender\.org\"\>a new location\<\/a\>\.\<\/center\>\<\/div\>/' {} \;

# Change URLs in css files, replace blender_wiki_archive_css_path with correct css root path
find . -type f -regex ".*\.css$" -exec perl -i -pe 's/en\.blender\.org/blender_wiki_archive_css_path/smg' {} \;

# Convert all remaining mentions of en.blender.org to old_blender_wiki_url
find . -type f -exec perl -i -pe 's/https\:\/\/en\.blender\.org/old_blender_wiki_url/smg' {} \;

# 270-cleanup for css files: Rename x.css?270.css to x.css
mv skins/blender/main.css?270.css skins/blender/main.css
mv skins/blender/niftyCorners.css?270.css skins/blender/niftyCorners.css
mv skins/naiad/main.css?270.css skins/naiad/main.css
mv skins/common/commonPrint.css?270.css skins/common/commonPrint.css
mv skins/common/shared.css?270.css skins/common/shared.css
mv skins/monobook/main.css?270.css skins/monobook/main.css
# Remove "?270.css" from "x.css?270.css" in html files
find . -type f -regex ".*\.html$" -exec perl -i -pe 's/\%3F270\.css//smg' {} \;

# 270-cleanup for js files: Rename x.js?270 to x.js
mv skins/common/metadata.js?270 skins/common/metadata.js
mv skins/common/wikibits.js?270 skins/common/wikibits.js
mv skins/common/history.js?270 skins/common/history.js
mv skins/common/edit.js?270 skins/common/edit.js
mv skins/common/ajax.js?270 skins/common/ajax.js
mv skins/common/rightclickedit.js?270 skins/common/rightclickedit.js
mv skins/common/mwsuggest.js?270 skins/common/mwsuggest.js
# Remove "?270" from "x.js?270"
find . -type f -regex ".*\.html$" -exec perl -i -pe 's/\.js\%3F270/\.js/smg' {} \;

# Copy index.php.html to index.html
cp index.php.html index.html
# Populate also index.php/index.html to decrease load for requests to /index.php/
cp index.php/Doc.html index.php/index.html

# Create index.html files from "x.1.html" files, and add "../" to all relative href and src paths, and finally also to beginning of css style @import clause
for f in `find . -type f | grep "\.1\.html$"`; do dest=`echo $f | perl -pe 's/(.*)\.1\.html/$1\/index.html/'`; echo "Processing $f --> $dest"; cat "$f" | perl -pe 's/href\=\"((?!(\/|http\:|https\:)).*?)\"/href\=\"..\/$1\"/g' | perl -pe 's/src\=\"((?!(\/|http\:|https\:)).*?)\"/src\=\"..\/$1\"/g' | perl -pe 's/CDATA\[\*\/\ \@import\ \"/CDATA\[\*\/\ \@import\ \"..\//'> "$dest"; done

# Create x/index.html from ../x.html files for those directories that do not already have index.html
find . -type d > dirlist.all.txt
cat dirlist.all.txt | perl -pe 's/^\.\/uploads.*\n//' | perl -pe 's/^\.\/skins.*\n//' > dirlist.txt
for f in `cat dirlist.txt`; do
if [ ! -f "$f/index.html" ]; then
bname=`basename $f`;
if [ -f "$f/../$bname.html" ]; then
echo "Processing $f/../$bname.html --> $f/index.html";
cat "$f/../$bname.html" | perl -pe 's/href\=\"((?!(\/|http\:|https\:)).*?)\"/href\=\"..\/$1\"/g' | perl -pe 's/src\=\"((?!(\/|http\:|https\:)).*?)\"/src\=\"..\/$1\"/g' | perl -pe 's/CDATA\[\*\/\ \@import\ \"/CDATA\[\*\/\ \@import\ \"..\//'> "$f/index.html";
else
echo "NO MAP FILE FOR $f";
fi;
else
echo "EXISTS ALREADY, skipping: $f/index.html";
fi;
done;

# Create web list of all html pages (excluding "&action=" pages)
find . -type f -regex ".*\.html$" > pagelist.all.txt
egrep -v "\&action\=" < pagelist.all.txt > pagelist.txt
cat pagelist.txt | perl -pe 's/\%/\%25/g' | perl -pe 's/\?/\%3F/g' | perl -pe 's/\:/\%3A/g' | perl -pe 'print $. + " "; s/^(.*)$/\<a href\=\"$1\"\>$1\<\/a\>\<br\>/g' > pagelist.html

# Create web list of all directory pages (except /uploads and /skins)
find . -type d > dirlist.all.txt
cat dirlist.all.txt | perl -pe 's/^\.\/uploads.*\n//' | perl -pe 's/^\.\/skins.*\n//' > dirlist.txt
rm -f dirlist.html
for f in `cat dirlist.txt`; do
if [ ! -f "$f/index.html" ]; then
echo $f | perl -pe 's/\%/\%25/g' | perl -pe 's/\?/\%3F/g' | perl -pe 's/\:/\%3A/g' | perl -pe 's/^(.*)$/\<a href\=\"$1\"\>$1\<\/a\>\ -\ no\ index\.html\<br\>/g' >> dirlist.html;
else
echo $f | perl -pe 's/\%/\%25/g' | perl -pe 's/\?/\%3F/g' | perl -pe 's/\:/\%3A/g' | perl -pe 's/^(.*)$/\<a href\=\"$1\"\>$1\<\/a\>\<br\>/g' >> dirlist.html;
fi
done

And you wonder why I never wanted to open up this can of worms ;)

I wonder though, since you seem to be so interested in converting the site, why you don't just import the wiki dump files that I put online into your own MediaWiki installation, and then code up a php file to invoke the MediaWiki parsing routines, or something similar, so that you end up with simple pages. I mean, you are putting all of this time into parsing out the CSS anyway.

Anyway, just a thought, in case you weren't aware the page dump was available (technically, the SQL can be made available after some sensitisation, if needed).

I tried the mediawiki page dump, after working through a bunch of failures and no end in sight I gave up. Since we're so close to getting it working this way there's no point anyway.

I am very very happy that someone was willing to open the can of worms, since it needed to be done. Thanks @Tuomo Keskitalo (tkeskita) !

My issues with the .1 is exactly that it doesn't allow me to simply replace the domain part, as the .1 may or may not exist. If the .html always exists, that's ok though.
Example usage "See if it exists in the Wiki Archive" in https://wiki.blender.org/wiki/A_Very_Special_Page

I'll let you know if I find more problems! (in a quick skim I only see the missing icons in the navtree)

I tried the mediawiki page dump, after working through a bunch of failures and no end in sight I gave up. Since we're so close to getting it working this way there's no point anyway.

Oh for sure. And there will always be the DB and page dump archive for some future civilisation that wants to reconstruct the wiki :D I keep such archives around in the file system as long as I can help it. I still have the old bug tracker around here somewhere *blows dust off some boxes*.

Anyway, poke if there is anything needed server side! Good job so far! o/

Thanks everyone, it is nice and motivating to know that your work is being appreciated! :)

I noticed that after the .1.html conversion there were still many directories without index.html. I generated index.html for those whose basename.html file could be found in the parent directory, now updated to my temporary web location.

I also generated lists, which may help searching from page names:

  • directories (8 073 pcs) in the static wiki: text file and html page. Note: list excludes /uploads and /skins. Also the HTML page list additionally indicates if directory contains no index.html.
  • html files (61 100 pcs) in the static wiki: text file and html page. Note: These don't include edit or history pages. Every html file included the total is 167 290 (9,9 GB of disk space, says du).

Some page names contain special characters, which are not converted correctly (broken link).

I'll update the script in the previous comment according to current process.

Icons should be under /extensions/BlenderTreeAndMenu/img/

wget did not populate that directory, and when I try to access https://en.blender.org/extensions/BlenderTreeAndMenu/img/ I get 403 Forbidden..? @Dan McGrath (dmcgrath) ?

@Inês Almeida (brita_):
I also got this weird thing:
Last link on http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Dev:Ref/GSoC.1.html
has /2006 and it shouldn't (also on the navtree)
http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Dev:Ref/GoogleSummerOfCode/2016/Students.html

I'm sorry I don't follow, can you please clarify?

Icons should be under /extensions/BlenderTreeAndMenu/img/

wget did not populate that directory, and when I try to access https://en.blender.org/extensions/BlenderTreeAndMenu/img/ I get 403 Forbidden..? @Dan McGrath (dmcgrath) ?

I turned indexing on for that directory, for you. Try again.

In general, if something is not linked from anywhere I the site, you probably don't need to worry too much about it. Besides, that's the directory where we keep all of the secret blender documents! ;)

Thanks, I got the icons and put them to both to /extensions/TreeAndMenu/img and /extensions/BlenderTreeAndMenu/img but the navtree on left still does not show them. This is now going over my web skills. Maybe this is caused by my removal of these lines (below) in the source. If somebody knows about this, please suggest replacement strings.


<script type="text/javascript" src="/index.php?title=-&amp;action=raw&amp;gen=js&amp;useskin=naiad"><!-- site js --></script>
<link rel="stylesheet" type="text/css" href="/index.php?title=MediaWiki:Common.css&amp;action=raw&amp;ctype=text/css&amp;smaxage=18000"/>
<link rel="stylesheet" type="text/css" href="/index.php?title=MediaWiki:Naiad.css&amp;action=raw&amp;ctype=text/css&amp;smaxage=18000"/>

@Inês Almeida (brita_):
I also got this weird thing:
Last link on http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Dev:Ref/GSoC.1.html
has /2006 and it shouldn't (also on the navtree)
http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Dev:Ref/GoogleSummerOfCode/2016/Students.html

I'm sorry I don't follow, can you please clarify?

If you go to http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Dev:Ref/GSoC.1.html and check the very last link 'Getting Started' it doesn't match the target of the same link on https://en.blender.org/index.php/Dev:Ref/GoogleSummerOfCode
I can see that the target page https://en.blender.org/index.php/Dev:Ref/GoogleSummerOfCode/Getting_Started was moved. Maybe that is what is causing the issue.

Icons
I don't see why you'd remove the naiad css or javascript. I'd add those 3 lines back and everything should be fine :)

Hi @Inês Almeida (brita_),

If you go to http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Dev:Ref/GSoC.1.html and check the very last link 'Getting Started' it doesn't match the target of the same link on https://en.blender.org/index.php/Dev:Ref/GoogleSummerOfCode
I can see that the target page https://en.blender.org/index.php/Dev:Ref/GoogleSummerOfCode/Getting_Started was moved. Maybe that is what is causing the issue.

Thanks for clarification. The reason why wget has changed the target is that the page https://en.blender.org/index.php/Dev:Ref/GoogleSummerOfCode/2016/Students for some reason redirects to https://en.blender.org/index.php/Dev:Ref/GoogleSummerOfCode/Getting_Started . Wget has logically merged these two targets, and therefore http://tkeskita.kapsi.fi/temp/en.blender.org/index.php/Dev:Ref/GoogleSummerOfCode/Getting_Started does not exist. Unfortunately I can't think of any easy way to find out these. @Brecht Van Lommel (brecht) or @Dan McGrath (dmcgrath) : Is it possible to get only redirections out of mediawiki dumps? Otherwise this is impossible to fix except manually whenever a missing target is found.

I don't see why you'd remove the naiad css or javascript. I'd add those 3 lines back and everything should be fine :)

Because the static wiki pages has no MediaWiki php. Wget did not convert those three URLs that contain index.php strings into anything. I need to have something as a text file I can point to in those src and href strings.

The icon problem does not seem to be related to those three removed lines I mention in T48096#571067:

The "&amp;" in those three URLs seem to break the links, and when replaced with "&" I now got what they actually return: The first one is the only one which actually returns something:

var skin = 'naiad';
var stylepath = '/skins';

I added those two lines to end of /skins/common/wikibits.js, which should be loaded just before that script line.

Unfortunately it does not fix this missing icon issue. Anyone with javascript skills who could help? Or we can just make do without the navtree working perfectly in the static wiki archive.

Hi Tuomo. First off, great work with the wiki baking. Thank you for your help with this!

Regarding the missing icons, currently they are referred to as :

<img id="jtamsiteTree1" src="/extensions/BlenderTreeAndMenu/img/plus.gif" alt="">

Which means that they are expected to be in at the root of http://tkeskita.kapsi.fi.
Currently they are at http://tkeskita.kapsi.fi/temp/en.blender.org/.

Since the plan is to move the baked wiki to https://archive.blender.org/wiki, this problem will persist. I would recommend to perform a search and replace on /extensions/BlenderTreeAndMenu/img/ and turn it into /wiki/extensions/BlenderTreeAndMenu/img/. That should solve the issue when the site is moved in the proper place.

@Francesco Siddi (fsiddi) : Thanks, I'll do that! I briefly tested that it indeed seems to fix the icon issue.

So the biggest issue still standing is the missing of some redirected directories. @Dan McGrath (dmcgrath) : Where is this mediawiki xml dump? I could maybe take a look at it. Thanks!

@Francesco Siddi (fsiddi) : Thanks, I'll do that! I briefly tested that it indeed seems to fix the icon issue.
So the biggest issue still standing is the missing of some redirected directories. @Dan McGrath (dmcgrath) : Where is this mediawiki xml dump? I could maybe take a look at it. Thanks!

A few comments above in those 50 page long thread :)

https://developer.blender.org/T48096#522847

Thanks, found em! There was a file which seems to list each page and links from it to other pages, which I hope I can use. Currently it looks mildly painful though, some relative path acrobatics is needed..

Uh oh, there are some lonesome island pages in the old wiki which are never linked to --> recursive wget did not find them. I need to do new retrieval with this pagelist. Maybe no need for acrobatics.. but this will take time.

Does the wiki log whenever someone tries to visit a missing page? That would be a good way to identify broken links.

@Benjamin Humpherys (brhumphe) I don't know about mediawiki, but at least www server logs errors. Good idea to use those to make the final check, when I get there!

My current idea is to make a new retrieval with wget, but instead of trying to use recursive retrieval from root, I'll retrieve each page listed in current.dump.xml separately along with it's talk, history and edit pages. Then do similar parsing of URLs as before and see how many broken links I end up with and try to fix those. Hopefully I end up with a complete replica of the old wiki. Continuing with tests, gonna take a while.

Does the wiki log whenever someone tries to visit a missing page? That would be a good way to identify broken links.

In does indeed. I have attached a log of the 404's from December 2018 for you. Note that they also contain junk; filter as needed.

Hi,

  1. The old wiki contains deleted pages, such as this, which redirects to here which is deleted. wget does not want to download such pages. Is it OK to leave these pages out, or should I try to find some way to get also them in? My script is collecting a list of these failed pages, let's see how long it gets.
  2. The old wiki contains pages with non-ascii characters in their page names, which at least my ext4 filesystem does not like to use correctly as directory name, such as this. I'm gonna convert all special characters in page names into %XX hexadecimal representation, using wget option --restrict-file-names=ascii. That way the pages will at least be downloaded, but I think they will not be found if you just try to access the static wiki like this, instead they would be accessible like this which makes it impossible to read. This affects only the page name, the page contents is still intact. Is this compromise acceptable? This affects only non-english pages.
  3. Just realized that the static wiki URL for "Page" would be "https://archive.blender.org/wiki/index.php/Page, so the base path is /wiki/index.php and not just /wiki. Is that OK?
  4. @Dan McGrath (dmcgrath) : My script has stopped running a few times, wget says "Unable to establish SSL connection." When I try to continue, the page is loaded without errors. Any ideas what is causing this? I'm not choking the server, am I?
  1. @Dan McGrath (dmcgrath) : My script has stopped running a few times, wget says "Unable to establish SSL connection." When I try to continue, the page is loaded without errors. Any ideas what is causing this? I'm not choking the server, am I?

There are PF (firewall) related limits that kick in if you make too many connections. It auto lifts after a few minutes, so no worries. I don't think we have anything configured in the Wiki Apache itself atm. If www.b.o is also not working for you, but git.blender.org is, it's a good indicator that you were blocked by the firewall. Try slowing down your crawl, otherwise, don't worry about it.

I was browsing the scraped copy last week, and bumped into a searchbox somewhere, we'd probably have to get rid of those otherwise users will use them and make tickets that it is not working.

I was browsing the scraped copy last week, and bumped into a searchbox somewhere, we'd probably have to get rid of those otherwise users will use them and make tickets that it is not working.

I guess you mean the quick search on lower left corner? Good idea! I'll also remove the "Page" menu from top right and "Wiki" menu from bottom right.

  1. The old wiki contains deleted pages, such as this, which redirects to here which is deleted. wget does not want to download such pages. Is it OK to leave these pages out, or should I try to find some way to get also them in? My script is collecting a list of these failed pages, let's see how long it gets.

Here is list of these failed pages accumulated so far. Retrieval per page is slow, so this will take maybe two more weeks until next version of static pages are ready for inspection.

I was browsing the scraped copy last week, and bumped into a searchbox somewhere, we'd probably have to get rid of those otherwise users will use them and make tickets that it is not working.

I guess you mean the quick search on lower left corner? Good idea! I'll also remove the "Page" menu from top right and "Wiki" menu from bottom right.

Any History page has a bunch of broken interaction as well.

I've been wondering about history pages. After pages are static it will not be possible to do comparisons between versions, so you will only see who edited the page last and when. Is it still interesting to save history page? @Inês Almeida (brita_)?

If yes, I can strip away the buttons etc. from history pages, too, and leave only the list. Is it enough to save only the first history page which contains max 50 latest edits? Currently my non-recursive script retrieves only first history page.

I wouldn't worry about saving the full edit history. I'm not sure how you could preserve that without a significant increase to the total size and file count of the wiki archive. It might be helpful to see who the last 50 or 100 people that edited the page are (and maybe the first 50 edits), but if someone needed more info than that they could download the wiki data dump.

Agree, I wouldn't worry too much about fixing every broken link. It's nice if it doesn't take too much time, but for an archived website it's ok if only the basics are functional.

Ok, new version of archived wiki is available temporarily for review at http://tkeskita.kapsi.fi/temp/en.blender.org/

  • Page list available in text and html formats. Pages now include also deleted pages and redirections to deleted pages. Page count is 53173 53209 out of 53221, and all page directories listed in the XML dump have index.html, so I think it is now pretty much complete.
  • Includes index.html, edit.html and history.html pages for each wiki page. History page lists date and username of 50 latest edits.
  • Page names may contain underscores instead of spaces, and they should both be found. This was done by creating directory symlinks like "Release_Notes" that point to "Release Notes". Is this OK?
  • All links to en.blender.org in pages have been converted into relative paths, so this can be freely placed in any directory and links should work. Since the whole script size is now 350 lines, the current bash script is attached here (updated version 2019-01-04):

I fixed a ton of my errors along the way, so I'm pretty sure somewhere there is still something wrong, but I'm now pretty happy with the result. Can anyone still find something wrong?

Update 2019-01-01: Retrieved deleted pages again, fixed statify.sh and whole packet accordingly.

Download link for the whole packet: http://tkeskita.kapsi.fi/temp/en.blender.org.tar.bz2

Update 2019-01-03: Updated few missing pages, page count from 53173 to 53209.
Update 2019-01-04: Updated pagelist accordingly.

I would keep all the deleted pages and their redirects. This would avoid broken links.
I don't think that the asccii character issue is is a good compromise as this will result in broken links? Or maybe I misunderstood something.

Imo, the broken functionality in the history and wiki menu is ok. It's an archive. I would definitely keep the history pages with the list of edits, which includes date, author and description. I agree that the last 50 edits should be enough.

Page names may contain underscores instead of spaces, and they should both be found.

If I type either "Release_Notes" or "Release Notes" in the URL both will work? This sounds fine.

Do you know what is missing so that 53173 is not exactly 53209?

Hi!

I would keep all the deleted pages and their redirects. This would avoid broken links.

Deleted pages are there, but without redirects. This means that, for deleted pages, you get the same view as you would get for it now in the mediawiki, but the page name remains original. If you then go to edit page, you see the redirection if that is needed. OK?

I don't think that the asccii character issue is is a good compromise as this will result in broken links? Or maybe I misunderstood something.

Yes the character issue does result in links from outside to archived wiki being broken, but I don't how this could be made better with static pages. Any ideas? At least the pages are there, although their name is not readable..

Imo, the broken functionality in the history and wiki menu is ok. It's an archive. I would definitely keep the history pages with the list of edits, which includes date, author and description. I agree that the last 50 edits should be enough.

OK so I think that's how it is now. Please check though.

Do you know what is missing so that 53173 is not exactly 53209?

No. It's like trying to find needle in haystack. I've been able to decrease count to 53209/53218. It is related to some special character(s) most likely, so that two page names evaluate to same path.

There seems to be some weirdness going on if wiki user's created subpages without a base page.

Compare these 2:

Despite the unusual look of the second page, I think having a "fallback base page" in this circumstance is nice as it makes it easier to find subpages. I suspect, in some cases, making subpages harder to find may have been intentional on part of the page authors, but I'm not sure it makes sense to leave those pages "hidden" anymore.

On a different note, would it be worth archiving some of the "wiki special pages" from the old wiki? Two of the special pages I've often used for cross-referencing appear to have been skipped:

The "what links here" pages:
https://en.blender.org/index.php/Special:WhatLinksHere/Dev:Doc/Blender_Source/Files_Structure

And recent user contributions:
https://en.blender.org/index.php?title=Special:Contributions&limit=100&target=NBurn

These pages can be helpful when trying to find more info related to an article, but adding them back could significantly increase the size of the wiki archive.

Lastly, I'm not sure if the "warning" part is needed in the disclaimer at the top. Maybe:

Note: This is an archived version of the Blender Developer Wiki. The current wiki is located here.

And have a new (non-google) link attached to "archived version" instead of just the one to the newer wiki at the end. I was thinking possibly a brief FAQ type page, maybe something like:

This is an older version of the wiki that was archived to preserve information about previous versions of Blender. This older wiki served as a general user manual for Blender, as an add-on repository, as a developer guide for working with Blender's source code, and as a guide for developing add-ons. With later versions of Blender much of this info was outsourced into the Blender manual and docs sites. With the release of 2.80 a decision was made to limit the scope of the wiki and drop the non-developer related resources as this info was being maintained elsewhere. (etc)

Thoughts?

Yes, the only special pages currently included are the edit and history pages. Technically, all those specials (non-existing parent page, what links here, recent user contributions for User:X pages) can be added, if someone else also seconds this motion? "What links here" and "recent user contributions" could be retrieved to e.g. links.html and contributions.html located under page folders. I don't think they would add too much to archive size, it would just take me time to do it.

But first I need to get some sort of approval for the current retrieve+processing result, so gonna wait for a while.

Hi, I'm now contemplating to do another full retrieve, so this would be a perfect time to make decisions about suggestions from @nBurn (nBurn) in T48096#594983. Anyone else also want those changes?

I've been digging into this special characters issue, and it seems that wget option --restrict-file-names=nocontrol might preserve special characters correctly, so there would be no need for transcribing page names to hexadecimals (%XX). However, this means that I would need to do third full retrieve from scratch. Last time it took 10 days(!), which is slooow. As a workaround, I'm thinking to set up a local Apache + Mediawiki installation for myself where I try to restore the old mediawiki dump , and retrieve from there. @Dan McGrath (dmcgrath) Is MediaWiki 1.16.2 correct version to use? Maybe add the Mediawiki version to README.txt in the mediawiki dump?

It looks like old Mediawiki doesn't want to work nicely with php7. Also there's no guarantee that the result will look anything like the current one, so I guess its gonna be a new 10 d crawl..

Update 2019-01-15: New crawl started including what-links-here.

Third retrieve is proceeding (slowly), and in ~4 days it should be finished. At that time (before running html processing for all pages) I need feedback about suggestion in T48096#594983 by @nBurn (nBurn) to change warning text (and add link to an explanatory page). @Inês Almeida (brita_), @Francesco Siddi (fsiddi), @Aaron Carlisle (Blendify), anyone: do you agree? What URL to add for explanatory page? Some page in new wiki? Or better to leave warning as it is now?

I decided already to include other suggestions by @nBurn (nBurn).

I would recommend having a clear message stating that the wiki contains legacy information and to visit wiki.blender.org for the active wiki documentation. Something along the lines of what @nBurn (nBurn) suggested.

This is an archived version of the Blender Developer Wiki. The current and active wiki is available on wiki.blender.org.

Thanks for your work!

OK, I'm glad to inform you that the static old wiki baking is finished. The current version is uploaded temporarily to http://tkeskita.kapsi.fi/temp/en.blender.org and download package is available in http://tkeskita.kapsi.fi/temp

  • Retrieval and processing is documented in statify.sh bash script.

  • All links to en.blender.org have been converted to relative paths, so archive should be fully stand-alone and can be placed freely wherever, and the root folder (en.blender.org) can be renamed.
  • Non-existing parent pages were retrieved as well as normal wiki pages
  • Adapted warning text on page top according to suggestion from @Francesco Siddi (fsiddi), but retained link to manual for manual-related pages like @Inês Almeida (brita_) wished previously.
  • UTF-8 special characters in page names are now preserved in directory names.
  • Page list is located at the root
  • Each wiki page includes index.html, edit.html, history.html, and links.html (What Links Here). Links to those are located on top right of each page.
  • User pages (User:username) additionally include contributions.html

Please let me know if you find something wrong. Thanks!

It looks great, I found no problems. Thanks a lot for doing this.

I've copied the whole thing to https://archive.blender.org/wiki/.

Now it's just a matter of setting up a redirect from en.blender.org to archive.blender.org/wiki and taking the old wiki offline (I guess we keep a backup).

@Dan McGrath (dmcgrath), can you do that?

Now it's just a matter of setting up a redirect from en.blender.org to archive.blender.org/wiki and taking the old wiki offline (I guess we keep a backup).
@Dan McGrath (dmcgrath), can you do that?

It should be done. I redirect everything except /robots.txt to archive.b.o/wiki.

I will start archival procedures of the MySQL db and on disk files and place them in the wiki's /root folder, clean up old on disk cache files, etc.

Thanks @Tuomo Keskitalo (tkeskita)! And thanks @Brecht Van Lommel (brecht).
From a quick look the archived wiki seems to be running well in all its legacy glory.
Looking forward to the last steps that Dan will perform.

Thanks @Tuomo Keskitalo (tkeskita)! And thanks @Brecht Van Lommel (brecht).
From a quick look the archived wiki seems to be running well in all its legacy glory.
Looking forward to the last steps that Dan will perform.

@Francesco Siddi (fsiddi) It should already be redirecting en.blender.org to archive.blender.org/wiki/ since my last post.

Great! I guess that the 2 remaining points (Utilities and Delete the old read-only PHP wiki) can be wrapped up as well?
After that, I propose to close this task and move any other todo to separate topics, tagged as Documentation.

Great! I guess that the 2 remaining points (Utilities and Delete the old read-only PHP wiki) can be wrapped up as well?
After that, I propose to close this task and move any other todo to separate topics, tagged as Documentation.

The old wiki should be inaccessible already, but I will have to purge the files from disk before it's technically "deleted". The directory is archived though, already. I won't erase the SQL db until I can test the dump was fine, but initial glance suggests it's all good.

How does the archive look with the redirects etc?

Redirects are working well.

Other todo's are here:

  • T61097: port code architecture docs.
  • T54097: move addon docs to the manual.

Good to hear.

Well the cleanup on the server side should be final now. I dumped and compared/verified both the only disk files, and the database dump.

I have left a copy of the tarball in /root/en.blender.org.tar.bz2. Feel free to grab a copy (~5.5G). This contains the vhost docroot, the old txt only dumps, and the database dump.

I forgot to mention that the old vhost docroot files have be entirely deleted, as well as the original database. The old docroot is a mere empty skeleton, needed to satisfy the Apache docroot path, as well as provide a means for Certbot (Lets Encrypt) to acquire its certificates. Once the snapshots rotate enough, the space from the old files will be reclaimed.

Brecht Van Lommel (brecht) closed this task as Resolved.Feb 1 2019, 3:40 PM
Brecht Van Lommel (brecht) claimed this task.

Let's consider this resolved then.

I have left a copy of the tarball in /root/en.blender.org.tar.bz2. Feel free to grab a copy (~5.5G). This contains the vhost docroot, the old txt only dumps, and the database dump.

I wouldn't know what to do with that file, I just imagined you had someplace to archive these kinds of things.

Let's consider this resolved then.

I have left a copy of the tarball in /root/en.blender.org.tar.bz2. Feel free to grab a copy (~5.5G). This contains the vhost docroot, the old txt only dumps, and the database dump.

I wouldn't know what to do with that file, I just imagined you had someplace to archive these kinds of things.

Generally I just keep old stuff around where it originally was left, or I move (tarball) it to the root dir. It's tough since our rack never had a central storage location, unless you consider download.blender.org. I have started throwing secondary copies of stuff into CephFS. Who knows, if we keep it around, it might be the new "attic" (what I put such things).

Glad that you like the result! Cheers for ye olde wiki! :-)

One small thing I noticed: 2.79 release notes page has an outdated Note box on top you may want to manually remove.

@Tuomo, note box removed.