htmlArea

A directory of browser-based WYSIWYG editors

  MAIN
INDEX
SEARCH
POSTS
WHO'S
ONLINE
LOG
IN

Home: htmlArea 3 (beta): htmlArea 2 & 3 archive (read only): htmlArea v3.0 - Add-Ons:
php wordClean workaround


The htmlArea 2 & 3 editors have been discontinued.

We've made these forums available as a read-only reference and knowledge-base for people using or developing editors based on htmlArea 2 or 3.

Anyone who is interested in taking over version 2 or 3 is free to do so. All we ask is that you choose a new name that doesn't have "htmlarea" in it to avoid confusion with this site. We'll even give you a link in the directory to make it easier for people to find you. If you are developing or hosting an htmlArea based-editor under a new name, please submit it to our directory.

 


boldtbanan
New User

Oct 8, 2004, 6:36 PM

Post #1 of 10 (24557 views)
Shortcut
php wordClean workaround Can't Post

I'm working on a website where users are likely to post large passages from word which are subsequently saved to a file (or db entry) on the server, so wordclean is pretty helpful although it doesn't work for mozilla. To get around this, I coded up a PHP version of the function which is used to clean up the passage before it is written to disk. Although this increases server load, it guarantees you'll have clean code and reduced what you'll have written to disk (which might pay for the processing time if the file is read frequently). Anyway, just wanted to post it if anyone else needed it.


function wordClean($D) {
$search = array(
// make one line
"/\r\n/",
"/\n/",
"/\r/",
"/\&nbsp\;/",
// keep tags, strip attributes
"/ class=[^\s|>]*/i",
// "/<p [^>]*TEXT-ALIGN: justify[^>]*>/i",
"/ style=\"[^>]*\"/i",
"/ align=[^\s|>]*/i",
//clean up tags
"/<b [^>]*>/i",
"/<i [^>]*>/i",
"/<li [^>]*>/i",
"/<ul [^>]*>/i",
// replace outdated tags
"/<b>/i",
"/<\/b>/i",
// mozilla doesn't like <em> tags
"/<em>/i",
"/<\/em>/i",
// kill unwanted tags
"/<\?xml:[^>]*>/", // Word xml
"/<\/?st1:[^>]*>/", // Word SmartTags
"/<\/?[a-z]\:[^>]*>/", // All other funny Word non-HTML stuff
"/<\/?font[^>]*>/i", // Disable if you want to keep font formatting
"/<\/?span[^>]*>/i",
"/<\/?div[^>]*>/i",
"/<\/?pre[^>]*>/i",
"/<\/?h[1-6][^>]*>/i");
//remove empty tags (watch array syntax if you turn these on)
// "/<strong><\/strong>/i",
// "/<i><\/i>/i",
// "/<P[^>]*><\/P>/i");

$replace = array(
// make one line
" "," "," "," ",
// keep tags, strip attributes
"",
// '<p align="justify">',
"","",
//clean up tags
"<b>",
"<i>",
"<li>",
"<ul>",
// replace outdated tags
"<strong>","</strong>",
// mozilla doesn't like <em> tags
"<i>","</i>",
// kill unwanted tags
"","","","",
" ", // Word xml
" ", // Word SmartTags
" ", // All other funny Word non-HTML stuff
" "); // Disable if you want to keep font formatting
//remove empty tags (watch array syntax if you turn these on)
// "","","");

$D = preg_replace($search, $replace, $D);

// nuke double tags
$oldlen = strlen($D) + 1;
while($oldlen > strlen($D)) {
$oldlen = strlen($D);
// join us now and free the tags, we'll be free hackers, we'll be free... ;-)
$D = preg_replace(array("/<([a-z][a-z]*)> *<\/\1>/i",
"/<([a-z][a-z]*)> *<([a-z][^>]*)> *<\/\1>/i"),
array(" ","<$2>"),
$D);
}
$D = preg_replace(array("/<([a-z][a-z]*)><\1>/i",
"/<\/([a-z][a-z]*)><\/\1>/i"),
array("<$1>","<\/$1>"),
$D);

// nuke double spaces (This will replace any combination of two whitespace characters
// in a row. Use the second line if you only want to eliminate two or more spaces in a row)
$D = preg_replace("/[\s]{2,}/",' ',$D);
// $D = preg_replace("/[ ]{2,}/",' ',$D); // spaces only

return $D;
}


kyberfabrikken
User

Oct 9, 2004, 8:04 AM

Post #2 of 10 (24540 views)
Shortcut
Re: [boldtbanan] php wordClean workaround [In reply to] Can't Post

actually i've written a php-class that takes care of that. i'm not sure if it's better/worse than your regex's, but you may want to have a go at it :
http://www.phpclasses.org/browse/package/1020.html


billbody
Novice

Oct 12, 2004, 12:11 AM

Post #3 of 10 (24450 views)
Shortcut
Re: [kyberfabrikken] php wordClean workaround [In reply to] Can't Post

so...how do a newie like me include this class to work with the editor?

T.i.A


kyberfabrikken
User

Oct 12, 2004, 5:50 AM

Post #4 of 10 (24432 views)
Shortcut
Re: [billbody] php wordClean workaround [In reply to] Can't Post

it's used on the serverside.

suppose your input from htmlarea comes in the variable $_GET['content'] you will do something like this :


Code
  

<?php
require_once('htmlcleaner.php');
$content_cleaned = htmlcleaner::cleanup($_GET['content']);
?>

the clean content will now be in your variable $content_cleaned


billbody
Novice

Oct 12, 2004, 1:52 PM

Post #5 of 10 (24420 views)
Shortcut
Re: [kyberfabrikken] php wordClean workaround [In reply to] Can't Post

sorry I don't understand...

if I have this : <div><textarea rows="<?php echo $rows; ?>" cols="40" name="content" tabindex="5" id="content"><?php echo $content ?></textarea></div>
</fieldset>

where do I have to inlcude the class?



T.I.A.


kyberfabrikken
User

Oct 12, 2004, 4:09 PM

Post #6 of 10 (24415 views)
Shortcut
Re: [billbody] php wordClean workaround [In reply to] Can't Post

you will have to show the code, where you insert data into your database, or save to file, witchever you do.


billbody
Novice

Oct 12, 2004, 10:59 PM

Post #7 of 10 (24407 views)
Shortcut
Re: [kyberfabrikken] php wordClean workaround [In reply to] Can't Post

mmmm not sure what code is it...

I using wordpress ( a blog sistem )


kyberfabrikken
User

Oct 12, 2004, 11:28 PM

Post #8 of 10 (24404 views)
Shortcut
Re: [billbody] php wordClean workaround [In reply to] Can't Post

in the top of the script, that is the target for your form (look in the action= attribute of the form-tag) post this :

<?php
require_once('htmlcleaner.php');
$_POST['content'] = htmlcleaner::cleanup($_POST['content']);
?>

it's a wild guess, but it might work.


billbody
Novice

Oct 13, 2004, 8:03 AM

Post #9 of 10 (24395 views)
Shortcut
Re: [kyberfabrikken] php wordClean workaround [In reply to] Can't Post

Thanks, will try and let iu know


wozke
New User

Jan 24, 2005, 8:37 AM

Post #10 of 10 (23253 views)
Shortcut
Re: [boldtbanan] php wordClean workaround [In reply to] Can't Post

I've implemented your function wordClean($D), and it works almost prefectly! The one thing it doesn't do is remove <p /> tags. Does anyone has a solution for this?

 
 
 


Search for (options)