发布:2023/12/7 15:32:55作者:大数据 来源:大数据 浏览次数:539
如今是互联网时代,随时随刻都在接触网页数据。那么对于.NET的开发人员来说,处理网页源码就是有时候就不能避免了。今天给大家分享.NET用正则表达式清除HTML标签的通用方法。使其保留网页源码中的纯文本,具体方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
<span class="hljs-meta">#<span class="hljs-meta-keyword">region</span> 去掉HTML中的所有标签,只留下纯文本</span> <span class="hljs-comment"><span class="hljs-doctag">///</span> <span class="hljs-doctag"><summary></span></span> <span class="hljs-comment"><span class="hljs-doctag">///</span> 去掉HTML中的所有标签,只留下纯文本</span> <span class="hljs-comment"><span class="hljs-doctag">///</span> <span class="hljs-doctag"></summary></span></span> <span class="hljs-comment"><span class="hljs-doctag">///</span> <span class="hljs-doctag"><param name="strHtml"></span><span class="hljs-doctag"></param></span></span> <span class="hljs-comment"><span class="hljs-doctag">///</span> <span class="hljs-doctag"><returns></span><span class="hljs-doctag"></returns></span></span> <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">string</span> <span class="hljs-title">CleanHtml</span>(<span class="hljs-params"><span class="hljs-keyword">string</span> strHtml</span>) </span>{ <span class="hljs-keyword">if</span> (<span class="hljs-keyword">string</span>.IsNullOrEmpty(strHtml)) <span class="hljs-keyword">return</span> strHtml; <span class="hljs-comment">//删除脚本</span> <span class="hljs-comment">//Regex.Replace(strHtml, @"<script[^>]*?>.*?</script>", "", RegexOptions.IgnoreCase)</span> strHtml = Regex.Replace(strHtml, <span class="hljs-string">"(\<script(.+?)\</script\>)|(\<style(.+?)\</style\>)"</span>, <span class="hljs-string">""</span>, RegexOptions.IgnoreCase | RegexOptions.Singleline); <span class="hljs-comment">//删除标签</span> <span class="hljs-keyword">var</span> r = <span class="hljs-keyword">new</span> Regex(<span class="hljs-string">@"</?[^>]*>"</span>, RegexOptions.IgnoreCase); Match m; <span class="hljs-keyword">for</span> (m = r.Match(strHtml); m.Success; m = m.NextMatch()) { strHtml = strHtml.Replace(m.Groups[<span class="hljs-number">0</span>].ToString(), <span class="hljs-string">""</span>); } <span class="hljs-keyword">return</span> strHtml.Trim(); } <span class="hljs-meta">#<span class="hljs-meta-keyword">endregion</span></span> |
© Copyright 2014 - 2024 柏港建站平台 ejk5.com. 渝ICP备16000791号-4