[java] 자바로 DOM 파싱 정규화-어떻게 작동합니까?

이 튜토리얼 에서 DOM 파서 코드의 아래 줄을 보았습니다 .

doc.getDocumentElement().normalize();

왜이 정규화를 수행합니까? 문서를
읽었 지만 단어를 이해할 수 없었습니다.

모든 Text 노드를이 노드 아래에있는 하위 트리의 전체 깊이에 둡니다.

그렇다면 누군가이 나무가 어떻게 보이는지 보여줄 수 있습니까 (사진과 함께)?

왜 정규화가 필요한지 설명해 줄 수 있습니까?
정규화하지 않으면 어떻게됩니까?

답변

나머지 문장은 다음과 같습니다.

여기서 구조 (예를 들어, 요소, 주석, 처리 명령, CDATA 섹션 및 엔티티 참조) 만 텍스트 노드를 분리합니다. 즉, 인접한 텍스트 노드 나 빈 텍스트 노드가 없습니다.

이것은 기본적으로 다음 XML 요소를 의미합니다

<foo>hello
wor
ld</foo>

비정규 화 된 노드에서 다음과 같이 나타낼 수 있습니다.

Element foo
    Text node: ""
    Text node: "Hello "
    Text node: "wor"
    Text node: "ld"

정규화되면 노드는 다음과 같습니다

Element foo
    Text node: "Hello world"

속성 <foo bar="Hello world"/>, 주석 등도 마찬가지입니다 .

답변

간단히 말해서 정규화는 중복을 줄이는 것입니다.
중복의 예 :
a) 루트 / 문서 태그 외부의 공백 ( … <document> </ document> … )
b) 시작 태그 (< … >) 및 끝 태그 (</ … >)
c) 속성과 값 사이의 공백 (예 : 키 이름 과 = “ 사이의 공백 )
d) 불필요한 네임 스페이스 선언
e) 속성 및 태그 텍스트의 줄 바꿈 / 공백
f) 주석 등 …

답변

더 많은 기술적 인 사용자를위한 @JBNizet의 답변에 대한 확장으로 여기에 org.w3c.dom.Node인터페이스 구현이 com.sun.org.apache.xerces.internal.dom.ParentNode어떻게 보이는지, 실제로 어떻게 작동하는지에 대한 아이디어를 제공합니다.

public void normalize() {
    // No need to normalize if already normalized.
    if (isNormalized()) {
        return;
    }
    if (needsSyncChildren()) {
        synchronizeChildren();
    }
    ChildNode kid;
    for (kid = firstChild; kid != null; kid = kid.nextSibling) {
         kid.normalize();
    }
    isNormalized(true);
}

모든 노드를 재귀 적으로 순회하고 호출합니다. kid.normalize()
이 메커니즘은org.apache.xerces.dom.ElementImpl

public void normalize() {
     // No need to normalize if already normalized.
     if (isNormalized()) {
         return;
     }
     if (needsSyncChildren()) {
         synchronizeChildren();
     }
     ChildNode kid, next;
     for (kid = firstChild; kid != null; kid = next) {
         next = kid.nextSibling;

         // If kid is a text node, we need to check for one of two
         // conditions:
         //   1) There is an adjacent text node
         //   2) There is no adjacent text node, but kid is
         //      an empty text node.
         if ( kid.getNodeType() == Node.TEXT_NODE )
         {
             // If an adjacent text node, merge it with kid
             if ( next!=null && next.getNodeType() == Node.TEXT_NODE )
             {
                 ((Text)kid).appendData(next.getNodeValue());
                 removeChild( next );
                 next = kid; // Don't advance; there might be another.
             }
             else
             {
                 // If kid is empty, remove it
                 if ( kid.getNodeValue() == null || kid.getNodeValue().length() == 0 ) {
                     removeChild( kid );
                 }
             }
         }

         // Otherwise it might be an Element, which is handled recursively
         else if (kid.getNodeType() == Node.ELEMENT_NODE) {
             kid.normalize();
         }
     }

     // We must also normalize all of the attributes
     if ( attributes!=null )
     {
         for( int i=0; i<attributes.getLength(); ++i )
         {
             Node attr = attributes.item(i);
             attr.normalize();
         }
     }

    // changed() will have occurred when the removeChild() was done,
    // so does not have to be reissued.

     isNormalized(true);
 }

시간이 절약되기를 바랍니다.