[SQL Server]SQL Server 2005 XML ASCII 문자(1~31) 사용시 오류 문제.


1. 오류 재현 코드

DECLARE @v_xml XML DECLARE @v_looper INT SELECT @v_looper = 1 WHILE (@v_looper<31) BEGIN SELECT @v_looper = @v_looper + 1 BEGIN TRY   SELECT @v_Xml = N'<rows>

<row attr="' + CHAR(@v_looper) + '" />

</rows>' PRINT 'Success! ASCII Code:' + CAST(@v_looper AS VARCHAR(10)) END TRY BEGIN CATCH PRINT 'Failure! ASCII Code:' + CAST(@v_looper AS VARCHAR(10)) --PRINT ERROR_NUMBER() --PRINT ERROR_MESSAGE() END CATCH END


실행결과

Error! ASCII Code:2

Error! ASCII Code:3 Error! ASCII Code:4 Error! ASCII Code:5 Error! ASCII Code:6 Error! ASCII Code:7 Error! ASCII Code:8 Success! ASCII Code:9 Success! ASCII Code:10 Error! ASCII Code:11 Error! ASCII Code:12 Success! ASCII Code:13 Error! ASCII Code:14 Error! ASCII Code:15 Error! ASCII Code:16 Error! ASCII Code:17 Error! ASCII Code:18 Error! ASCII Code:19 Error! ASCII Code:20 Error! ASCII Code:21 Error! ASCII Code:22 Error! ASCII Code:23 Error! ASCII Code:24 Error! ASCII Code:25 Error! ASCII Code:26 Error! ASCII Code:27 Error! ASCII Code:28 Error! ASCII Code:29 Error! ASCII Code:30 Error! ASCII Code:31

위 실행결과에서 확인할 수 있듯이 ASCII코드 1에서 31사이에 사용 가능한 Character는

9  (TAB : horizontal tab, XML #x9)
10 (LF : line feed, new line, XML #xA)
13 (CR : carriage  return, XML #xD)

이렇게 세가지 뿐이고, 나머지 Character는 다음과 모두 에러를 발생한다.

메시지 9420, 수준 16, 상태 1, 줄 9
XML parsing: line 1, character 18, illegal xml character
이 같은 문제에 대해 MS에서는 해당 문자들은 XML 스펙에서 사용을 금지하고 있으므로 사용하지 말라고 이야기 하고 있다.

PRB: Error Message When an XML Document Contains Low-Order ASCII Characters

원문 : http://support.microsoft.com/kb/315580

SYMPTOMS (증상)

When you attempt to use versions 3.0 or later of the MSXML parser to parse XML documents that contain certain low-order non-printable ASCII characters (that is, characters below ASCII 32), you may receive the following error message:
An Invalid character was found in text content.


CAUSE

Versions 3.0 and later of the MSXML parser strictly enforce the valid XML character ranges that are defined by the World Wide Web Consortium (W3C) XML language specification. XML documents that are parsed using versions 3.0 or later of MSXML cannot contain characters that fall outside the defined valid XML character ranges. The low-order non-printable ASCII characters in the ranges that are listed in the "More Information" section are not valid XML characters. An XML document that contains instances of these characters is not conformant with the W3C specifications and cannot be parsed successfully with versions 3.0 and later of MSXML.


RESOLUTION
To resolve this problem, either remove instances of the low-order non-printable ASCII characters, or replace the characters with an alternate valid character such as the space character (ASCII 32, hex #x20). This solution makes the XML document compliant with the W3C specifications. However, removing or replacing instances of these characters may affect other applications that use the data and to which the characters are significant. Such additional impact can only be identified by testing and will need to be addressed by implementing a fix or workaround that is appropriate for a specific situation.

STATUS
This behavior is by design.


MORE INFOMATION
Versions 2.6 and earlier of the MSXML parser permit XML documents to contain low-order non-printable ASCII characters that fall outside the W3C valid XML character ranges. However, the design of versions 3.0 and later of the MSXML parser has been changed to strictly enforce the valid XML character ranges that are defined in the W3C XML language specification. This design change is required to be able to identify non-conformant XML documents.
The following are the valid XML characters and character ranges (hex values) as defined by the W3C XML language specifications 1.0:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
The following are the character ranges for low-order non-printable ASCII characters that are rejected by MSXML versions 3.0 and later:
#x0 - #x8 (ASCII 0 - 8)
#xB - #xC (ASCII 11 - 12)
#xE - #x1F (ASCII 14 - 31)
This design change may affect the following users and applications:
  • Internet Explorer users: Users who have been using Internet Explorer versions 5.5 and earlier (and who did not install MSXML 3.0 in Replace mode) to browse and view XML documents that contain one or more instances of the specified low-order non-printable ASCII characters encounter the error message after upgrading to Internet Explorer 6.0 because Internet Explorer 6.0 installs MSXML 3.0 SP2 in Replace mode and uses it to parse XML documents.
  • MDAC and ADO users: Developers and users who load ADO-persisted XML documents that contain one or more instances of the specified low-order non-printable ASCII characters into ADO Recordset objects encounter the error message after upgrading to MDAC 2.7 because MDAC 2.7 installs MSXML 3.0 SP2, which is the version of the MSXML parser that the ADO 2.7 Recordset object uses.
  • Applications that use the MSXML Document Object Model (DOM): Applications that use version independent PROGIDs to instantiate MSXML DOM objects that are used to parse XML documents generate the specified error when MSXML 3.0 or one of its service packs is installed in Replace mode or when the code is modified to use the MSXML 3.0 or 4.0 version specific PROGIDs.


REFERENCES

For additional information on other known causes and workarounds for the error message that is specified in the 'Symptoms' section, click the article numbers below to view the articles in the Microsoft Knowledge Base:
238833  (http://support.microsoft.com/kb/238833/EN-US/ ) PRB: XML Parser: Invalid Character Was Found in Text Content
275883  (http://support.microsoft.com/kb/275883/EN-US/ ) INFO: XML Encoding and DOM Interface Methods

APPLIES TO
  • Microsoft XML Parser 3.0
  • Microsoft XML Parser 3.0 Service Pack 1
  • Microsoft XML Parser 3.0 Service Pack 2
  • Microsoft XML Core Services 4.0
  • Microsoft Data Access Components 2.8


이에 대해서 W3C의 Extensible Markup Language (XML) 1.0 (Fifth Edition) 에서 다음과 같이 정의하고 있다. ("2.2.Characters"에 대한 내용만 일부 복사.)

Extensible Markup Language (XML) 1.0 (Fifth Edition)
W3C Recommendation 26 November 2008
2.2. Characters
[Definition: A parsed entity contains text, a sequence of characters, which may represent markup or character data.] [Definition: A character is an atomic unit of text as specified by ISO/IEC 10646:2000 [ISO/IEC 10646]. Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. The versions of these standards cited in A.1 Normative References were current at the time this document was prepared. New characters may be added to these standards by amendments or new editions. Consequently, XML processors MUST accept any character in the range specified for Char. ]

Character Range
[2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

The mechanism for encoding character code points into bit patterns may vary from entity to entity. All XML processors MUST accept the UTF-8 and UTF-16 encodings of Unicode [Unicode]; the mechanisms for signaling which of the two is in use, or for bringing other encodings into play, are discussed later, in 4.3.3 Character Encoding in Entities.

Note:
Document authors are encouraged to avoid "compatibility characters", as defined in section 2.3 of [Unicode]. The characters defined in the following ranges are also discouraged. They are either control characters or permanently undefined Unicode characters:
[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDEF],
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
[#x10FFFE-#x10FFFF].

나의 결론 : 지금까지 이런 문제에 대해 전혀 모르고 있었다. XML를 사용하면서 W3C의 XML 스펙 문서를 자세히 읽어 본 적도 없고,  XML관련 서적을 봤을 때 W3C의 XML 스펙에 대해 이야기하는 책도 접해보지 못했다. 새로운 기술을 사용할 때는 기본에 충실하고,  해당 표준 스펙에 대해 충분히 숙지 해야할 것이다. 그리고 또 다른 한가지는 모든 스펙문서가 영문으로 된 경우 많으므로 영어능력을 키우는 것도 중요하겠다. 여기서 다시 한번 모든 학문의 기초는 영어라는 이야기를 되새기면서... 이 씁쓸함은 머지?





이올린에 북마크하기(0) 이올린에 추천하기(0)

Posted by 좐군

2009/06/07 00:47 2009/06/07 00:47

Trackback URL : http://John.tobe30.com/tc/trackback/165

Leave a comment
[로그인][오픈아이디란?]