This project is read-only.

Problems With MessageStructureProcessor.ParseContentPart

Jan 27, 2010 at 12:58 AM

I don't claim to be an expert or even proficient with IMAP responses, but the responses I get when I get email from a specific sender have been causing "Invalid format could not parse body part headers" exceptions in the MessageStructureProcessor. So, I wrote a new RegEx expression that seems to handle these messages and inserted it into MessageStructureProcessor.ParseContentPart. The problem is that these messages from this certain sender have "UID 1234" (or whatever the UID value is) after the response string, causing an exception. Here is the now-working MessageStructureProcessor.ParseContentPart():

        private void ParseContentPart(IMessageContent part, string s)
        {
            const string non_attach = "^\\((?<type>(\"[^\"]*\"|NIL))\\s(?<subtype>(\"[^\"]*\"|NIL))\\s(?<attr>(\\([^\\)]*\\)|NIL))\\s(?<id>(\"[^\"]*\"|NIL))\\s(?<desc>(\"[^\"]*\"|NIL))\\s(?<encoding>(\"[^\"]*\"|NIL))\\s(?<size>(\\d+|NIL))\\s(?<lines>(\\d+|NIL))\\s(?<md5>(\"[^\"]*\"|NIL))\\s(?<disposition>(\\([^\\)]*\\)|NIL))\\s(?<lang>(\"[^\"]*\"|NIL))\\)$";
            const string attachment = "^\\((?<type>(\"[^\"]*\"|NIL))\\s(?<subtype>(\"[^\"]*\"|NIL))\\s(?<attr>(\\([^\\)]*\\)|NIL))\\s(?<id>(\"[^\"]*\"|NIL))\\s(?<desc>(\"[^\"]*\"|NIL))\\s(?<encoding>(\"[^\"]*\"|NIL))\\s(?<size>(\\d+|NIL))\\s((?<data>(.*))\\s|)(?<lines>(\"[^\"]*\"|NIL))\\s(?<disposition>((?>\\((?<LEVEL>)|\\)(?<-LEVEL>)|(?!\\(|\\)).)+(?(LEVEL)(?!))|NIL))\\s(?<lang>(\"[^\"]*\"|NIL))\\)$";
            const string alt_attach = "^\\([\"]*(?<type>([\\w]*))[\"\\s]*(?<subtype>(\\w*))[\"\\s]*\\([\"*]\\w*[\"\\s\"]*(?<filename>([^\"]*))[\")\\s]*NIL\\sNIL\\s\"(?<encoding>(\\w*))[\"]\\s(?<size>(\\d*))";
            //<Begin Edit>
            //Added the following to deal with emails with odd body part headers from BB:
            const string bb_msg = @"^\((?<type>(""[^""]*""|NIL))\s(?<subtype>(""[^""]*""|NIL))\s(?<attr>(\([^\)]*\)|NIL))\s(?<id>(\""[^\""]*\""|NIL))\s(?<desc>(\""[^\""]*\""|NIL))\s(?<encoding>(\""[^\""]*\""|NIL))\s(?<size>(\d+|NIL))\s(?<lines>(\d+|NIL))\s(?<md5>(\""[^\""]*\""|NIL))\s(?<disposition>(\([^\)]*\)|NIL))\s(?<lang>(\""[^\""]*\""|NIL))\)\s(?<uidlabel>(\D+|NIL))\s(?<uidvalue>(\d+|NIL))";
            //<End Edit>
            Match match;
            
            if ((match = Regex.Match(s, non_attach, RegexOptions.ExplicitCapture)).Success)
            {
                //this.Attachment = false;
                part.ContentType = string.Format("{0}/{1}", match.Groups["type"].Value.Replace("\"", ""), match.Groups["subtype"].Value.Replace("\"", ""));
                part.Charset = ParseCharacterSet(ParseNIL(match.Groups["attr"].Value));
                part.ContentId = ParseNIL(match.Groups["id"].Value);
                part.ContentDescription = ParseNIL(match.Groups["desc"].Value);
                part.ContentTransferEncoding = ParseNIL(match.Groups["encoding"].Value);
                part.ContentSize = Convert.ToInt64(ParseNIL(match.Groups["size"].Value));
                part.Lines = Convert.ToInt64(ParseNIL(match.Groups["lines"].Value));                
                part.MD5 = ParseNIL(match.Groups["md5"].Value);
                part.ContentDisposition = ParseNIL(match.Groups["disposition"].Value);
                part.Language = ParseNIL(match.Groups["lang"].Value);
            }
            else if ((match = Regex.Match(s, attachment, RegexOptions.ExplicitCapture)).Success)
            {
                //this.Attachment = true;
                part.ContentType = string.Format("{0}/{1}", match.Groups["type"].Value.Replace("\"", ""), match.Groups["subtype"].Value.Replace("\"", ""));
                part.ContentFilename = ParseFileName(ParseNIL(match.Groups["attr"].Value));
                part.ContentId = ParseNIL(match.Groups["id"].Value);
                part.ContentDescription = ParseNIL(match.Groups["desc"].Value);
                part.ContentTransferEncoding = ParseNIL(match.Groups["encoding"].Value.Replace("\"",""));
                part.ContentSize = Convert.ToInt64(ParseNIL(match.Groups["size"].Value));
                part.Lines = Convert.ToInt64(ParseNIL(match.Groups["lines"].Value));
                part.ContentDisposition = ParseNIL(match.Groups["disposition"].Value);
                part.Language = ParseNIL(match.Groups["lang"].Value);            
                
            }
            else if ((match = Regex.Match(s, alt_attach, RegexOptions.ExplicitCapture)).Success)
            {
                part.ContentType = string.Format("{0}/{1}", match.Groups["type"].Value, match.Groups["subtype"].Value);
                part.ContentFilename = match.Groups["filename"].Value;
                part.ContentTransferEncoding = match.Groups["encoding"].Value;
                part.ContentSize = Convert.ToInt64(match.Groups["size"].Value);

            }
            //<Begin Edit>
            //Added for email originating from BB
            else if ((match = Regex.Match(s, bb_msg, RegexOptions.ExplicitCapture)).Success)
            {   
                part.ContentType = string.Format("{0}/{1}", match.Groups["type"].Value.Replace("\"", ""), match.Groups["subtype"].Value.Replace("\"", ""));
                part.Charset = ParseCharacterSet(ParseNIL(match.Groups["attr"].Value));
                part.ContentId = ParseNIL(match.Groups["id"].Value);
                part.ContentDescription = ParseNIL(match.Groups["desc"].Value);
                part.ContentTransferEncoding = ParseNIL(match.Groups["encoding"].Value);
                part.ContentSize = Convert.ToInt64(ParseNIL(match.Groups["size"].Value));
                part.Lines = Convert.ToInt64(ParseNIL(match.Groups["lines"].Value));
                part.MD5 = ParseNIL(match.Groups["md5"].Value);
                part.ContentDisposition = ParseNIL(match.Groups["disposition"].Value);
                part.Language = ParseNIL(match.Groups["lang"].Value);
            }
            //<End Edit>
            else
                throw new Exception("Invalid format could not parse body part headers.");
                
        }

I think more work needs to be done now on parsing messages from this sender, so that various fields populate correctly. The "subject" and "body" of the emails from this particular sender are wrong; however, the attachments, date, and to/from are correct. I don't know if this is the sender's fault for how they structure their email (I gather not everyone follows proper email protocols these days), but Outlook seems to handle these emails properly when set up as an IMAP client. So, I would guess that it's the content processors that need a little polishing.

 

Mar 26, 2010 at 1:54 PM
Thanks for your post about this. Processing the body structure and content parts of the IMAP message is the most difficult part of this whole library. Due to the various different ways a message can be sent and recognizing that not every IMAP server plays by the rules there will always be edge cases where special processing is needed. There are certainly better ways to do this than the way i have done it, which was mostly due to the lack of engineering prowess on my part. i would love to take that part of the library and start over, but at this point i don't have a lot of time to devout to that. my hope would be that some other, stronger developers would be willing to contribute their knowledge to this project so that the message processing can be a little more forgiving to unexpected message structures.