avatarAndrew Zuo

Summary

Andrew Zuo wrote an XML parser for his RSS reader project, "Project Keystone," with the help of ChatGPT, finding existing packages lacking and the XML specification complex.

Abstract

Andrew Zuo embarked on creating an RSS reader, dubbed "Project Keystone," to integrate his blogging, app development, and news reading activities. Finding existing RSS parsing libraries on Flutter's pub.dev outdated, he decided to write his own XML parser. Despite the complexity of the XML specification and initial challenges with encoding, Zuo successfully created a basic parser with guidance from ChatGPT. He notes that while ChatGPT was instrumental in understanding XML and answering specific questions, it was less helpful with API interactions and sometimes provided vague or incorrect answers. Zuo's experience underscores the utility of AI in navigating complex documentation and the importance of verification when using AI-generated information.

Opinions

  • Zuo believes that programming challenges, such as parsing XML, require a deep understanding of specifications, which can be facilitated by AI assistance like ChatGPT.
  • He criticizes the complexity of the XML specification and the assumption that developers should know the exact layout of XML documents.
  • Zuo is skeptical of the reliability of existing XML parsing packages, pointing out that even popular ones may not handle basic tasks like encoding detection.
  • He values the inclusion of the XML declaration in documents for clarity in processing, despite it not being mandatory.
  • Zuo finds ChatGPT to be a reliable resource for XML-related questions but acknowledges its limitations, such as providing vague answers or occasional incorrect information.
  • He emphasizes the importance of practical examples and hands-on experience in learning and problem-solving, as seen in his request for simple XML test cases from ChatGPT.
  • Zuo reflects on the simplicity of Atom and RSS feeds compared to the complexity of XML, suggesting that ChatGPT's assistance was more critical for XML due to its less straightforward documentation.
  • He encourages reader engagement by asking for claps to support the article's visibility, indicating the significance of community feedback in content creation and dissemination.
Photo by Obi - @pixel7propix on Unsplash

I Wrote An XML Parser With The Help Of ChatGPT

So remember that post where I was like:

So I haven’t mastered managing RSS feeds yet. I want to make my own RSS app to fix some of the problems I see with RSS. But in the meantime here are some tips for managing and organizing your feeds:

Well, it’s finally time to make an RSS reader.

This is actually an important project for me because I see it as uniting everything together. It will unite my blogging, it will unite my apps, and it will unite my news reading. It will act as the keystone between all of my projects.

And that is why I’m calling it Project Keystone.

Making An RSS Reader

So I looked at how I’d parse RSS in Flutter. First I looked at the options on pub.dev. And there are some like

Last updated 22 months ago with pull requests as far back as 2021. One of which is called ‘Added support for Youtube channel feed’.

And then there’s another one called

Last updated 21 months ago with even more pull requests going back even further, even 2019.

Apparently someone even forked this one into this other one

And there are a bunch of other ones that I didn’t want to try out. So what do I do? I make my own RSS parser of course. So how do I do that?

Making An XML Parser

So there are a few XML parsers available. The big one is this one

I looked at it and… it’s just so complicated. It seems to be expecting that you know exactly how the XML document is laid out. And after reviewing the Atom and RSS specifications, yeah, I do, but I also want to know if there are any unexpected XML nodes.

So I decided to write my own XML parser too. Bad idea, right? Surely a pub package with 265 likes which is on version 6.2.2 is more developed than anything I could write. Right?

And then I looked at this:

If your file is not UTF-8 encoded pass the correct encoding to readAsStringSync. It is the responsibility of the caller to provide a standard Dart [String] using the default UTF-16 encoding. To read and write large files you might want to use the event-driven API instead.

So you’re telling me that your XML parser with 265 likes and is on version 6.2.2 is not capable of reading the XML header that goes:

And automatically assigning a correct encoding?

So at this point I decided to write the XML parser myself. It was pretty easy actually, only took a day to do. I mean, it’s not complete yet, it can’t do multiple encodings yet, but the XML package can’t either so I guess it’s fine for now. I’ll add that functionality later.

This actually reminds me of another post I wrote a long time ago.

In it I talked about another post talking about someone else’s being unable to parse brackets. I don’t want to be rude, but if you’re having problems with that maybe programming is not for you. Or maybe just read this other article I read on the topic:

Because XML is actually really similar to that bracket problem. It’s just a bit more complicated which brings me to:

ChatGPT

This entire XML/RSS thing is a pretty complicated process. Actually mostly XML. The Atom/RSS specification is described very simply. It’s basically just XML with a predefined schema. Actually, not basically, that’s literally what it is. XML, on the other hand… Well, just take a look:

The formal grammar of XML is given in this specification using a simple Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form

symbol ::= expression

I think I need a PhD to understand this document.

It’s not the worse document I’ve ever read, that would be the GDPR:

And probably not as bad as the Dubai font’s license. But I still don’t want to read it.

So the bulk of my understanding on XML was actually given by ChatGPT. ChatGPT helped me throughout this process. So for example I asked it:

What is this line: and does it appear in every XML document?

And it gave me this:

The line is known as the XML declaration and it appears at the very beginning of an XML document. It provides information about the version of XML used in the document and the encoding used to represent the characters in the document.

This line is not required in every XML document, but it is a good practice to include it as it provides important information about the document and makes it easier to process. If an XML document does not include this line, then the processor will have to make assumptions about the version of XML and encoding used, which could lead to issues.

I asked it this:

Is Atom compatible with RSS?

And it gave me this:

[…] Since Atom and RSS are different feed formats, they are not directly compatible with each other. However, it is possible to convert between Atom and RSS, as both formats use XML as the underlying format for the data. This makes it possible to write code to convert from one format to another.

And I asked it some other questions like ‘Is it possible for an XML document to have more than one root node?’, ‘In XML can an attribute name have an = in it?’, ‘Can an attribute’s value have an = in it?’, ‘Can the value of an attribute contain the “ character?’, ‘Can you give me some simple example XML to use as test cases for my app?’, ‘Can an XML node both have text and children?’, ‘In XML are there any nodes that aren’t closed?’, ‘Is it possible to have overlapping nodes in XML?’, and ‘Are all versions of RSS backwards compatible with each other?’

So as you can see ChatGPT helped a lot with Project Keystone.

Now, ChatGPT is not always helpful. Once I tried to use it for help interacting with an API.

And it didn’t do so well. And there were also times when ChatGPT would give me an incorrect answer. I asked it for an implementation of an XML parser. It was wrong. And sometimes it will give you extremely vague answers. Like I asked it ‘Then how do I know what’s text and what’s white space?’

and it told me this:

In XML, white space is considered to be part of the text content of an element. This means that any whitespace characters (such as spaces, tabs, and line breaks) within an element are treated as part of the text content of that element.

To determine what’s text and what’s white space in an XML document, you need to consider the context in which the text and whitespace characters appear. If the characters are within the opening and closing tags of an element, they are considered to be part of the text content of that element. If the characters are outside the opening and closing tags of any element, they are considered to be insignificant whitespace and are typically ignored by XML parsers.

When parsing XML, it’s important to keep track of the context in which text and whitespace characters appear in order to properly identify and extract the text content of elements. Most XML parsers have built-in support for handling white space, so you typically don’t need to worry about this when using a parser to process XML documents.

which is not exactly helpful. But after some Googling I got

XML ignores the first sequence of white space immediately after the opening tag and the last sequence of white space immediately before the closing tag. XML translates non-space characters (tab and new-line) into a space character and consolidates all multiple space characters into a single space.

So it just took me a bit more time to figure out. It didn’t mean that ChatGPT failed or anything.

Final Thoughts

ChatGPT is a pretty niche tool. It helped on XML because the documentation for XML is not that great. I don’t anticipate that it will help for RSS though because Atom and RSS feeds are very simple and the documentation is straightforward.

It’s just the XML that’s annoying.

You know people are always saying that ChatGPT could make stuff up. But in my experience when it’s making stuff up it’s really obvious when it’s doing it. You shouldn’t trust it all the time, but in my experience, at least with XML, it has been pretty reliable.

If you liked this article be sure to give it a few claps. It helps out a lot with the algorithm.

Andrew Zuo is a programmer who has developed a wide range of applications across various industries. His current focus is Litany (Android, iOS), a language learning app that utilizes advanced algorithms to optimize flashcard review and ensure maximum retention of new vocabulary. Frustrated with the traditional brute force method of language learning, Andrew set out to create a more efficient way for users to learn new languages.

Xml
RSS
Rss Feeds
Xml Parsing
Programming
Recommended from ReadMedium