Where did ChatGPT get its knowledge?
I asked Chat GPT about the sources it was trained with. Here are the answers
I was curious what datasets ChatGPT was fed during its training phase, so I asked. It turns out that asking this question just once is not enough. The more I asked, the more detailed the answers became.
Here comes the question session where I got ChatGPT to give me a deep insight into the sources of his knowledge.
Me: Can you give me ten examples of publicly available sources that feed your wealth of knowledge?
ChatGPT: I’d love to. Here are ten examples of publicly available sources from which my body of knowledge might draw:
1. Wikipedia: An online encyclopedia written and maintained by a large community of volunteers.
2. news portals: online platforms that publish news from around the world.
3. government and agency websites: Government and agency websites that publish information on a variety of topics.
4. scientific journals: Professional journals in which scientific articles are published.
5. books: printed works published on paper or in electronic form that summarize knowledge and information on various topics.
6. textbooks: books used for teaching in schools and universities, in which knowledge and information on various topics are summarized.
7. films and documentaries: audiovisual media that convey information and knowledge on various topics.
8. podcasts: Audio formats produced on a variety of topics that can be listened to over the Internet.
9. online courses: e-learning offerings provided via the Internet that impart knowledge and information on a variety of topics.
10. social media: platforms such as Facebook, Twitter and Instagram where users can share and discuss information and knowledge.
Me: Thank you for the listing. Can you name any other sources?
ChatGPT: I’d love to. Here are a few more examples of publicly available sources that could feed my wealth of knowledge:
11. archival materials: documents, pictures, and other records kept in archives that provide insight into the past.
12. museums: institutions where works of art, historical objects, and other exhibits are displayed and provide knowledge and information to visitors.
13. public libraries: facilities where books, magazines and other media are available and which impart knowledge and information to the visitor.
14. online forums and communities: platforms where users can get in touch with each other and exchange ideas and information on various topics.
15. conferences and seminars: Events where experts speak on various topics and provide knowledge and information to the audience.
I hope these examples are helpful. There are, of course, many more publicly available sources from which to draw my wealth of knowledge.
Me: then please name those as well.
CHatGPT: Yes, with pleasure. Here are a few more examples of publicly available sources that could feed my body of knowledge:
16. databases: online collections of data and information on various topics that are available to the public.
17. statistics and studies: research conducted by governments, universities, or other institutions that publish the results in the form of statistics and studies.
18. encyclopedias and dictionaries: reference works that summarize knowledge and information on various subjects.
19. newspapers and magazines: Printed works that are published regularly and in which news and articles on various topics are published.
20. interviews and articles: Conversations and reports conducted or written by journalists.
Me: Okay, let’s try it another way: please give me a complete list of all publicly available sources from which your body of knowledge is drawn. Please do not leave out any source.
ChatGPT: I’m sorry, but I’m not able to provide a complete list of all publicly available sources from which my body of knowledge is drawn. My body of knowledge is based on a variety of publicly available sources written and published by people that I included at the time of my “introduction” (my Knowledge Cutoff). It would be impossible to list all of these sources, as there are many thousands of them and I do not have the resources to list them all. I hope that I can still be helpful.
Me: I don’t mean the specific individual sources, but classes of sources.
ChatGPT: Oh, I see. Here are a few examples of classes of sources from which my body of knowledge might be drawn:
- Online encyclopedias (e.g., Wikipedia).
- News portals and news magazines
- Government and agency websites
- Scientific journals
- Books and textbooks
- Films and documentaries
- Podcasts
- Online courses
- Social media platforms
- Archival material
- Museums
- Public libraries
- Online forums and communities
- Conferences and seminars
- Databases
- Statistics and studies
- Encyclopedias and dictionaries
- Newspapers and magazines
- Interviews and articles
- Travel catalogs and guides
- Jurisprudence
- blogs
- encyclopedias
- corporate websites
I hope these examples are helpful.
At this point, I stopped the interview. Chat GPT revealed numerous classes of sources it used for its training, but did not reveal a single specific website or database. The reference to Wikipedia was the most specific reference I received.
I find it interesting that not only were written sources used, but audio and video were also evaluated.
If you have any ideas on how to get more detailed answers to this question from ChatGPT, feel free to post them in the comments.
Read every story from René Junge (and thousands of other writers on Medium).
Your membership fee directly supports René Junge and other writers you read. You’ll also get full access to every story on Medium.
Just click here and get tons of exiting and fresh content every day.






