avatarJacob Ferus

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2975

Abstract

ses of language models were shown:</p> <figure id="ef9d"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;schema=twitter&amp;url=https%3A//twitter.com/michael_j_black/status/1593133722316189696&amp;image=https%3A//i.embed.ly/1/image%3Furl%3Dhttps%253A%252F%252Fabs.twimg.com%252Ferrors%252Flogo46x38.png%26key%3Da19fcc184b9711e1b4764040d3dc5c07" allowfullscreen="" frameborder="0" height="281" width="500"> </div> </div> </figure></iframe></div></div></figure> <figure id="592a"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;schema=twitter&amp;url=https%3A//twitter.com/garymarcus/status/1592931153673351168&amp;image=https%3A//i.embed.ly/1/image%3Furl%3Dhttps%253A%252F%252Fabs.twimg.com%252Ferrors%252Flogo46x38.png%26key%3Da19fcc184b9711e1b4764040d3dc5c07" allowfullscreen="" frameborder="0" height="281" width="500"> </div> </div> </figure></iframe></div></div></figure><p id="9ea4">The danger with language models isn’t just that they are wrong sometimes, it is that they do not say or indicate when they are wrong or unsure.</p><p id="675a">Is there a solution? Perhaps, restricting the functionality of the models to do tasks that are less prone to error but also where the errors are more easily detected. For instance, if you asked a question and the AI returned an extracted part of a real paper with the number of citations, authors, title, etc. Then it should be easier to discern if the answer makes sense or not.</p><h1 id="8f67">Hugging Face and arXiv Colab</h1><p id="39f7">arXiv has now integrated links to Hugging Face demos inside the abstract pages of papers. This is great for introducing a standard to connect papers with actual models and easily share them. Note that it doesn’t have to be the authors themselves that create the models, the community can do it too. This could also incentivize more authors to release models and demos for their papers, which is always a good thing.</p><p id="c731">I see no reason to only stick to PDFs anymore. It’s time to modernize papers and I think this is a good step forward. Check out the blog post:</p><div id="1810" class="link-block"> <a href="https://blog.arxiv.org/2022/11/17/discover-state-of-the-art-machine-learning-demos-on-arxiv/"> <div> <div> <h2>Discover State-of-the-Art Machine Learning Demos on arXiv</h2> <div><h3>We're very excited to announce that Hugging Face has collaborated with arXiv to make papers more accessible…</h3></div>

Options

       <div><p>blog.arxiv.org</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*5vVMJNLUBWdiM5y8)"></div>
          </div>
        </div>
      </a>
    </div><h1 id="009a">What do numbers look like?</h1><p id="1d91">This is a blog post I found that visualizes <b>numbers</b>. The numbers are encoded in a binary vector form where each element in the vector represents the existence of a specific prime number in its prime factorization. Then, using the UMAP algorithm for dimensionality reduction, they are displayed in 2D. It’s quite fascinating the structure that is formed. A video of the numbers iteratively being encoded is shown below and the full blog post can be found <a href="https://johnhw.github.io/umap_primes/index.md.html">here</a>.</p>
    <figure id="d9ae">
        <div>
          <div>
            <img class="ratio" src="http://placehold.it/16x9">
            <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FnCk8dyU7zUM%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DnCk8dyU7zUM&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FnCk8dyU7zUM%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" allowfullscreen="" frameborder="0" height="480" width="854">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="61f1">That was it for this week, see you next week!</p><p id="0160">If you’re interested in reading more articles about data science or AI, check out my reading lists below:</p><div id="71c8" class="link-block">
      <a href="https://medium.com/@dreamferus/list/ea01474f2db5">
        <div>
          <div>
            <h2>AI</h2>
            <div><h3> </h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*b272eb9e7e39c127512a631bbba0afb5eca2e6b7.jpeg)"></div>
          </div>
        </div>
      </a>
    </div><div id="1cbe" class="link-block">
      <a href="https://medium.com/@dreamferus/list/57808dcf16f0">
        <div>
          <div>
            <h2>Data science</h2>
            <div><h3> </h3></div>
            <div><p>science

medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*fa9570bb3b551d17caea123d6d113e2f78603939.jpeg)"></div> </div> </div> </a> </div><p id="0074">If you’d like to get a Medium membership you can use my <a href="https://medium.com/@dreamferus/membership">referral link</a> if you wish. Have a nice day.</p></article></body>

Weekly Findings In Data Science and AI

The AI That Had To Be Taken Down — Weekly Findings

Visualization of numbers, AI spreading misinformation and a Hugging Face + arXiv collaboration in this week’s findings.

Generated by Jacob Ferus using Midjourney.

The AI that had to be taken down

Galactica is a newly developed AI language model trained on large amounts of scientific data by Papers With Code (part of Facebook AI). It can do a number of tasks:

  • Suggest papers based on descriptions, code and formulas
  • Translate expressions between different forms (code, formulas, English)
  • Simplify expressions
  • Summarize papers
  • Generate Wiki articles
  • And more

It looked impressive, yet, it was quickly taken down only a few days after its demo was published.

Because the model was trained on scientific data and presented as “a model for science”, it is very important that it’s correct and unbiased. But as most of whom have tinkered with language models know, they can sometimes yield unreasonable or factually inaccurate information. For this reason, the release of a scientific language model is somewhat conflicting.

Shortly after its release, as expected, the familiar weaknesses of language models were shown:

The danger with language models isn’t just that they are wrong sometimes, it is that they do not say or indicate when they are wrong or unsure.

Is there a solution? Perhaps, restricting the functionality of the models to do tasks that are less prone to error but also where the errors are more easily detected. For instance, if you asked a question and the AI returned an extracted part of a real paper with the number of citations, authors, title, etc. Then it should be easier to discern if the answer makes sense or not.

Hugging Face and arXiv Colab

arXiv has now integrated links to Hugging Face demos inside the abstract pages of papers. This is great for introducing a standard to connect papers with actual models and easily share them. Note that it doesn’t have to be the authors themselves that create the models, the community can do it too. This could also incentivize more authors to release models and demos for their papers, which is always a good thing.

I see no reason to only stick to PDFs anymore. It’s time to modernize papers and I think this is a good step forward. Check out the blog post:

What do numbers look like?

This is a blog post I found that visualizes numbers. The numbers are encoded in a binary vector form where each element in the vector represents the existence of a specific prime number in its prime factorization. Then, using the UMAP algorithm for dimensionality reduction, they are displayed in 2D. It’s quite fascinating the structure that is formed. A video of the numbers iteratively being encoded is shown below and the full blog post can be found here.

That was it for this week, see you next week!

If you’re interested in reading more articles about data science or AI, check out my reading lists below:

If you’d like to get a Medium membership you can use my referral link if you wish. Have a nice day.

AI
Artificial Intelligence
Data Science
Machine Learning
Technology
Recommended from ReadMedium