Image by Editor
On May 12 2023, Kaggle opened up a competition where the Kaggle community can participate in building a report that will summarize the rapid advancements in AI from the past two years. The Kaggle community is a diverse group that has a variety of experiences within the depths of AI.
Participants were asked to write an essay on a particular topic based on the changes and developments over the past 2 years, for example, Generative AI, AI ethics and more.
The report is here and is made up of the following sections:
- Generative AI
- Text Data
- Image & Video Data
- Tabular & Time Series Data
- Kaggle Competitions
- AI Ethics
So let’s dive into what we’ve learnt…
Generative AI has been a popular topic of conversation recently. This starting section dives into the rapid progress and applications of Generative AI in the past 2 years. We have seen advancements such as text generation, image creation and music development using tools and techniques such as GANs and LLMs.
This has only been possible with the use of larger datasets and improved hardware for enhancing algorithms during their training phase. Although Generative AI is still in its early stage, it has shown in the past year alone how it is revolutionizing different industries. There are still ethical concerns to take into consideration such as privacy concerns, misinformation, and use of these AI systems.
Have a further read in the different essays:
- Generative AI
- Understand, Generate and Transform the World
- A Glimpse into the Realm of Generative AI
With the hype around Generative AI, there has been a major rise of interest in Natural Language Processing (NLP) due to the rise of large language models (LLMs). Naturally, the next section of the Kaggle AI report focuses on NLP techniques and their use in various tasks such as summarisation and translation.
If we take it back, early approaches to text-based tasks included term-frequency-based feature engineering in conjunction with non-neural network-based machine learning methods. Now we are catering to larger datasets which undergo learning word representation for model interpretation.
The use of the internet data as a training corpus has allowed these models to learn better, and produce better performance in areas such as transfer learning. Within Kaggle competitions, there has been a trend in fine-tuning publicly available models which have shown to surpass human-level performance.
The following top essays focus on the emergence and recent techniques of LLMs:
- Contemporary Large Language Models LLMs
- Large Language Models: Reasoning ability
- Mini-Giants: “Small” Language Models
Just like text data being used in tasks such as content generation, image and video generation has been very popular too. Computer vision has been around for a long time, but in recent years it has skyrocketed. We can now handle tasks such as object detection and more.
This section dives into model architectures as well as common practices used in computer vision such as augmentation. Used in a variety of different industries such as healthcare for medical imaging, computer vision still has its challenges within areas such as deep fakes, ethical and philosophical considerations, limitations of multi-modal models and more.
We have models such as the Segment Anything Model (SAM) and YOLO (You Only Look Once) which have shown how generalized, open-source models can be adapted for different and unique tasks.
Dive into the advances in image and video data with these essays:
The next section dives into the historical significance of tabular data and time series data. Both of these have not been widely popular in the past few years as they have not had the same impact as the deep learning revolution. However, there are still widely used and very effective, trending in areas such as:
- Unique approach for individual datasets/problems
- Importance of data preprocessing and feature engineering
- The dominance of gradient-boosted trees
Within the Kaggle community, these trends have been highly recognised and the following essays will dive into these as well as the unique challenges tabular and time series data come across.
- Learnings From the Typical Tabular Pipeline
- Time Series and Tabular Data
- Tabular Data in the Age of AI
A part of this report from the Kaggle community was to also analyze Kaggle competitions by looking into its developments and the community’s observations of it in the past 2 years. Kaggle competitions have been widely popular over the years as the community has used the platform to test their skills, build a portfolio and prepare for the real world.
Observations of changes in Kaggle competitions are techniques such as pseudo labeling, seed averaging, and hill climbing which were once upon a time considered “tricks,” but have now become common practices. Kaggle competitions over the past 2 years have become more competitive and competitions such as RSNA, Learning Agency and more are very popular.
Dive into the winning tricks of Kaggle competitions:
Ethics around AI is also another area of concern, with a lot of people from society having mixed emotions about the use and implementation of AI systems. Organizations are looking into the ethical principles of AI and creating new strategies to ensure that they can not only understand the AI systems but also be able to monitor and mitigate risks.
It is not an academic study but a societal one, there are many opinions which are important to understand the world of AI and how it can still be used whilst safeguarding society’s values. We have seen organizations undergo continuous auditing of their AI systems with the adoption of ethics-by-design.
Learn more about the challenges around AI and the impact it is having on society:
- Exploring the Landscape of AI Ethics
- Developments in AI and Ethics in the Past 2 Years
- Ethical AI Is All We Need!!
The Kaggle team has created a unique report in which it has allowed its community to express their opinions and experience of the world of AI and its changes in the last 2 years. Let us know if there was a particular section or essay you found very interesting!
Nisha Arya is a Data Scientist and Freelance Technical Writer. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.