Top 10 Datasets in Machine Learning

Blog Post


Date Published


Everything can be represented by data making it an essential part of both computing and Machine Learning. The efficiency of Machine Learning relies heavily on its datasets to perform properly. But how do you determine which data set is the best for your project?  Here’s a list of the top 10 free and easily accessible online Machine Learning datasets. 

MSleep Dataset

name: common name

  1. genus: taxonomic rank
  2. vore: carnivor, omnivore or herbivor
  3. order: taxonomic rank
  4. conservation: status of the mammal
  5. sleep_total: total amount of sleep measured in hours
  6. sleep_rem: rem sleep measured in hours
  7. sleepy_cycle: length of sleep cycle measured in hours
  8. awake: time spent awake measured in hours
  9. brainwt: brain weight in kilograms
  10. boydwt: body weight in kilograms

Car Seat Dataset

Consists of the sales of car seats from 400 different store locations with 11 variables. Each of the following variables are measured in increments of thousands.

  1. Sales: unit sales at each location
  2. CompPrice: Price charged by a competitor at each location
  3. Income: Community income level measured in thousands of dollars
  4. Advertising: Local advertising budget for the company at each location
  5. Population: Population size in region
  6. Price: Price the company charges for car seats at each site
  7. ShelveLoc: Measured in Bad, Good, and Medium indicating the quality of the shelving location for the car seats at each location
  8. Age: Average age of the local population
  9. Education: Education level at each location
  10. Urban: Yes/ No to indicate if the store is in an urban or rural location
  11. US: Yes/No to indicate if the store is in the US or not

Diamond Dataset

Contains information regarding almost 54,000 diamonds with ten variables.

  1. Carat: the weight of the diamond
  2. Cut: quality of the diamond measured from Fair, Good, Very Good, Premium, Ideal
  3. Color: color of the diamond measured from D, the best, to J, the worst
  4. Clarity: how clear the diamond measured by the following scale (worst to best): I1, SI2, SI1, VS2, VS1, VVS2, VVS1, IF
  5. Depth: total depth percentage, calculated using the x, y, and z variables
  6. Table: width of the top of the diamond in relation to its widest point
  7. Price: amount in USD
  8. X: length in millimeters
  9. Y: width in millimeters
  10. Z: length in millimeters

Free Spoken Digit Dataset

This dataset allows you to contribute your recordings of spoken digits as long as they are 8kHz wav files and in English. The recordings are also trimmed at the beginning and end for minimal silence. As an open dataset, it is expected to grow over time as contributions trickle in. This dataset hopes to solve digit pronunciation problems and at the time of this post, consists of six speakers, with 3,000 recordings (50 of each digit per speaker).

The Wikipedia Corpus

Wikipedia is not only a resource for students with research papers, but also a very useful tool for Natural Language Processing researchers.  This dataset consists of nearly 1.9 billion words from more than 4 million Wikipedia articles that can be searched by words, phrases, and paragraphs. 

Face Image Dataset

Subjects of this dataset consist mostly of male and female adults, ranging between the ages of 18-20 years old, from various ethnicities. The objective of this dataset is to help distinguish not only between genders but also emotions. As part of the dataset, images with a resolution of 180*200 pixels were taken of the female and female subjects. In total, nearly 400 individuals participated with 20 images taken per each subject. Now, anyone can download this dataset as a zip file.

Spam SMS Classifier

Ham or spam?  This dataset helps predict whether a text is ham (legit) or spam.  Consisting of more than 5,500 messages in English, this dataset is beginner-friendly and simple to comprehend.  By using a comma-separated value format and one message per line made up of two columns: v1, ham or spam, and v2, the raw text this data set is novice approved.

Fashion MNIST Dataset

Like the Spam SMS Classifier dataset, this dataset is beginner-friendly and useful in understanding the techniques and deep learning recognition pattern of real-world data.  With over 70,000, 28×28, grayscale pixel images, this set was created to replace the original MNIST dataset to become the new benchmark for algorithms.  In this dataset each pixel has a pixel-value integer running from 0 to 255 associated with it, the bigger numbers representing the darkest pixel.

Breat Cancer Wisconsin (Diagnostic) Dataset

Used often to help with classification problems in machine learning, this dataset describes the cell nuclei characteristics present in the image with the following real-valued features:

  1. Radius 
  2. Texture (standard deviation of gray-scale values)
  3. Perimeter 
  4. Area
  5. Smoothness 
  6. Compactness  (perimeter^2 / area – 1.0) 
  7. Concavity
  8. Concave points
  9. Symmetry
  10. Fractal dimension 

Iris Flower Dataset

Used by R.A. Fisher, statistical science genius, in 1936 this dataset can still be used to build simple projects in machine learning algorithms and is beginner-friendly.  The dataset is small and consists of four attributes all measured in centimeters: sepal length, sepal width, petal length and petal width with three classes: Virginica, Setosa and Versicolor. 

Creating datasets for machine learning is a laborious human task, but luckily there are several public datasets available.  The datasets mentioned above are user-friendly, but rest assured there are plenty of other accessible datasets available for use, regardless of your project or use case.  

Featured Post

6 Types of Artificial Intelligence

Blog Post


Date Published


Getting deep in the development of Machine Learning.

Coined by computer and cognitive scientist John McCarthy in 1956, the term Artificial Intelligence came into reality a decade before the United States sent a man to the moon. AI was not a thing of the 21st Century; it was a thing of the 20th Century. Despite this, the term seems intimidating to most, something reserved for computer scientists and the tech-savvy.

The goal of AI is to emulate human behavior; the more advanced the AI, the more human-like in its functionality. Based on this definition, seven segments of AI have been identified to this day. Let’s take a deeper dive into the different types of AI out there:

1. Reactive Machine

As the oldest form of AI, relative machines have the most limited capabilities. This AI can react to stimuli but is unable to learn like humans due to its limited memory-based functionality. It cannot rely on memory or past experiences to inform on future or present decisions. IBM’s Deep Blue, a chess computer, is an example of this.

2. Limited Memory

Building on reactive AI, limited memory AI can learn from historical data to inform its decisions. Trained by large volumes of data that is then stored in their memory and used as a reference to inform decisions, these AI systems are currently the most common. Image recognition, chatbots, and self-driving vehicles are limited memory AI machines.

3. Theory Mind

Theory of mind AI is still in a developmental phase and is not clearly defined. A work in progress for researchers, the goal of the Theory of Mind AI is to be able to understand humans. That includes being able to discern human needs, emotions, beliefs, and thought processes. Although very exciting, Theory of Mind requires additional development in various branches of AI to become an every-day reality.

4. Self Awareness

Purely hypothetical for now, self-aware AI is fundamentally a human brain that is self-aware. Much like humans, this AI will have its own sets of beliefs, needs, emotions, and desires—it is a self-aware entity. This stage of AI is also the most controversial in large part due to popular culture referencing this point of AI development as the downfall of human civilization and the rise of robots. This fear is deeply rooted in the belief that self-aware AI not only challenges but threatens human intelligence. Nonetheless, this work is hypothetical for now and decades away from becoming tangible.

4. Artificial Narrow Intelligence (ANI)

Sometimes referred to as Weak AI, all existing AI falls under this term. Overlapping with reactive and limited memory AI, ANI systems can only perform one task autonomously using human-like capabilities. This AI system is limited to its default programming and nothing else. ANI systems cannot interpret anything beyond their code.

5. Artificial General Intelligence (AGI)

Also known as Strong AI or Human-Level AI, these systems can mimic how humans learn, plan, and perceive. We don’t have any systems like this yet, but AGI will have the general mental capacity to independently generalize and form connections – cutting down on training time with its ability to replicate how we reason and understand the world around us.

6. Artificial Superintelligence (ASI)

Superintelligence! That is the end goal, the Grand Prix. These systems are a compilation of all previous AI and mark the pinnacle of AI research. Similar to self-aware AI, these systems are synonymous with the modern futuristic world that we’ve often envisioned. Surpassing AGI with superior memory and the ability to process and analyze data for faster, more informed decisions; these systems mark the beginning of a singularity. Defined by the moment when the capabilities and intelligence of technology surpass human abilities, singularity symbolizes the end of the era of man and the beginning of the era of machine.

While the many AI applications have not been perfected, much less discovered is AI’s role in our future is guaranteed. Perhaps it’s because of the dismal worlds reflected in popular culture at the fault of AI or that its unmapped future seems so daunting. Regardless, the reality is that we’ve all used some form of artificial intelligence in our day-to-day lives unknowingly, and maybe during our life-time, we’ll get to interact with AGI and ASI systems— or maybe not. One thing is certain when looking at the evolution of AI, there are a lot of advancements and uncharted territory to go through first.


Featured Post

6 Ways to Improve Your eCommerce Customer Service in 2021

Blog Post

61% of consumers now view customer service as “very important”. And by 2022, customer service is predicted to take over price and product as the main brand differentiator.

An excellent customer experience will help build trust and encourage repeat purchases. However, a negative customer experience can result in a loss of sales, damaged reputation and negative online reviews.

60% of consumers said they stopped doing business with a brand due to a poor customer experience.

It’s estimated that it costs five times more to attract new customers than retain them so keeping existing customers happy is crucial if you want to be successful long-term.

Key Elements of a Great Customer Experience in 2021

What does excellent customer service mean in 2020? It means being able to respond to customer queries efficiently and effectively using email, Live Chat and social media.

These are the top six things that make up a great online customer experience.

  1. Getting the issue resolved quickly.
  2. Getting the issue resolved in one interaction.
  3. Dealing with a friendly customer service representative.
  4. Being able to follow-up with the same person if necessary.
  5. Being able to record, print, save a copy of the interaction.
  6. Having some sort of follow-up afterward to ensure you’re satisfied.

Rising Customer Expectations

Customer expectations for customer service continue to grow. We live in an age of convenience where we want it all and we want it instantly!

According to a recent survey by HubSpot, 90% of customers rate an immediate response as “important” or “very important”.

Customer service and the customer experience have become key differentiators for brands and eCommerce businesses.

With that in mind, let’s take a look at some strategies to boost your eCommerce customer service for new and existing businesses.

1. Develop a Multichannel Strategy

A study by the Aberdeen Group found that companies with a well-defined omnichannel customer experience have a customer retention rate of 91%.

Developing a multichannel strategy means you need to know where your customers are, whether that’s Facebook, Instagram, WhatsApp, or Live Chat, and be there.

When you establish your channels for contact, let your customers know that you can help them through these channels. You should aim to provide a consistently high level of customer service across all your channel channels.

Your goal as an eCommerce business is to take customer service to your customers and make them happy.

2. Ensure Customers Don’t Have to Repeat Themselves

According to HubSpot Research, 66% of us rate the most frustrating aspect of getting customer service help as waiting on hold or repeating information to different representatives.

It’s probably happened to us all at one time or another and it can be very frustrating and a waste of time for both parties.

Making sure customers don’t have to repeat themselves or wait days for a response is an important aspect of a good customer service experience.

3. Track Your Customer Satisfaction Score

If you want to improve something, you need to measure it.

Measuring customer satisfaction will help you see if the interaction with the customer was a successful one or not.

There are many different methods, each with pros and cons, but the most popular due to its simplicity is the Customer Satisfaction Score or CSAT for short.

Customer satisfaction surveys commonly use the CSAT to measure the consumers’ satisfaction with the product or service. They often include the question “How satisfied were you with your experience today?” and offer a scale of 1-10 or 1-5.

Getting customer feedback can give you invaluable insight into what is working or not.

4. Implement an Automated Chat

Chatbot helps to increase sales with studies suggesting that having Chatbots on your site can drive 3-5 times more conversions.

The messaging software allows customers (in most cases) to get an answer right away rather than wait for 24 hours or more for someone to respond to an email. Quicker response times help to improve customer satisfaction and loyalty.

It’s very convenient and less costly than phone support or live chat as customer service is fully automated.

5. Use Help Desk Software

Having the right software can be just as or more important than having the right team or processes.

Customer service tools like eDesk when used correctly can be crucial in terms of retaining customers and keeping them satisfied.

Help desk software helps customer service representatives manage customer inquiries better and respond to them faster.

It can integrate with social media platforms and contact forms on your website, so everything is stored in one central location.

6. Make Key Information Available Online

If you’re thinking of making a purchase online, one of the most frustrating things is being unable to find the information you want online.

To avoid this scenario, ensure your eCommerce website has an FAQ section and a knowledge base containing all the relevant information that the consumer requires. For example, information on exchange and returns should be easy to find.

Having a knowledge base will not only help to cut down on customer support requests but they are also good for SEO so win-win.

Final Thoughts

To grow your eCommerce business, you need to retain current customers and attract new ones by consistently delivering an excellent customer experience. Sam Walton once said, “The goal as a company is to have customer service that is not just the best, but legendary”.

We hope that this post has given you some ideas that you can implement in your business to improve your eCommerce customer service.

Featured Post
Aphid's blog page

How Can Deep Learning Change the Evolution on Machine Learning

Blog Post


Date Published


Aphid's blog page

Getting deep in the development of Machine Learning.

In the past few years, we have seen an incredible shift in new technologies. Big companies have improved their growth strategy and turned themselves into technologies such as artificial intelligence, deep learning, and machine learning. With the increase of attention around these technologies in recent years, they have been praised and showed promising innovations across different parts of business functions.

Machine learning and deep learning are forms of AI, but both have unique traits and use in terms of having benefits to the end-user and delivering services. Certainly, while machine learning had a long track record before deep learning, researchers and service provider companies were most likely to use ML algorithms to build a variety of models to improve statistics, simplify speech and envisage risk, and in other applications.

Machine learning simplifies a computer’s ability to learn and train itself in order to become an effective tool for new data landscape. Over the years it has significantly advanced its ability to evaluate complex, sophisticated and big data.

Arthur Samuel invented the first ML program and coined the phrase machine learning in 1952. After joining IBM’s Poughkeepsie Laboratory, Arthur built the first computer learning programs, which were developed to play the game of checkers. When each time checkers was played, the computer would always get better, fixing its mistakes, and finding better ways to improve from that data. This automatic learning is one of the first examples of machine learning.

Today, technology has become an integral part of processing data. When you compare it to deep learning, the main difference is that machine learning needs manual intervention in selecting which features to process, compare to deep learning who does it intuitively. It is believed by many experts that deep learning has increased interest in AI and stimulated the development of improved tools, processes, and infrastructure for all kinds of machine learning. Deep learning unique outcomes gained in applications such as computer vision, speech recognition, natural language understanding (NLU), threat detection, etc., it is increasingly becoming a big buzz among businesses.

Startups often see deep learning as an advanced, sophisticated subdivision of AI with predictive competences enthused by the brain’s ability to learn. The technology has the potential to identify objects in nearly milliseconds and with precision similar to a human brain does.

However, despite having a transformational impact of deep learning on business applications, software engineers still use the traditional statistical machine learning algorithms to capture information about training data. Most machine learning applications in corporations do not rely on neural networks and in its place capitalize on traditional machine learning models. Linear/logistic regression, random forests and boosted decision trees are the most common models. And these are the ones behind friend suggestions, ad targeting, user interest prediction, supply/demand simulation, and search result ranking, among other services technology companies use.

Despite the fact that deep neural networks are not utilized directly, they are indirectly driving fundamental changes in the field of machine learning. For example, predictive capabilities of deep learning have stirred data science professionals to contemplate distinct ways of framing problems that arise in other types of machine learning.





Made in Silicon Beach | Los Angeles, California

WLS Patent Pending

© All rights reserved

Featured Post

We use cookies and other similar technologies to improve your browsing experience and the functionality of our site. By clicking “Accept All Cookies”, you consent to the storing on your device of all the technologies described in our Cookie Policy. Your current cookie settings can be changed at any time by clicking “Cookie Preferences”. We also urge you to read our  Privacy Policy and Terms of Use  to better understand how we maintain our site, and how we may collect and use visitor data.