🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
-
Updated
Dec 2, 2024 - Python
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Quizzes & Assignment Solutions for Google Data Analytics Professional Certificate on Coursera. Also included a few resources on side that I found helpful.
Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊
A domain-specific probabilistic programming language for scalable Bayesian data cleaning
Wrangler Transform: A DMD system for transforming Big Data
XGBoost, LightGBM, LSTM, Linear Regression, Exploratory Data Analysis
An SQL data cleaning project
This is a binary classification problem related with Autistic Spectrum Disorder (ASD) screening in Adult individual. Given some attributes of a person, my model can predict whether the person would have a possibility to get ASD using different Supervised Learning Techniques and Multi-Layer Perceptron.
Java DSL for (online) deduplication
This repo created for sharing the required/discussed files during Online Internship training program on Data Science Using Python in May-2021
Comprehensive Power BI dashboards showcasing insights on Call Centre Trends, Customer Retention, and Diversity & Inclusion to drive business impact.
Make quick and dirty data mining made easier in Sublime Text
Predict if a driver will file an insurance claim next year. (Kaggle Competition)
Data cleanse, clustering with Vector Quantization and Adaptive Resonance Theory
This library contains the file system extensions to Data-Forge that allow it to directly read and write CSV and JSON files in Node.js
Product Rationalization of Pro Bikes Inc using Power BI
Data Structures project in C++11 language, uses custom Vector & String structures with Move Semantics (Rule of Five)
Data cleaning tool.
Add a description, image, and links to the data-cleansing topic page so that developers can more easily learn about it.
To associate your repository with the data-cleansing topic, visit your repo's landing page and select "manage topics."