INTRODUCTION
The followings are the essential Python libraries for data science:
- Numpy
 - Pandas
 - Matplotlib
 - Scikit-learn
 - Statsmodels
 - TensorFlow
 - Keras
 - PyTorch 10.TensorFlow
 
NumPy
Description:
NumPy (Numerical Python) is a fundamental package for scientific computing with Python. It provides support for arrays, matrices, and many mathematical functions to operate on these arrays.
Key Features:
Efficient array computation.
Mathematical functions for linear algebra, Fourier transform, and random number generation. Integration with C/C++ and Fortran code. Necessary Modules:
Key Features:
Efficient array computation.
Mathematical functions for linear algebra, Fourier transform, and random number generation. Integration with C/C++ and Fortran code. Necessary Modules:
import numpy as np
Pandas
Description: Pandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and data manipulation library built on top of the Python programming language.
Key Features:
Data manipulation and data analysis. Data structures like Series and DataFrame. Time-series functionality. Necessary Modules:
import pandas as pd
Matplotlib
Description: Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It is a multi-platform data visualization library built on NumPy arrays and designed to work with the broader SciPy stack.
Key Features:
Supports various plots like line plots, bar plots, scatter plots, and histograms. Customizable plots. Necessary Modules:
import matplotlib.pyplot as plt
Scikit-learn
Description: Scikit-learn is a free machine learning library for Python. It features various classification, regression, and clustering algorithms, including support vector machines, random forests, gradient boosting, k-means, and DBSCAN.
Key Features:
Simple and efficient tools for data mining and data analysis. Built on NumPy, SciPy, and matplotlib. Open-source, commercially usable - BSD license. Necessary Modules:
import sklearn
Statsmodels
Description: Statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests and exploring data. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.
Key Features:
Regression models. Time-series analysis. Nonparametric methods. Necessary Modules:
import statsmodels.api as sm
TensorFlow
Description: TensorFlow is an open-source machine learning library developed by Google. It is used for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.
Key Features:
Highly efficient computation. Support for deep learning and machine learning. Necessary Modules:
import tensorflow as tf
Keras
Description: Keras is an open-source neural network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible.
Key Features:
User-friendly API. Modular and extensible. Support for convolutional and recurrent networks. Necessary Modules:
import keras
PyTorch
Description: PyTorch is an open-source machine learning library developed by Facebook. It is based on the Torch library and used for applications such as natural language processing. It is primarily used for applications such as computer vision and natural language processing.
Key Features:
Support for dynamic computation graphs. Highly efficient tensor computation. Necessary Modules:
import torch
 def function_name():
5 GETTING STARTED WITH PANDAS
While pandas adopts many coding idioms from NumPy, the biggest difference is that pandas is designed for working with tabular or heterogeneous data. NumPy, by contrast, is best suited for working with homogeneously typed numerical array data. The two main house in Pandas is series, and dataframe
Series
In Pandas, a Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). It is similar to a column in an Excel spreadsheet or a database table. Each element in a Series has an index, which is used to label the data.
The syntax for creating a Pandas Series is:
pandas.Series(data = None, index = None, dtype = None, name = None, copy = False, fastpath = False)Series Parameters:
dataĀ(array-like, Iterable, dict, or scalar value, optional):- The data for the Series. This can be a list, NumPy array, dictionary, or scalar value. If data is a dictionary, the 
keyswill be used as the index. If data is ascalar value, an index must be provided. indexĀ(array-like, optional):- Values must be unique and hashable, same length as data. This is the index (row labels) for the Series. If not provided, a default integer index is used (0, 1, 2, ā¦, n).
 dtypeĀ (numpy.dtype, optional):- Data type for the output Series. If not specified, the data type will be inferred.
 nameĀ(str, optional):- The name to give to the Series.
 copyĀ(bool, default False):- Copy the data. This is relevant for array-like or dictionary inputs.
 fastpathĀ (bool, default False):- This is an internal parameter and should generally not be used.
 
Creating a Series:
Here's a simple example to create a Pandas Series:
- Import Pandas:
 - Create a Series from a list:
 
import pandas as pddata = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)Output:
goCopy code
0    10
1    20
2    30
3    40
4    50
dtype: int6In this example, the SeriesĀ seriesĀ contains integers with the default integer index ranging from 0 to 4.
Creating a Series with Custom Index:
You can also specify a custom index for the Series:
data = [10, 20, 30, 40, 50]
Info = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index = info)
print(series)Output:
cssCopy code
a    10
b    20
c    30
d    40
e    50
dtype: int64Creating a Series from a Dictionary:
A Series can also be created from a dictionary, where the keys become the index:
pythonCopy code
data = {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50}
series = pd.Series(data)
print(series)Output:
cssCopy code
a    10
b    20
c    30
d    40
e    50
dtype: int64Creating a Series with a Specified Data Type:
ythonCopy code
data = [1, 2, 3, 4, 5]
series = pd.Series(data, dtype='float64')
print(series)Output:
goCopy code
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64Creating a Series with a Name:
pythonCopy code
data = [1, 2, 3, 4, 5]
series = pd.Series(data, name='my_series')
print(series)
print(series.name)Output:
yamlCopy code
0    1
1    2
2    3
3    4
4    5
Name: my_series, dtype: int64
my_seriesAccessing Data in a Series:
You can access data in a Series using both the index and the position:
pythonCopy code
# Using the index
print(series['c'])  # Output: 30
# Using the position
print(series[2])    # Output: 30Basic Operations on Series:
Pandas Series supports various operations like arithmetic operations, applying functions, filtering, etc.
- Arithmetic Operations:
 - Applying Functions:
 - Filtering:
 
pythonCopy code
series2 = series + 5
print(series2)Output:
cssCopy code
a    15
b    25
c    35
d    45
e    55
dtype: int64pythonCopy code
series3 = series.apply(lambda x: x * 2)
print(series3)Output:
cssCopy code
a    20
b    40
c    60
d    80
e    100
dtype: int64pythonCopy code
series4 = series[series > 30]
print(series4)Output:
goCopy code
d    40
e    50
dtype: int64Accessing Array Representation and Index of a Pandas Series:
In Pandas, you can access the array representation and the index object of a Series using itsĀ .arrayĀ andĀ .indexĀ attributes, respectively. These attributes provide useful ways to work with the underlying data and the labels.
Array Representation
TheĀ .arrayĀ attribute returns the underlying data of the Series as a Pandas Extension. Array, which is an abstraction over the actual data array (e.g., NumPy array or other array-like objects).
Index Object
TheĀ .indexĀ attribute returns the index (labels) of the Series, which can be used to access or modify the index labels.
# Accessing the array
obj = pd.Series([4, 7, -5, 3])
obj.arrayOutput:
<NumpyExtensionArray>
[4, 7, -5, 3]
Length: 4, dtype: int64# Accessing the index
obj = pd.Series([4, 7, -5, 3])
obj.indexOutput:
RangeIndex(start=0, stop=4, step=1)Another Example:
import pandas as pd
# Creating a Series
data = [10, 20, 30, 40, 50]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)
# Accessing the array representation
array_representation = series.array
print("Array Representation:")
print(array_representation)
# Accessing the index object
index_object = series.index
print("\nIndex Object:")
print(index_object)
Output:
Array Representation:
<PandasArray>
[10, 20, 30, 40, 50]
Length: 5, dtype: int64Index Object:
Index(['a', 'b', 'c', 'd', 'e'], dtype='object').arrayattribute allows you to work directly with the data values, while theĀ .indexĀ attribute gives you access to the index labels, both of which can be crucial for various data operations. Note that attributes are without ( ).  eg obj.array  and obj.index