Week 8 - NumPy (Part II) - 2D NumPy Arrays, Basic Statistics

Liaw Bei Le · May 16, 2021

2D NumPy Arrays:

  1. type(np_name)
  2. 2D NumPy Array
  3. np_name.shape( )
  4. Subsetting using index positions

Basic Statistics:

  1. np.mean( )
  2. np.median( )
  3. np.corrcoef( )
  4. np.std( )
  5. Generate data using np.random.normal( ) and np.column_stack( )

there’s np.sum( ) , np.sort( ) and more…

Note: All values in arrays need to be the same data type to have a homogeneous array.


What I learnt:

2D NumPy Arrays:

Example: we start with having 2 separate arrays

import numpy as np
np_height=np.array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight=np.array([65.0, 59.2, 63.4, 88.5, 65.23])
  • type( ) shows type of data. Here ndarray means N-dimensional array.
    import numpy as np
    type(np_height)
    type(np_weight)
    

    Output:

    numpy.ndarray
    
  • 2D NumPy Array
    np_2d=np.array([1.73, 1.68, 1.71, 1.89, 1.79],[65.0, 59.2, 63.4, 88.5, 65.23])
    

    Output:

    array([[1.73, 1.68, 1.71, 1.89, 1.79],[65.0, 59.2, 63.4, 88.5, 65.23]])
    
  • .shape shows number of rows and columns
    np_2d.shape
    

    Output:

    '''shows us 2 rows 5 columns in the 2D NumPy array'''
    (2,5)
    
  • Subsetting using index positions to generate values from specified rows and columns
    • item in index one of this 2d array is the first (nested) array
        np_2d[0]
      

      Output:

        array([1.73, 1.68, 1.71, 1.89, 1.79])
      
    • indicate row index, then column index, to generate one value
        '''get the item in row index 0 and column index 2'''
        np_2d[0][2]
      
        '''get the item in row index 0 and column index 2'''
        np_2d[0,2]
      

      Output:

        1.71
      
    • we can indicate row and column ranges and generate (2d) arrays as output
        '''get the items in both rows and column index 1 and 2'''
        np_2d[:,1:3]
      

      Output:

        # we get a 2d array with 2 rows and 2 columns as the output
        array([[1.68, 1.71],[59.2, 63.4]])
      
        '''get the items in second rows and all columns'''
        np_2d[1,:]
      

      Output:

        # we get a 1d array with 1 rows as the output
        array([65.0, 59.2, 63.4, 88.5, 65.23])
      

Basic Statistics

  • np.mean( ) gets the average value from selected data
    np.mean(np_city[:,0])
    

    Output:

    1.7472
    
  • np.median( ) gets the median value from selected data

  • np.corrcoef( ) for correlation coefficient to see relationship between fields
    np.corrcoef(np_city[:,0],np_city[:,1])
    
  • np.std( ) to find standard deviation
    np.std(np_city[:,0])
    
  • Generate data using np.random.normal(distribution mean, distribution standard deviation, no. of samples) and np.column_stack( )
    height=np.round(np.random.normal(1.75,0.20,5000),2)
    weight=np.round(np.random.normal(60.32,15,5000),2)
    np_city=np.column_stack((height,weight))
    

Twitter, Facebook