Investigating Blob Store and Repository Size and Space Usage

When working with repositories and blob stores, you may want to have some insight into how the storage space is being used.

This article provides you with a few different ways to find out where repository and blob store space is being consumed.

Listing the Size of File-based Repositories and Blob Stores

For file-based blob stores (not AWS S3), it is possible to run a script that will provide some information regarding the disk space that it being used by each blob store and repository.

The groovy script to run this report can be found here: nx-blob-repo-space-report.groovy

The script can be executed as a task in Nexus Repository Manager. In the Administration pane select System > Tasks. Create a new Execute Script task. Set the Language to groovy and task frequency to Manual, copying the Source from the above link and pasting it into the provided text-box.

When you execute the task, the output within nexus.log will look similar to the following (the directories scanned will differ):

*SYSTEM Script47 - Blob Storage scan STARTED.
*SYSTEM Script47 - Scanning /home/nexus/sonatype-work/nexus3/blobs/default
*SYSTEM Script47 - Scanning /opt/nexus/test2
*SYSTEM Script47 - Scanning /home/nexus/sonatype-work/nexus3/blobs/test1
*SYSTEM Script47 - Blob Storage scan ENDED. Report at /home/nexus/sonatype-work/nexus3/tmp/repoSizes-20181213-104154.json

You should be able to find the generated JSON report at the location provided in the log:

Report at /home/nexus/sonatype-work/nexus3/tmp/repoSizes-20181213-104154.json

(The actual location will vary according to your nexus configuration)

Within the JSON report, there are details of each blob store and each repository that uses the blob store. For example, the output below shows two blob stores, each having a single repository:

{
"blobstore1": {
"repositories": {
"repositoryA": {
"reclaimableBytes": 0,
"totalBytes": 4173387
}
},
"totalBlobStoreBytes": 4173387,
"totalReclaimableBytes": 0,
"totalRepoNameMissingCount": 0
},
"blobstore2": {
"repositories": {
"repositoryB": {
"reclaimableBytes": 0,
"totalBytes": 1397598
}
},
"totalBlobStoreBytes": 1397598,
"totalReclaimableBytes": 0,
"totalRepoNameMissingCount": 0
}
}

For each repository, totalbytes indicates how much space is being used and reclaimableBytes indicates how much space may be reclaimed by running the Compact Blob Store maintenance task.

For each blob store, all of the repository entries are aggregated. totalRepoNameMissingCount will display how many assets within the blob store are associated with a repository that no longer exists.

The report will also include Repositories that are empty.

NOTE: If your Nexus has a large number of repositories, please utilize "REPOSITORY_WHITELIST" in the nx-blob-repo-space-report.groovy script to reduce the execution time and memory usage by this task.  

 

Finding the Largest Blobs Within a Blob Store

1. Using a groovy script from the command line.

The groovy script below will find the largest blobs inside a blob store directory. Execute the script from the command line making sure dir_name points to the correct path for your sonatype-work directory. The output contains a list of blobs larger than 100M sorted by size.

long min_size = 100000000
String dir_name = '/opt/Nexus/sonatype-work/nexus3'

def ant = new AntBuilder()
def scanner = ant.fileScanner {
  fileset(dir: dir_name) {
    include(name: '**/blobs/**/*.properties')
    exclude(name: '**/metadata.properties')
    exclude(name: '**/*metrics.properties')
    exclude(name: '**/tmp')
  }
}
def results = [:].withDefault { 0 }
scanner.each { File file ->
  def properties = new Properties()
  file.withInputStream { is ->
    properties.load(is)
  }
  long prop_size = properties.size as long;
  if (prop_size > min_size) {
    results.put(properties['@BlobStore.blob-name'], prop_size)
  }
}
def sorted = results.sort { a, b -> b.value <=> a.value }

sorted.each{ k, v -> println "${k}:${v}" }

2. Running a repository manager task.

The groovy script below can be executed as a task in Nexus Repository Manager. In the Administration pane select System > Tasks. Create a new Execute Script task. Set the Language to groovy and task frequency to Manual, using the Source below. When you execute the task the output logged to the nexus.log will list assets larger than min_size in descending size order for each repository.

import org.sonatype.nexus.repository.storage.StorageFacet
import org.sonatype.nexus.repository.Repository
import org.sonatype.nexus.repository.storage.Asset
import groovy.json.JsonOutput

long min_size = 100000000

repository.repositoryManager.browse().each { Repository repo ->
    StorageFacet storageFacet = repo.facet(StorageFacet)
    def tx = storageFacet.txSupplier().get()
    def results = [:].withDefault { 0 }
	try {
    	tx.begin()    
    
    	tx.browseAssets(tx.findBucket(repo)).each { Asset asset ->
      	if (asset.size() > min_size) {
        	results.put(asset.name(),asset.size())
      	}
   	 }
	} finally {
    	tx.close()
    }
    def sorted = results.sort { a, b -> b.value <=> a.value }
    log.info(JsonOutput.prettyPrint(JsonOutput.toJson(sorted)))
}
Have more questions? Submit a request

0 Comments

Article is closed for comments.